OSDN Git Service

uclinux-h8/linux.git
6 years agoIB/mthca: Fix gup usage in mthca_map_user_db()
Davidlohr Bueso [Thu, 25 Jan 2018 19:27:27 +0000 (11:27 -0800)]
IB/mthca: Fix gup usage in mthca_map_user_db()

get_user_pages() must be called with mmap_sem held, currently
it is not. In fact it is called under the user db_table->mutex.
To fix this we can convert gup to use the fast alternative,
and safely avoid taking mmap_sem, if possible. Furthermore
this is safe wrt to the mutex as other callers that take the
lock (unmap and alloc_db) are not called under mmap_sem
(hence possible deadlock).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/qedr: lower print level of flushed CQEs
Kalderon, Michal [Thu, 25 Jan 2018 11:23:20 +0000 (13:23 +0200)]
RDMA/qedr: lower print level of flushed CQEs

There are races where can still get flush on CQEs before the QP enters
error state. This is not an error and should be treated as
debug information.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/uverbs: Use an unambiguous errno for method not supported
Jason Gunthorpe [Thu, 25 Jan 2018 02:58:34 +0000 (19:58 -0700)]
RDMA/uverbs: Use an unambiguous errno for method not supported

Returning EOPNOTSUPP is problematic because it can also be
returned by the method function, and we use it in quite a few
places in drivers these days.

Instead, dedicate EPROTONOSUPPORT to indicate that the ioctl framework
is enabled but the requested object and method are not supported by
the kernel. No other case will return this code, and it lets userspace
know to fall back to write().

grep says we do not use it today in drivers/infiniband subsystem.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMAINTAINERS: Fix the location of the rdma git repo
Doug Ledford [Thu, 25 Jan 2018 15:54:22 +0000 (10:54 -0500)]
MAINTAINERS: Fix the location of the rdma git repo

When Jason Gunthorpe and I became co-maintainers of the rdma tree, we
moved the official git repo location to a name neutral location.
However, that update did not make it here as well.  Fix that mistake.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMAINTAINERS: Remove Ram Amrani from Q-Logic RDMA driver
Amrani, Ram [Wed, 24 Jan 2018 07:29:53 +0000 (09:29 +0200)]
MAINTAINERS: Remove Ram Amrani from Q-Logic RDMA driver

Remove myself from maintaining the qedr module as my period
of working with Cavium/Q-Logic has come to an end. I've had
a pleasure working with the community, cheers!

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/srpt: Fix RCU debug build error
Leon Romanovsky [Wed, 24 Jan 2018 06:56:05 +0000 (08:56 +0200)]
RDMA/srpt: Fix RCU debug build error

Combination of CONFIG_DEBUG_OBJECTS_RCU_HEAD=y and
CONFIG_INFINIBAND_SRPT=m produces the following build error.

ERROR: "init_rcu_head" [drivers/infiniband/ulp/srpt/ib_srpt.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:92: __modpost] Error 1
make: *** [Makefile:1216: modules] Error 2

The reason to it that init_rcu_head() is not exported and not supposed
to be used in modules. It is needed for dynamic initialization of
statically allocated rcu_head structures.

Fixes: 795bc112cd5a ("IB/srpt: Make it safe to use RCU for srpt_device.rch_list")
Fixes: a11253142e6d ("IB/srpt: Rework multi-channel support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Add target_can_queue login parameter
Bart Van Assche [Mon, 22 Jan 2018 22:27:13 +0000 (14:27 -0800)]
IB/srp: Add target_can_queue login parameter

Although I'm not sure this parameter is useful for regular SRP users,
setting this parameter to 1 has shown to be invaluable for testing the
block layer core, SCSI core and device mapper queue running mechanisms.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Add RDMA/CM support
Bart Van Assche [Mon, 22 Jan 2018 22:27:12 +0000 (14:27 -0800)]
IB/srp: Add RDMA/CM support

Since the SRP_LOGIN_REQ defined in the SRP standard is larger than
what fits in the RDMA/CM login request private data, introduce a new
login request format for the RDMA/CM.

Note: since srp_daemon and ibsrpdm rely on the subnet manager and
since there is no equivalent of the IB subnet manager in non-IB
networks, login has to be performed manually for non-IB networks.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agokobject: Export kobj_ns_grab_current() and kobj_ns_drop()
Bart Van Assche [Mon, 22 Jan 2018 22:27:11 +0000 (14:27 -0800)]
kobject: Export kobj_ns_grab_current() and kobj_ns_drop()

Make it possible to call these two functions from a kernel module.
Note: despite their name, these two functions can be used meaningfully
independent of kobjects. A later patch will add calls to these
functions from the SRP driver because this patch series modifies the
SRP driver such that it can hold a reference to a namespace that can
last longer than the lifetime of the process through which the
namespace reference was obtained.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/cma: Update RoCE multicast routines to use net namespace
Parav Pandit [Tue, 9 Jan 2018 13:58:57 +0000 (15:58 +0200)]
RDMA/cma: Update RoCE multicast routines to use net namespace

rdma_dev_addr contains the net namespace pointer, while referring
bound_dev_if of the rdma_dev_addr, refer to the net namespace of
rdma_cm_id stored in rdma_dev_addr.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Update cma_validate_port to honor net namespace
Parav Pandit [Tue, 9 Jan 2018 13:58:56 +0000 (15:58 +0200)]
RDMA/cma: Update cma_validate_port to honor net namespace

cma_validate_port uses rdma_dev_addr to validate the port of the cm_id.
It needs to honor the net namespace which is setup during cm_id creation
when finding netdevice.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Refactor to access multiple fields of rdma_dev_addr
Parav Pandit [Tue, 9 Jan 2018 13:58:55 +0000 (15:58 +0200)]
RDMA/cma: Refactor to access multiple fields of rdma_dev_addr

Pass the rdma_cm_id so that multiple fields of the rdma_dev_addr
structure can be accessed, instead of passing each individual fields.

This is needed to access some additional fields in followup patches.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Check existence of netdevice during port validation
Parav Pandit [Tue, 9 Jan 2018 13:58:54 +0000 (15:58 +0200)]
RDMA/cma: Check existence of netdevice during port validation

If valid netdevice is not found for RoCE, GID table should not be
searched with NULL netdevice.

Doing so causes the search routines to ignore the netdev argument and may
match the wrong GID table entry if the netdev is deleted.

Fixes: abae1b71dd37 ("IB/cma: cma_validate_port should verify the port and netdevice")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Remove redundant allocation warning print
Leon Romanovsky [Fri, 19 Jan 2018 11:07:11 +0000 (13:07 +0200)]
RDMA/mlx5: Remove redundant allocation warning print

The kmalloc() failure to allocate memory generates enough information
and doesn't need to be accompanied by another driver print.

Fixes: d69a24e03659 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Simplify rdma_addr_get_sgid() to not support RoCE
Parav Pandit [Thu, 18 Jan 2018 08:11:19 +0000 (10:11 +0200)]
RDMA/core: Simplify rdma_addr_get_sgid() to not support RoCE

Now that all callers who care about RoCE addresses have been
converted to use rdma_read_gids() simplify rdma_addr_get_sgid()
to only support real GID addresses.

Callers should only use it for OPA and IB transports.

The now deleted implementation for RoCE has several bugs related to IPv6
support and incorrect/inconsistent 'GID' addresses compared to the CM
paths.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/rds: Use rdma_read_gids to read connection GIDs
Parav Pandit [Thu, 18 Jan 2018 08:11:18 +0000 (10:11 +0200)]
net/rds: Use rdma_read_gids to read connection GIDs

Use the newly introduced rdma_read_gids() to read the SGID and DGID for
the connection which returns GID correctly for RoCE transport as well.

rdma_addr_get_dgid() for RoCE for client side connections returns MAC
address, instead of DGID.
rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
when more than one IP address is assigned to the netdevice.

Therefore use transport agnostic rdma_read_gids() API provided by rdma_cm
module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/ucma: Use rdma cm API to query GID
Parav Pandit [Thu, 18 Jan 2018 08:11:17 +0000 (10:11 +0200)]
RDMA/ucma: Use rdma cm API to query GID

Make use of rdma_read_gids() API to read SGID and DGID which returns
correct GIDs for RoCE and other transports.

rdma_addr_get_dgid() for RoCE for client side connections returns MAC
address, instead of DGID.
rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
when more than one IP address is assigned to the netdevice.

Therefore use transport agnostic rdma_read_gids() API provided by rdma_cm
module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Introduce API to read GIDs for multiple transports
Parav Pandit [Thu, 18 Jan 2018 08:11:16 +0000 (10:11 +0200)]
RDMA/cma: Introduce API to read GIDs for multiple transports

This patch introduces an API that allows legacy applications to query
GIDs for a rdma_cm_id which is used during connection establishment.

GIDs are stored and created differently for iWarp, IB and RoCE transports.
Therefore rdma_read_gids() returns GID for all the transports hiding
such internal details to caller.
It is usable for client side and server side connections.

In general continued use of GID based addressing outside of IB is
discouraged, so rdma_read_gids() should not be used by any new ULPs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/srpt: Move the code for parsing struct ib_cm_req_event_param
Bart Van Assche [Wed, 17 Jan 2018 00:14:17 +0000 (16:14 -0800)]
IB/srpt: Move the code for parsing struct ib_cm_req_event_param

This patch does not change any functionality but makes srpt_cm_req_recv()
independent of the IB/CM and hence simplifies the patch that introduces
RDMA/CM support.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Preparations for adding RDMA/CM support
Bart Van Assche [Wed, 17 Jan 2018 00:14:16 +0000 (16:14 -0800)]
IB/srpt: Preparations for adding RDMA/CM support

Introduce a union in struct srpt_rdma_ch for member variables that
depend on the type of connection manager. Avoid that error messages
report the IB/CM ID.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Don't allow reordering of commands on wait list
Bart Van Assche [Wed, 17 Jan 2018 00:14:15 +0000 (16:14 -0800)]
IB/srpt: Don't allow reordering of commands on wait list

If a receive I/O context is removed from the wait list and
srpt_handle_new_iu() fails to allocate a send I/O context then
re-adding the receive I/O context to the wait list can cause
reordering. Avoid this by only removing a receive I/O context
from the wait list after allocating a send I/O context succeeded.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Fix a race condition related to wait list processing
Bart Van Assche [Wed, 17 Jan 2018 00:14:14 +0000 (16:14 -0800)]
IB/srpt: Fix a race condition related to wait list processing

Wait list processing only occurs if the channel state >= CH_LIVE. Hence
set the channel state to CH_LIVE before triggering wait list processing
asynchronously.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Fix login-related race conditions
Bart Van Assche [Wed, 17 Jan 2018 00:14:13 +0000 (16:14 -0800)]
IB/srpt: Fix login-related race conditions

Make sure that sport->mutex is not released between the duplicate
channel check, adding a channel to the channel list and performing
the sport enabled check. Avoid that srpt_disconnect_ch() can be
invoked concurrently with the ib_send_cm_rep() call by
srpt_cm_req_recv().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Log all zero-length writes and completions
Bart Van Assche [Wed, 17 Jan 2018 00:14:12 +0000 (16:14 -0800)]
IB/srpt: Log all zero-length writes and completions

The new pr_debug() statements are useful when debugging the ib_srpt
driver.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Simplify srpt_close_session()
Bart Van Assche [Wed, 17 Jan 2018 00:14:11 +0000 (16:14 -0800)]
IB/srpt: Simplify srpt_close_session()

Move a mutex lock and unlock statement from srpt_close_session()
into srpt_disconnect_ch_sync(). Since the previous patch removed
the last user of the return value of that function, change the
return value of srpt_disconnect_ch_sync() into void.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Rework multi-channel support
Bart Van Assche [Wed, 17 Jan 2018 00:14:10 +0000 (16:14 -0800)]
IB/srpt: Rework multi-channel support

Store initiator and target port ID's once per nexus instead of in each
channel data structure. This change simplifies the duplicate connection
check in srpt_cm_req_recv().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Use the source GID as session name
Bart Van Assche [Wed, 17 Jan 2018 00:14:09 +0000 (16:14 -0800)]
IB/srpt: Use the source GID as session name

Use the source GID as session name instead of the initiator port ID
from the SRP login request. The only functional change in this patch
is that it changes the session name shown in debug messages.

Note: the fifth argument that is passed to target_alloc_session() is
what the SCSI target core uses as key for lookups in the ACL (access
control list) information.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: One target per port
Bart Van Assche [Wed, 17 Jan 2018 00:14:08 +0000 (16:14 -0800)]
IB/srpt: One target per port

In multipathing setups where a target system is equipped with
dual-port HCAs it is useful to have one connection per target port
instead of one connection per target HCA. Hence move the connection
list (rch_list) from struct srpt_device into struct srpt_port.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Add P_Key support
Bart Van Assche [Wed, 17 Jan 2018 00:14:07 +0000 (16:14 -0800)]
IB/srpt: Add P_Key support

Process connection requests that use another P_Key than the default
correctly.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Rework srpt_disconnect_ch_sync()
Bart Van Assche [Wed, 17 Jan 2018 00:14:06 +0000 (16:14 -0800)]
IB/srpt: Rework srpt_disconnect_ch_sync()

This patch fixes a use-after-free issue for ch->release_done when
running the SRP protocol on top of the rdma_rxe driver.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Make it safe to use RCU for srpt_device.rch_list
Bart Van Assche [Wed, 17 Jan 2018 00:14:05 +0000 (16:14 -0800)]
IB/srpt: Make it safe to use RCU for srpt_device.rch_list

The next patch will iterate over rch_list from a context from which
it is not allowed to block. Hence make rch_list RCU-safe.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Refactor srp_send_req()
Bart Van Assche [Tue, 16 Jan 2018 18:39:44 +0000 (10:39 -0800)]
IB/srp: Refactor srp_send_req()

This patch does not change any functionality but prepares for the patch
that adds RDMA_CM support by making the RDMA_CM patch much easier to
read.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Improve path record query error message
Bart Van Assche [Tue, 16 Jan 2018 18:39:43 +0000 (10:39 -0800)]
IB/srp: Improve path record query error message

Show all path record query parameters if a path record query fails.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Use kstrtoull() instead of simple_strtoull()
Bart Van Assche [Tue, 16 Jan 2018 18:39:42 +0000 (10:39 -0800)]
IB/srp: Use kstrtoull() instead of simple_strtoull()

Use kstrtoull() since simple_strtoull() is deprecated. This patch
improves error checking but otherwise does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Remove unnecessary platform_get_resource() error check
weiyongjun (A) [Wed, 17 Jan 2018 11:28:38 +0000 (11:28 +0000)]
RDMA/hns: Remove unnecessary platform_get_resource() error check

devm_ioremap_resource() already checks if the resource is NULL, so
remove the unnecessary platform_get_resource() error check.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Mmap the HCA's clock info to user-space
Feras Daoud [Tue, 16 Jan 2018 18:08:41 +0000 (20:08 +0200)]
IB/mlx5: Mmap the HCA's clock info to user-space

This patch maps the new page to user space applications to
allow converting a user space completion timestamp to system wall
time at the lowest possible latency cost.
By using a versioning scheme we allow compatibility between current
and future userspace libraries.
The change moves mlx5_ib_mmap_cmd enum from mlx5_ib.h to the
abi header file mlx5-abi.h.

Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agonet/mlx5e: Add clock info page to mlx5 core devices
Feras Daoud [Tue, 16 Jan 2018 18:08:40 +0000 (20:08 +0200)]
net/mlx5e: Add clock info page to mlx5 core devices

Adds a new page to mlx5 core containing clock info data that allows
user level applications to translate between cqe timestamp to
nanoseconds. The information stored into this page is represented
through mlx5_ib_clock_info.

In order to synchronize between kernel and user space a sequence
number is incremented at the beginning and end of each update.
An odd number means the data is being updated while an even means
the access was already done. To guarantee that the data structure
was accessed atomically user will:

repeat:
        seq1 = <read sequence>
        goto <repeate> while odd
        <read data structure>
        seq2 = <read sequence>
        if seq1 != seq2 goto repeat

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct
Sagi Grimberg [Sun, 14 Jan 2018 15:07:50 +0000 (17:07 +0200)]
IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct

polling the completion queue directly does not interfere
with the existing polling logic, hence drop the requirement.
Be aware that running ib_process_cq_direct with non IB_POLL_DIRECT
CQ may trigger concurrent CQ processing.

This can be used for polling mode ULPs.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
[maxg: added wcs array argument to __ib_process_cq]
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: postpone WR initialization during queue drain
Max Gurtovoy [Sun, 14 Jan 2018 15:07:48 +0000 (17:07 +0200)]
IB/core: postpone WR initialization during queue drain

No need to initialize completion and WR in case we fail
during QP modification.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/rxe: Fix rxe_qp_cleanup()
Bart Van Assche [Fri, 12 Jan 2018 23:11:59 +0000 (15:11 -0800)]
RDMA/rxe: Fix rxe_qp_cleanup()

rxe_qp_cleanup() can sleep so it must be run in thread context and
not in atomic context. This patch avoids that the following bug is
triggered:

Kernel BUG at 00000000560033f3 [verbose debug info unavailable]
BUG: sleeping function called from invalid context at net/core/sock.c:2761
in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0
INFO: lockdep is turned off.
Preemption disabled at:
[<00000000b6e69628>] __do_softirq+0x4e/0x540
CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x85/0xbf
 ___might_sleep+0x177/0x260
 lock_sock_nested+0x1d/0x90
 inet_shutdown+0x2e/0xd0
 rxe_qp_cleanup+0x107/0x140 [rdma_rxe]
 rxe_elem_release+0x18/0x80 [rdma_rxe]
 rxe_requester+0x1cf/0x11b0 [rdma_rxe]
 rxe_do_task+0x78/0xf0 [rdma_rxe]
 tasklet_action+0x99/0x270
 __do_softirq+0xc0/0x540
 run_ksoftirqd+0x1c/0x70
 smpboot_thread_fn+0x1be/0x270
 kthread+0x117/0x130
 ret_from_fork+0x24/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/rxe: Fix a race condition in rxe_requester()
Bart Van Assche [Fri, 12 Jan 2018 23:11:58 +0000 (15:11 -0800)]
RDMA/rxe: Fix a race condition in rxe_requester()

The rxe driver works as follows:
* The send queue, receive queue and completion queues are implemented as
  circular buffers.
* ib_post_send() and ib_post_recv() calls are serialized through a spinlock.
* Removing elements from various queues happens from tasklet
  context. Tasklets are guaranteed to run on at most one CPU. This serializes
  access to these queues. See also rxe_completer(), rxe_requester() and
  rxe_responder().
* rxe_completer() processes the skbs queued onto qp->resp_pkts.
* rxe_requester() handles the send queue (qp->sq.queue).
* rxe_responder() processes the skbs queued onto qp->req_pkts.

Since rxe_drain_req_pkts() processes qp->req_pkts, calling
rxe_drain_req_pkts() from rxe_requester() is racy. Hence this patch.

Reported-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: Add SRQ support for Broadcom adapters
Devesh Sharma [Thu, 11 Jan 2018 16:52:11 +0000 (11:52 -0500)]
RDMA/bnxt_re: Add SRQ support for Broadcom adapters

Shared receive queue (SRQ) is defined as a pool of
receive buffers shared among multiple QPs which belong
to same protection domain in a given process context.
Use of SRQ reduces the memory foot print of IB applications.

Broadcom adapters support SRQ, adding code-changes to enable
shared receive queue.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: expose detailed stats retrieved from HW
Selvin Xavier [Thu, 11 Jan 2018 16:52:10 +0000 (11:52 -0500)]
RDMA/bnxt_re: expose detailed stats retrieved from HW

Broadcom's adapter supports more granular statistics
to allow better understanding about the state of the
chip when data traffic is flowing.

Exposing the detailed stats to the consumer through
the standard hook available in the kverbs interface.
In order to retrieve all the information, driver
implements a firmware command.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: Add support for MRs with Huge pages
Somnath Kotur [Thu, 11 Jan 2018 16:52:09 +0000 (11:52 -0500)]
RDMA/bnxt_re: Add support for MRs with Huge pages

Depending on the OS page-table configurations, applications
may request MRs which has page size alignment other than 4K

Underlying provider driver needs to adjust its PBL boundaries
according to the incoming page boundaries in the PA list.

Adding a capability to register MRs having pages-sizes other
than 4K (Hugepages).

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: Add support for query firmware version
Selvin Xavier [Thu, 11 Jan 2018 16:52:08 +0000 (11:52 -0500)]
RDMA/bnxt_re: Add support for query firmware version

The device now reports firmware version thus, removing
the hard coded values of the FW version string and
redundant fw_rev hook from sysfs. Adding code to query
firmware version from underlying device and report it
through the kernel verb to get firmware version string.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: Enable RoCE on virtual functions
Selvin Xavier [Thu, 11 Jan 2018 16:52:07 +0000 (11:52 -0500)]
RDMA/bnxt_re: Enable RoCE on virtual functions

RoCE can be used by virtual functions (VFs) as well. Adding
code changes to allow resource reservation, initialization
and avail the resources to the RDMA applications running on
those VFs.

Currently, fifty percent of the total available resources
are reserved for PF and remaining are equally divided among
active VFs.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Free IEQ resources
Mustafa Ismail [Fri, 12 Jan 2018 00:10:54 +0000 (18:10 -0600)]
i40iw: Free IEQ resources

The iWARP Exception Queue (IEQ) resources are not freed when a QP is
destroyed. Fix this by freeing IEQ resources when freeing QP resources.

Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoi40iw: Remove setting of rem_addr.len
Mustafa Ismail [Fri, 12 Jan 2018 00:10:53 +0000 (18:10 -0600)]
i40iw: Remove setting of rem_addr.len

Remove setting of rem_addr.len before calling iw_rdma_write,
iw_inline_rdma_write and rdma_read. rem_addr.len is not used in those
functions.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoi40iw: Remove limit on re-posting AEQ entries to HW
Sindhu Devale [Fri, 12 Jan 2018 00:10:52 +0000 (18:10 -0600)]
i40iw: Remove limit on re-posting AEQ entries to HW

Currently, if the number of processed Asynchronous Event Queue (AEQ)
entries exceeds 255, they are not returned to HW for re-use. During
scale-up, the unreturned AEQ entries can grow to the max AEQ size and
cause the HW to report an AEQ overflow.

Remove the check which limits the number of processed AEQ entries returned
to HW.

Fixes: 86dbcd0f12e9 ("RDMA/i40iw: add file to handle cqp calls")
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoi40iw: Zero-out consumer key on allocate stag for FMR
Shiraz Saleem [Fri, 12 Jan 2018 00:10:51 +0000 (18:10 -0600)]
i40iw: Zero-out consumer key on allocate stag for FMR

If the application invalidates the MR before the FMR WR, HW parses the
consumer key portion of the stag and returns an invalid stag key
Asynchronous Event (AE) that tears down the QP.

Fix this by zeroing-out the consumer key portion of the allocated stag
returned to application for FMR.

Fixes: ee855d3b93f3 ("RDMA/i40iw: Add base memory management extensions")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoi40iw: Remove extra call to i40iw_est_sd()
Shiraz Saleem [Fri, 12 Jan 2018 00:10:50 +0000 (18:10 -0600)]
i40iw: Remove extra call to i40iw_est_sd()

Remove redundant estimate SD function call.  sd_needed should already be
updated at the end of the do while resource reduction loop.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Set the guid for hip08 RoCE device
oulijun [Wed, 10 Jan 2018 06:39:53 +0000 (14:39 +0800)]
RDMA/hns: Set the guid for hip08 RoCE device

This patch assign a guid(Global Unique identifer) value to the hip08
device.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Update the verbs of polling for completion
oulijun [Wed, 10 Jan 2018 06:39:52 +0000 (14:39 +0800)]
RDMA/hns: Update the verbs of polling for completion

If the port is a RoCEv2 port, the remote port address and QP information
which returned for UD will be modified.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Assign zero for pkey_index of wc in hip08
oulijun [Wed, 10 Jan 2018 06:39:51 +0000 (14:39 +0800)]
RDMA/hns: Assign zero for pkey_index of wc in hip08

Because pkey is fixed for hip08 RoCE, it needs to assign zero for
pkey_index of wc. otherwise, it will happen an error when establishing
connection by communication management mechanism.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Fill sq wqe context of ud type in hip08
oulijun [Wed, 10 Jan 2018 06:39:50 +0000 (14:39 +0800)]
RDMA/hns: Fill sq wqe context of ud type in hip08

This patch mainly configure the fields of sq wqe of ud type when posting
wr of gsi qp type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Add gsi qp support for modifying qp in hip08
oulijun [Wed, 10 Jan 2018 06:39:49 +0000 (14:39 +0800)]
RDMA/hns: Add gsi qp support for modifying qp in hip08

It needs to Assign the values for some fields in qp context when qp type
is gsi qp type in hip08.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Create gsi qp in hip08
oulijun [Wed, 10 Jan 2018 06:39:48 +0000 (14:39 +0800)]
RDMA/hns: Create gsi qp in hip08

The gsi qp and rc qp use the same qp context structure and the created
flow, only differentiate them by qpn and qp type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Assign the correct value for tx_cqn
oulijun [Wed, 10 Jan 2018 06:39:47 +0000 (14:39 +0800)]
RDMA/hns: Assign the correct value for tx_cqn

When modifying qp from init to init, it need to assign the cqn of send cq
for tx cqn field of qp context. Otherwise, it will cause a mistake when
the send and recv cq sizes are different.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/cma: use strlcpy() instead of strncpy()
Xiongfeng Wang [Fri, 12 Jan 2018 07:56:05 +0000 (15:56 +0800)]
IB/cma: use strlcpy() instead of strncpy()

gcc-8 reports

drivers/infiniband/core/cma_configfs.c: In function 'make_cma_dev':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 64 equals destination size [-Wstringop-truncation]

We need to use strlcpy() to make sure the string is nul-terminated.

Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Clarify rdma_ah_find_type
Parav Pandit [Fri, 12 Jan 2018 05:58:42 +0000 (07:58 +0200)]
RDMA/core: Clarify rdma_ah_find_type

iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
ports and for clarity it shouldn't have a special test for iWarp.

This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
when wrongly called on an iWarp port.

Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Fix ib_wc structure size to remain in 64 bytes boundary
Bodong Wang [Fri, 12 Jan 2018 05:58:41 +0000 (07:58 +0200)]
IB/core: Fix ib_wc structure size to remain in 64 bytes boundary

The change of slid from u16 to u32 results in sizeof(struct ib_wc)
cross 64B boundary, which causes more cache misses. This patch
rearranges the fields and remain the size to 64B.

Pahole output before this change:

struct ib_wc {
        union {
                u64                wr_id;                /*           8 */
                struct ib_cqe *    wr_cqe;               /*           8 */
        };                                               /*     0     8 */
        enum ib_wc_status          status;               /*     8     4 */
        enum ib_wc_opcode          opcode;               /*    12     4 */
        u32                        vendor_err;           /*    16     4 */
        u32                        byte_len;             /*    20     4 */
        struct ib_qp *             qp;                   /*    24     8 */
        union {
                __be32             imm_data;             /*           4 */
                u32                invalidate_rkey;      /*           4 */
        } ex;                                            /*    32     4 */
        u32                        src_qp;               /*    36     4 */
        int                        wc_flags;             /*    40     4 */
        u16                        pkey_index;           /*    44     2 */

        /* XXX 2 bytes hole, try to pack */

        u32                        slid;                 /*    48     4 */
        u8                         sl;                   /*    52     1 */
        u8                         dlid_path_bits;       /*    53     1 */
        u8                         port_num;             /*    54     1 */
        u8                         smac[6];              /*    55     6 */

        /* XXX 1 byte hole, try to pack */

        u16                        vlan_id;              /*    62     2 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u8                         network_hdr_type;     /*    64     1 */

        /* size: 72, cachelines: 2, members: 17 */
        /* sum members: 62, holes: 2, sum holes: 3 */
        /* padding: 7 */
        /* last cacheline: 8 bytes */
};

Pahole output after this change:

struct ib_wc {
        union {
                u64                wr_id;                /*           8 */
                struct ib_cqe *    wr_cqe;               /*           8 */
        };                                               /*     0     8 */
        enum ib_wc_status          status;               /*     8     4 */
        enum ib_wc_opcode          opcode;               /*    12     4 */
        u32                        vendor_err;           /*    16     4 */
        u32                        byte_len;             /*    20     4 */
        struct ib_qp *             qp;                   /*    24     8 */
        union {
                __be32             imm_data;             /*           4 */
                u32                invalidate_rkey;      /*           4 */
        } ex;                                            /*    32     4 */
        u32                        src_qp;               /*    36     4 */
        u32                        slid;                 /*    40     4 */
        int                        wc_flags;             /*    44     4 */
        u16                        pkey_index;           /*    48     2 */
        u8                         sl;                   /*    50     1 */
        u8                         dlid_path_bits;       /*    51     1 */
        u8                         port_num;             /*    52     1 */
        u8                         smac[6];              /*    53     6 */

        /* XXX 1 byte hole, try to pack */

        u16                        vlan_id;              /*    60     2 */
        u8                         network_hdr_type;     /*    62     1 */

        /* size: 64, cachelines: 1, members: 17 */
        /* sum members: 62, holes: 1, sum holes: 1 */
        /* padding: 1 */
};

Cc: <stable@vger.kernel.org> # v4.13
Fixes: 7db20ecd1d97 ("IB/core: Change wc.slid from 16 to 32 bits")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH ports
Jack Morgenstein [Fri, 12 Jan 2018 05:58:40 +0000 (07:58 +0200)]
IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH ports

Allocating steerable UD QPs depends on having at least one IB port,
while releasing those QPs does not.

As a result, when there are only ETH ports, the IB (RoCE) driver
requests releasing a qp range whose base qp is zero, with
qp count zero.

When SR-IOV is enabled, and the VF driver is running on a VM over
a hypervisor which treats such qp release calls as errors
(rather than NOPs), we see lines in the VM message log like:

 mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0

Fix this by adding a check for a zero count in mlx4_release_qp_range()
(which thus treats releasing 0 qps as a nop), and eliminating the
check for device managed flow steering when releasing steerable UD QPs.
(Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it
remains NULL when steerable UD QPs are not allocated).

Cc: <stable@vger.kernel.org>
Fixes: 4196670be786 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/qedr: Fix endian problems around imm_data
Jason Gunthorpe [Thu, 11 Jan 2018 21:43:07 +0000 (14:43 -0700)]
RDMA/qedr: Fix endian problems around imm_data

The double swap matches what user space rdma-core does to imm_data.

wc->imm_data is not used in the kernel so this change has no practical
impact.

Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Fix endian problems around imm_data and rkey
Jason Gunthorpe [Thu, 11 Jan 2018 21:43:06 +0000 (14:43 -0700)]
RDMA/hns: Fix endian problems around imm_data and rkey

This matches the changes made recently to the userspace hns
driver when it was made sparse clean.

See rdma-core commit bffd380cfe56 ("libhns: Make the provider sparse
clean")

wc->imm_data is not used in the kernel so this change has no practical
impact.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA: Mark imm_data as be32 in the verbs uapi header
Jason Gunthorpe [Thu, 11 Jan 2018 21:43:05 +0000 (14:43 -0700)]
RDMA: Mark imm_data as be32 in the verbs uapi header

This matches what the userspace copy of this header has been doing
for a while. imm_data is an opaque 4 byte array carried over the network,
and invalidate_rkey is in CPU byte order.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Limit DMAC resolution to RoCE Connected QPs
Parav Pandit [Tue, 9 Jan 2018 13:24:53 +0000 (15:24 +0200)]
IB/core: Limit DMAC resolution to RoCE Connected QPs

Resolving DMAC for RoCE is applicable to only Connected mode QPs.
So resolve DMAC for only for Connected mode QPs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Attempt DMAC resolution for only RoCE
Parav Pandit [Tue, 9 Jan 2018 13:24:52 +0000 (15:24 +0200)]
IB/core: Attempt DMAC resolution for only RoCE

Instead of returning 0 (success) for RoCE scenarios where DMAC should
not be resolved, avoid such attempt and make code consistent with
ib_create_user_ah().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Limit DMAC resolution to userspace QPs
Parav Pandit [Tue, 9 Jan 2018 13:24:51 +0000 (15:24 +0200)]
IB/core: Limit DMAC resolution to userspace QPs

Currently ah_attr is initialized by the ib_cm layer for rdma_cm
based applications. For RoCE transport ah_attr.roce.dmac is already
initialized by ib_cm, rdma_cm either from wc, path record, route
resolve, explicit path record setting depending on active or passive
side QP. Therefore avoid resolving DMAC for QP of kernel consumers.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Perform modify QP on real one
Parav Pandit [Tue, 9 Jan 2018 13:24:50 +0000 (15:24 +0200)]
IB/core: Perform modify QP on real one

Currently qp->port stores the port number whenever IB_QP_PORT
QP attribute mask is set (during QP state transition to INIT state).
This port number should be stored for the real QP when XRC target QP
is used.

Follow the ib_modify_qp() implementation and hide the access to ->real_qp.

Fixes: a512c2fbef9c ("IB/core: Introduce modify QP operation with udata")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoinfiniband: fix sw/rdmavt/* kernel-doc notation
Randy Dunlap [Sat, 6 Jan 2018 00:22:32 +0000 (16:22 -0800)]
infiniband: fix sw/rdmavt/* kernel-doc notation

Use correct parameter names and formatting in function kernel-doc notation
to eliminate warnings from scripts/kernel-doc.

../drivers/infiniband/sw/rdmavt/mr.c:784: warning: Excess function parameter 'ibmfr' description in 'rvt_map_phys_fmr'
../drivers/infiniband/sw/rdmavt/vt.c:234: warning: Excess function parameter 'intex' description in 'rvt_query_pkey'
../drivers/infiniband/sw/rdmavt/vt.c:266: warning: Excess function parameter 'index' description in 'rvt_query_gid'
../drivers/infiniband/sw/rdmavt/vt.c:306: warning: Excess function parameter 'data' description in 'rvt_alloc_ucontext'
../drivers/infiniband/sw/rdmavt/cq.c:65: warning: Excess function parameter 'sig' description in 'rvt_cq_enter'
../drivers/infiniband/sw/rdmavt/qp.c:279: warning: Excess function parameter 'qpt' description in 'rvt_free_all_qps'
../drivers/infiniband/sw/rdmavt/mcast.c:282: warning: Excess function parameter 'igd' description in 'rvt_attach_mcast'
../drivers/infiniband/sw/rdmavt/mcast.c:345: warning: Excess function parameter 'igd' description in 'rvt_detach_mcast'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-doc@vger.kernel.org
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoinfiniband: fix ulp/opa_vnic/opa_vnic_vema.c kernel-doc notation
Randy Dunlap [Sat, 6 Jan 2018 00:22:04 +0000 (16:22 -0800)]
infiniband: fix ulp/opa_vnic/opa_vnic_vema.c kernel-doc notation

Use correct parameter name and description in kernel-doc notation to
eliminate a kernel-doc warning.

../drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c:730: warning: Excess function parameter 'cport' description in 'opa_vnic_vema_send_trap'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-doc@vger.kernel.org
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoinfiniband: fix core/fmr_pool.c kernel-doc notation
Randy Dunlap [Sat, 6 Jan 2018 00:21:53 +0000 (16:21 -0800)]
infiniband: fix core/fmr_pool.c kernel-doc notation

Fix kernel-doc warning for ib_fmr_pool_map_phys() and also format it
with function description and text spacing.

../drivers/infiniband/core/fmr_pool.c:404: warning: Excess function parameter 'pool' description in 'ib_fmr_pool_map_phys'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoinfiniband: fix core/verbs.c kernel-doc notation
Randy Dunlap [Sat, 6 Jan 2018 00:21:40 +0000 (16:21 -0800)]
infiniband: fix core/verbs.c kernel-doc notation

Change function parameter name in kernel-doc notation and other comments
to eliminate a kernel-doc warning.

../drivers/infiniband/core/verbs.c:1790: warning: Excess function parameter 'wq_init_attr' description in 'ib_create_wq'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Fix rdma_cm path querying for RoCE
Parav Pandit [Mon, 8 Jan 2018 15:04:48 +0000 (17:04 +0200)]
RDMA/cma: Fix rdma_cm path querying for RoCE

The 'if' logic in ucma_query_path was broken with OPA was introduced
and started to treat RoCE paths as as OPA paths. Invert the logic
of the 'if' so only OPA paths are treated as OPA paths.

Otherwise the path records returned to rdma_cma users are mangled
when in RoCE mode.

Fixes: 57520751445b ("IB/SA: Add OPA path record type")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Fix rdma_cm raw IB path setting for RoCE
Parav Pandit [Mon, 8 Jan 2018 15:04:47 +0000 (17:04 +0200)]
RDMA/cma: Fix rdma_cm raw IB path setting for RoCE

rdma_set_ib_path() missed setting path record fields for RoCE
transport when RoCE support was added.

This results in setting incorrect ndev, destination mac address,
incorrect GID type etc errors when user space attempts to set a raw
IB path using the roce IB path compatibility mapping from userspace.

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/{cma, ucma}: Simplify and rename rdma_set_ib_paths
Parav Pandit [Mon, 8 Jan 2018 15:04:45 +0000 (17:04 +0200)]
RDMA/{cma, ucma}: Simplify and rename rdma_set_ib_paths

Since 2006 there has been no user of rdmacm based application to make use
of setting multiple path records using rdma_set_ib_paths API.

Therefore code is simplified to allow setting one path record entry.
Now that it sets only single path, it is renamed to reflect the same.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Provide a function to set RoCE path record L2 parameters
Parav Pandit [Mon, 8 Jan 2018 15:04:44 +0000 (17:04 +0200)]
RDMA/cma: Provide a function to set RoCE path record L2 parameters

Introduce a helper function to set path record L2 fields for RoCE.
This includes setting GID type, destination mac address and netdev
ifindex.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Use the right net namespace for the rdma_cm_id
Parav Pandit [Mon, 8 Jan 2018 15:04:43 +0000 (17:04 +0200)]
RDMA/cma: Use the right net namespace for the rdma_cm_id

The net namespace is set in addr during create_rdma_id(),
cma_resolve_iboe_route() should use that instead of the
init namespace.

The original code was added in commit fa20105e09e9 ("IB/cma: Add support
for network namespaces"), but this path wasn't in use back then.

This patch updates the code to use right namespace, as preparation
for improving namespace support.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Increase number of char device minors
Huy Nguyen [Mon, 8 Jan 2018 10:15:38 +0000 (12:15 +0200)]
IB/core: Increase number of char device minors

There is a need to increase number of possible char devices to support
large number of SR-IOV instances. The current limit is in the range of
64-128 devices/ports. Increase it to support up to 1024.

The patch performs the following steps to refactor the code:
1. Removes the split bitmap for fixed and overflow dev numbers.
2. Pre-allocates the non-legacy major number range during driver
   initialization, choosen for simplicity.
3. Add new define (RDMA_MAX_PORTS) that is shared between all drivers.
   This is the maximum total number of ports on all struct ib_devices.
4. Set RDMA_MAX_PORTS to 1024.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Remove the locking for character device bitmaps
Huy Nguyen [Mon, 8 Jan 2018 10:15:37 +0000 (12:15 +0200)]
IB/core: Remove the locking for character device bitmaps

Remove the locks that protect character device bitmaps of
uverbs, umad and issm.

The character device bitmaps are accessed in "client->add" and
"client->remove" calls from ib_register_device and ib_unregister_device
respectively. These calls are already protected by the "device_mutex"
mutex. Thus, the spinlocks are not needed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/rxe: Fix a race condition related to the QP error state
Bart Van Assche [Tue, 9 Jan 2018 19:23:40 +0000 (11:23 -0800)]
RDMA/rxe: Fix a race condition related to the QP error state

The following sequence:
* Change queue pair state into IB_QPS_ERR.
* Post a work request on the queue pair.

Triggers the following race condition in the rdma_rxe driver:
* rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
  that examines the QP send queue.
* rxe_post_send() posts a work request on the QP send queue.

If rxe_completer() runs prior to rxe_post_send(), it will drain the send
queue and the driver will assume no further action is necessary.
However, once we post the send to the send queue, because the queue is
in error, no send completion will ever happen and the send will get
stuck.  In order to process the send, we need to make sure that
rxe_completer() gets run after a send is posted to a queue pair in an
error state.  This patch ensures that happens.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.8
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: remove redundant assignment of mdev
Colin Ian King [Tue, 9 Jan 2018 15:55:43 +0000 (15:55 +0000)]
IB/mlx5: remove redundant assignment of mdev

The initial assignment to mdev is redundant as mdev is re-assigned
later and the first assigned value is never read. Remove this
redundant assignment.

Cleans up clang warning:
drivers/infiniband/hw/mlx5/main.c:359:24: warning: Value stored
to 'mdev' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rxe: remove unnecessary skb_clone in xmit
Zhu Yanjun [Mon, 8 Jan 2018 05:14:25 +0000 (00:14 -0500)]
IB/rxe: remove unnecessary skb_clone in xmit

In xmit, there is a skb_clone. This function copies the struct sk_buff.
And some parameters are changed to the new skb. Then the new skb is sent
while the old skb is freed.

While the function skb_clone is removed, the parameter changes are made on
the old skb, then the old skb is sent. It can also work well.

The following tests are made.

 server                       client
---------                    ---------
|1.1.1.1|<----rxe-channel--->|1.1.1.2|
---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512

The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server
and client.

This test runs for several hours. There is no memory leak and the whole
system can work well.

As the above network, the following tests are made.

Server: ibv_rc_pingpong -d rxe0 -g 1
Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1

The result on Server.
Before:
8192000 bytes in 0.88 seconds = 74.36 Mbit/sec
1000 iters in 0.88 seconds = 881.30 usec/iter

After:
8192000 bytes in 0.81 seconds = 81.15 Mbit/sec
1000 iters in 0.81 seconds = 807.62 usec/iter

The throughput is enhanced and the latency is reduced.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rxe: add the static type to the variable
Zhu Yanjun [Sun, 7 Jan 2018 12:08:48 +0000 (07:08 -0500)]
IB/rxe: add the static type to the variable

The variable recv_sockets is only used in the file rxe_net.c. So
it is better to add static type to it.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'bart-srpt-for-next' into k.o/wip/dl-for-next
Doug Ledford [Mon, 8 Jan 2018 21:06:20 +0000 (16:06 -0500)]
Merge branch 'bart-srpt-for-next' into k.o/wip/dl-for-next

Merging in 12 patch series from Bart that required changes in the
current for-rc branch in order to apply cleanly.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Micro-optimize I/O context state manipulation
Bart Van Assche [Mon, 8 Jan 2018 19:00:51 +0000 (11:00 -0800)]
IB/srpt: Micro-optimize I/O context state manipulation

Since all I/O context state changes are already serialized, it is
not necessary to protect I/O context state changes with the I/O
context spinlock. Hence remove that spinlock.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Inline srpt_get_cmd_state()
Bart Van Assche [Mon, 8 Jan 2018 19:00:50 +0000 (11:00 -0800)]
IB/srpt: Inline srpt_get_cmd_state()

It is not necessary to obtain ioctx->spinlock when reading the ioctx
state. Since after removal of this locking only a single line remains,
inline the srpt_get_cmd_state() function.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Introduce srpt_format_guid()
Bart Van Assche [Mon, 8 Jan 2018 19:00:49 +0000 (11:00 -0800)]
IB/srpt: Introduce srpt_format_guid()

Introduce a function for converting a GUID into an ASCII string. This
patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Reduce frequency of receive failure messages
Bart Van Assche [Mon, 8 Jan 2018 19:00:48 +0000 (11:00 -0800)]
IB/srpt: Reduce frequency of receive failure messages

Disabling an SRP target port causes the state of all QPs associated
with a port to be changed into IB_QPS_ERR. Avoid that this causes
one error message per I/O context to be reported.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Convert a warning into a debug message
Bart Van Assche [Mon, 8 Jan 2018 19:00:47 +0000 (11:00 -0800)]
IB/srpt: Convert a warning into a debug message

At least when running the ib_srpt driver on top of the rdma_rxe
driver it is easy to trigger a zero-length write completion in
the CH_DISCONNECTED state. Hence make the message that reports
this less noisy.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Use the IPv6 format for GIDs in log messages
Bart Van Assche [Mon, 8 Jan 2018 19:00:46 +0000 (11:00 -0800)]
IB/srpt: Use the IPv6 format for GIDs in log messages

Make the ib_srpt driver use the IPv6 format for GIDs in log messages
to improve consistency of this driver with other RDMA kernel drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Verify port numbers in srpt_event_handler()
Bart Van Assche [Mon, 8 Jan 2018 19:00:45 +0000 (11:00 -0800)]
IB/srpt: Verify port numbers in srpt_event_handler()

Verify whether port numbers are in the expected range before using
these as an array index. Complain if a port number is out of range.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Reduce the severity level of a log message
Bart Van Assche [Mon, 8 Jan 2018 19:00:44 +0000 (11:00 -0800)]
IB/srpt: Reduce the severity level of a log message

Since the SRQ event message is only useful for debugging purposes,
reduce its severity from "informational" to "debug".

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Rename a local variable, a member variable and a constant
Bart Van Assche [Mon, 8 Jan 2018 19:00:43 +0000 (11:00 -0800)]
IB/srpt: Rename a local variable, a member variable and a constant

Rename rsp_size into max_rsp_size and SRPT_RQ_SIZE into MAX_SRPT_RQ_SIZE.
The new names better reflect the role of this member variable and constant.
Since the prefix "srp_" is superfluous in the context of the function
that creates an RDMA channel, rename srp_sq_size into sq_size.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Document all structure members in ib_srpt.h
Bart Van Assche [Mon, 8 Jan 2018 19:00:42 +0000 (11:00 -0800)]
IB/srpt: Document all structure members in ib_srpt.h

This patch avoids that the following command reports any warnings:

scripts/kernel-doc -none drivers/infiniband/ulp/srpt/ib_srpt.h

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Fix kernel-doc warnings in ib_srpt.c
Bart Van Assche [Mon, 8 Jan 2018 19:00:41 +0000 (11:00 -0800)]
IB/srpt: Fix kernel-doc warnings in ib_srpt.c

Avoid that warnings about missing parameter descriptions are reported
when building with W=1.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Remove an unused structure member
Bart Van Assche [Mon, 8 Jan 2018 19:00:40 +0000 (11:00 -0800)]
IB/srpt: Remove an unused structure member

Fixes: commit a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agonet/mlx5: Set num_vhca_ports capability
Daniel Jurgens [Thu, 4 Jan 2018 15:25:44 +0000 (17:25 +0200)]
net/mlx5: Set num_vhca_ports capability

Set the current capability to the max capability. Doing so enables dual
port RoCE functionality if supported by the firmware.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Don't advertise RAW QP support in dual port mode
Daniel Jurgens [Thu, 4 Jan 2018 15:25:43 +0000 (17:25 +0200)]
IB/mlx5: Don't advertise RAW QP support in dual port mode

When operating in dual port RoCE mode FW doesn't support steering for
raw QPs on the slave port. They still work on the master port, but
the user has no way of knowing which port is the master. The
capability is reported per device, not per port.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Route MADs for dual port RoCE
Daniel Jurgens [Thu, 4 Jan 2018 15:25:42 +0000 (17:25 +0200)]
IB/mlx5: Route MADs for dual port RoCE

Route performance query MADs to the correct mlx5_core_dev when using
dual port RoCE mode.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>