OSDN Git Service

uclinux-h8/linux.git
5 years agobridge: Switch to bitmap_zalloc()
Andy Shevchenko [Thu, 30 Aug 2018 10:33:18 +0000 (13:33 +0300)]
bridge: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Share main switch IRQ
Marek Behún [Thu, 30 Aug 2018 00:13:50 +0000 (02:13 +0200)]
net: dsa: mv88e6xxx: Share main switch IRQ

On some boards the interrupt can be shared between multiple devices.
For example on Turris Mox the interrupt is shared between all switches.

Signed-off-by: Marek Behun <marek.behun@nic.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/ipv6: Do not reset nl_net in ip6_route_info_create
David Ahern [Wed, 29 Aug 2018 23:54:01 +0000 (16:54 -0700)]
net/ipv6: Do not reset nl_net in ip6_route_info_create

nl_net is set on entry to ip6_route_info_create. Only devices
within that namespace are considered so no need to reset it
before returning.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/ipv4: Add extack message that dev is required for ONLINK
David Ahern [Wed, 29 Aug 2018 23:53:27 +0000 (16:53 -0700)]
net/ipv4: Add extack message that dev is required for ONLINK

Make IPv4 consistent with IPv6 and return an extack message that the
ONLINK flag requires a nexthop device.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: change IPv6 flow-label upon receiving spurious retransmission
Yuchung Cheng [Wed, 29 Aug 2018 21:53:56 +0000 (14:53 -0700)]
tcp: change IPv6 flow-label upon receiving spurious retransmission

Currently a Linux IPv6 TCP sender will change the flow label upon
timeouts to potentially steer away from a data path that has gone
bad. However this does not help if the problem is on the ACK path
and the data path is healthy. In this case the receiver is likely
to receive repeated spurious retransmission because the sender
couldn't get the ACKs in time and has recurring timeouts.

This patch adds another feature to mitigate this problem. It
leverages the DSACK states in the receiver to change the flow
label of the ACKs to speculatively re-route the ACK packets.
In order to allow triggering on the second consecutive spurious
RTO, the receiver changes the flow label upon sending a second
consecutive DSACK for a sequence number below RCV.NXT.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: add missing tcf_lock for act_connmark
Cong Wang [Wed, 29 Aug 2018 17:15:36 +0000 (10:15 -0700)]
net_sched: add missing tcf_lock for act_connmark

According to the new locking rule, we have to take tcf_lock
for both ->init() and ->dump(), as RTNL will be removed.
However, it is missing for act_connmark.

Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoveth: add software timestamping
Michael Walle [Wed, 29 Aug 2018 15:24:11 +0000 (17:24 +0200)]
veth: add software timestamping

Provide a software TX timestamp as well as the ethtool query interface
and report the software timestamp capabilities.

Tested with "ethtool -T" and two linuxptp instances each bound to a
tunnel endpoint.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "net: sched: act: add extack for lookup callback"
Cong Wang [Wed, 29 Aug 2018 17:15:35 +0000 (10:15 -0700)]
Revert "net: sched: act: add extack for lookup callback"

This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback").

This extack is never used after 6 months... In fact, it can be just
set in the caller, right after ->lookup().

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Sat, 1 Sep 2018 00:41:08 +0000 (17:41 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-09-01

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus.

2) BPF verifier improvements by giving each register its own liveness
   chain which allows to simplify and getting rid of skip_callee() logic,
   from Edward.

3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap
   and percpu lru hashmap. Also add generic percpu formatted print on
   bpftool so the same can be dumped there, from Yonghong.

4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and
   TCP_SAVED_SYN options to allow reflection of tos/tclass from received
   SYN packet, from Nikita.

5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2
   interaction and removal of incorrect shutdown() calls, from John.

6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoxsk: i40e: get rid of useless struct xdp_umem_props
Magnus Karlsson [Fri, 31 Aug 2018 11:40:02 +0000 (13:40 +0200)]
xsk: i40e: get rid of useless struct xdp_umem_props

This commit gets rid of the structure xdp_umem_props. It was there to
be able to break a dependency at one point, but this is no longer
needed. The values in the struct are instead stored directly in the
xdp_umem structure. This simplifies the xsk code as well as af_xdp
zero-copy drivers and as a bonus gets rid of one internal header file.

The i40e driver is also adapted to the new interface in this commit.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoi40e: fix possible compiler warning in xsk TX path
Magnus Karlsson [Fri, 31 Aug 2018 11:40:01 +0000 (13:40 +0200)]
i40e: fix possible compiler warning in xsk TX path

With certain gcc versions, it was possible to get the warning
"'tx_desc' may be used uninitialized in this function" for the
i40e_xmit_zc. This was not possible, however this commit simplifies
the code path so that this warning is no longer emitted.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add selftest for bpf's (set|get)_sockopt for SAVE_SYN
Nikita V. Shirokov [Fri, 31 Aug 2018 16:43:47 +0000 (09:43 -0700)]
bpf: add selftest for bpf's (set|get)_sockopt for SAVE_SYN

adding selftest for feature, introduced in commit 9452048c79404 ("bpf:
add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt").

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agosamples/bpf: xdpsock, minor fixes
Prashant Bhole [Fri, 31 Aug 2018 01:00:49 +0000 (10:00 +0900)]
samples/bpf: xdpsock, minor fixes

- xsks_map size was fixed to 4, changed it MAX_SOCKS
- Remove redundant definition of MAX_SOCKS in xdpsock_user.c
- In dump_stats(), add NULL check for xsks[i]

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoxsk: remove unnecessary assignment
Prashant Bhole [Fri, 31 Aug 2018 00:59:16 +0000 (09:59 +0900)]
xsk: remove unnecessary assignment

Since xdp_umem_query() was added one assignment of bpf.command was
missed from cleanup. Removing the assignment statement.

Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add TCP_SAVE_SYN/TCP_SAVED_SYN sample program
Nikita V. Shirokov [Thu, 30 Aug 2018 14:51:54 +0000 (07:51 -0700)]
bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN sample program

Sample program which shows TCP_SAVE_SYN/TCP_SAVED_SYN usage example:
bpf program which is doing TOS/TCLASS reflection (server would reply
with a same TOS/TCLASS as client).

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt
Nikita V. Shirokov [Thu, 30 Aug 2018 14:51:53 +0000 (07:51 -0700)]
bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt

Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set)
and TCP_SAVED_SYN (get). This would allow for bpf program to build
logic based on data from ingress SYN packet (e.g. doing tcp's tos/
tclass reflection (see sample prog)) and do it transparently from
userspace program point of view.

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoxdp: remove redundant variable 'headroom'
Colin Ian King [Thu, 30 Aug 2018 14:27:18 +0000 (15:27 +0100)]
xdp: remove redundant variable 'headroom'

Variable 'headroom' is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
variable ‘headroom’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Fri, 31 Aug 2018 21:08:34 +0000 (14:08 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-08-30

This series contains updates to i40e, i40evf and virtchnl.

Jake implements helper functions to use an array to handle the queue
stats which reduces the boiler plate code as well as keep the complexity
localized to a few functions.

Paweł adds the ability to change a VF's MAC address from the host side
without having to reload the VF driver on the guest side.

Paul adds a check to ensure that the number of queues that the PF sends
to the VF is equal to or less than the maximum number of queues the VF
can support.

Mitch fixes an issue caught by GCC 8, where we need to not include the
terminating null in the length of the string for strncpy().

Lihong fixes a VF issue to ensure that it does not enter into
promiscuous mode when macvlan is added to the VF.  Fixed a potential
crash after a VF is removed, since the workqueue sync for the adminq
task was not being cancelled.

Harshitha fixes the type for field_flags in the virtchnl_filter struct.

Martyna removes an unnecessary check in a conditional if statement.

Björn fixes an issue reported by Jesper Dangaard Brouer, where the
driver was reporting incorrect statistics when XDP was enabled.

Jan fixes the potential reporting of incorrect speed settings.

Patryk fixed an issue where the flag
I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING was getting set when any offload is
set via ethtool.  Resolved by only setting this flag when VLAN offload
is enabled.  Also ensure we hold the rtnl lock when we are clearing the
interrupt scheme.  Added a check when deleting the MAC address from the
VF to ensure that the MAC address was not set by the PF and if it was,
do not delete it.

v2: updated patch 2 in the series based on community feedback from David
    Miller to inline a function
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoi40e: Prevent deleting MAC address from VF when set by PF
Patryk Małek [Tue, 28 Aug 2018 17:16:09 +0000 (10:16 -0700)]
i40e: Prevent deleting MAC address from VF when set by PF

To prevent VF from deleting MAC address that was assigned by the
PF we need to check for that scenario when we try to delete a MAC
address from a VF.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40evf: cancel workqueue sync for adminq when a VF is removed
Lihong Yang [Tue, 28 Aug 2018 17:16:08 +0000 (10:16 -0700)]
i40evf: cancel workqueue sync for adminq when a VF is removed

If a VF is being removed, there is no need to continue with the
workqueue sync for the adminq task, thus cancel it. Without this call,
when VFs are created and removed right away, there might be a chance for
the driver to crash with events stuck in the adminq.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: hold the rtnl lock on clearing interrupt scheme
Patryk Małek [Tue, 28 Aug 2018 17:16:03 +0000 (10:16 -0700)]
i40e: hold the rtnl lock on clearing interrupt scheme

Hold the rtnl lock when we're clearing interrupt scheme
in i40e_shutdown and in i40e_remove.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40evf: Don't enable vlan stripping when rx offload is turned on
Patryk Małek [Tue, 28 Aug 2018 17:16:02 +0000 (10:16 -0700)]
i40evf: Don't enable vlan stripping when rx offload is turned on

With current implementation of i40evf_set_features when user sets
any offload via ethtool we set I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING
as a required aq which triggers driver to call
i40evf_enable_vlan_stripping. This shouldn't take place.
This patches fixes it by setting the flag only when VLAN offload
is turned on.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Check and correct speed values for link on open
Jan Sokolowski [Tue, 28 Aug 2018 17:16:01 +0000 (10:16 -0700)]
i40e: Check and correct speed values for link on open

If our card has been put in an unstable state due to
other drivers interacting with it, speed settings
might be incorrect. If incorrect, forcefully reset them
on open to known default values.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: report correct statistics when XDP is enabled
Björn Töpel [Fri, 24 Aug 2018 11:21:59 +0000 (13:21 +0200)]
i40e: report correct statistics when XDP is enabled

When XDP is enabled, the driver will report incorrect
statistics. Received frames will reported as transmitted frames.

This commits fixes the i40e implementation of ndo_get_stats64 (struct
net_device_ops), so that iproute2 will report correct statistics
(e.g. when running "ip -stats link show dev eth0") even when XDP is
enabled.

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: static analysis report from community
Martyna Szapar [Mon, 20 Aug 2018 15:12:33 +0000 (08:12 -0700)]
i40e: static analysis report from community

Static analysis tools report a problem from original driver submission.
Removing unnecessary check in condition.

Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agovirtchnl: use u8 type for a field in the virtchnl_filter struct
Harshitha Ramamurthy [Mon, 20 Aug 2018 15:12:32 +0000 (08:12 -0700)]
virtchnl: use u8 type for a field in the virtchnl_filter struct

The virtchnl_filter struct has a field called field_flags. A previous
commit mistakenly had the type to be a __u8. What we want is for the
field to be an unsigned 8 bit value, so let's just use the existing
kernel type u8 for that.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40evf: set IFF_UNICAST_FLT flag for the VF
Lihong Yang [Mon, 20 Aug 2018 15:12:31 +0000 (08:12 -0700)]
i40evf: set IFF_UNICAST_FLT flag for the VF

Set IFF_UNICAST_FLT flag for the VF to prevent it from entering
promiscuous mode when macvlan is added to the VF.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: use correct length for strncpy
Mitch Williams [Mon, 20 Aug 2018 15:12:30 +0000 (08:12 -0700)]
i40e: use correct length for strncpy

Caught by GCC 8. When we provide a length for strncpy, we should not
include the terminating null. So we must tell it one less than the size
of the destination buffer.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40evf: Validate the number of queues a PF sends
Paul M Stillwell Jr [Mon, 20 Aug 2018 15:12:29 +0000 (08:12 -0700)]
i40evf: Validate the number of queues a PF sends

A PF can send any number of queues to the VF and the VF may not
be able to support that many. Check to see that the number of
queues is less than or equal to the max number of queues the
VF can have.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40evf: Change a VF mac without reloading the VF driver
Paweł Jabłoński [Mon, 20 Aug 2018 15:12:26 +0000 (08:12 -0700)]
i40evf: Change a VF mac without reloading the VF driver

Add possibility to change a VF mac address from host side
without reloading the VF driver on the guest side. Without
this patch it is not possible to change the VF mac because
executing i40evf_virtchnl_completion function with
VIRTCHNL_OP_GET_VF_RESOURCES opcode resets the VF mac
address to previous value.

Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40evf: update ethtool stats code and use helper functions
Jacob Keller [Mon, 20 Aug 2018 15:12:25 +0000 (08:12 -0700)]
i40evf: update ethtool stats code and use helper functions

Fix a bug in the way we handled VF queues, by always showing stats for
the maximum number of queues, even if they aren't allocated. It is not
safe to change the number of strings reported to ethtool, as grabbing
statistics occurs over multiple ethtool ops for which the rtnl_lock()
cannot be held the entire time.

Avoid this by always reporting queue stats for the maximum number of
queues in the netdevice. Share some of the helper functionality for
adding stats with the PF code in i40e_ethtool_stats.h

This should reduce the chance of potential future bugs, and make adding
new statistics easier.

Note for the queue stats, unlike the PF driver we do not keep an array
of queue pointers, but an array of queues, so care must be taken to
avoid accessing queue memory that hasn't yet been allocated.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h
Jacob Keller [Mon, 20 Aug 2018 15:12:24 +0000 (08:12 -0700)]
i40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h

Move the boiler plate structures and helper functions we recently
added into their own header file, so that the complete collection is
located together.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: convert queue stats to i40e_stats array
Jacob Keller [Mon, 20 Aug 2018 15:12:23 +0000 (08:12 -0700)]
i40e: convert queue stats to i40e_stats array

Use an i40e_stats array to handle the queue stats, instead of coding
similar functionality separately. Because of how the queue stats are
accessed on some kernels, we can't easily use i40e_add_ethtool_stats.

Instead, implement a separate helper, i40e_add_queue_stats, which we'll
use instead. This helper will correctly implement the
u64_stats_fetch_begin_irq logic and allow retries until successful. We
share the most complex code by re-using i40e_add_one_ethtool_stat.

This logic additionally easily supports skipping disabled rings by using
a ternary operator before calling the u64_stats_fetch_begin_irq()
function, so that we correctly zero-out the stats values without having
to perform two separate sections of code.

This significantly reduces the boiler plate code in
i40e_get_ethtool_stats, and helps keep the complex logic contained to as
few functions as possible.

With this patch, we've finally converted all the statistics to use the
helpers and the i40e_stats function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoxsk: include XDP meta data in AF_XDP frames
Björn Töpel [Thu, 30 Aug 2018 13:12:48 +0000 (15:12 +0200)]
xsk: include XDP meta data in AF_XDP frames

Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did
not include XDP meta data in the data buffers copied out to the user
application.

In this commit, we check if meta data is available, and if so, it is
prepended to the frame.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-bpffs-bpftool-dump-with-btf'
Daniel Borkmann [Thu, 30 Aug 2018 12:03:54 +0000 (14:03 +0200)]
Merge branch 'bpf-bpffs-bpftool-dump-with-btf'

Yonghong Song says:

====================
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the
basic arraymap") and Commit 699c86d6ec21 ("bpf: btf: add pretty print
for hash/lru_hash maps") added bpffs pretty print for array, hash and
lru hash maps. The pretty print gives users a structurally formatted
dump for keys/values which much easy to understand than raw bytes.

This patch set implemented bpffs pretty print support for
percpu arraymap, percpu hashmap and percpu lru hashmap.
For complex key/value types, the pretty print here is even more useful
due to:

  . large volumne of data making it even harder to correlate bytes
    to a particular field in a particular cpu.
  . kernel rounds the value size for each cpu to multiple of 8.
    User has to be aware of this otherwise wrong value may be
    derived from cpu 1/2/...

For example, we may have a bpffs pretty print like below:
   43602: {
        cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
        cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
        cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
        cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
   }
for a percpu map.

This patch also added percpu formatted print on bpftool. For example,
bpftool may print like below:
    {
        "key": 0,
        "values": [{
                "cpu": 0,
                "value": {
                    "ui32": 0,
                    "ui16": 0,
                }
            },{
                "cpu": 1,
                "value": {
                    "ui32": 1,
                    "ui16": 0,
                }
            },{
                "cpu": 2,
                "value": {
                    "ui32": 2,
                    "ui16": 0,
                }
            },{
                "cpu": 3,
                "value": {
                    "ui32": 3,
                    "ui16": 0,
                }
            }
        ]
    }

Patch #1 implemented bpffs pretty print for percpu arraymap/hash/lru_hash
in kernel. Patch #2 added the test case in tools bpf selftest test_btf.
Patch #3 added percpu map btf based dump.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: bpftool: add btf percpu map formated dump
Yonghong Song [Wed, 29 Aug 2018 21:43:15 +0000 (14:43 -0700)]
tools/bpf: bpftool: add btf percpu map formated dump

The btf pretty print is added to percpu arraymap,
percpu hashmap and percpu lru hashmap.
For each <key, value> pair, the following will be
added to plain/json output:

   {
       "key": <pretty_print_key>,
       "values": [{
             "cpu": 0,
             "value": <pretty_print_value_on_cpu0>
          },{
             "cpu": 1,
             "value": <pretty_print_value_on_cpu1>
          },{
          ....
          },{
             "cpu": n,
             "value": <pretty_print_value_on_cpun>
          }
       ]
   }

For example, the following could be part of plain or json formatted
output:
    {
        "key": 0,
        "values": [{
                "cpu": 0,
                "value": {
                    "ui32": 0,
                    "ui16": 0,
                }
            },{
                "cpu": 1,
                "value": {
                    "ui32": 1,
                    "ui16": 0,
                }
            },{
                "cpu": 2,
                "value": {
                    "ui32": 2,
                    "ui16": 0,
                }
            },{
                "cpu": 3,
                "value": {
                    "ui32": 3,
                    "ui16": 0,
                }
            }
        ]
    }

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: add bpffs percpu map pretty print tests in test_btf
Yonghong Song [Wed, 29 Aug 2018 21:43:14 +0000 (14:43 -0700)]
tools/bpf: add bpffs percpu map pretty print tests in test_btf

The bpf selftest test_btf is extended to test bpffs
percpu map pretty print for percpu array, percpu hash and
percpu lru hash.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add bpffs pretty print for percpu arraymap/hash/lru_hash
Yonghong Song [Wed, 29 Aug 2018 21:43:13 +0000 (14:43 -0700)]
bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash

Added bpffs pretty print for percpu arraymap, percpu hashmap
and percpu lru hashmap.

For each map <key, value> pair, the format is:
   <key_value>: {
cpu0: <value_on_cpu0>
cpu1: <value_on_cpu1>
...
cpun: <value_on_cpun>
   }

For example, on my VM, there are 4 cpus, and
for test_btf test in the next patch:
   cat /sys/fs/bpf/pprint_test_percpu_hash

You may get:
   ...
   43602: {
cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
   }
   72847: {
cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
   }
   ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge tag 'mac80211-next-for-davem-2018-08-29' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 30 Aug 2018 05:13:47 +0000 (22:13 -0700)]
Merge tag 'mac80211-next-for-davem-2018-08-29' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Only a few changes at this point:
 * new channels in 60 GHz
 * clarify (average) ACK signal reporting API
 * expose ieee80211_send_layer2_update() for all drivers
 * start/stop mac80211's TXQs properly when required
 * avoid regulatory restore with IE ignoring
 * spelling: contidion -> condition
 * fully implement WFA Multi-AP backhaul
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: reduce dirty cache line in vxlan_find_mac
Li RongQing [Wed, 29 Aug 2018 03:52:10 +0000 (11:52 +0800)]
vxlan: reduce dirty cache line in vxlan_find_mac

vxlan_find_mac() unconditionally set f->used for every packet,
this causes a cache miss for every packet, since remote, hlist
and used of vxlan_fdb share the same cache line, which are
accessed when send every packets.

so f->used is set only if not equal to jiffies, to reduce dirty
cache line times, this gives 3% speed-up with small packets.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'liquidio-improve-soft-command-response-handling'
David S. Miller [Thu, 30 Aug 2018 03:07:42 +0000 (20:07 -0700)]
Merge branch 'liquidio-improve-soft-command-response-handling'

Weilin Chang says:

====================
liquidio: improve soft command/response handling

Change soft command handling to fix the possible race condition when the
process handles a response of a soft command that was already freed by an
application which got timeout for this request.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: remove obsolete functions and data structures
Felix Manlunas [Wed, 29 Aug 2018 01:51:44 +0000 (18:51 -0700)]
liquidio: remove obsolete functions and data structures

1. Remove unused functions and data structures.
2. Change the sending of the remaining soft commands to synchronous.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: change octnic_ctrl_pkt to do synchronous soft commands
Felix Manlunas [Wed, 29 Aug 2018 01:51:40 +0000 (18:51 -0700)]
liquidio: change octnic_ctrl_pkt to do synchronous soft commands

1. Change struct octnic_ctrl_pkt to support synchronous operation.
2. Change code which use structure octnic_ctrl_pkt to send sc's
   synchronously.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: make soft command calls synchronous
Felix Manlunas [Wed, 29 Aug 2018 01:51:35 +0000 (18:51 -0700)]
liquidio: make soft command calls synchronous

1. Add wait_for_sc_completion_timeout() for waiting the response and
   handling common response errors
2. Send sc's synchronously: remove unused callback function,
   and context structure; use wait_for_sc_completion_timeout() to wait
   its response.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: improve soft command handling
Felix Manlunas [Wed, 29 Aug 2018 01:51:30 +0000 (18:51 -0700)]
liquidio: improve soft command handling

1. Set LIO_SC_MAX_TMO_MS as the maximum timeout value for a soft command
   (sc).  All sc's use this value as a hard timeout value. Add expiry_time
   in struct octeon_soft_command to keep the hard timeout value. The field
   wait_time and timeout in struct octeon_soft_command will be obsoleted in
   the last patch of this patch series.
2. Add processing a synchronous sc in sc response thread
   lio_process_ordered_list. The memory allocated for a synchronous sc will
   be freed by lio_process_ordered_list() to the sc pool.
3. Add two response lists for lio_process_ordered_list to process the
   storage allocated for sc's:
   OCTEON_DONE_SC_LIST response list keeps all sc's which will be freed to
   the pool after their requestors have finished processing the responses.
   OCTEON_ZOMBIE_SC_LIST response list keeps all sc's which have got
   LIO_SC_MAX_TMO_MS timeout.
   When an sc gets a hard timeout, lio_process_order_list() will recheck
   its status 1 ms later. If the status has not updated by the firmware at
   that time, the sc will be removed from OCTEON_DONE_SC_LIST response list
   to OCTEON_ZOMBIE_SC_LIST response list. The sc's in the
   OCTEON_ZOMBIE_SC_LIST response list will be freed when the driver is
   unloaded.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: Calculate nsg for zerocopy path without skb_cow_data.
Doron Roberts-Kedes [Tue, 28 Aug 2018 23:33:57 +0000 (16:33 -0700)]
net/tls: Calculate nsg for zerocopy path without skb_cow_data.

decrypt_skb fails if the number of sg elements required to map it
is greater than MAX_SKB_FRAGS. nsg must always be calculated, but
skb_cow_data adds unnecessary memcpy's for the zerocopy case.

The new function skb_nsg calculates the number of scatterlist elements
required to map the skb without the extra overhead of skb_cow_data.
This patch reduces memcpy by 50% on my encrypted NBD benchmarks.

Reported-by: Vakul Garg <Vakul.garg@nxp.com>
Reviewed-by: Vakul Garg <Vakul.garg@nxp.com>
Tested-by: Vakul Garg <Vakul.garg@nxp.com>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: nixge: Add support for 64-bit platforms
Moritz Fischer [Tue, 28 Aug 2018 22:16:31 +0000 (15:16 -0700)]
net: nixge: Add support for 64-bit platforms

Add support for 64-bit platforms to driver.

The hardware only supports 32-bit register accesses
so the accesses need to be split up into two writes
when setting the current and tail descriptor values.

Signed-off-by: Moritz Fischer <mdf@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests/net: add ip_defrag selftest
Peter Oskolkov [Tue, 28 Aug 2018 18:36:20 +0000 (11:36 -0700)]
selftests/net: add ip_defrag selftest

This test creates a raw IPv4 socket, fragments a largish UDP
datagram and sends the fragments out of order.

Then repeats in a loop with different message and fragment lengths.

Then does the same with overlapping fragments (with overlapping
fragments the expectation is that the recv times out).

Tested:

root@<host># time ./ip_defrag.sh
ipv4 defrag
PASS
ipv4 defrag with overlaps
PASS

real    1m7.679s
user    0m0.628s
sys     0m2.242s

A similar test for IPv6 is to follow.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoip: fail fast on IP defrag errors
Peter Oskolkov [Tue, 28 Aug 2018 18:36:19 +0000 (11:36 -0700)]
ip: fail fast on IP defrag errors

The current behavior of IP defragmentation is inconsistent:
- some overlapping/wrong length fragments are dropped without
  affecting the queue;
- most overlapping fragments cause the whole frag queue to be dropped.

This patch brings consistency: if a bad fragment is detected,
the whole frag queue is dropped. Two major benefits:
- fail fast: corrupted frag queues are cleared immediately, instead of
  by timeout;
- testing of overlapping fragments is now much easier: any kind of
  random fragment length mutation now leads to the frag queue being
  discarded (IP packet dropped); before this patch, some overlaps were
  "corrected", with tests not seeing expected packet drops.

Note that in one case (see "if (end&7)" conditional) the current
behavior is preserved as there are concerns that this could be
legitimate padding.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: fix race condition in instruction completion processing
Rick Farrington [Tue, 28 Aug 2018 18:32:55 +0000 (11:32 -0700)]
liquidio: fix race condition in instruction completion processing

In lio_enable_irq, the pkt_in_done count register was being cleared to
zero.  However, there could be some completed instructions which were not
yet processed due to budget and limit constraints.
So, only write this register with the number of actual completions
that were processed.

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: remove unnecessary delay when processing IQ responses
Rick Farrington [Tue, 28 Aug 2018 18:19:54 +0000 (11:19 -0700)]
liquidio: remove unnecessary delay when processing IQ responses

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ethtool-drop-get_settings-and-set_settings-ops'
David S. Miller [Thu, 30 Aug 2018 02:46:11 +0000 (19:46 -0700)]
Merge branch 'ethtool-drop-get_settings-and-set_settings-ops'

Michal Kubecek says:

====================
ethtool: drop get_settings and set_settings ops

As Andrew Lunn pointed out in recent discussion, there is only one in tree
driver left which still defines deprecated callbacks get_settings() and
set_settings() in ethtool_ops. First patch converts this driver to
get_link_ksettings() and set_link_ksettings(). Second patch then removes
the deprecated callbacks from struct ethtool_ops and ethtool code which
falls back to them.

This doesn't break old versions of ethtool or any other userspace code
using ETHTOOL_{G,S}SET. We still implement both (old) ETHTOOL_{G,S}SET and
(new) ETHTOOL_{G,S}LINKSETTINGS ioctl commands but after this series both
will be implemented only using {g,s}et_link_ksettings(). The only affected
code would be out of tree NIC drivers which have not been converted yet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoethtool: drop get_settings and set_settings callbacks
Michal Kubecek [Tue, 28 Aug 2018 17:56:58 +0000 (19:56 +0200)]
ethtool: drop get_settings and set_settings callbacks

Since [gs]et_settings ethtool_ops callbacks have been deprecated in
February 2016, all in tree NIC drivers have been converted to provide
[gs]et_link_ksettings() and out of tree drivers have had enough time to do
the same.

Drop get_settings() and set_settings() and implement both ETHTOOL_[GS]SET
and ETHTOOL_[GS]LINKSETTINGS only using [gs]et_link_ksettings().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years ago8390/etherh: convert to ethtool_{get, set}_link_ksettings
Michal Kubecek [Tue, 28 Aug 2018 17:56:53 +0000 (19:56 +0200)]
8390/etherh: convert to ethtool_{get, set}_link_ksettings

This is the last in-tree driver using the old {get,set}_settings API.

Note: this is only build tested. I don't have the hardware at hand; as it's
10Mb/s half duplex device and driver can be built only for one subplatform
of 32-bit ARM (Acorn RiscPC), it may be difficult to find someone who does.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: thunderbolt: Convert to use SPDX identifier
Mika Westerberg [Tue, 28 Aug 2018 16:58:43 +0000 (19:58 +0300)]
net: thunderbolt: Convert to use SPDX identifier

This gets rid of the licence boilerblate in favor of SPDX identifier
which only takes a single line comment.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogenetlink: constify genl_err_attr() argument
Michal Kubecek [Tue, 28 Aug 2018 16:51:58 +0000 (18:51 +0200)]
genetlink: constify genl_err_attr() argument

genl_err_attr() sets netlink_ext_ack::bad_attr which is a pointer to const
struct nlattr so make the attr argument also const.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: Convert to using %pOFn instead of device_node.name
Rob Herring [Tue, 28 Aug 2018 15:44:30 +0000 (10:44 -0500)]
net: ethernet: Convert to using %pOFn instead of device_node.name

In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Cc: John Crispin <john@phrozen.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Nelson Chang <nelson.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Acked-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/ncsi: remove duplicated include from ncsi-netlink.c
YueHaibing [Thu, 30 Aug 2018 01:29:39 +0000 (01:29 +0000)]
net/ncsi: remove duplicated include from ncsi-netlink.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'verifier-liveness-simplification'
Alexei Starovoitov [Thu, 30 Aug 2018 01:52:13 +0000 (18:52 -0700)]
Merge branch 'verifier-liveness-simplification'

Edward Cree says:

====================
The first patch is a simplification of register liveness tracking by using
 a separate parentage chain for each register and stack slot, thus avoiding
 the need for logic to handle callee-saved registers when applying read
 marks.  In the future this idea may be extended to form use-def chains.
The second patch adds information about misc/zero data on the stack to the
 state dumps emitted to the log at various points; this information was
 found essential in debugging the first patch, and may be useful elsewhere.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf/verifier: display non-spill stack slot types in print_verifier_state
Edward Cree [Wed, 22 Aug 2018 19:02:44 +0000 (20:02 +0100)]
bpf/verifier: display non-spill stack slot types in print_verifier_state

If a stack slot does not hold a spilled register (STACK_SPILL), then each
 of its eight bytes could potentially have a different slot_type.  This
 information can be important for debugging, and previously we either did
 not print anything for the stack slot, or just printed fp-X=0 in the case
 where its first byte was STACK_ZERO.
Instead, print eight characters with either 0 (STACK_ZERO), m (STACK_MISC)
 or ? (STACK_INVALID) for any stack slot which is neither STACK_SPILL nor
 entirely STACK_INVALID.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf/verifier: per-register parent pointers
Edward Cree [Wed, 22 Aug 2018 19:02:19 +0000 (20:02 +0100)]
bpf/verifier: per-register parent pointers

By giving each register its own liveness chain, we elide the skip_callee()
 logic.  Instead, each register's parent is the state it inherits from;
 both check_func_call() and prepare_func_exit() automatically connect
 reg states to the correct chain since when they copy the reg state across
 (r1-r5 into the callee as args, and r0 out as the return value) they also
 copy the parent pointer.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoMerge branch 'AF_XDP-zerocopy-for-i40e'
Alexei Starovoitov [Wed, 29 Aug 2018 19:26:39 +0000 (12:26 -0700)]
Merge branch 'AF_XDP-zerocopy-for-i40e'

Björn Töpel says:

====================
This patch set introduces zero-copy AF_XDP support for Intel's i40e
driver. In the first preparatory patch we also add support for
XDP_REDIRECT for zero-copy allocated frames so that XDP programs can
redirect them. This was a ToDo from the first AF_XDP zero-copy patch
set from early June. Special thanks to Alex Duyck and Jesper Dangaard
Brouer for reviewing earlier versions of this patch set.

The i40e zero-copy code is located in its own file i40e_xsk.[ch]. Note
that in the interest of time, to get an AF_XDP zero-copy implementation
out there for people to try, some code paths have been copied from the
XDP path to the zero-copy path. It is out goal to merge the two paths
in later patch sets.

In contrast to the implementation from beginning of June, this patch
set does not require any extra HW queues for AF_XDP zero-copy
TX. Instead, the XDP TX HW queue is used for both XDP_REDIRECT and
AF_XDP zero-copy TX.

Jeff, given that most of changes are in i40e, it is up to you how you
would like to route these patches. The set is tagged bpf-next, but
if taking it via the Intel driver tree is easier, let us know.

We have run some benchmarks on a dual socket system with two Broadwell
E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
cores which gives a total of 28, but only two cores are used in these
experiments. One for TR/RX and one for the user space application. The
memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
8192MB and with 8 of those DIMMs in the system we have 64 GB of total
memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The
NIC is Intel I40E 40Gbit/s using the i40e driver.

Below are the results in Mpps of the I40E NIC benchmark runs for 64
and 1500 byte packets, generated by a commercial packet generator HW
outputing packets at full 40 Gbit/s line rate. The results are with
retpoline and all other spectre and meltdown fixes, so these results
are not comparable to the ones from the zero-copy patch set in June.

AF_XDP performance 64 byte packets.
Benchmark   XDP_SKB    XDP_DRV    XDP_DRV with zerocopy
rxdrop       2.6        8.2         15.0
txpush       2.2        -           21.9
l2fwd        1.7        2.3         11.3

AF_XDP performance 1500 byte packets:
Benchmark   XDP_SKB   XDP_DRV     XDP_DRV with zerocopy
rxdrop       2.0        3.3         3.3
l2fwd        1.3        1.7         3.1

XDP performance on our system as a base line:

64 byte packets:
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      16      18.4M  0

1500 byte packets:
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      16      3.3M    0

The structure of the patch set is as follows:

Patch 1: Add support for XDP_REDIRECT of zero-copy allocated frames
Patches 2-4: Preparatory patches to common xsk and net code
Patches 5-7: Preparatory patches to i40e driver code for RX
Patch 8: i40e zero-copy support for RX
Patch 9: Preparatory patch to i40e driver code for TX
Patch 10: i40e zero-copy support for TX
Patch 11: Add flags to sample application to force zero-copy/copy mode
====================

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agosamples/bpf: add -c/--copy -z/--zero-copy flags to xdpsock
Björn Töpel [Tue, 28 Aug 2018 12:44:35 +0000 (14:44 +0200)]
samples/bpf: add -c/--copy -z/--zero-copy flags to xdpsock

The -c/--copy -z/--zero-copy flags enforces either copy or zero-copy
mode.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoi40e: add AF_XDP zero-copy Tx support
Magnus Karlsson [Tue, 28 Aug 2018 12:44:34 +0000 (14:44 +0200)]
i40e: add AF_XDP zero-copy Tx support

This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
i40e_xsk.c.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoi40e: move common Tx functions to i40e_txrx_common.h
Magnus Karlsson [Tue, 28 Aug 2018 12:44:33 +0000 (14:44 +0200)]
i40e: move common Tx functions to i40e_txrx_common.h

This patch prepares for the upcoming zero-copy Tx functionality, by
moving common functions and refactor chunks of code into re-usable
functions, used both by the regular path and zero-copy path.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoi40e: add AF_XDP zero-copy Rx support
Björn Töpel [Tue, 28 Aug 2018 12:44:32 +0000 (14:44 +0200)]
i40e: add AF_XDP zero-copy Rx support

This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, i40e_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoi40e: move common Rx functions to i40e_txrx_common.h
Björn Töpel [Tue, 28 Aug 2018 12:44:31 +0000 (14:44 +0200)]
i40e: move common Rx functions to i40e_txrx_common.h

This patch prepares for the upcoming zero-copy Rx functionality, by
moving/changing linkage of common functions, used both by the regular
path and zero-copy path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoi40e: refactor Rx path for re-use
Björn Töpel [Tue, 28 Aug 2018 12:44:30 +0000 (14:44 +0200)]
i40e: refactor Rx path for re-use

In this commit, the Rx path is refactored some, as a step torwards the
introduction AF_XDP Rx zero-copy.

The page re-use counter is moved into the i40e_reuse_rx_page, instead
of bumping the counter in many places. The Rx buffer page clearing is
moved for better readability. Lastely, functions to update statistics
and bump the XDP Tx ring are introduced.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoi40e: added queue pair disable/enable functions
Björn Töpel [Tue, 28 Aug 2018 12:44:29 +0000 (14:44 +0200)]
i40e: added queue pair disable/enable functions

Add functions for queue pair enable/disable. Instead of resetting the
whole device, only the affected queue pair is disabled or enabled.

This plumbing is used in a later commit, when zero-copy AF_XDP support
is introduced.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agonet: add napi_if_scheduled_mark_missed
Magnus Karlsson [Tue, 28 Aug 2018 12:44:28 +0000 (14:44 +0200)]
net: add napi_if_scheduled_mark_missed

The function napi_if_scheduled_mark_missed is used to check if the
NAPI context is scheduled, if so set NAPIF_STATE_MISSED and return
true. Used by the AF_XDP zero-copy i40e Tx code implementation in
order to make sure that irq affinity is honored by the napi context.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoxsk: expose xdp_umem_get_{data,dma} to drivers
Björn Töpel [Tue, 28 Aug 2018 12:44:27 +0000 (14:44 +0200)]
xsk: expose xdp_umem_get_{data,dma} to drivers

Move the xdp_umem_get_{data,dma} functions to include/net/xdp_sock.h,
so that the upcoming zero-copy implementation in the Ethernet drivers
can utilize them.

Also, supply some dummy function implementations for
CONFIG_XDP_SOCKETS=n configs.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoxdp: export xdp_rxq_info_unreg_mem_model
Björn Töpel [Tue, 28 Aug 2018 12:44:26 +0000 (14:44 +0200)]
xdp: export xdp_rxq_info_unreg_mem_model

Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model,
so it can be used from netdev drivers. Also, add additional checks for
the memory type.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoxdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPY
Björn Töpel [Tue, 28 Aug 2018 12:44:25 +0000 (14:44 +0200)]
xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPY

This commit adds proper MEM_TYPE_ZERO_COPY support for
convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an
xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a
MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might
make sense to implement a more sophisticated thread-safe alloc/free
scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is
required in the fast-path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: use --cgroup in test_suite if supplied
John Fastabend [Tue, 28 Aug 2018 16:10:50 +0000 (09:10 -0700)]
bpf: use --cgroup in test_suite if supplied

If the user supplies a --cgroup value in the arguments when running
the test_suite go ahaead and run the self tests there. I use this
to test with multiple cgroup users.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: sockmap test remove shutdown() calls
John Fastabend [Tue, 28 Aug 2018 16:10:45 +0000 (09:10 -0700)]
bpf: sockmap test remove shutdown() calls

Currently, we do a shutdown(sk, SHUT_RDWR) on both peer sockets and
a shutdown on the sender as well. However, this is incorrect and can
occasionally cause issues if you happen to have bad timing. First
peer1 or peer2 may still be in use depending on the test and timing.
Second we really should only be closing the read side and/or write
side depending on if the test is receiving or sending.

But, really none of this is needed just remove the shutdown calls.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: remove duplicated include from syscall.c
YueHaibing [Tue, 28 Aug 2018 07:42:32 +0000 (07:42 +0000)]
bpf: remove duplicated include from syscall.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agocfg80211: clarify frames covered by average ACK signal report
Balaji Pothunoori [Thu, 19 Jul 2018 13:26:27 +0000 (18:56 +0530)]
cfg80211: clarify frames covered by average ACK signal report

Modify the API to include all ACK frames in average ACK
signal strength reporting, not just ACKs for data frames.
Make exposing the data conditional on implementing the
extended feature flag.

This is how it was really implemented in mac80211, update
the code there to use the new defines and clean up some of
the setting code.

Keep nl80211.h source compatibility by keeping the old names.

Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
[rewrite commit log, change compatibility to be old=new
 instead of the other way around, update kernel-doc,
 roll in mac80211 changes, make mac80211 depend on valid
 bit instead of HW flag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agomac80211: add missing WFA Multi-AP backhaul STA Rx requirement
Sathishkumar Muruganandam [Tue, 28 Aug 2018 12:21:38 +0000 (17:51 +0530)]
mac80211: add missing WFA Multi-AP backhaul STA Rx requirement

The current mac80211 WDS (4-address mode) can be used to cover most of the
Multi-AP requirements for Data frames per the WFA Multi-AP Specification v1.0.
When configuring AP/STA interfaces in 4-address mode, they are able to function
as fronthaul AP/backhaul STA of Multi-AP device complying below
Tx, Rx requirements except one missing STA Rx requirement added by this patch.

Multi-AP specification section 14.1 describes the following requirements:

Transmitter requirements
------------------------
1. Fronthaul AP
        i) When DA!=RA of backhaul STA, must use 4-address format
        ii) When DA==RA of backhaul STA, shall use either 3-address
            or 4-address format with RA updated with STA MAC

            (mac80211 support 4-address format via AP/VLAN interface)

2. Backhaul STA
        i) When SA!=TA of backhaul STA, must use 4-address format
        ii) When SA==TA of backhaul STA, shall use either 3-address
            or 4-address format with RA updated with AP MAC

            (mac80211 support 4-address format via use_4addr)

Receiver requirements
---------------------
1. Fronthaul AP
        i) When SA!=TA of backhaul STA, must support receiving 4-address
           format frames
        ii) When SA==TA of backhaul STA, must support receiving both
            3-address and 4-address format frames

            (mac80211 support both 3-addr & 4-addr via AP/VLAN interface)

2. Backhaul STA
        i) When DA!=RA of backhaul STA, must support receiving 4-address
           format frames
        ii) When DA==RA of backhaul STA,  must support receiving both
            3-address and 4-address format frames

            (mac80211 support only receiving 4-address format via
             use_4addr)

This patch addresses the above Rx requirement (ii) for backhaul STA to receive
unicast (DA==RA) 3-address frames in addition to 4-address frames.

The current design doesn't accept 3-address frames when configured in 4-address
mode (use_4addr). Hence add a check to allow 3-address frames when DA==RA of
backhaul STA (adhering to Table 9-26 of IEEE Std 802.11™-2016).

This case was tested with a bridged station interface when associated with
a non-mac80211 based vendor AP implementation using 3-address frames for WDS.

STA was able to support the Multi-AP Rx requirement when DA==RA. No issues,
no loops seen when tested with mac80211 based AP as well.

Verified and confirmed all other Tx and Rx requirements of AP and STA for
Multi-AP respectively. They all work using the current mac80211-WDS design.

Signed-off-by: Sathishkumar Muruganandam <murugana@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agoMerge branch 'nfp-add-NFP5000-support'
David S. Miller [Tue, 28 Aug 2018 23:01:48 +0000 (16:01 -0700)]
Merge branch 'nfp-add-NFP5000-support'

Jakub Kicinski says:

====================
nfp: add NFP5000 support

This series broadly speaking adds support for NFP5000 and
related products.

First we add support for loading FW from flash.  We need to allow
for the management processor to provide extended log messages when
FW is loaded.  This is needed when FW selection policy is to compare
the FW on the disk and in the flash, and load the newer.  User should
be told what FW was selected.

We use this opportunity to add extended errors for normal FW loading
as well.

Next we add support for requesting HW information from the management
processor.  Up until now the driver read the HWinfo as it appears in
card memory, but there can be cases when management processor has
additional information or generates the entries dynamically so
occasionally we will have to consult it.  We use this to look up MAC
addresses for PCIe netdevs.

Next the actual patch with NFP5000 support and a small dose of
refactoring of PCIe init.

The remaining patches add support for reading RTsymbol types we
didn't need before.  Ones explicitly placed in external memory unit's
cache and absolute ones.

This part begins with a patch moving the logic which figures out
the correct bit offsets to device probe, to avoid redoing the
calculation for each access.  Second patch adds error messages
for easier troubleshooting.  Next patch adds helpers which will
take care of address conversions to reach into EMU cache.
Subsequently users are migrated from the raw CPP API to the new RTsym
helpers.  Finally we add support for reading absolute symbols.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: make RTsym users handle absolute symbols correctly
Jakub Kicinski [Tue, 28 Aug 2018 20:20:47 +0000 (13:20 -0700)]
nfp: make RTsym users handle absolute symbols correctly

Make the RTsym users access the size via the helper, which
takes care of special handling of absolute symbols.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: support access to absolute RTsyms
Jakub Kicinski [Tue, 28 Aug 2018 20:20:46 +0000 (13:20 -0700)]
nfp: support access to absolute RTsyms

Add support in nfpcore for reading the absolute RTsyms.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: convert all RTsym users to use new read/write helpers
Jakub Kicinski [Tue, 28 Aug 2018 20:20:45 +0000 (13:20 -0700)]
nfp: convert all RTsym users to use new read/write helpers

Convert all users of RTsym to the new set of helpers which
handle all targets correctly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: convert existing RTsym helpers to full target decoding
Jakub Kicinski [Tue, 28 Aug 2018 20:20:44 +0000 (13:20 -0700)]
nfp: convert existing RTsym helpers to full target decoding

Make nfp_rtsym_{read,write}_le() and nfp_rtsym_map() use the new
target resolution helpers to allow accessing in-cache symbols.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: pass cpp_id to nfp_cpp_map_area()
Jakub Kicinski [Tue, 28 Aug 2018 20:20:43 +0000 (13:20 -0700)]
nfp: pass cpp_id to nfp_cpp_map_area()

Align nfp_cpp_map_area() with other CPP-level APIs and pass
encoded cpp_id/dest rather than target, action, domain tuple.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: add RTsym access helpers
Jakub Kicinski [Tue, 28 Aug 2018 20:20:42 +0000 (13:20 -0700)]
nfp: add RTsym access helpers

RTsyms may have special encodings for more complex symbol types.
For example symbols which are placed in external memory unit's
cache directly, constants or local memory.  Add set of helpers
which will check for those special encodings and handle them
correctly.

For now only add direct cache accesses, we don't have a need to
access the other ones in foreseeable future.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: add basic errors messages to target logic
Jakub Kicinski [Tue, 28 Aug 2018 20:20:41 +0000 (13:20 -0700)]
nfp: add basic errors messages to target logic

Add error prints to CPP target encoding/decoding logic, otherwise
it's quite hard to pin point the reasons why read or write
operations fail.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: save the MU locality field offset
Jakub Kicinski [Tue, 28 Aug 2018 20:20:40 +0000 (13:20 -0700)]
nfp: save the MU locality field offset

We will soon need the MU locality field offset much more
often than just for decoding MIP address.  Save it in nfp_cpp
for quick access.  Note that we can already reuse the target
config from nfp_cpp, no need to do the XPB read.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: refactor the per-chip PCIe config
Jakub Kicinski [Tue, 28 Aug 2018 20:20:39 +0000 (13:20 -0700)]
nfp: refactor the per-chip PCIe config

Use a switch statement instead of ifs for code dependent
on chip version.  While at it make sure we fail for unknown
chip revisions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: add support for NFP5000
Jakub Kicinski [Tue, 28 Aug 2018 20:20:38 +0000 (13:20 -0700)]
nfp: add support for NFP5000

Add NFP5000 to supported chips, the chip is backward compatible
with NFP4000 and NFP6000, so core PCIe code needs to handle it
the same way as 4k and 6k.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: abm: look up MAC addresses via management FW
Jakub Kicinski [Tue, 28 Aug 2018 20:20:37 +0000 (13:20 -0700)]
nfp: abm: look up MAC addresses via management FW

In multi-host scenarios Management FW may allocate MAC addresses
at runtime, we have to use the indirect lookup to find them.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: add support for indirect HWinfo lookup
Jakub Kicinski [Tue, 28 Aug 2018 20:20:36 +0000 (13:20 -0700)]
nfp: add support for indirect HWinfo lookup

Management FW can adjust some of the information in the HWinfo table
at runtime.  In some cases reading the table directly will not yield
correct results.  Add a NSP command for looking up information.
Up until now we weren't making use of any of the values which may
get adjusted.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: interpret extended FW load result codes
Jakub Kicinski [Tue, 28 Aug 2018 20:20:35 +0000 (13:20 -0700)]
nfp: interpret extended FW load result codes

To enable easier FW distribution NFP can now automatically
select between FW stored on the flash and loaded from the
kernel.

If FW loading policy is set to auto it will compare the
versions of FW from the host and from the flash and load
the newer one.  If FW type doesn't match (e.g. one advanced
application vs another) the FW from the host takes precedence,
unless one of them is the basic NIC firmware, in which case
the non-basic-NIC FW is selected.

This automatic selection mechanism requires we inform user
what the verdict was.  Print a message to the logs explaining
the decision and the reason.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: attempt FW load from flash
Jakub Kicinski [Tue, 28 Aug 2018 20:20:34 +0000 (13:20 -0700)]
nfp: attempt FW load from flash

Flash may contain a default NFP application FW.  This application
can either be put there by the user (with ethtool -f) or shipped
with the card.  If file system FW is not found, attempt to load
this flash stored app FW.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: encapsulate NSP command arguments into structs
Jakub Kicinski [Tue, 28 Aug 2018 20:20:33 +0000 (13:20 -0700)]
nfp: encapsulate NSP command arguments into structs

There is already a fair number of arguments to nfp_nsp_command()
family of functions.  Encapsulate them into structures to make
adding new ones easier.  No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 28 Aug 2018 22:57:25 +0000 (15:57 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-08-28

This series contains new features and implementation updates for the
ice driver.

Anirudh reworks the current flex programming logic to add support for
a second flex descriptor profile.  Updated the transmit scheduler
code to handle changes to the spec, specifically the firmware expects
a 4KB buffer at all times so fix the default scheduler topology buffer
size.  Also the maximum children per node per layer is replaced by
maximum sibling group size.  Adds a check to ensure a reset is not in
progress before exercising a control queue operation.  Refactored the
switch rule management functions and structures to simply the logic and
to add a common function to search for a rule entry and add a new rule
entry.  Refactored the VSI allocation, deletion and rebuild flow so that
on reset we can restore all the filters that were previously added.  Did
some spring cleaning of define names and macros.

Dan updates the admin queue command for requesting resource ownership
to the latest specification by adding new enum's and change the locks.

Zhenning optimizes the driver by using the existing buffer in a
structure directly versus a local array.

Chinh implements handlers for ethtool for get and set link settings.

Sudheer implements transmit hang/timeout detection and malicious driver
detection in the driver.

Md Fahad Iqbal implements the get and set bridge mode operations.

Hieu adds the ability for firmware logging during initialization.

Brett updates the driver to only enable VSI transmit and receive pruning
when VLAN 0 is active, and when VLAN 0 is removed/not active, pruning is
disabled.

Akeem adds a flag to use for stopping the service task.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 28 Aug 2018 22:56:37 +0000 (15:56 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2018-08-28

This series contains updates to ixgbe and ixgbevf only.

Sebastian adds support for firmware NVM recovery mode, which logs a
message when errors are detected and un-registers the device.  Also
fixed RSS type recognition with VF to VF communication.

Shannon Nelson implements IPsec hardware offload for VF devices in
Intel's 10GbE x540 family of Ethernet devices.

The IPsec HW offload feature has been in the x540/Niantic family of
network devices since their release in 2009, but there was no Linux
kernel support for the offload until 2017.  After the XFRM code added
support for the offload last year, the HW offload was added to the ixgbe
PF driver.

Since the related x540 VF device uses same setup as the PF for implementing
the offload, adding the feature to the ixgbevf seemed like a good idea.
In this case, the PF owns the device registers, so the VF simply packages
up the request information into a VF<->PF message and the PF does the
device configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoixgbe: fix the return value for unsupported VF offload
Shannon Nelson [Wed, 22 Aug 2018 23:47:15 +0000 (16:47 -0700)]
ixgbe: fix the return value for unsupported VF offload

When failing the request because we can't support that offload,
reporting EOPNOTSUPP makes much more sense than ENXIO.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoixgbe: disallow IPsec Tx offload when in SR-IOV mode
Shannon Nelson [Wed, 22 Aug 2018 23:47:14 +0000 (16:47 -0700)]
ixgbe: disallow IPsec Tx offload when in SR-IOV mode

There seems to be a problem in the x540's internal switch wherein if SR-IOV
mode is enabled and an offloaded IPsec packet is sent to a local VF,
the packet is silently dropped.  This might never be a problem as it is
somewhat a corner case, but if someone happens to be using IPsec offload
from the PF to a VF that just happens to get migrated to the local box,
communication will mysteriously fail.

Not good.

A simple way to protect from this is to simply not allow any IPsec offloads
for outgoing packets when num_vfs != 0.  This doesn't help any offloads that
were created before SR-IOV was enabled, but we'll get to that later.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoixgbevf: enable VF IPsec offload operations
Shannon Nelson [Mon, 13 Aug 2018 18:43:45 +0000 (11:43 -0700)]
ixgbevf: enable VF IPsec offload operations

Add the IPsec initialization into the driver startup and
add the Rx and Tx processing hooks.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoixgbevf: add VF IPsec offload code
Shannon Nelson [Mon, 13 Aug 2018 18:43:44 +0000 (11:43 -0700)]
ixgbevf: add VF IPsec offload code

Add the IPsec offload support code.  This is based off of the similar
code in ixgbe, but instead of writing the SA registers, the VF asks
the PF to setup the offload by sending the offload information to the
PF via the standard mailbox.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>