OSDN Git Service

uclinux-h8/linux.git
4 years agoMerge tag 'mlx5-updates-2020-05-22' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 23 May 2020 23:37:00 +0000 (16:37 -0700)]
Merge tag 'mlx5-updates-2020-05-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-05-22

This series includes two updates and one cleanup patch

1) Tang Bim, clean-up with IS_ERR() usage

2) Vlad introduces a new mlx5 kconfig flag for TC support

   This is required due to the high volume of current and upcoming
   development in the eswitch and representors areas where some of the
   feature are TC based such as the downstream patches of MPLSoUDP and
   the following representor bonding support for VF live migration and
   uplink representor dynamic loading.
   For this Vlad kept TC specific code in tc.c and rep/tc.c and
   organized non TC code in representors specific files.

3) Eli Cohen adds support for MPLS over UPD encap and decap TC offloads.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: psample: fix build error when CONFIG_INET is not enabled
Randy Dunlap [Fri, 22 May 2020 20:05:26 +0000 (13:05 -0700)]
net: psample: fix build error when CONFIG_INET is not enabled

Fix psample build error when CONFIG_INET is not set/enabled by
bracketing the tunnel code in #ifdef CONFIG_NET / #endif.

../net/psample/psample.c: In function ‘__psample_ip_tun_to_nlattr’:
../net/psample/psample.c:216:25: error: implicit declaration of function ‘ip_tunnel_info_opts’; did you mean ‘ip_tunnel_info_opts_set’? [-Werror=implicit-function-declaration]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Yotam Gigi <yotam.gi@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: at803x: fix PHY ID masks
Michael Walle [Fri, 22 May 2020 09:53:31 +0000 (11:53 +0200)]
net: phy: at803x: fix PHY ID masks

Ever since its first commit 0ca7111a38f05 ("phy: add AT803x driver") the
PHY ID mask was set to 0xffffffef. It is unclear to me why this mask was
chosen in the first place. Both the AR8031/AR8033 and the AR8035
datasheets mention it is always the given value:
 - for AR8031/AR8033 its 0x004d/0xd074
 - for AR8035 its 0x004d/0xd072

Unfortunately, I don't have a datasheet for the AR8030. Therefore, we
leave its PHY ID mask untouched. For the PHYs mentioned before use the
handy PHY_ID_MATCH_EXACT() macro.

I've tried to contact the author of the initial commit, but received no
answer so far.

Cc: Matus Ujhelyi <ujhelyi.m@gmail.com>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Sat, 23 May 2020 01:30:34 +0000 (18:30 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-05-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 8 day(s) which contain
a total of 109 files changed, 2776 insertions(+), 2887 deletions(-).

The main changes are:

1) Add a new AF_XDP buffer allocation API to the core in order to help
   lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe
   as well as mlx5 have been moved over to the new API and also gained a
   small improvement in performance, from Björn Töpel and Magnus Karlsson.

2) Add getpeername()/getsockname() attach types for BPF sock_addr programs
   in order to allow for e.g. reverse translation of load-balancer backend
   to service address/port tuple from a connected peer, from Daniel Borkmann.

3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers
   being non-NULL, e.g. if after an initial test another non-NULL test on
   that pointer follows in a given path, then it can be pruned right away,
   from John Fastabend.

4) Larger rework of BPF sockmap selftests to make output easier to understand
   and to reduce overall runtime as well as adding new BPF kTLS selftests
   that run in combination with sockmap, also from John Fastabend.

5) Batch of misc updates to BPF selftests including fixing up test_align
   to match verifier output again and moving it under test_progs, allowing
   bpf_iter selftest to compile on machines with older vmlinux.h, and
   updating config options for lirc and v6 segment routing helpers, from
   Stanislav Fomichev, Andrii Nakryiko and Alan Maguire.

6) Conversion of BPF tracing samples outdated internal BPF loader to use
   libbpf API instead, from Daniel T. Lee.

7) Follow-up to BPF kernel test infrastructure in order to fix a flake in
   the XDP selftests, from Jesper Dangaard Brouer.

8) Minor improvements to libbpf's internal hashmap implementation, from
   Ian Rogers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/mlx5e: Support pedit on mpls over UDP decap
Eli Cohen [Wed, 8 Apr 2020 06:01:33 +0000 (09:01 +0300)]
net/mlx5e: Support pedit on mpls over UDP decap

Allow to modify ethernet headers while decapsulating mpls over UDP
packets. This is implemented using the same reformat object used for
decapsulation.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Add support for hw decapsulation of MPLS over UDP
Eli Cohen [Mon, 3 Feb 2020 11:44:14 +0000 (13:44 +0200)]
net/mlx5e: Add support for hw decapsulation of MPLS over UDP

MPLS over UDP is supported in hardware by using a packet reformat object
with reformat type equal L3_TUNNEL_TO_L2 which both decapsulates the
outer L3, L4 and MPLS headers, and allows for setting the L2 headers of
the resulting decapsulated packet. For the hardware to operate
correctly, the configuration of the firmware must have
FLEX_PARSER_PROFILE_ENABLE = 1.

Example tc rule:
  tc filter add dev bareudp0 protocol all prio 1 root flower enc_dst_port \
      6635 enc_src_ip 8.8.8.23 action mpls pop protocol ip pipe \
      action pedit ex munge eth dst set 00:11:22:33:44:21 pipe action \
      mirred egress redirect dev enp59s0f0_0

We use pedit to set the correct destination MAC.

For MPLS over UDP decapsulation to take place, the driver logic requires
the following:

1. flower filter added on bareudp device.
2. action mpls pop
3. zero or more pedit munge actions
4. one redirect action

Current implementation supports only IPv4 and no VLAN.

tc filter show output looks like this:
   filter protocol all pref 1 flower chain 0
   filter protocol all pref 1 flower chain 0 handle 0x1
     enc_src_ip 8.8.8.24
     enc_dst_port 6635
     in_hw in_hw_count 1
            action order 1: mpls  pop protocol ip pipe
             index 2 ref 1 bind 1

            action order 2:  pedit action pipe keys 2
             index 1 ref 1 bind 1
             key #0  at eth+0: val 00112233 mask 00000000
             key #1  at eth+4: val 44210000 mask 0000ffff

            action order 3: mirred (Egress Redirect to device enp59s0f0_0) stolen
            index 2 ref 1 bind 1

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Allow to match on mpls parameters
Eli Cohen [Wed, 29 Jan 2020 14:21:16 +0000 (16:21 +0200)]
net/mlx5e: Allow to match on mpls parameters

Support matching on MPLS over UDP parameters using misc2 section of
match parameters.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Add support for hw encapsulation of MPLS over UDP
Eli Cohen [Sun, 17 Nov 2019 13:32:24 +0000 (15:32 +0200)]
net/mlx5e: Add support for hw encapsulation of MPLS over UDP

MPLS over UDP is supported by adding a rule on a representor net device
which does tunnel_key set, push mpls and forward to a baredup device. At
the hardware level we use a packet_reformat_context object to do the
encapsulation of the packet.

The resulting packet looks as follows (left side transmitted first):
outer L2 | outer IP | UDP | MPLS | inner L3 and data |

Example usage:
  tc filter add dev $rep0 protocol ip prio 1 root flower skip_sw  \
     action tunnel_key set src_ip 8.8.8.21 dst_ip 8.8.8.24 id 555 \
     dst_port 6635 tos 4 ttl 6 csum action mpls push protocol 0x8847 \
     label 555 tc 3 action mirred egress redirect dev bareudp0

This is how the filter is shown with tc filter show:
tc filter show dev enp59s0f0_0 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  skip_sw
  in_hw in_hw_count 1
        action order 1: tunnel_key  set
        src_ip 8.8.8.21
        dst_ip 8.8.8.24
        key_id 555
        dst_port 6635
        csum
        tos 0x4
        ttl 6 pipe
         index 1 ref 1 bind 1

        action order 2: mpls  push protocol mpls_uc label 555 tc 3 ttl 255 pipe
         index 1 ref 1 bind 1

        action order 3: mirred (Egress Redirect to device bareudp0) stolen
        index 1 ref 1 bind 1

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet: Add netif_is_bareudp() API to identify bareudp devices
Eli Cohen [Sun, 17 Nov 2019 12:35:42 +0000 (14:35 +0200)]
net: Add netif_is_bareudp() API to identify bareudp devices

Add netif_is_bareudp() so the device can be identified as a bareudp one.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Introduce kconfig var for TC support
Vlad Buslov [Tue, 12 May 2020 15:24:11 +0000 (18:24 +0300)]
net/mlx5e: Introduce kconfig var for TC support

In order to improve code maintainability and readability, introduce new
CONFIG_MLX5_CLS_ACT kconfig variable to control compilation of TC hardware
offloads implementation. This allows distinguishing between features that
require TC support (MPLSoUDP, etc.) and features that just rely on
representor functionality (rep_bond for live migration, etc.).

Modify rep_tc.h, rep_neigh.h, en_tc.h and chains.h files to provide stubs
for functions that are called from generic code.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Move TC-specific code from en_main.c to en_tc.c
Vlad Buslov [Tue, 12 May 2020 15:08:41 +0000 (18:08 +0300)]
net/mlx5e: Move TC-specific code from en_main.c to en_tc.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract TC-specific code from
en_main.c to en_tc.c. This allows easily compiling out the code by
only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_main.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c
Vlad Buslov [Tue, 12 May 2020 14:29:22 +0000 (17:29 +0300)]
net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract neigh-specific code
from en_rep.c to standalone file. This allows easily compiling out the code
by only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_rep.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Extract TC-specific code from en_rep.c to rep/tc.c
Vlad Buslov [Tue, 12 May 2020 13:41:41 +0000 (16:41 +0300)]
net/mlx5e: Extract TC-specific code from en_rep.c to rep/tc.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract TC-specific code from
en_rep.c to standalone file. This allows easily compiling out the code by
only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_rep.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Use IS_ERR() to check and simplify code
Tang Bin [Fri, 15 May 2020 23:06:33 +0000 (07:06 +0800)]
net/mlx5e: Use IS_ERR() to check and simplify code

Use IS_ERR() and PTR_ERR() instead of PTR_ERR_OR_ZERO() to
simplify code, avoid redundant judgements.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agoMerge branch 'bridge-mrp-Add-br_mrp_unique_ifindex-function'
David S. Miller [Fri, 22 May 2020 23:17:15 +0000 (16:17 -0700)]
Merge branch 'bridge-mrp-Add-br_mrp_unique_ifindex-function'

Horatiu Vultur says:

====================
bridge: mrp: Add br_mrp_unique_ifindex function

This patch series adds small fixes to MRP implementation.
The following are fixed in this patch series:
- now is not allow to add the same port to multiple MRP rings
- remove unused variable
- restore the port state according to the bridge state when the MRP instance
  is deleted

v2:
 - use rtnl_dereference instead of rcu_dereference in the first patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobridge: mrp: Restore port state when deleting MRP instance
Horatiu Vultur [Thu, 21 May 2020 23:19:07 +0000 (23:19 +0000)]
bridge: mrp: Restore port state when deleting MRP instance

When a MRP instance is deleted, then restore the port according to the
bridge state. If the bridge is up then the ports will be in forwarding
state otherwise will be in disabled state.

Fixes: 9a9f26e8f7ea ("bridge: mrp: Connect MRP API with the switchdev API")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoswitchdev: mrp: Remove the variable mrp_ring_state
Horatiu Vultur [Thu, 21 May 2020 23:19:06 +0000 (23:19 +0000)]
switchdev: mrp: Remove the variable mrp_ring_state

Remove the variable mrp_ring_state from switchdev_attr because is not
used anywhere.
The ring state is set using SWITCHDEV_OBJ_ID_RING_STATE_MRP.

Fixes: c284b5459008 ("switchdev: mrp: Extend switchdev API to offload MRP")
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobridge: mrp: Add br_mrp_unique_ifindex function
Horatiu Vultur [Thu, 21 May 2020 23:19:05 +0000 (23:19 +0000)]
bridge: mrp: Add br_mrp_unique_ifindex function

It is not allow to have the same net bridge port part of multiple MRP
rings. Therefore add a check if the port is used already in a different
MRP. In that case return failure.

Fixes: 9a9f26e8f7ea ("bridge: mrp: Connect MRP API with the switchdev API")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'DP83869-Enhancements'
David S. Miller [Fri, 22 May 2020 23:13:10 +0000 (16:13 -0700)]
Merge branch 'DP83869-Enhancements'

Dan Murphy says:

====================
DP83869 Enhancements

These are improvements to the DP83869 Ethernet PHY driver.  OP-mode and port
mirroring may be strapped on the device but the software only retrives these
settings from the device tree.  Reading the straps and initializing the
associated stored variables so when setting the PHY up and down the PHY's
configuration values will be retained.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: dp83869: Set opmode from straps
Dan Murphy [Thu, 21 May 2020 17:47:38 +0000 (12:47 -0500)]
net: phy: dp83869: Set opmode from straps

If the op-mode for the device is not set in the device tree then set
the strapped op-mode and store it for later configuration.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: dp83869: Update port-mirroring to read straps
Dan Murphy [Thu, 21 May 2020 17:47:37 +0000 (12:47 -0500)]
net: phy: dp83869: Update port-mirroring to read straps

The device tree may not have the property set for port mirroring
because the hardware may have it strapped. If the property is not in the
DT then check the straps and set the port mirroring bit appropriately.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests/bpf: CONFIG_LIRC required for test_lirc_mode2.sh
Alan Maguire [Fri, 22 May 2020 11:36:29 +0000 (12:36 +0100)]
selftests/bpf: CONFIG_LIRC required for test_lirc_mode2.sh

test_lirc_mode2.sh assumes presence of /sys/class/rc/rc0/lirc*/uevent
which will not be present unless CONFIG_LIRC=y

Fixes: 6bdd533cee9a ("bpf: add selftest for lirc_mode2 type program")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1590147389-26482-3-git-send-email-alan.maguire@oracle.com
4 years agoselftests/bpf: CONFIG_IPV6_SEG6_BPF required for test_seg6_loop.o
Alan Maguire [Fri, 22 May 2020 11:36:28 +0000 (12:36 +0100)]
selftests/bpf: CONFIG_IPV6_SEG6_BPF required for test_seg6_loop.o

test_seg6_loop.o uses the helper bpf_lwt_seg6_adjust_srh();
it will not be present if CONFIG_IPV6_SEG6_BPF is not specified.

Fixes: b061017f8b4d ("selftests/bpf: add realistic loop tests")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1590147389-26482-2-git-send-email-alan.maguire@oracle.com
4 years agoselftests/bpf: Add general instructions for test execution
Alan Maguire [Fri, 22 May 2020 11:24:34 +0000 (12:24 +0100)]
selftests/bpf: Add general instructions for test execution

Getting a clean BPF selftests run involves ensuring latest trunk LLVM/clang
are used, pahole is recent (>=1.16) and config matches the specified
config file as closely as possible.  Add to bpf_devel_QA.rst and point
tools/testing/selftests/bpf/README.rst to it.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1590146674-25485-1-git-send-email-alan.maguire@oracle.com
4 years agoRevert "net: mvneta: speed down the PHY, if WoL used, to save energy"
David S. Miller [Fri, 22 May 2020 23:09:42 +0000 (16:09 -0700)]
Revert "net: mvneta: speed down the PHY, if WoL used, to save energy"

This reverts commit 5e3768a436bb70c9c3e27aaba6b73f8ef8f5dcf3.

On request from Russell King, this is a layering violation.

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocxgb4: add adapter hotplug support for ULDs
Potnuri Bharat Teja [Thu, 21 May 2020 10:34:29 +0000 (16:04 +0530)]
cxgb4: add adapter hotplug support for ULDs

Upon adapter hotplug, cxgb4 registers ULD devices for all the ULDs that
are already loaded, ensuring that ULD's can enumerate the hotplugged
adapter without reloading the ULD.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: flow_offload: simplify hw stats check handling
Edward Cree [Wed, 20 May 2020 18:18:10 +0000 (19:18 +0100)]
net: flow_offload: simplify hw stats check handling

Make FLOW_ACTION_HW_STATS_DONT_CARE be all bits, rather than none, so that
 drivers and __flow_action_hw_stats_check can use simple bitwise checks.

Pre-fill all actions with DONT_CARE in flow_rule_alloc(), rather than
 relying on implicit semantics of zero from kzalloc, so that callers which
 don't configure action stats themselves (i.e. netfilter) get the correct
 behaviour by default.

Only the kernel's internal API semantics change; the TC uAPI is unaffected.

v4: move DONT_CARE setting to flow_rule_alloc() for robustness and simplicity.

v3: set DONT_CARE in nft and ct offload.

v2: rebased on net-next, removed RFC tags.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ip6_tunnel-add-MPLS-support'
David S. Miller [Fri, 22 May 2020 22:49:31 +0000 (15:49 -0700)]
Merge branch 'ip6_tunnel-add-MPLS-support'

Vadim Fedorenko says:

====================
ip6_tunnel: add MPLS support

The support for MPLS-in-IPv4 was added earlier. This patchset adds
support for MPLS-in-IPv6.

Changes in v2:
- Eliminate ifdefs IS_ENABLE(CONFIG_MPLS)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agompls: Add support for IPv6 tunnels
Vadim Fedorenko [Wed, 20 May 2020 15:21:39 +0000 (18:21 +0300)]
mpls: Add support for IPv6 tunnels

Add support for IPv6 tunnel devices in AF_MPLS.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoip6_tunnel: add generic MPLS receive support
Vadim Fedorenko [Wed, 20 May 2020 15:21:38 +0000 (18:21 +0300)]
ip6_tunnel: add generic MPLS receive support

Add support for MPLS in receive side.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotunnel6: support for IPPROTO_MPLS
Vadim Fedorenko [Wed, 20 May 2020 15:21:37 +0000 (18:21 +0300)]
tunnel6: support for IPPROTO_MPLS

This patch is just preparation for MPLS support in ip6_tunnel

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoip6_tunnel: add MPLS transmit support
Vadim Fedorenko [Wed, 20 May 2020 15:21:36 +0000 (18:21 +0300)]
ip6_tunnel: add MPLS transmit support

Add ETH_P_MPLS_UC as supported protocol.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoip6_tunnel: simplify transmit path
Vadim Fedorenko [Wed, 20 May 2020 15:21:35 +0000 (18:21 +0300)]
ip6_tunnel: simplify transmit path

Merge ip{4,6}ip6_tnl_xmit functions into one universal
ipxip6_tnl_xmit in preparation for adding MPLS support.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'mediatek-add-support-for-MediaTek-Ethernet-MAC'
David S. Miller [Fri, 22 May 2020 21:20:12 +0000 (14:20 -0700)]
Merge branch 'mediatek-add-support-for-MediaTek-Ethernet-MAC'

Bartosz Golaszewski says:

====================
mediatek: add support for MediaTek Ethernet MAC

This series adds support for the STAR Ethernet Controller present on MediaTeK
SoCs from the MT8* family.

First we convert the existing DT bindings for the PERICFG controller to YAML
and add a new compatible string for mt8516 variant of it. Then we add the DT
bindings for the MAC.

Next we do some cleanup of the mediatek ethernet drivers directory.

The largest patch in the series adds the actual new driver.

The rest of the patches add DT fixups for the boards already supported
upstream.

v1 -> v2:
- add a generic helper for retrieving the net_device associated with given
  private data
- fix several typos in commit messages
- remove MTK_MAC_VERSION and don't set the driver version
- use NET_IP_ALIGN instead of a magic number (2) but redefine it as it defaults
  to 0 on arm64
- don't manually turn the carrier off in mtk_mac_enable()
- process TX cleanup in napi poll callback
- configure pause in the adjust_link callback
- use regmap_read_poll_timeout() instead of handcoding the polling
- use devres_find() to verify that struct net_device is managed by devres in
  devm_register_netdev()
- add a patch moving all networking devres helpers into net/devres.c
- tweak the dma barriers: remove where unnecessary and add comments to the
  remaining barriers
- don't reset internal counters when enabling the NIC
- set the net_device's mtu size instead of checking the framesize in
  ndo_start_xmit() callback
- fix a race condition in waking up the netif queue
- don't emit log messages on OOM errors
- use dma_set_mask_and_coherent()
- use eth_hw_addr_random()
- rework the receive callback so that we reuse the previous skb if unmapping
  fails, like we already do if skb allocation fails
- rework hash table operations: add proper timeout handling and clear bits when
  appropriate

v2 -> v3:
- drop the patch adding priv_to_netdev() and store the netdev pointer in the
  driver private data
- add an additional dma_wmb() after reseting the descriptor in
  mtk_mac_ring_pop_tail()
- check the return value of dma_set_mask_and_coherent()
- improve the DT bindings for mtk-eth-mac: make the reg property in the example
  use single-cell address and size, extend the description of the PERICFG
  phandle and document the mdio sub-node
- add a patch converting the old .txt bindings for PERICFG to yaml
- limit reading the DMA memory by storing the mapped addresses in the driver
  private structure
- add a patch documenting the existing networking devres helpers

v3 -> v4:
- drop the devres patches: they will be sent separately
- call netdev_sent_queue() & netdev_completed_queue() where appropriate
- don't redefine NET_IP_ALIGN: define a private constant in the driver
- fix a couple typos
- only disabe/enable the MAC in suspend/resume if netif is running
- drop the count field from the ring structure and instead calculate the number
  of used descriptors from the tail and head indicies
- rework the locking used to protect the ring structures from concurrent
  access: use cheaper spin_lock_bh() and completely disable the internal
  spinlock used by regmap
- rework the interrupt handling to make it more fine-grained: onle re-enable
  TX and RX interrupts while they're needed, process the stats updates in a
  workqueue, not in napi context
- shrink the code responsible for unmapping and freeing skb memory
- rework the barriers as advised by Arnd

v4 -> v5:
- rename the driver to make it less confusing with the existing mtk_eth_soc
  ethernet driver
- unregister the mdiobus at device's detachment
- open-code spin lock calls to avoid calling the _bh variants where unnecessary
- limit read-modify-write operations where possible when accessing descriptor
  memory
- use READ_ONCE/WRITE_ONCE when modifying the status and data_ptr descriptor
  fields
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoARM64: dts: mediatek: enable ethernet on pumpkin boards
Bartosz Golaszewski [Fri, 22 May 2020 12:07:00 +0000 (14:07 +0200)]
ARM64: dts: mediatek: enable ethernet on pumpkin boards

Add remaining properties to the ethernet node and enable it.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoARM64: dts: mediatek: add ethernet pins for pumpkin boards
Bartosz Golaszewski [Fri, 22 May 2020 12:06:59 +0000 (14:06 +0200)]
ARM64: dts: mediatek: add ethernet pins for pumpkin boards

Setup the pin control for the Ethernet MAC.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoARM64: dts: mediatek: add an alias for ethernet0 for pumpkin boards
Bartosz Golaszewski [Fri, 22 May 2020 12:06:58 +0000 (14:06 +0200)]
ARM64: dts: mediatek: add an alias for ethernet0 for pumpkin boards

Add the ethernet0 alias for ethernet so that u-boot can find this node
and fill in the MAC address.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoARM64: dts: mediatek: add the ethernet node to mt8516.dtsi
Bartosz Golaszewski [Fri, 22 May 2020 12:06:57 +0000 (14:06 +0200)]
ARM64: dts: mediatek: add the ethernet node to mt8516.dtsi

Add the Ethernet MAC node to mt8516.dtsi. This defines parameters common
to all the boards based on this SoC.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoARM64: dts: mediatek: add pericfg syscon to mt8516.dtsi
Bartosz Golaszewski [Fri, 22 May 2020 12:06:56 +0000 (14:06 +0200)]
ARM64: dts: mediatek: add pericfg syscon to mt8516.dtsi

This adds support for the PERICFG register range as a syscon. This will
soon be used by the MediaTek Ethernet MAC driver for NIC configuration.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: mtk-star-emac: new driver
Bartosz Golaszewski [Fri, 22 May 2020 12:06:55 +0000 (14:06 +0200)]
net: ethernet: mtk-star-emac: new driver

This adds the driver for the MediaTek STAR Ethernet MAC currently used
on the MT8* SoC family. For now we only support full-duplex.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: mediatek: remove unnecessary spaces from Makefile
Bartosz Golaszewski [Fri, 22 May 2020 12:06:54 +0000 (14:06 +0200)]
net: ethernet: mediatek: remove unnecessary spaces from Makefile

The Makefile formatting in the kernel tree usually doesn't use tabs,
so remove them before we add a second driver.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: mediatek: rename Kconfig prompt
Bartosz Golaszewski [Fri, 22 May 2020 12:06:53 +0000 (14:06 +0200)]
net: ethernet: mediatek: rename Kconfig prompt

We'll soon by adding a second MediaTek Ethernet driver so modify the
Kconfig prompt.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-bindings: net: add a binding document for MediaTek STAR Ethernet MAC
Bartosz Golaszewski [Fri, 22 May 2020 12:06:52 +0000 (14:06 +0200)]
dt-bindings: net: add a binding document for MediaTek STAR Ethernet MAC

This adds yaml DT bindings for the MediaTek STAR Ethernet MAC present
on the mt8* family of SoCs.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-bindings: add new compatible to mediatek,pericfg
Bartosz Golaszewski [Fri, 22 May 2020 12:06:51 +0000 (14:06 +0200)]
dt-bindings: add new compatible to mediatek,pericfg

The PERICFG controller is present on the MT8516 SoC. Add an appropriate
compatible variant.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-bindings: convert the binding document for mediatek PERICFG to yaml
Bartosz Golaszewski [Fri, 22 May 2020 12:06:50 +0000 (14:06 +0200)]
dt-bindings: convert the binding document for mediatek PERICFG to yaml

Convert the DT binding .txt file for MediaTek's peripheral configuration
controller to YAML. There's one special case where the compatible has
three positions. Otherwise, it's a pretty normal syscon.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ENA-features-and-cosmetic-changes'
David S. Miller [Fri, 22 May 2020 21:12:49 +0000 (14:12 -0700)]
Merge branch 'ENA-features-and-cosmetic-changes'

Arthur Kiyanovski says:

====================
ENA features and cosmetic changes

Diff from V1 of this patchset:
Removed error prints patch

This patchset includes:
1. new rx offset feature
2. reduction of the driver load time
3. multiple cosmetic changes to the code
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: reduce driver load time
Arthur Kiyanovski [Fri, 22 May 2020 09:09:05 +0000 (12:09 +0300)]
net: ena: reduce driver load time

This commit reduces the driver load time by using usec resolution
instead of msec when polling for hardware state change.

Also add back-off mechanism to handle cases where minimal sleep
time is not enough.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: minor code changes
Arthur Kiyanovski [Fri, 22 May 2020 09:09:04 +0000 (12:09 +0300)]
net: ena: cosmetic: minor code changes

1. Use BIT macro instead of shift operator for code clarity
2. Replace multiple flag assignments to a single assignment of multiple
   flags in ena_com_add_single_rx_desc()
3. Move ENA_HASH_KEY_SIZE from ena_netdev.h to ena_com.h

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: fix spacing issues
Arthur Kiyanovski [Fri, 22 May 2020 09:09:03 +0000 (12:09 +0300)]
net: ena: cosmetic: fix spacing issues

1. Add leading and trailing spaces to several comments for better
   readability
2. Make tabs and spaces uniform in enum defines in ena_admin_defs.h

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: code reorderings
Arthur Kiyanovski [Fri, 22 May 2020 09:09:02 +0000 (12:09 +0300)]
net: ena: cosmetic: code reorderings

1. Reorder sanity checks in get_comp_ctxt() to make more sense
2. Reorder variables in ena_com_fill_hash_function() and
   ena_calc_io_queue_size() in reverse christmas tree.
3. Move around member initializations.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: remove unnecessary code
Arthur Kiyanovski [Fri, 22 May 2020 09:09:01 +0000 (12:09 +0300)]
net: ena: cosmetic: remove unnecessary code

1. Remove unused definition of DRV_MODULE_VERSION
2. Remove {} from single line-of-code ifs
3. Remove unnecessary comments from ena_get/set_coalesce()
4. Remove unnecessary extra spaces and newlines

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: fix line break issues
Arthur Kiyanovski [Fri, 22 May 2020 09:09:00 +0000 (12:09 +0300)]
net: ena: cosmetic: fix line break issues

1. Join unnecessarily broken short lines in ena_com.c ena_netdev.c
2. Fix Indentations of broken lines

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: fix spelling and grammar mistakes in comments
Arthur Kiyanovski [Fri, 22 May 2020 09:08:59 +0000 (12:08 +0300)]
net: ena: cosmetic: fix spelling and grammar mistakes in comments

fix spelling and grammar mistakes in comments in ena_com.h,
ena_com.c and ena_netdev.c

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: set queue sizes to u32 for consistency
Arthur Kiyanovski [Fri, 22 May 2020 09:08:58 +0000 (12:08 +0300)]
net: ena: cosmetic: set queue sizes to u32 for consistency

Make all types of variables that convey the number and sizeof queues to
be u32, for consistency with the API between the driver and device via
ena_admin_defs.h:ena_admin_get_feat_resp.max_queue_ext fields. Current
code sometimes uses int and there are multiple assignments between these
variables with different types.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: rename ena_update_tx/rx_rings_intr_moderation()
Arthur Kiyanovski [Fri, 22 May 2020 09:08:57 +0000 (12:08 +0300)]
net: ena: cosmetic: rename ena_update_tx/rx_rings_intr_moderation()

Rename ena_update_tx/rx_rings_intr_moderation() to
ena_update_tx/rx_rings_nonadaptive_intr_moderation()
to distinguish between adaptive and non adaptive interrupt moderaion.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: simplify ena_com_update_intr_delay_resolution()
Arthur Kiyanovski [Fri, 22 May 2020 09:08:56 +0000 (12:08 +0300)]
net: ena: simplify ena_com_update_intr_delay_resolution()

Initialize prev_intr_delay_resolution with ena_dev->intr_delay_resolution
unconditionally, since it is initialized with
ENA_DEFAULT_INTR_DELAY_RESOLUTION in ena_probe(). This approach makes much
more sense than handling errors of not initializing it.

Also added unlikely to if condition.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: fix ena_com_comp_status_to_errno() return value
Arthur Kiyanovski [Fri, 22 May 2020 09:08:55 +0000 (12:08 +0300)]
net: ena: fix ena_com_comp_status_to_errno() return value

Default return value should be -EINVAL since the input
in this case was unexpected.
Also remove the now redundant check in the beginning
of the function.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: use explicit variable size for clarity
Arthur Kiyanovski [Fri, 22 May 2020 09:08:54 +0000 (12:08 +0300)]
net: ena: use explicit variable size for clarity

Use u64 instead of unsigned long long for clarity

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: rename ena_com_free_desc to make API more uniform
Arthur Kiyanovski [Fri, 22 May 2020 09:08:53 +0000 (12:08 +0300)]
net: ena: rename ena_com_free_desc to make API more uniform

Rename ena_com_free_desc to ena_com_free_q_entries to match
the LLQ mode.

In non-LLQ mode, an entry in an IO ring corresponds to a
a descriptor. In LLQ mode an entry may correspond to several
descriptors (per LLQ definition).

Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: add support for the rx offset feature
Arthur Kiyanovski [Fri, 22 May 2020 09:08:52 +0000 (12:08 +0300)]
net: ena: add support for the rx offset feature

Newer ENA devices can write data to rx buffers with an offset
from the beginning of the buffer.

This commit adds support for this feature in the driver.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-atlantic-QoS-implementation'
David S. Miller [Fri, 22 May 2020 21:08:29 +0000 (14:08 -0700)]
Merge branch 'net-atlantic-QoS-implementation'

Igor Russkikh says:

====================
net: atlantic: QoS implementation

This patch series adds support for mqprio rate limiters and multi-TC:
 * max_rate is supported on both A1 and A2;
 * min_rate is supported on A2 only;

This is a joint work of Mark and Dmitry.

To implement this feature, a couple of rearrangements and code
improvements were done, in areas of TC/ring management, allocation
control, etc.

One of the problems we faced is conflicting ptp functionality, which
consumes a whole traffic class due to hardware limitations.
Patches below have a more detailed description on how PTP and multi-TC
co-exist right now.

v2:
 * accommodated review comments (-Wmissing-prototypes and
   -Wunused-but-set-variable findings);
 * added user notification in case of conflicting multi-TC<->PTP
   configuration;
 * added automatic PTP disabling, if a conflicting configuration is
   detected;
 * removed module param, which was used for PTP disabling in v1;

v1: https://patchwork.ozlabs.org/cover/1294380/
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: proper rss_ctrl1 (54c0) initialization
Mark Starovoytov [Fri, 22 May 2020 08:19:48 +0000 (11:19 +0300)]
net: atlantic: proper rss_ctrl1 (54c0) initialization

This patch fixes an inconsistency between code and spec, which
was found while working on the QoS implementation.

When 8TCs are used, 2 is the maximum supported number of index bits.
In a 4TC mode, we do support 3, but we shouldn't really use the bytes,
which are intended for the 8TC mode.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: QoS implementation: min_rate
Mark Starovoytov [Fri, 22 May 2020 08:19:47 +0000 (11:19 +0300)]
net: atlantic: QoS implementation: min_rate

This patch adds support for mqprio min_rate limiters.

A2 HW supports Weighted Strict Priority (WSP) arbitration for Tx Descriptor
Queue scheduling among TCs, which can be used for min_rate shaping.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: change the order of arguments for TC weight/credit setters
Mark Starovoytov [Fri, 22 May 2020 08:19:46 +0000 (11:19 +0300)]
net: atlantic: change the order of arguments for TC weight/credit setters

This patch changes the order of arguments for TC weight/credit setter
functions.
Having the "value to be set" on the right is slightly more robust in
a sense that it's more natural for the humans, so it's a bit more
error-proof this way.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: always use random TC-queue mapping for TX on A2.
Mark Starovoytov [Fri, 22 May 2020 08:19:45 +0000 (11:19 +0300)]
net: atlantic: always use random TC-queue mapping for TX on A2.

This patch changes the TC-queue mapping mechanism used on A2.
Configure the A2 HW in such a way that we can keep queue index mapping
exactly as it was on A1.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: automatically downgrade the number of queues if necessary
Mark Starovoytov [Fri, 22 May 2020 08:19:44 +0000 (11:19 +0300)]
net: atlantic: automatically downgrade the number of queues if necessary

This patch adds support for automatic queue number downgrade.

On A2: this is a must have, because only TC0/TC1 support more than 4Q.
Other TCs support 4Qs maximum.
Thus, on A2 we must downgrade the number of queues per TC to 4, if more
than 2 TCs are requested.

On A1: this allows using 8TCs even on systems with cpu count >= 8, when
we have 8 queues by default.
We will just automatically switch to 8TCx4Q mode in this case.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: QoS implementation: max_rate
Mark Starovoytov [Fri, 22 May 2020 08:19:43 +0000 (11:19 +0300)]
net: atlantic: QoS implementation: max_rate

This patch adds initial support for mqprio rate limiters (max_rate only).

Atlantic HW supports Rate-Shaping for time-sensitive traffic at per
Traffic Class (TC) granularity.
Target rate is defined by:
* nominal link rate (always 10G);
* rate factor (ratio between nominal rate and max allowed).

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: make TCVEC2RING accept nic_cfg
Mark Starovoytov [Fri, 22 May 2020 08:19:42 +0000 (11:19 +0300)]
net: atlantic: make TCVEC2RING accept nic_cfg

This patch updates TCVEC2RING to accept nic_cfg, which is needed to be able
to use it from hw_atl.
The name is updated to reflect the changes.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: per-TC queue statistics
Mark Starovoytov [Fri, 22 May 2020 08:19:41 +0000 (11:19 +0300)]
net: atlantic: per-TC queue statistics

This patch adds support for per-TC queue statistics.

By default (single TC), the output is the same as it used to be, e.g.:
     Queue[0] InPackets: 2
     Queue[0] OutPackets: 8
     Queue[0] Restarts: 0
     Queue[0] InJumboPackets: 0
     Queue[0] InLroPackets: 0
     Queue[0] InErrors: 0

If several TCs are enabled, then each queue statistics line is prefixed
with TC number, e.g.:
     TC0 Queue[0] InPackets: 6
     TC0 Queue[0] OutPackets: 11
Queue numbering is end-to-end, so:
     TC1 Queue[4] InPackets: 0
     TC1 Queue[4] OutPackets: 22

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: QoS implementation: multi-TC support
Dmitry Bezrukov [Fri, 22 May 2020 08:19:40 +0000 (11:19 +0300)]
net: atlantic: QoS implementation: multi-TC support

This patch adds multi-TC support.

PTP is automatically disabled when the user enables more than 2 TCs,
otherwise traffic on TC2 won't quite work, because it's reserved for PTP.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: changes for multi-TC support
Dmitry Bezrukov [Fri, 22 May 2020 08:19:39 +0000 (11:19 +0300)]
net: atlantic: changes for multi-TC support

This patch contains the following changes:
* add cfg->is_ptp (used for PTP enable/disable switch, which
  is described in more details below);
* add cfg->tc_mode (A1 supports 2 HW modes only);
* setup queue to TC mapping based on TC mode on A2;
* remove hw_tx_tc_mode_get / hw_rx_tc_mode_get hw_ops.

In the first generation of our hardware (A1), a whole traffic class is
consumed for PTP handling in FW (FW uses it to send the ptp data and to
send back timestamps).
The 'is_ptp' flag introduced in this patch will be used in to automatically
disable PTP when a conflicting configuration is detected, e.g. when
multiple TCs are enabled.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: move PTP TC initialization to a separate function
Dmitry Bezrukov [Fri, 22 May 2020 08:19:38 +0000 (11:19 +0300)]
net: atlantic: move PTP TC initialization to a separate function

This patch moves the PTP TC initialization into a separate function.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: atlantic: changes for multi-TC support
Dmitry Bezrukov [Fri, 22 May 2020 08:19:37 +0000 (11:19 +0300)]
net: atlantic: changes for multi-TC support

This patch contains the following changes:
* access cfg via aq_nic_get_cfg() in aq_nic_start() and aq_nic_map_skb();
* call aq_nic_get_dev() just once in aq_nic_map_skb();
* move ring allocation/deallocation out of aq_vec_alloc()/aq_vec_free();
* add the missing aq_nic_deinit() in atl_resume_common();
* rename 'tcs' field to 'tcs_max' in aq_hw_caps_s to differentiate it from
  the 'tcs' field in aq_nic_cfg_s, which is used for the current number of
  TCs;
* update _TC_MAX defines to the actual number of supported TCs;
* move tx_tc_mode register defines slightly higher (just to keep the order
  of definitions);
* separate variables for TX/RX buff_size in hw_atl*_hw_qos_set();
* use AQ_HW_*_TC instead of hardcoded magic numbers;
* actually use the 'ret' value in aq_mdo_add_secy();

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Fri, 22 May 2020 21:05:05 +0000 (14:05 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2020-05-21

This series contains updates to ice driver only.  Several of the changes
are fixes, which could be backported to stable, of which, only one was
marked for stable because of the memory leak potential.

Jake exposes the information in the flash memory used for link
management, which is called the netlist module.

Henry and Tony add support for tunnel offloads.

Brett adds promiscuous support in VF's which is based on VF trust and
the new vf-true-promisc flag.

Avinash fixes an issue where a transmit timeout for a queue that belongs
to a PFC enabled TC is not a true transmit timeout, but because the PFC
is in action.

Dave fixes the check for contiguous TCs to allow for various UP2TC
mapping configurations.  Also fixed an issue when changing the pause
parameters would could multiple link drop/down's in succession, which in
turn caused the firmware to not generate a link interrupt for the driver
to respond to.

Anirudh (Ani) fixed a potential race condition in probe/open due to a
bit being cleared too early.

Lihong updates an error message to make it more meaningful instead of
just printing out the numerical value of the status/error code.  Also
fixed an incorrect return value if deleting a filter does not find a
match to delete or when adding a filter that already exists.

Karol fixes casting issues and precision loss in the driver.

Jesse make the sign usage more consistent in the driver by making sure
all instances of vf_id are unsigned, since it can never be negative.

Eric fixes a potential memory leak in ice_add_prof_id_vsig() where was
not cleaning up resources properly when an error occurs.

Michal to help organize the filtering code in the driver, refactor the
code into a separate file and add functions to prepare the filter
information.

Bruce cleaned up a conditional statement that always resulted in true
and provided a comment to make it more obvious.  Also cleaned up
redundant code checks.

Tony helps with potential namespace issues by renaming a 'ice' specific
function with the driver name prepended.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'Support-for-fdb-ECMP-nexthop-groups'
David S. Miller [Fri, 22 May 2020 21:00:39 +0000 (14:00 -0700)]
Merge branch 'Support-for-fdb-ECMP-nexthop-groups'

Roopa Prabhu says:

====================
Support for fdb ECMP nexthop groups

This series introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
The patches make sure that routes dont reference such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 -
    - fix error path free_skb in vxlan_xmit_nh
    - fix atomic notifier initialization issue
      (Reported-by: kernel test robot <rong.a.chen@intel.com>)
      The reported error was easy to locate and fix, but i was not
      able to re-test with the robot reproducer script due to some
      other issues with running the script on my test system.

v3 - fix wording in selftest print as pointed out by davidA

v2 -
- dropped nikolays fixes for nexthop multipath null pointer deref
  (he will send those separately)
- added negative tests for route add with fdb nexthop + a few more
- Fixes for a few  fdb replace conditions found during more testing
- Moved to rcu_dereference_rtnl in vxlan_fdb_info and consolidate rcu
  dereferences
- Fixes to build failures Reported-by: kbuild test robot <lkp@intel.com>
- DavidA, I am going to send a separate patch for the neighbor code validation
  for NDA_NH_ID if thats ok.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: net: add fdb nexthop tests
Roopa Prabhu [Fri, 22 May 2020 05:26:17 +0000 (22:26 -0700)]
selftests: net: add fdb nexthop tests

This commit adds ipv4 and ipv6 fdb nexthop api tests to fib_nexthops.sh.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovxlan: support for nexthop notifiers
Roopa Prabhu [Fri, 22 May 2020 05:26:16 +0000 (22:26 -0700)]
vxlan: support for nexthop notifiers

vxlan driver registers for nexthop add/del notifiers to
cleanup fdb entries pointing to such nexthops.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonexthop: add support for notifiers
Roopa Prabhu [Fri, 22 May 2020 05:26:15 +0000 (22:26 -0700)]
nexthop: add support for notifiers

This patch adds nexthop add/del notifiers. To be used by
vxlan driver in a later patch. Could possibly be used by
switchdev drivers in the future.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovxlan: ecmp support for mac fdb entries
Roopa Prabhu [Fri, 22 May 2020 05:26:14 +0000 (22:26 -0700)]
vxlan: ecmp support for mac fdb entries

Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
  exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonexthop: support for fdb ecmp nexthops
Roopa Prabhu [Fri, 22 May 2020 05:26:13 +0000 (22:26 -0700)]
nexthop: support for fdb ecmp nexthops

This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 - fixed uninitialized variable reported by kernel test robot
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Fri, 22 May 2020 20:48:24 +0000 (13:48 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2020-05-21

This series contains updates to igc and e1000.

Andre cleans up code that was left over from the igb driver that handled
MAC address filters based on the source address, which is not currently
supported.  Simplifies the MAC address filtering code and prepare the
igc driver for future source address support.  Updated the MAC address
filter internal APIs to support filters based on source address.  Added
support for Network Flow Classification (NFC) rules based on source MAC
address.  Cleaned up the 'cookie' field which is not used anywhere in
the code and cleaned up a wrapper function that was not needed.
Simplified the filtering code for readability and aligned the ethtool
functions, so that function names were consistent.

Alex provides a fix for e1000 to resolve a deadlock issue when NAPI is
being disabled.

Sasha does additional cleanup of the igc driver of dead code that is not
used or needed.

v2: Fix the function header comment in patch 3 of the series, based on
    the feedback from Jakub Kicinski.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoice: Rename build_ctob to ice_build_ctob
Tony Nguyen [Fri, 8 May 2020 00:41:13 +0000 (17:41 -0700)]
ice: Rename build_ctob to ice_build_ctob

To make the function easier to identify as being part of the ice driver,
prepend ice to the function name.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: remove unnecessary backslash
Bruce Allan [Fri, 8 May 2020 00:41:12 +0000 (17:41 -0700)]
ice: remove unnecessary backslash

Self-explanatory.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: remove unnecessary check
Bruce Allan [Fri, 8 May 2020 00:41:11 +0000 (17:41 -0700)]
ice: remove unnecessary check

The variable status cannot be zero due to a prior check of it; remove this
check.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: remove unnecessary expression that is always true
Bruce Allan [Fri, 8 May 2020 00:41:10 +0000 (17:41 -0700)]
ice: remove unnecessary expression that is always true

The else conditional expression is always true due to the if conditional
expression; remove it and add a comment to make it obvious still.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Fix check for removing/adding mac filters
Lihong Yang [Fri, 8 May 2020 00:41:09 +0000 (17:41 -0700)]
ice: Fix check for removing/adding mac filters

In function ice_set_mac_address, we will remove old dev_addr before
adding the new MAC. In the removing and adding process of the MAC,
there is no need to return error if the check finds the to-be-removed
dev_addr does not exist in the MAC filter list or the to-be-added mac
already exists, keep going or return success accordingly.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: refactor filter functions
Michal Swiatkowski [Fri, 8 May 2020 00:41:08 +0000 (17:41 -0700)]
ice: refactor filter functions

Move filter functions to separate file.

Add functions that prepare suitable ice_fltr_info struct
depending on the filter type and add this struct to earlier created
list:
- ice_fltr_add_mac_to_list
- ice_fltr_add_vlan_to_list
- ice_fltr_add_eth_to_list
This functions are used in adding and removing filters.

Create wrappers for functions mentioned above that alloc list,
add suitable ice_fltr_info to it and call add or remove function.
- ice_fltr_prepare_mac
- ice_fltr_prepare_mac_and_broadcast
- ice_fltr_prepare_vlan
- ice_fltr_prepare_eth

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Fix resource leak on early exit from function
Eric Joyner [Fri, 8 May 2020 00:41:07 +0000 (17:41 -0700)]
ice: Fix resource leak on early exit from function

Memory allocated in the ice_add_prof_id_vsig() function wasn't being
properly freed if an error occurred inside the for-loop in the function.

In particular, 'p' wasn't being freed if an error occurred before it was
added to the resource list at the end of the for-loop.

Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: cleanup vf_id signedness
Jesse Brandeburg [Fri, 8 May 2020 00:41:06 +0000 (17:41 -0700)]
ice: cleanup vf_id signedness

The vf_id variable is dealt with in the code in inconsistent
ways of sign usage, preventing compilation with -Werror=sign-compare.
Fix this problem in the code by always treating vf_id as unsigned, since
there are no valid values of vf_id that are negative.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Fix casting issues
Karol Kolacinski [Fri, 8 May 2020 00:41:05 +0000 (17:41 -0700)]
ice: Fix casting issues

Change min() macros to min_t() which has compare type specified and it
helps avoid precision loss.

In some cases there was precision loss during calls or assignments.
Some fields in structs were unnecessarily large and gave multiple
warnings.

There were also some minor type differences which are now fixed as well as
some cases where a simple cast was needed.

Callers were were passing data that is a u16 to
ice_sched_cfg_node_bw_alloc() but the function was truncating that to a u8.
Fix that by changing the function to take a u16.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Provide more meaningful error message
Lihong Yang [Fri, 8 May 2020 00:41:04 +0000 (17:41 -0700)]
ice: Provide more meaningful error message

When printing the ice status or AQ error codes, instead of printing out the
numerical value, provide the description of the error code. This provides
more info about the issue than a number.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Fix probe/open race condition
Anirudh Venkataramanan [Fri, 8 May 2020 00:41:03 +0000 (17:41 -0700)]
ice: Fix probe/open race condition

As soon as the driver registers the PF netdev, userspace utilities
like NetworkManager try to bring up the associated interface. When
this happens, the driver may not have finished initializing fully,
resulting in a bunch of errors in the interface up flow.

The driver already has a mechanism to indicate if it's not up yet;
by setting the __ICE_DOWN bit in pf->state, but this bit gets
cleared too early in the current flow. So clear this bit only when
the driver is fully up. Also check for the same bit in the ice_open
flow, and return -EBUSY if the bit is set.

Also in ice_open, replace references of vsi->back with a local
variable.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: only drop link once when setting pauseparams
Dave Ertman [Fri, 8 May 2020 00:41:02 +0000 (17:41 -0700)]
ice: only drop link once when setting pauseparams

Currently, the ice driver is setting a PHY configuration,
which causes a link drop, and then additionally it calls
for a nway_reset, which restarts auto-negotiation on the
link, which also causes a link drop.  These two link
events in such close timing is causing the FW to not be
able to generate a link interrupt for the driver to
respond to.

Remove the unnecessary auto-negotiation restart from the
set pauseparams flow.  Also remove error path that
would have performed an ice_down/ice_up as that is
also unnecessary.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Fix check for contiguous TCs
Dave Ertman [Fri, 8 May 2020 00:41:01 +0000 (17:41 -0700)]
ice: Fix check for contiguous TCs

The current implementation for contiguous TC check
is assuming that the UPs will be mapped to TCs in
a linear progressing fashion.  This is obviously
not always true.

Change the check to allow for various UP2TC mapping
configurations.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Don't reset and rebuild for Tx timeout on PFC enabled queue
Avinash JD [Fri, 8 May 2020 00:41:00 +0000 (17:41 -0700)]
ice: Don't reset and rebuild for Tx timeout on PFC enabled queue

When there's a Tx timeout for a queue which belongs to a PFC enabled TC,
then it's not because the queue is hung but because PFC is in action.

In PFC, peer sends a pause frame for a specified period of time when its
buffer threshold is exceeded (due to congestion). Netdev on the other
hand checks if ACK is received within a specified time for a TX packet, if
not, it'll invoke the tx_timeout routine.

Signed-off-by: Avinash JD <avinash.dayanand@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Add VF promiscuous support
Brett Creeley [Fri, 8 May 2020 00:40:59 +0000 (17:40 -0700)]
ice: Add VF promiscuous support

Implement promiscuous support for VF VSIs. Behaviour of promiscuous support
is based on VF trust as well as the, introduced, vf-true-promisc flag.

A trusted VF with vf-true-promisc disabled will be the default VSI, which
means that all traffic without a matching destination MAC address in the
device's internal switch will be forwarded to this VF VSI.

A trusted VF with vf-true-promisc enabled will go into "true promiscuous
mode". This amounts to the VF receiving all ingress and egress traffic
that hits the device's internal switch.

An untrusted VF will only receive traffic destined for that VF.

The vf-true-promisc-support flag cannot be toggled while any VF is in
promiscuous mode. This flag should be set prior to loading the iavf driver
or spawning VF(s).

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Add support for tunnel offloads
Tony Nguyen [Wed, 6 May 2020 16:32:30 +0000 (09:32 -0700)]
ice: Add support for tunnel offloads

Create a boost TCAM entry for each tunnel port in order to get a tunnel
PTYPE. Update netdev feature flags and implement the appropriate logic to
get and set values for hardware offloads.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: report netlist version in .info_get
Jacob Keller [Tue, 5 May 2020 22:55:37 +0000 (15:55 -0700)]
ice: report netlist version in .info_get

The flash memory for the ice hardware contains a block of information
used for link management called the Netlist module.

As this essentially represents another section of firmware, add its
version information to the output of the driver's .info_get handler.

This includes both a version and the first few bytes of a hash of the
module contents.

  fw.netlist -> the version information extracted from the netlist module
  fw.netlist.build-> first 4 bytes of the hash of the contents, similar
                     to fw.mgmt.build

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoMerge branch 'improve-branch_taken'
Alexei Starovoitov [Fri, 22 May 2020 00:44:25 +0000 (17:44 -0700)]
Merge branch 'improve-branch_taken'

John Fastabend says:

====================
This series adds logic to the verifier to handle the case where a
pointer is known to be non-null but then the verifier encountesr a
instruction, such as 'if ptr == 0 goto X' or 'if ptr != 0 goto X',
where the pointer is compared against 0. Because the verifier tracks
if a pointer may be null in many cases we can improve the branch
tracking by following the case known to be true.

The first patch adds the verifier logic and patches 2-4 add the
test cases.

v1->v2: fix verifier logic to return -1 indicating both paths must
still be walked if value is not zero. Move mark_precision skip for
this case into caller of mark_precision to ensure mark_precision can
still catch other misuses. And add PTR_TYPE_BTF_ID to our list of
supported types. Finally add one more test to catch the value not
equal zero case. Thanks to Andrii for original review.

Also fixed up commit messages hopefully its better now.
====================

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 years agobpf: Selftests, add printk to test_sk_lookup_kern to encode null ptr check
John Fastabend [Thu, 21 May 2020 20:08:26 +0000 (13:08 -0700)]
bpf: Selftests, add printk to test_sk_lookup_kern to encode null ptr check

Adding a printk to test_sk_lookup_kern created the reported failure
where a pointer type is checked twice for NULL. Lets add it to the
progs test test_sk_lookup_kern.c so we test the case from C all the
way into the verifier.

We already have printk's in selftests so seems OK to add another one.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159009170603.6313.1715279795045285176.stgit@john-Precision-5820-Tower