git.osdn.net Git - android-x86/kernel.git/log

cxgb4: remove dead code when allocating filter

Error code is already returned earlier if filter exists
at specified location. So, remove dead code trying to
free existing filter.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp_bbr-more-GSO-work'

Eric Dumazet says:

====================
tcp_bbr: more GSO work

Playing with r8152 USB 1Gbit NIC, on both USB2 and USB3 slots, I found
that BBR was performing poorly, because of TSO being limited to 16KB

This patch series makes sure BBR is not under estimating number of
packets that are needed to fill the pipe when a device has suboptimal
TSO limits.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bbr: remove bbr->tso_segs_goal

Its value is computed then immediately used,
there is no need to store it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bbr: better deal with suboptimal GSO (II)

This is second part of dealing with suboptimal device gso parameters.
In first patch (350c9f484bde "tcp_bbr: better deal with suboptimal GSO")
we dealt with devices having low gso_max_segs

Some devices lower gso_max_size from 64KB to 16 KB (r8152 is an example)

In order to probe an optimal cwnd, we want BBR being not sensitive
to whatever GSO constraint a device can have.

This patch removes tso_segs_goal() CC callback in favor of
min_tso_segs() for CC wanting to override sysctl_tcp_min_tso_segs

Next patch will remove bbr->tso_segs_goal since it does not have
to be persistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)

recvmmsg does not call ___sys_recvmsg when sk_err is set.
That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
should always call ___sys_recvmsg regardless of sk->sk_err to
be able to clear error queue. Otherwise, users are not able to
drain the error queue using recvmmsg.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-Reduce-duplication'

Florian Fainelli says:

====================
net: phy: Reduce duplication

This patch series reduces the duplication among 10G PHY drivers that just
essentially stub most functions, but do that while replicating what the existing
generic functions do.

Changes in v3:

- removed unused "reg" variable in teranetics.c
- fixed subject for patch 5 since we actually use gen10g_no_soft_reset()

Changes in v2:

- rename gen10g_soft_reset() to gen10g_no_soft_reset() to better illustrate
what it does (or does not)
- removed stray comment in marvell10g.c
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell10g: Utilize gen10g_no_soft_reset()

We do the same thing as the generic function: nothing, so utilize it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

net: phy: cortina: Utilize generic functions

cortina_soft_reset() does the same thing as gen10g_soft_reset(), and
cortina_config_aneg() is actually doing what gen10g_config_init() does
for 10G capable PHYs.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

net: phy: teranetics: Utilize generic functions

Update teranetics_aneg_done() to use genphy_c45_aneg_done() instead of
duplicating that code, and switch to gen10g_* functions where
appropriate instead of maintaining identical copies doing nothing.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

net: phy: Export gen10g_* functions

In order to remove a fair amount of duplication in the different 10G PHY
drivers, export all gen10g_* functions to be able to make use of those.
While we are at it, rename gen10g_soft_reset() to gen10g_no_soft_reset()
to illustrate what it does.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

net: phy: aquantia: Utilize genphy_c45_aneg_done()

The driver duplicates what the generic function does, so use the generic
function intead.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Merge branch 'mac89x0-fixes-and-cleanups'

Finn Thain says:

====================
Fixes, cleanup and modernization for mac89x0 driver

Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Added acked-by tags from Geert Uytterhoeven.
- Omitted patches unrelated to mac89x0 driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mac89x0: Replace custom debug logging with netif_* calls

Adopt the conventional style of debug logging because it is both
shorter and more flexible.
Remove the 'version_printed' flag as the version will be printed
only once anyway (when the module loads).

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mac89x0: Fix and modernize log messages

Fix log message fragments that no longer produce the desired output
since the behaviour of printk() was changed.
Add missing printk severity levels.
Drop deprecated "out of memory" message as per checkpatch advice.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mac89x0: Convert to platform_driver

Apparently these Dayna cards don't have a pseudoslot declaration ROM
which means they can't be probed like NuBus cards.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mac89x0: Remove redundant code

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'forwarding-selftest-fixes'

David Ahern says:

====================
selftests: forwarding: misc bug fixes and enhancements

Bug fixes and an enhancement for the recent forwarding tests:
- only check tc version on tc tests
- handle multipath tests failing with 0 packet count
- fix ping command for IPv6 on Debian jessie
- improve summary of multipath tests

v2
- add CHECK_TC to bridge_vlan_aware.sh (Ido)
- dropped patch 2; always check for mz given its use
- fixed commit message for the last patch (Multipath: was dropped)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add description to the multipath tests

Add a better description to the summary for multipath tests. e.g.,

INFO: Running IPv6 multipath tests
TEST: ECMP                                               [PASS]
INFO: Expected ratio 1.00 Measured ratio 1.02
TEST: Weighted MP 2:1                                    [PASS]
INFO: Expected ratio 2.00 Measured ratio 2.02
TEST: Weighted MP 11:45                                  [PASS]
INFO: Expected ratio 4.09 Measured ratio 4.03

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Use PING6 instead of ping for ipv6 multipath test

On Debian jessie ping can not handle IPv6 addresses so the command
fails. Use PING6 which is set to ping6.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Handle 0 for packet difference in multipath tests

If the packet stats have a difference of 0, the test output shows:
INFO: Expected ratio 2.00 Measured ratio
Runtime error (func=(main), adr=9): Divide by zero
(standard_in) 2: syntax error
(standard_in) 1: syntax error
./router_multipath.sh: line 187: test: : integer expression expected
TEST: Multipath [FAIL]
Too large discrepancy between expected and measured ratios

Handle the 0 and display a cleaner message:
INFO: Running IPv6 multipath tests
TEST: Multipath [FAIL]
Packet difference is 0

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Only check tc version for tc tests

Capabilities of tc command are irrelevant for router tests:
$ ./router.sh
SKIP: iproute2 too old, missing shared block support

Add a CHECK_TC flag and only check tc capabilities if set. Add flag to
tc_common.sh and have it sourced before lib.sh

Also, if the command lacks some feature the test should exit non-0.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_rules: FRA_GENERIC_POLICY updates for ip proto, sport and dport attrs

Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport and dport")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: allow userspace to add IFA_F_OPTIMISTIC addresses

According to RFC 4429 (section 3.1), adding new IPv6 addresses as
optimistic addresses is acceptable, as long as the implementation
follows some rules:

   * Optimistic DAD SHOULD only be used when the implementation is aware
        that the address is based on a most likely unique interface
        identifier (such as in [RFC2464]), generated randomly [RFC3041],
        or by a well-distributed hash function [RFC3972] or assigned by
        Dynamic Host Configuration Protocol for IPv6 (DHCPv6) [RFC3315].
        Optimistic DAD SHOULD NOT be used for manually entered
        addresses.

Thus, it seems reasonable to allow userspace to set the optimistic flag
when adding new addresses.

We must not let userspace set NODAD + OPTIMISTIC, since if the kernel is
not performing DAD we would never clear the optimistic flag. We must
also ignore userspace's request to add OPTIMISTIC flag to addresses that
have already completed DAD (addresses that don't have the TENTATIVE
flag, or that have the DADFAILED flag).

Then we also need to clear the OPTIMISTIC flag on permanent addresses
when DAD fails. Otherwise, IFA_F_OPTIMISTIC addresses added by userspace
can still be used after DAD has failed, because in
ipv6_chk_addr_and_flags(), IFA_F_OPTIMISTIC overrides IFA_F_TENTATIVE.

Setting IFA_F_OPTIMISTIC from userspace is conditional on
CONFIG_IPV6_OPTIMISTIC_DAD and the optimistic_dad sysctl.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix spelling mistake "greater then" -> "greater than"

Fix trivial spelling mistake "greater then" -> "greater than".

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phylink: Remove redundant netdev.phydev assignment

As a part of working on MII time stamping infrastructure, I was trying
to figure out how netdev->phydev gets assigned, and I stumbled across
this. Ever since the new phylink code came in, the field is assigned
twice.

The function, phylink_connect_phy(), calls

phy_attach_direct()
phylink_bringup_phy()

and phy_attach_direct() sets

dev->phydev = phydev;

but phylink_bringup_phy() then sets the same field again:

pl->netdev->phydev = phy;

Similarly, the function, phylink_of_phy_connect(), calls

of_phy_attach()
phy_attach_direct()
phylink_bringup_phy()

The removal code is also duplicated:

phylink_disconnect_phy()
pl->netdev->phydev = NULL;
phy_disconnect()
phy_detach()
phydev->attached_dev->phydev = NULL;

This patch removes the redundant assignments, restricting manipulation
of the netdev.phydev field to phy_attach_direct() and phy_detach().

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smc-link-layer-control-enhancements'

Ursula Braun says:

====================
net/smc: Link Layer Control enhancements

here is a series of smc patches enabling SMC communication with peers
supporting more than one link per link group.

The first three patches are preparing code cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: prevent new connections on link group

When the processing of a DELETE LINK message has started,
new connections should not be added to the link group that
is about to terminate.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: process add/delete link messages

Add initial support for the LLC messages ADD LINK and DELETE LINK.
Introduce a link state field. Extend the initial LLC handshake with
ADD LINK processing.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: do not allow eyecatchers in rmbe

SMC does not support eyecatchers in RMB elements,
decline peers requesting this support.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: process confirm/delete rkey messages

Process and respond to CONFIRM RKEY and DELETE RKEY messages.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: respond to test link messages

Add TEST LINK message responses, which also serves as preparation for
support of sockopt TCP_KEEPALIVE.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: remove unused fields from smc structures

The daddr field holds the destination IPv4 address. The field was set but
never used and can be removed. The addr field was a left-over from an
earlier version of non-blocking connects and can be removed.
The result of the call to kernel_getpeername is not used, the call can be
removed. Non-blocking connects are working, so remove restriction comment.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: move netinfo function to file smc_clc.c

The function smc_netinfo_by_tcpsk() belongs to CLC handling.
Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: cleanup smc_llc.h and smc_clc.h headers

Remove structures used internal only from headers.
And remove an extra function parameter.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv4-ipv6-mcast-align'

Yuval Mintz says:

====================
ipmr, ip6mr: Align multicast routing for IPv4 & IPv6

Historically ip6mr was based [cut-n-paste] on ipmr and the two have not
diverged too much. Apparently as ipv4 multicast routing is more common
than its ipv6 brethren modifications since then are mostly one-way,
affecting ipmr while leaving ip6mr unchanged.

This series is meant to re-factor both ipmr and ip6mr into having common
structures [and some functionality], adding 2 new common files -
mroute_base.h and ipmr_base.c.

The series begins by bringing ip6mr up to speed to some of the changes
applied in the past to ipmr [#2, #3].
It is then possible to re-factor a lot of the common structures -
vif devices [#1], mr_table [#4] mfc_cache [#6], and use the common
structures in both ipmr and ip6mr.

The rest of the patches re-factor some choice flows used by both ipmr
and ip6mr and eliminates duplicity.

This series would later allow for easy extension of ipmr offloading
to support ip6mr offloading as well, as almost all structures
related to the offloading would be shared between the two protocols.

Changes from previous versions
------------------------------
v2:
  - #6 Corrected reporting logic when hitting an unresolved cache
  - #7 Addressed kernel doc style [Thanks Nikolay]

RFC -> v1:
  - Corrected support for CONFIG_IP{,V6}_MROUTE_MULTIPLE_TABLES
  - Addressed a couple of kbuild test robot issues
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr, ip6mr: Unite dumproute flows

The various MFC entries are being held in the same kind of mr_tables
for both ipmr and ip6mr, and their traversal logic is identical.
Also, with the exception of the addresses [and other small tidbits]
the major bulk of the nla setting is identical.

Unite as much of the dumping as possible between the two.
Notice this requires creating an mr_table iterator for each, as the
for-each preprocessor macro can't be used by the common logic.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6mr: Remove MFC_NOTIFY and refactor flags

MFC_NOTIFY exists in ip6mr, probably as some legacy code
[was already removed for ipmr in commit
06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum").
Remove it from ip6mr as well, and move the enum into a common file;
Notice MFC_OFFLOAD is currently only used by ipmr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr, ip6mr: Unite vif seq functions

Same as previously done with the mfc seq, the logic for the vif seq is
refactored to be shared between ipmr and ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr, ip6mr: Unite mfc seq logic

With the exception of the final dump, ipmr and ip6mr have the exact same
seq logic for traversing a given mr_table. Refactor that code and make
it common.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr, ip6mr: Unite logic for searching in MFC cache

ipmr and ip6mr utilize the exact same methods for searching the
hashed resolved connections, difference being only in the construction
of the hash comparison key.

In order to unite the flow, introduce an mr_table operation set that
would contain the protocol specific information required for common
flows, in this case - the hash parameters and a comparison key
representing a (*,*) route.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr, ip6mr: Make mfc_cache a common structure

mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic - mr_mfc
and convert both ipmr and ip6mr into using it.

For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr, ip6mr: Unite creation of new mr_table

Now that both ipmr and ip6mr are using the same mr_table structure,
we can have a common function to allocate & initialize a new instance.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mroute*: Make mr_table a common struct

Following previous changes to ip6mr, mr_table and mr6_table are
basically the same [up to mr6_table having additional '6' suffixes to
its variable names].
Move the common structure definition into a common header; This
requires renaming all references in ip6mr to variables that had the
distinct suffix.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6mr: Align hash implementation to ipmr

Since commit 8fb472c09b9d ("ipmr: improve hash scalability") ipmr has
been using rhashtable as a basis for its mfc routes, but ip6mr is
currently still using the old private MFC hash implementation.

Align ip6mr to the current ipmr implementation.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6mr: Make mroute_sk rcu-based

In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr,ipmr6: Define a uniform vif_device

The two implementations have almost identical structures - vif_device and
mif_device. As a step toward uniforming the mr_tables, eliminate the
mif_device and relocate the vif_device definition into a new common
header file.

Also, introduce a common initializing function for setting most of the
vif_device fields in a new common source file. This requires modifying
the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common
config option - CONFIG_IP_MROUTE_COMMON.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fib_rules-support-sport-dport-and-proto-match'

Roopa Prabhu says:

====================
fib_rules: support sport, dport and proto match

This series extends fib rule match support to include sport, dport
and ip proto match (to complete the 5-tuple match support).
Common use-cases of Policy based routing in the data center require
5-tuple match. The last 2 patches in the series add a call to flow dissect
in the fwd path if required by the installed fib rules (controlled by a flag).

v1:
  - Fix errors reported by kbuild and feedback on RFC
  - extend port match uapi to accomodate port ranges

v2:
  - address comments from Nikolay, David Ahern and Paolo (Thanks!)

Pending things I will submit separate patches for:
  - extack for fib rules
  - fib rules test (as requested by david ahern)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: route: dissect flow in input path if fib rules need it

Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: route: dissect flow in input path if fib rules need it

Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations (Thanks to Nikolay Aleksandrov for some review here).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fib6_rules: support for match on sport, dport and ip proto

support to match on src port, dst port and ip protocol.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: fib_rules: support match on sport, dport and ip proto

support to match on src port, dst port and ip protocol.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fib_rules: support for match on ip_proto, sport and dport

uapi for ip_proto, sport and dport range match
in fib rules.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: fix interrupt number after adding support for MSI-X interrupts

In case of MSI-X the interrupt number may differ from pcidev->irq.
Fix this by using pci_irq_vector().

Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-02-28

This series contains updates to fm10k only.

Jake provides all the changes in this series, starting with making the
function header comments consistent and to align with how the kernel
documentation expects it. Also cleaned up code comment as well as bump
the driver version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'selftests-forwarding-Add-VRF-based-tests'

Ido Schimmel says:

====================
selftests: forwarding: Add VRF-based tests

One of the nice things about network namespaces is that they allow one
to easily create and test complex environments.

Unfortunately, these namespaces can not be used with actual switching
ASICs, as their ports can not be migrated to other network namespaces
(NETIF_F_NETNS_LOCAL) and most of them probably do not support the
L1-separation provided by namespaces.

However, a similar kind of flexibility can be achieved by using VRFs and
by looping the switch ports together. For example:

                             br0
                              +
               vrf-h1         |           vrf-h2
                 +        +---+----+        +
                 |        |        |        |
    192.0.2.1/24 +        +        +        + 192.0.2.2/24
               swp1     swp2     swp3     swp4
                 +        +        +        +
                 |        |        |        |
                 +--------+        +--------+

The VRFs act as lightweight namespaces representing hosts connected to
the switch.

This approach for testing switch ASICs has several advantages over the
traditional method that requires multiple physical machines, to name a
few:

1. Only the device under test (DUT) is being tested without noise from
other system.

2. Ability to easily provision complex topologies. Testing bridging
between 4-ports LAGs or 8-way ECMP requires many physical links that are
not always available. With the VRF-based approach one merely needs to
loopback more ports.

These tests are written with switch ASICs in mind, but they can be run
on any Linux box using veth pairs to emulate physical loopbacks.

v2:
* Order local variables declaration according to function arguments
  order (Petr)

v1:
* Change location to net/forwarding instead of forwarding/
* Add ability to pause on failure
* Add ability to pause on cleanup
* Make configuration file optional
* Make ping/ping6/mz configurable
* Add more tc tests
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Introduce basic shared blocks tests

Test shared block infrastructure. This is a basic test that shares TC
block in between 2 clsact qdiscs.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Introduce basic tc chains tests

Tests chains matching and goto chain action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Introduce tc actions tests

Add first part of actions tests. This patch only contains tests of gact
ok/drop/trap and mirred redirect egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Introduce tc flower matching tests

Add first part of flower tests. This patch only contains dst/src ip/mac
matching.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Allow to get netdev interfaces names from commandline

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add MAC get helper

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add tc offload check helper

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Test IPv6 weighted nexthops

Have one host generate 16K IPv6 echo requests with a random flow label
and check that they are distributed between both multipath links
according to the provided weights.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Test IPv4 weighted nexthops

Use different weights for the multipath route configured on the first
router and check that the different flows generated by the first host
are distributed according to the provided weights.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Create test topology for multipath routing

Create a topology with two hosts, each directly connected to a different
router. Both routers are connected using two links, enabling multipath
routing.

Test IPv4 and IPv6 ping using default MTU and large MTU.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test for basic IPv4 and IPv6 routing

Configure two hosts which are directly connected to the same router and
test IPv4 and IPv6 ping. Use a large MTU and check that ping is
unaffected.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test for flooded traffic

Add test cases for unknown unicast and unregistered multicast flooding.

For each traffic type, turn off flooding on one bridged port and inject
a packet of the specified type through the second bridged port. Make
sure the packet was not received by checking the ACL counters on the
other end. Later, turn on flooding and make sure the packet was
received.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test for FDB learning

Send a packet with a specific destination MAC, make sure it was learned
on the ingress port and then aged-out.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add initial testing framework

Add initial framework to test packet forwarding functionality. The tests
can run on actual devices using loop-backed cables or using veth pairs.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: use per device spinlock to protect addrs list updates

This changeset moves ipvlan address under RCU protection, using
a per ipvlan device spinlock to protect list mutation and RCU
read access to protect list traversal.

Also explicitly use RCU read lock to traverse the per port
ipvlans list, so that we can now perform a full address lookup
without asserting the RTNL lock.

Overall this allows the ipvlan driver to check fully for duplicate
addresses - before this commit ipv6 addresses assigned by autoconf
via prefix delegation where accepted without any check - and avoid
the following rntl assertion failure still in the same code path:

RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (124)
WARNING: CPU: 15 PID: 0 at drivers/net/ipvlan/ipvlan_core.c:124 ipvlan_addr_busy+0x97/0xa0 [ipvlan]
Modules linked in: ipvlan(E) ixgbe
CPU: 15 PID: 0 Comm: swapper/15 Tainted: G            E    4.16.0-rc2.ipvlan+ #1782
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
RIP: 0010:ipvlan_addr_busy+0x97/0xa0 [ipvlan]
RSP: 0018:ffff881ff9e03768 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff881fdf2a9000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00000000000000f6 RDI: 0000000000000300
RBP: ffff881fdf2a8000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffff881ff9e034c0 R12: ffff881fe07bcc00
R13: 0000000000000001 R14: ffffffffa02002b0 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff881ff9e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc5c1a4f248 CR3: 000000207e012005 CR4: 00000000001606e0
Call Trace:
  <IRQ>
  ipvlan_addr6_event+0x6c/0xd0 [ipvlan]
  notifier_call_chain+0x49/0x90
  atomic_notifier_call_chain+0x6a/0x100
  ipv6_add_addr+0x5f9/0x720
  addrconf_prefix_rcv_add_addr+0x244/0x3c0
  addrconf_prefix_rcv+0x2f3/0x790
  ndisc_router_discovery+0x633/0xb70
  ndisc_rcv+0x155/0x180
  icmpv6_rcv+0x4ac/0x5f0
  ip6_input_finish+0x138/0x6a0
  ip6_input+0x41/0x1f0
  ipv6_rcv+0x4db/0x8d0
  __netif_receive_skb_core+0x3d5/0xe40
  netif_receive_skb_internal+0x89/0x370
  napi_gro_receive+0x14f/0x1e0
  ixgbe_clean_rx_irq+0x4ce/0x1020 [ixgbe]
  ixgbe_poll+0x31a/0x7a0 [ixgbe]
  net_rx_action+0x296/0x4f0
  __do_softirq+0xcf/0x4f5
  irq_exit+0xf5/0x110
  do_IRQ+0x62/0x110
  common_interrupt+0x91/0x91
  </IRQ>

v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event()

Fixes: e9997c2938b2 ("ipvlan: fix check for IP addresses in control path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: egress mcast packets are not exceptional

Currently, if IPv6 is enabled on top of an ipvlan device in l3
mode, the following warning message:

Dropped {multi|broad}cast of type= [86dd]

is emitted every time that a RS is generated and dmseg is soon
filled with irrelevant messages. Replace pr_warn with pr_debug,
to preserve debuggability, without scaring the sysadmin.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-mq-red-offload'

Jiri Pirko says:

====================
mlxsw: Offload multi-queue RED support

Nogah says:

Support a two level hierarchy of offloaded qdiscs in mlxsw, with sch_prio
being the root qdisc and sch_red as the children.

                +----------+
                | sch_prio |
                +----+-----+
                     |
                     |
    +----------------------------------+
    |                |                 |
    |                |                 |
    |                |                 |
+---v---+       +----v---+       +-----v--+
|sch_red|       |sch_red |       |sch_red |
+-------+       +--------+       +--------+

When setting sch_prio as the root qdisc on a physical port, mlxsw will
offload it. When adding it with sch_red as a child qdisc, it will offload
it as well.
Relocating child qdisc or connecting them to more then one child will
result in unoffloading them. Relocating child qdisc more then once is
highly unrecommended and might cause a miss match between the kernel
configuration and the offloaded one. The offloaded configuration will be
aligned with the one shown in the show command.
Changing the priomap parameter of sch_prio might cause a band that its
configuration was changed and it has offloaded sch_red set on it, to lose
some stats data as if sch_red was unoffloaded and offloaded again. However,
it won't affect the data on this band that will have sch_red continuously.

Patch 1 adds support for setting RED as the child of root qdisc.
Patches 2-4 add support for RED bstasts for offloaded child qdiscs.
Patches 5-6 handle backlog related changes for offloaded child qdiscs.
Patches 7-8 update PRIO in mlxsw to be able to have RED as child on its
bands.
Patch 9 adds offload handles for PRIO graft operations. In mlxsw it will
cause the driver to stop offloading the child in question.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: qdiscs: prio: Handle graft command

Handle graft command for an offloaded sch_prio.
Grafting a qdisc to any place other than under its original parent is not
supported by mlxsw and will cause the grafted qdisc to stop being
offloaded.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sch: prio: Add offload ability for grafting a child

Offload sch_prio graft command for capable drivers.
Warn in case of a failure, unless the graft was done as part of a destroy
operation (the new qdisc is a noop) or if all the qdiscs (the parent, the
old child, and the new one) are not offloaded.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: qdiscs: prio: Delete child qdiscs when removing bands

When the number the bands of sch_prio is decreased, child qdiscs on the
deleted bands would get deleted as well.
This change and deletions are being done under sch_tree_lock of the
sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if
it is offloaded. Un-offloading can't be done inside this lock.
Move the offload command to be done before reducing the number of bands,
so unoffloading of the qdiscs that are about to be deleted could be done
outside of the lock.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Update sch_prio stats to include sch_red related drops

sch_prio as root qdisc should count all the drops its children have. Since
it is possible for it to have sch_red children, it needs to count RED early
drops.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sch: Don't warn on missmatching qlen and backlog for offloaded qdiscs

Offloaded qdiscs are allowed to expose only parts of their statistics.
It means that if backlog is being exposed and qlen is not, it might trigger
a warning in qdisc_tree_reduce_backlog.
Do not warn in case the qdisc that was removed was an offloaded one.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: qdiscs: Update backlog handling of a child qdiscs

When removing a child qdisc its backlog will be decreased from the parent
backlog. The driver backlog count should do the same.
When the parent changes its configuration, the child might need to clean
its stats. However, the backlog can't be cleaned with the rest of the
stats, because it reflects a momentary value that needs to be synced with
the core, not the history of the qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: qdiscs: Collect stats for sch_red based on priomap

Priority counters count packets according to their packet priority.
Collect the stats for sch_red based on these counters, so the qdisc bstats
will be the sum of counters matching the priorities marked in the qdisc
priomap.
Changing the mapping of the priorities to bands while traffic is running
can result in losing the stats of the bands qdiscs from their last dump
call to this change, as if the qdisc was unoffloaded and re-offloaded. It
will not affect the traffic behaviour according to sch_red.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: qdiscs: Add priority map per qdisc

Add priority map per qdisc, to indicate which priorities are being
directed through this qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Add priority counters

Add TX packets and bytes counters per switch priority per port.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: qdiscs: Support qdisc per tclass

Add the option to set a qdisc per tclass. Match the qdisc to the tclass by
parent ID. Supported currently for sch_red only.
It allows offloading sch_prio as root qdisc and sch_red as its child.
(However, doing so might corrupt the stats for both parent and child.)

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: Add hardware offloading for VLAN filtering

Marvell PPv2 controller allows for generic packet filtering. This commit
adds entries to implement VLAN filtering. The approach taken is :

- Filter entries that would match on the presence of the VLAN tag
   (existing VLAN detection, DSA / EDSA detection) will set the next
   lookup ID to be for the VID.

- For each VLAN existing on a given port, we add an entry that matches
   this specific VID. If the incoming packet matches the VID entry, it is
   set for the next lookup in the chain (LU_L2).

- A Guard entry is added for each port, that will match if the incoming
   packet didn't match any of the above VID entries. This entry tags the
   packet to be dropped.

Due to this design, and the fact that the total 256 filter entries are
also used for other purposes, we have a limit of 10 VLANs per port. To
accommodate the case where we would need more VLANS on one port, this
patch implements the ndo_set_features to allow for disabling of VLAN
filtering using ethtool.

The default config has VLAN filtering disabled.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: convert remaining feature flag and remove enum features

Now that only one feature flag is left we can convert it and remove
enum features.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'macmace-cleanups'

Finn Thain says:

====================
Fixes, cleanup and modernization for macmace driver

Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Omitted patches unrelated to macmace driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/macmace: Drop redundant MACH_IS_MAC test

The MACH_IS_MAC test is redundant here because the platform device
won't get registered unless MACH_IS_MAC.
Adopt module_platform_driver() convention.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/macmace: Fix and clean up log messages

Don't log the unexpanded "eth%d" format string.
Log the chip revision in the probe message (consistent with mace.c).
Drop redundant debug messages for FIFO events recorded in the
interface statistics (also consistent with mace.c).

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: whitespace cleanup

Ran simple script to find/remove trailing whitespace and blank lines
at EOF because that kind of stuff git whines about and editors leave
behind.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

emulex/benet: Constify *be_misconfig_evt_port_state[]

Note: This is compile only tested as I have no access to the hw.
No benefit gained except for some self-documenting.

add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Function old new delta
Total: Before=2757703, After=2757703, chg +0.00%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlogic/qed: Constify *pkt_type_str[]

Note: This is compile only tested as I have no access to the hw.
Constifying and declaring as static saves 24 bytes.

add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24)
Function old new delta
pkt_type_str 24 - -24
Total: Before=3599256, After=3599232, chg -0.00%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'SFP-updates'

Russell King says:

====================
SFP updates

Included in this series are a further few updates for SFP support:

- Adding support for Fiberstore's non-standard BiDi modules operating
  at 1310nm/1550nm wavelengths rather than the 1000BASE-BX standard of
  1310nm/1490nm.
- Adding support for negotiating the PHY interface mode with the MAC,
  so that modules supporting faster speeds and Gigabit ethernet work
  with Gigabit-only MACs.
- Adding support for high power (>1W) SFP modules.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfp: add high power module support

This patch is the result of work by both Jon Nettleton and Russell King.
Jon wrote the original patch, adding support for SFP modules which
require a power level greater than '1'.

Russell's changes:
- Fix the power levels for big-endian, and make the code flow better.
- Convert to use device_property_read_u8()
- Warn for power levels exceeding host level
  SFF-8431 says:

  "To avoid exceeding system power supply limits and cooling capacity,
   all modules at power up by default shall operate with up to 1.0 W.
   Hosts supporting Power Level II or III operation may enable a Power
   Level II or III module through the 2-wire interface. Power Level II
   or III modules shall assert the power level declaration bit of
   SFF-8472."

  Print a warning for modules that exceed the host power level, and
  leave them operating in power level 1.

- Fix i2c write
  The first byte of any write after the bus address is always the
  device address.  In order to write a value to device D, address I,
  value V, we need to generate on the bus:

    S DDDDDDDD A IIIIIIII A VVVVVVVV A P

  where S = start, R = restart, A = ack, P = stop.  Splitting this
  as two:

    S DDDDDDDD A IIIIIIII A R DDDDDDDD A VVVVVVVV A P

  results in the device's address register being written first by I
  and then by V - the addressed register within the device is not
  written.

- Avoid power mode switching if 0xa2 is not implemented
  Some modules indicate that they support power level II or power level
  III, but do not implement address 0xa2, meaning that the bit to set
  them to high power mode is not accessible.

  These modules appear to have the sff8472_compliance field set to zero,
  and also do not implement diagnostics.  Detect this, but also ensure
  that the module does not require the address switching mode, which we
  do not implement.

- Use mW for power level rather than power level number.

- Fix high power mode transition
  We must not switch to SFP_MOD_PRESENT state until we have finished
  initialising, because the remaining state machines check for that
  state.  Add SFP_MOD_HPOWER as an intermediate state.

- Use definition for I2C register address rather than constant.

Signed-off-by: Jon Nettleton <jon@solid-run.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: add maximum power level to SFP binding

Add the new maximum power level property to the SFP binding.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylink,sfp: negotiate interface format with MAC

Negotiate the interface format with the MAC rather than requiring it to
be a fixed type specified solely by the SFP module. This allows modules
that can work with several different interface signalling formats to
select a format compatible with the MAC - for example, a Fiber module
supporing Gigabit ethernet and faster connected to a Gigabit only MAC
needs to select the 1000BASE-X mode.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfp: support 1G BiDi (eg, FiberStore SFP-GE-BX) modules

Some BiDi modules (eg, FiberStore SFP-GE-BX) are not compliant with
1000BASE-BX as they use different wavelengths from the 1000BASE-BX
standard (eg, 1310nm/1550nm rather than 1310nm/1490nm). These modules
support 1000BASE-X ethernet, so detect them by a failure to find any
other support, the 8B10B encoding and a bit rate that falls within the
1Gbps window.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: Use extack to report enslavement failures

Use extack inside team's enslavement function and also propagate it to
the netdevice notifier to allow enslaved ports to report the failure
reason. Example:

$ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
$ ip link set dev lo master team0
Error: Loopback device can't be added as a team port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fm10k: bump version number

We're aligned with latest version released on SourceForge, so update the
version number to match.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fm10k: fix incorrect warning for function prototype

Recent kernels now complain about incorrect function prototype comments,
in order to ensure comments are accurate to the function. However, it
incorrectly associates the comment above the fm10k_pci_tbl[] as
a function header comment. Fix this by removing the extra "*" in the
comment. This normally indicates that the function is a doxygen style
function header comment.

Once removed, the logic no longer kicks in and the following warning is
fixed:

warning: cannot understand function prototype: 'const struct pci_device_id fm10k_pci_tbl[] = '

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fm10k: fix function doxygen comments

Several function header comments had incorrect function parameter
definitions. Recent versions of the upstream kernel have started to warn
about these issues. Fix up the comments which do not match in order to
resolve these new warnings.

While fixing these, update the copyright year also.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge tag 'mlx5-updates-2018-02-23' of git://git./linux/kernel/git/mellanox/linux

Saeed Mahameed says:

mlx5-update-2018-02-23 (IB representors)

From: Mark Bloch <markb@mellanox.com>
=========
Add IB representor when in switchdev mode

The following series adds support for an IB (RAW Ethernet only) device
representor which is created when the user switches to switchdev mode.

Today when switching to switchdev mode the only representors which are
created are net devices. Each netdev is a representor of a virtual
function and any data sent via the representor is received on the virtual
function, and any data sent via the virtual function is received by the
representor.

For the mlx5 driver the main use of this functionality is to be able to
use Open vSwitch on the hypervisor in order to manage/control traffic
from/to the virtual functions. Open vSwitch can also work with DPDK
devices and not just net devices, this series exposes an IB device, which
Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.

An IB device representor exposes only RAW Ethernet QP capabilities and
the ability to create flow rules to direct traffic to its RX queues. The
state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
corresponding net device representor. No other RDMA/RoCE functionality is
currently supported and no GID table is exposed.
=========

Signed-off-by: David S. Miller <davem@davemloft.net>