OSDN Git Service

uclinux-h8/linux.git
5 years agosky2: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:06 +0000 (00:18 +0100)]
sky2: use __vlan_hwaccel helpers

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlx4: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:05 +0000 (00:18 +0100)]
mlx4: use __vlan_hwaccel helpers

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobenet: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:04 +0000 (00:18 +0100)]
benet: use __vlan_hwaccel helpers

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4/tunnel: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:04 +0000 (00:18 +0100)]
ipv4/tunnel: use __vlan_hwaccel helpers

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobridge: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:03 +0000 (00:18 +0100)]
bridge: use __vlan_hwaccel helpers

This removes assumption than vlan_tci != 0 when tag is present.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years ago8021q: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:03 +0000 (00:18 +0100)]
8021q: use __vlan_hwaccel helpers

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfnetlink/queue: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:02 +0000 (00:18 +0100)]
nfnetlink/queue: use __vlan_hwaccel helpers

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/core: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:02 +0000 (00:18 +0100)]
net/core: use __vlan_hwaccel helpers

This removes assumptions about VLAN_TAG_PRESENT bit.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: use __vlan_hwaccel helpers
Michał Mirosław [Thu, 8 Nov 2018 23:18:01 +0000 (00:18 +0100)]
cxgb4: use __vlan_hwaccel helpers

Use __vlan_hwaccel_put_tag() to set vlan tag and proto fields.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: move __skb_checksum_complete*() to skbuff.c
Cong Wang [Thu, 8 Nov 2018 22:05:42 +0000 (14:05 -0800)]
net: move __skb_checksum_complete*() to skbuff.c

__skb_checksum_complete_head() and __skb_checksum_complete()
are both declared in skbuff.h, they fit better in skbuff.c
than datagram.c.

Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-ethernet-ti-cpsw-fix-vlan-mcast'
David S. Miller [Fri, 9 Nov 2018 04:30:58 +0000 (20:30 -0800)]
Merge branch 'net-ethernet-ti-cpsw-fix-vlan-mcast'

Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: fix vlan mcast

The cpsw holds separate mcast entires for vlan entries. At this moment
driver adds only not vlan mcast addresses, omitting vlan/mcast entries.
As result mcast for vlans doesn't work. It can be fixed by adding same
mcast entries for every created vlan, but this patchseries uses more
sophisticated way and allows to create mcast entries only for vlans
that really require it. Generic functions from this series can be
reused for fixing vlan and macvlan unicast.

Simple example of ALE table before and after this series, having same
mcast entries as for vlan 100 as for real device (reserved vlan 2),
and one mcast address only for vlan 100 - 01:1b:19:00:00:00.

<---- Before this patchset ---->
vlan , vid = 2, untag_force = 0x5, reg_mcast = 0x5, mem_list = 0x5
mcast, vid = 2, addr = ff:ff:ff:ff:ff:ff, port_mask = 0x1
ucast, vid = 2, addr = 74:da:ea:47:7d:9d, persistant, port_num = 0x0
vlan , vid = 0, untag_force = 0x7, reg_mcast = 0x0, mem_list = 0x7
mcast, vid = 2, addr = 33:33:00:00:00:01, port_mask = 0x1
mcast, vid = 2, addr = 01:00:5e:00:00:01, port_mask = 0x1
vlan , vid = 1, untag_force = 0x3, reg_mcast = 0x3, mem_list = 0x3
mcast, vid = 1, addr = ff:ff:ff:ff:ff:ff, port_mask = 0x1
ucast, vid = 1, addr = 74:da:ea:47:7d:9c, persistant, port_num = 0x0
mcast, vid = 1, addr = 33:33:00:00:00:01, port_mask = 0x1
mcast, vid = 1, addr = 01:00:5e:00:00:01, port_mask = 0x1
mcast, vid = 2, addr = 01:80:c2:00:00:00, port_mask = 0x1
mcast, vid = 2, addr = 01:80:c2:00:00:03, port_mask = 0x1
mcast, vid = 2, addr = 01:80:c2:00:00:0e, port_mask = 0x1
mcast, vid = 1, addr = 01:80:c2:00:00:00, port_mask = 0x1
mcast, vid = 1, addr = 01:80:c2:00:00:03, port_mask = 0x1
mcast, vid = 1, addr = 01:80:c2:00:00:0e, port_mask = 0x1
mcast, vid = 2, addr = 33:33:ff:47:7d:9d, port_mask = 0x1
mcast, vid = 2, addr = 33:33:00:00:00:fb, port_mask = 0x1
mcast, vid = 2, addr = 33:33:00:01:00:03, port_mask = 0x1
mcast, vid = 1, addr = 33:33:ff:47:7d:9c, port_mask = 0x1
mcast, vid = 1, addr = 33:33:00:00:00:fb, port_mask = 0x1
mcast, vid = 1, addr = 33:33:00:01:00:03, port_mask = 0x1
mcast, vid = 1, addr = 01:00:5e:00:00:fb, port_mask = 0x1
mcast, vid = 1, addr = 01:00:5e:00:00:fc, port_mask = 0x1
vlan , vid = 100, untag_force = 0x0, reg_mcast = 0x5, mem_list = 0x5
ucast, vid = 100, addr = 74:da:ea:47:7d:9d, persistant, port_num = 0x0
mcast, vid = 100, addr = ff:ff:ff:ff:ff:ff, port_mask = 0x1
mcast, vid = 2, addr = 01:1b:19:00:00:00, port_mask = 0x1
 ^^^
 Here mcast entry (ptpl2), has to be added only for vlan 100
 but added for reserved vlan 2...that's not enough.

<---- After this patchset ---->
vlan , vid = 2, untag_force = 0x5, reg_mcast = 0x5, mem_list = 0x5
mcast, vid = 2, addr = ff:ff:ff:ff:ff:ff, port_mask = 0x1
ucast, vid = 2, addr = 74:da:ea:47:7d:9d, persistant, port_num = 0x0
vlan , vid = 0, untag_force = 0x7, reg_mcast = 0x0, mem_list = 0x7
mcast, vid = 2, addr = 33:33:00:00:00:01, port_mask = 0x1
mcast, vid = 2, addr = 01:00:5e:00:00:01, port_mask = 0x1
vlan , vid = 1, untag_force = 0x3, reg_mcast = 0x3, mem_list = 0x3
mcast, vid = 1, addr = ff:ff:ff:ff:ff:ff, port_mask = 0x1
ucast, vid = 1, addr = 74:da:ea:47:7d:9c, persistant, port_num = 0x0
mcast, vid = 1, addr = 33:33:00:00:00:01, port_mask = 0x1
mcast, vid = 1, addr = 01:00:5e:00:00:01, port_mask = 0x1
mcast, vid = 2, addr = 01:80:c2:00:00:00, port_mask = 0x1
mcast, vid = 2, addr = 01:80:c2:00:00:03, port_mask = 0x1
mcast, vid = 2, addr = 01:80:c2:00:00:0e, port_mask = 0x1
mcast, vid = 1, addr = 01:80:c2:00:00:00, port_mask = 0x1
mcast, vid = 1, addr = 01:80:c2:00:00:03, port_mask = 0x1
mcast, vid = 1, addr = 01:80:c2:00:00:0e, port_mask = 0x1
mcast, vid = 2, addr = 33:33:ff:47:7d:9d, port_mask = 0x1
mcast, vid = 1, addr = 33:33:ff:47:7d:9c, port_mask = 0x1
mcast, vid = 2, addr = 33:33:00:00:00:fb, port_mask = 0x1
mcast, vid = 2, addr = 33:33:00:01:00:03, port_mask = 0x1
mcast, vid = 1, addr = 33:33:00:00:00:fb, port_mask = 0x1
mcast, vid = 1, addr = 33:33:00:01:00:03, port_mask = 0x1
vlan , vid = 100, untag_force = 0x0, reg_mcast = 0x5, mem_list = 0x5
ucast, vid = 100, addr = 74:da:ea:47:7d:9d, persistant, port_num = 0x0
mcast, vid = 100, addr = ff:ff:ff:ff:ff:ff, port_mask = 0x1
mcast, vid = 100, addr = 33:33:00:00:00:01, port_mask = 0x1
mcast, vid = 100, addr = 01:00:5e:00:00:01, port_mask = 0x1
mcast, vid = 100, addr = 33:33:ff:47:7d:9d, port_mask = 0x1
mcast, vid = 100, addr = 01:80:c2:00:00:00, port_mask = 0x1
mcast, vid = 100, addr = 01:80:c2:00:00:03, port_mask = 0x1
mcast, vid = 100, addr = 01:80:c2:00:00:0e, port_mask = 0x1
mcast, vid = 100, addr = 33:33:00:00:00:fb, port_mask = 0x1
mcast, vid = 100, addr = 33:33:00:01:00:03, port_mask = 0x1
mcast, vid = 100, addr = 01:1b:19:00:00:00, port_mask = 0x1
 ^^^
    Here mcast entry (ptpl2), is added only for vlan 100
    as it should be.

Based on net-next/master

v2..v1:
  net: ethernet: ti: cpsw: fix vlan mcast
- removed limit for legacy switch cpsw mode
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: fix vlan configuration while down/up
Ivan Khoronzhuk [Thu, 8 Nov 2018 20:27:57 +0000 (22:27 +0200)]
net: ethernet: ti: cpsw: fix vlan configuration while down/up

The vlan configuration is not restored after interface donw/up sequence
(if dual-emac - both interfaces). Tested on am572x EVM.

Steps to check:
~# ip link add link eth1 name eth1.100 type vlan id 100
~# ifconfig eth0 down
~# ifconfig eth1 down

Try to remove vid and observe warning:
~# ip link del eth1.100
[  739.526757] net eth1: removing vlanid 100 from vlan filter
[  739.533322] failed to kill vid 0081/100 for device eth1

This patch fixes it, restoring only vlan ALE entries and all other
unicast/multicast entries are restored by system calling rx_mode ndo.

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: fix vlan mcast
Ivan Khoronzhuk [Thu, 8 Nov 2018 20:27:56 +0000 (22:27 +0200)]
net: ethernet: ti: cpsw: fix vlan mcast

At this moment, mcast addresses are added for real device only
(reserved vlans for dual-emac mode), even if a mcast address was added
for some vlan only, thus ALE doesn't have corresponding vlan mcast
entries after vlan socket joined multicast group. So ALE drops vlan
frames with mcast addresses intended for vlans and potentially can
receive mcast frames for base ndev. That's not correct. So, fix it by
creating only vlan/mcast entries as requested. Patch doesn't use any
additional lists and is based on device mc address list and cpsw ALE
table entries.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: 8021q: vlan_core: allow use list of vlans for real device
Ivan Khoronzhuk [Thu, 8 Nov 2018 20:27:55 +0000 (22:27 +0200)]
net: 8021q: vlan_core: allow use list of vlans for real device

It's redundancy for the drivers to hold the list of vlans when
absolutely the same list exists in vlan core. In most cases it's
needed only to traverse the vlan devices, their vids and sync some
settings with h/w, so add API to simplify this.

At least some of these drivers also can benefit:
grep "for_each.*vid" -r drivers/net/ethernet/

drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:
drivers/net/ethernet/qlogic/qlge/qlge_main.c:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c:
drivers/net/ethernet/via/via-rhine.c:
drivers/net/ethernet/via/via-velocity.c:
drivers/net/ethernet/intel/igb/igb_main.c:
drivers/net/ethernet/intel/ice/ice_main.c:
drivers/net/ethernet/intel/e1000/e1000_main.c:
drivers/net/ethernet/intel/i40e/i40e_main.c:
drivers/net/ethernet/intel/e1000e/netdev.c:
drivers/net/ethernet/intel/igbvf/netdev.c:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:
drivers/net/ethernet/intel/ixgb/ixgb_main.c:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:
drivers/net/ethernet/amd/xgbe/xgbe-dev.c:
drivers/net/ethernet/emulex/benet/be_main.c:
drivers/net/ethernet/neterion/vxge/vxge-main.c:
drivers/net/ethernet/adaptec/starfire.c:
drivers/net/ethernet/brocade/bna/bnad.c:

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: core: dev_addr_lists: add auxiliary func to handle reference address updates
Ivan Khoronzhuk [Thu, 8 Nov 2018 20:27:54 +0000 (22:27 +0200)]
net: core: dev_addr_lists: add auxiliary func to handle reference address updates

In order to avoid all table update, and only remove or add new
address, the auxiliary function exists, named __hw_addr_sync_dev().
It allows end driver do nothing when nothing changed and add/rm when
concrete address is firstly added or lastly removed. But it doesn't
include cases when an address of real device or vlan was reused by
other vlans or vlan/macval devices.

For handaling events when address was reused/unreused the patch adds
new auxiliary routine - __hw_addr_ref_sync_dev(). It allows to do
nothing when nothing was changed and do updates only for an address
being added/reused/deleted/unreused. Thus, clone address changes for
vlans can be mirrored in the table. The function is exclusive with
__hw_addr_sync_dev(). It's responsibility of the end driver to
identify address vlan device, if it needs so.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc: use the new __netdev_tx_sent_queue BQL optimisation
Edward Cree [Thu, 8 Nov 2018 19:47:19 +0000 (19:47 +0000)]
sfc: use the new __netdev_tx_sent_queue BQL optimisation

As added in 3e59020abf0f ("net: bql: add __netdev_tx_sent_queue()"), which
 see for performance rationale.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-Remove-VLAN_TAG_PRESENT-from-drivers'
David S. Miller [Fri, 9 Nov 2018 03:49:32 +0000 (19:49 -0800)]
Merge branch 'net-Remove-VLAN_TAG_PRESENT-from-drivers'

Michał Mirosław says:

====================
net: Remove VLAN_TAG_PRESENT from drivers

This series removes VLAN_TAG_PRESENT use from network drivers in
preparation to removing its special meaning.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogianfar: remove use of VLAN_TAG_PRESENT
Michał Mirosław [Thu, 8 Nov 2018 17:44:50 +0000 (18:44 +0100)]
gianfar: remove use of VLAN_TAG_PRESENT

Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoOVS: remove use of VLAN_TAG_PRESENT
Michał Mirosław [Thu, 8 Nov 2018 17:44:50 +0000 (18:44 +0100)]
OVS: remove use of VLAN_TAG_PRESENT

This is a minimal change to allow removing of VLAN_TAG_PRESENT.
It leaves OVS unable to use CFI bit, as fixing this would need
a deeper surgery involving userspace interface.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocnic: remove use of VLAN_TAG_PRESENT
Michał Mirosław [Thu, 8 Nov 2018 17:44:50 +0000 (18:44 +0100)]
cnic: remove use of VLAN_TAG_PRESENT

This just removes VLAN_TAG_PRESENT use.  VLAN TCI=0 special meaning is
deeply embedded in the driver code and so is left as is.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoi40iw: remove use of VLAN_TAG_PRESENT
Michał Mirosław [Thu, 8 Nov 2018 17:44:49 +0000 (18:44 +0100)]
i40iw: remove use of VLAN_TAG_PRESENT

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socionext: refactor netsec_alloc_dring()
Ilias Apalodimas [Thu, 8 Nov 2018 15:19:55 +0000 (17:19 +0200)]
net: socionext: refactor netsec_alloc_dring()

return -ENOMEM directly instead of assigning it in a variable

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socionext: different approach on DMA
Ilias Apalodimas [Thu, 8 Nov 2018 15:19:54 +0000 (17:19 +0200)]
net: socionext: different approach on DMA

Current driver dynamically allocates an skb and maps it as DMA Rx
buffer. In order to prepare for upcoming XDP changes, let's introduce a
different allocation scheme.
Buffers are allocated dynamically and mapped into hardware.
During the Rx operation the driver uses build_skb() to produce the
necessary buffers for the network stack.
This change increases performance ~15% on 64b packets with smmu disabled
and ~5% with smmu enabled

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: qca_spi: Add available buffer space verification
Stefan Wahren [Thu, 8 Nov 2018 13:38:21 +0000 (14:38 +0100)]
net: qca_spi: Add available buffer space verification

Interferences on the SPI line could distort the response of
available buffer space. So at least we should check that the
response doesn't exceed the maximum available buffer space.
In error case increase a new error counter and retry it later.
This behavior avoids buffer errors in the QCA7000, which
results in an unnecessary chip reset including packet loss.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosock: Reset dst when changing sk_mark via setsockopt
David Barmann [Thu, 8 Nov 2018 14:13:35 +0000 (08:13 -0600)]
sock: Reset dst when changing sk_mark via setsockopt

When setting the SO_MARK socket option, if the mark changes, the dst
needs to be reset so that a new route lookup is performed.

This fixes the case where an application wants to change routing by
setting a new sk_mark.  If this is done after some packets have already
been sent, the dst is cached and has no effect.

Signed-off-by: David Barmann <david.barmann@stackpath.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 's390-qeth-next'
David S. Miller [Fri, 9 Nov 2018 01:22:24 +0000 (17:22 -0800)]
Merge branch 's390-qeth-next'

Julian Wiedmann says:

====================
s390/qeth: updates 2018-11-08

please apply the following qeth patches to net-next.

The first patch allows one more device type to query the FW for a MAC address,
the others are all basically just removal of duplicated or unused code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: don't process hsuid in qeth_l3_setup_netdev()
Julian Wiedmann [Thu, 8 Nov 2018 14:06:22 +0000 (15:06 +0100)]
s390/qeth: don't process hsuid in qeth_l3_setup_netdev()

qeth_l3_setup_netdev() checks if the hsuid attribute is set on the qeth
device, and propagates it to the net_device. In the past this was needed
to pick up any hsuid that was set before allocation of the net_device.

With commit d3d1b205e89f ("s390/qeth: allocate netdevice early") this
is no longer necessary, qeth_l3_dev_hsuid_store() always stores the
hsuid straight into dev->perm_addr.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: remove unused fallback in Layer3's MAC code
Julian Wiedmann [Thu, 8 Nov 2018 14:06:21 +0000 (15:06 +0100)]
s390/qeth: remove unused fallback in Layer3's MAC code

If the CREATE ADDR sent by qeth_l3_iqd_read_initial_mac() fails, its
callback sets a random MAC address on the net_device. The error then
propagates back, and qeth_l3_setup_netdev() bails out without
registering the net_device.

Any subsequent call to qeth_l3_setup_netdev() will then attempt a fresh
CREATE ADDR which either 1) also fails, or 2) sets a proper MAC address
on the net_device. Consequently, the net_device will never be registered
with a random MAC and we can drop the fallback code.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: remove two IPA command helpers
Julian Wiedmann [Thu, 8 Nov 2018 14:06:20 +0000 (15:06 +0100)]
s390/qeth: remove two IPA command helpers

qeth_l3_send_ipa_arp_cmd() is merely a wrapper around
qeth_send_control_data() now. So push the length adjustment into
QETH_SETASS_BASE_LEN, and remove the wrapper. While at it, also remove
some redundant 0-initializations.

qeth_send_setassparms() requires that callers prepare their command
parameters, so that they can be copied into the parameter area in one
go. Skip the indirection, and just let callers set up the command
themselves.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: replace open-coded cmd setup
Julian Wiedmann [Thu, 8 Nov 2018 14:06:19 +0000 (15:06 +0100)]
s390/qeth: replace open-coded cmd setup

Call qeth_prepare_ipa_cmd() during setup of a new IPA cmd buffer, so
that it is used for all commands. Thus ARP and SNMP requests don't have
to do their own initialization.

This will now also set the proper MPC protocol version for SNMP requests
on L2 devices.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: remove card list
Julian Wiedmann [Thu, 8 Nov 2018 14:06:18 +0000 (15:06 +0100)]
s390/qeth: remove card list

Re-implement the card-by-RDEV lookup by using device model concepts, and
remove the now redundant list of all qeth card instances in the system.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: unify transmit code
Julian Wiedmann [Thu, 8 Nov 2018 14:06:17 +0000 (15:06 +0100)]
s390/qeth: unify transmit code

Since commit 82bf5c0867f6 ("s390/qeth: add support for IPv6 TSO"),
qeth_xmit() also knows how to build TSO packets and is practically
identical to qeth_l3_xmit().
Convert qeth_l3_xmit() into a thin wrapper that merely strips the
L2 header off a packet, and calls qeth_xmit() for the actual
TX processing.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: handle af_iucv skbs in qeth_l3_fill_header()
Julian Wiedmann [Thu, 8 Nov 2018 14:06:16 +0000 (15:06 +0100)]
s390/qeth: handle af_iucv skbs in qeth_l3_fill_header()

Filling the HW header from one single function will make it easier to
rip out all the duplicated transmit code in qeth_l3_xmit(). On top, this
saves one conditional branch in the TSO path.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: utilize virtual MAC for Layer2 OSD devices
Julian Wiedmann [Thu, 8 Nov 2018 14:06:15 +0000 (15:06 +0100)]
s390/qeth: utilize virtual MAC for Layer2 OSD devices

By default, READ MAC on a Layer2 OSD device returns the adapter's
burnt-in MAC address. Given the default scenario of many virtual devices
on the same adapter, qeth can't make any use of this address and
therefore skips the READ MAC call for this device type.

But in some configurations, the READ MAC command for a Layer2 OSD device
actually returns a pre-provisioned, virtual MAC address. So enable the
READ MAC code to detect this situation, and let the L2 subdriver
call READ MAC for OSD devices.

This also removes the QETH_LAYER2_MAC_READ flag, which protects L2
devices against calling READ MAC multiple times. Instead protect the
whole call to qeth_l2_request_initial_mac().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoopenvswitch: remove BUG_ON from get_dpdev
Li RongQing [Thu, 8 Nov 2018 12:40:20 +0000 (20:40 +0800)]
openvswitch: remove BUG_ON from get_dpdev

if local is NULL pointer, and the following access of local's
dev will trigger panic, which is same as BUG_ON

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ICMP-error-handling-for-UDP-tunnels'
David S. Miller [Fri, 9 Nov 2018 01:13:09 +0000 (17:13 -0800)]
Merge branch 'ICMP-error-handling-for-UDP-tunnels'

Stefano Brivio says:

====================
ICMP error handling for UDP tunnels

This series introduces ICMP error handling for UDP tunnels and
encapsulations and related selftests. We need to handle ICMP errors to
support PMTU discovery and route redirection -- this support is entirely
missing right now:

- patch 1/11 adds a socket lookup for UDP tunnels that use, by design,
  the same destination port on both endpoints -- i.e. VXLAN and GENEVE
- patches 2/11 to 7/11 are specific to VxLAN and GENEVE
- patches 8/11 and 9/11 add infrastructure for lookup of encapsulations
  where sent packets cannot be matched via receiving socket lookup, i.e.
  FoU and GUE
- patches 10/11 and 11/11 are specific to FoU and GUE

v2: changes are listed in the single patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: pmtu: Introduce FoU and GUE PMTU exceptions tests
Stefano Brivio [Thu, 8 Nov 2018 11:19:24 +0000 (12:19 +0100)]
selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests

Introduce eight tests, for FoU and GUE, with IPv4 and IPv6 payload,
on IPv4 and IPv6 transport, that check that PMTU exceptions are created
with the right value when exceeding the MTU on a link of the path.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofou, fou6: ICMP error handlers for FoU and GUE
Stefano Brivio [Thu, 8 Nov 2018 11:19:23 +0000 (12:19 +0100)]
fou, fou6: ICMP error handlers for FoU and GUE

As the destination port in FoU and GUE receiving sockets doesn't
necessarily match the remote destination port, we can't associate errors
to the encapsulating tunnels with a socket lookup -- we need to blindly
try them instead. This means we don't even know if we are handling errors
for FoU or GUE without digging into the packets.

Hence, implement a single handler for both, one for IPv4 and one for IPv6,
that will check whether the packet that generated the ICMP error used a
direct IP encapsulation or if it had a GUE header, and send the error to
the matching protocol handler, if any.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: Support for error handlers of tunnels with arbitrary destination port
Stefano Brivio [Thu, 8 Nov 2018 11:19:22 +0000 (12:19 +0100)]
udp: Support for error handlers of tunnels with arbitrary destination port

ICMP error handling is currently not possible for UDP tunnels not
employing a receiving socket with local destination port matching the
remote one, because we have no way to look them up.

Add an err_handler tunnel encapsulation operation that can be exported by
tunnels in order to pass the error to the protocol implementing the
encapsulation. We can't easily use a lookup function as we did for VXLAN
and GENEVE, as protocol error handlers, which would be in turn called by
implementations of this new operation, handle the errors themselves,
together with the tunnel lookup.

Without a socket, we can't be sure which encapsulation error handler is
the appropriate one: encapsulation handlers (the ones for FoU and GUE
introduced in the next patch, e.g.) will need to check the new error codes
returned by protocol handlers to figure out if errors match the given
encapsulation, and, in turn, report this error back, so that we can try
all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match.

v2:
- Name all arguments in err_handler prototypes (David Miller)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Convert protocol error handlers from void to int
Stefano Brivio [Thu, 8 Nov 2018 11:19:21 +0000 (12:19 +0100)]
net: Convert protocol error handlers from void to int

We'll need this to handle ICMP errors for tunnels without a sending socket
(i.e. FoU and GUE). There, we might have to look up different types of IP
tunnels, registered as network protocols, before we get a match, so we
want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
inet_protos and inet6_protos. These error codes will be used in the next
patch.

For consistency, return sensible error codes in protocol error handlers
whenever handlers can't handle errors because, even if valid, they don't
match a protocol or any of its states.

This has no effect on existing error handling paths.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6
Stefano Brivio [Thu, 8 Nov 2018 11:19:20 +0000 (12:19 +0100)]
selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6

Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogeneve: Allow configuration of DF behaviour
Stefano Brivio [Thu, 8 Nov 2018 11:19:19 +0000 (12:19 +0100)]
geneve: Allow configuration of DF behaviour

draft-ietf-nvo3-geneve-08 says:

   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
   [RFC1981]) be used by setting the DF bit in the IP header when Geneve
   packets are transmitted over IPv4 (this is the default with IPv6).

Now that ICMP error handling is working for GENEVE, we can comply with
this recommendation.

Make this configurable, though, to avoid breaking existing setups. By
default, DF won't be set. It can be set or inherited from inner IPv4
packets. If it's configured to be inherited and we are encapsulating IPv6,
it will be set.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, apply DF
  setting only if (!geneve->collect_md) in geneve_xmit_skb()
  (Stephen Hemminger)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogeneve: ICMP error lookup handler
Stefano Brivio [Thu, 8 Nov 2018 11:19:18 +0000 (12:19 +0100)]
geneve: ICMP error lookup handler

Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6
Stefano Brivio [Thu, 8 Nov 2018 11:19:17 +0000 (12:19 +0100)]
selftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6

Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Change all occurrences of VxLAN to VXLAN (Jiri Benc)
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: Allow configuration of DF behaviour
Stefano Brivio [Thu, 8 Nov 2018 11:19:16 +0000 (12:19 +0100)]
vxlan: Allow configuration of DF behaviour

Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
DF is configured to be inherited, always set it.

For IPv4, inheriting DF from the inner header was probably intended from
the very beginning judging by the comment to vxlan_xmit(), but it wasn't
actually implemented -- also because it would have done more harm than
good, without handling for ICMP Fragmentation Needed messages.

According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
SHOULD set the DF-bit [...]", whatever that means.

Given this background, the only sane option is probably to let the user
decide, and keep the current behaviour as default.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, move DF
  setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: ICMP error lookup handler
Stefano Brivio [Thu, 8 Nov 2018 11:19:15 +0000 (12:19 +0100)]
vxlan: ICMP error lookup handler

Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: Handle ICMP errors for tunnels with same destination port on both endpoints
Stefano Brivio [Thu, 8 Nov 2018 11:19:14 +0000 (12:19 +0100)]
udp: Handle ICMP errors for tunnels with same destination port on both endpoints

For both IPv4 and IPv6, if we can't match errors to a socket, try
tunnels before ignoring them. Look up a socket with the original source
and destination ports as found in the UDP packet inside the ICMP payload,
this will work for tunnels that force the same destination port for both
endpoints, i.e. VXLAN and GENEVE.

Actually, lwtunnels could break this assumption if they are configured by
an external control plane to have different destination ports on the
endpoints: in this case, we won't be able to trace ICMP messages back to
them.

For IPv6 redirect messages, call ip6_redirect() directly with the output
interface argument set to the interface we received the packet from (as
it's the very interface we should build the exception on), otherwise the
new nexthop will be rejected. There's no such need for IPv4.

Tunnels can now export an encap_err_lookup() operation that indicates a
match. Pass the packet to the lookup function, and if the tunnel driver
reports a matching association, continue with regular ICMP error handling.

v2:
- Added newline between network and transport header sets in
  __udp{4,6}_lib_err_encap() (David Miller)
- Removed redundant skb_reset_network_header(skb); in
  __udp4_lib_err_encap()
- Removed redundant reassignment of iph in __udp4_lib_err_encap()
  (Sabrina Dubroca)
- Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this
  won't work with lwtunnels configured to use asymmetric ports. By the way,
  it's VXLAN, not VxLAN (Jiri Benc)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix spelling mistake, "assertting" -> "asserting"
Colin Ian King [Thu, 8 Nov 2018 10:32:02 +0000 (10:32 +0000)]
net: hns3: fix spelling mistake, "assertting" -> "asserting"

Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: Add new T6 PCI device ids 0x608a
Ganesh Goudar [Thu, 8 Nov 2018 08:51:07 +0000 (14:21 +0530)]
cxgb4: Add new T6 PCI device ids 0x608a

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/ipv6: compute anycast address hash only if dev is null
Li RongQing [Thu, 8 Nov 2018 06:58:07 +0000 (14:58 +0800)]
net/ipv6: compute anycast address hash only if dev is null

avoid to compute the hash value if dev is not null, since
hash value is not used

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bcmgenet: return correct value 'ret' from bcmgenet_power_down
YueHaibing [Thu, 8 Nov 2018 02:08:43 +0000 (02:08 +0000)]
net: bcmgenet: return correct value 'ret' from bcmgenet_power_down

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/broadcom/genet/bcmgenet.c: In function 'bcmgenet_power_down':
drivers/net/ethernet/broadcom/genet/bcmgenet.c:1136:6: warning:
 variable 'ret' set but not used [-Wunused-but-set-variable]

bcmgenet_power_down should return 'ret' instead of 0.

Fixes: ca8cf341903f ("net: bcmgenet: propagate errors from bcmgenet_power_down")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-sched-prepare-for-more-Qdisc-offloads'
David S. Miller [Fri, 9 Nov 2018 00:19:48 +0000 (16:19 -0800)]
Merge branch 'net-sched-prepare-for-more-Qdisc-offloads'

Jakub Kicinski says:

====================
net: sched: prepare for more Qdisc offloads

This series refactors the "switchdev" Qdisc offloads a little.  We have
a few Qdiscs which can be fully offloaded today to the forwarding plane
of switching devices.

First patch adds a helper for handing statistic dumps, the code seems
to be copy pasted between PRIO and RED.  Second patch removes unnecessary
parameter from RED offload function.  Third patch makes the MQ offload
use the dump helper which helps it behave much like PRIO and RED when
it comes to the TCQ_F_OFFLOADED flag.  Patch 4 adds a graft helper,
similar to the dump helper.

Patch 5 is unrelated to offloads, qdisc_graft() code seemed ripe for a
small refactor - no functional changes there.

Last two patches move the qdisc_put() call outside of the sch_tree_lock
section for RED and PRIO.  The child Qdiscs will get removed from the
hierarchy under the lock, but having the put (and potentially destroy)
called outside of the lock helps offload which may choose to sleep,
and it should generally lower the Qdisc change impact.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: prio: delay destroying child qdiscs on change
Jakub Kicinski [Thu, 8 Nov 2018 01:33:40 +0000 (17:33 -0800)]
net: sched: prio: delay destroying child qdiscs on change

Move destroying of the old child qdiscs outside of the sch_tree_lock()
section.  This should improve the software qdisc replace but is even
more important for offloads.  Calling offloads under a spin lock is
best avoided, and child's destroy would be called under sch_tree_lock().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: red: delay destroying child qdisc on replace
Jakub Kicinski [Thu, 8 Nov 2018 01:33:39 +0000 (17:33 -0800)]
net: sched: red: delay destroying child qdisc on replace

Move destroying of the old child qdisc outside of the sch_tree_lock()
section.  This should improve the software qdisc replace but is even
more important for offloads.  Firstly calling offloads under a spin
lock is best avoided.  Secondly the destroy event of existing child
would have been sent to the offload device before the replace, causing
confusion.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: refactor grafting Qdiscs with a parent
Jakub Kicinski [Thu, 8 Nov 2018 01:33:38 +0000 (17:33 -0800)]
net: sched: refactor grafting Qdiscs with a parent

The code for grafting Qdiscs when there is a parent has two needless
indentation levels, and breaks the "keep the success path unindented"
guideline.  Refactor.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: add an offload graft helper
Jakub Kicinski [Thu, 8 Nov 2018 01:33:37 +0000 (17:33 -0800)]
net: sched: add an offload graft helper

Qdisc graft operation of offload-capable qdiscs performs a few
extra steps which are identical among all the qdiscs.  Add
a helper to share this code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: set TCQ_F_OFFLOADED flag for MQ
Jakub Kicinski [Thu, 8 Nov 2018 01:33:36 +0000 (17:33 -0800)]
net: sched: set TCQ_F_OFFLOADED flag for MQ

PRIO and RED mark the qdisc with TCQ_F_OFFLOADED upon successful offload,
make MQ do the same.  The consistency will help with consistent
graft callback behaviour.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: red: remove unnecessary red_dump_offload_stats parameter
Jakub Kicinski [Thu, 8 Nov 2018 01:33:35 +0000 (17:33 -0800)]
net: sched: red: remove unnecessary red_dump_offload_stats parameter

Offload dump helper does not use opt parameter, remove it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: add an offload dump helper
Jakub Kicinski [Thu, 8 Nov 2018 01:33:34 +0000 (17:33 -0800)]
net: sched: add an offload dump helper

Qdisc dump operation of offload-capable qdiscs performs a few
extra steps which are identical among all the qdiscs.  Add
a helper to share this code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-improve-and-simplify-phylib-state-machine'
David S. Miller [Thu, 8 Nov 2018 23:02:06 +0000 (15:02 -0800)]
Merge branch 'net-phy-improve-and-simplify-phylib-state-machine'

Heiner Kallweit says:

====================
net: phy: improve and simplify phylib state machine

This patch series is based on two axioms:

- During autoneg a PHY always reports the link being down

- Info in clause 22/45 registers doesn't allow to differentiate between
  these two states:
  1. Link is physically down
  2. A link partner is connected and PHY is autonegotiating
  In both cases "link up" and "aneg finished" bits aren't set.
  One consequence is that having separate states PHY_NOLINK and PHY_AN
  isn't needed.

By using these two axioms the state machine can be significantly
simplified.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: use phy_check_link_status in more places in the state machine
Heiner Kallweit [Wed, 7 Nov 2018 19:47:53 +0000 (20:47 +0100)]
net: phy: use phy_check_link_status in more places in the state machine

Use phy_check_link_status in more places in the state machine.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: remove state PHY_AN
Heiner Kallweit [Wed, 7 Nov 2018 19:46:51 +0000 (20:46 +0100)]
net: phy: remove state PHY_AN

After the recent changes in the state machine state PHY_AN isn't used
any longer and can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: add phy_check_link_status
Heiner Kallweit [Wed, 7 Nov 2018 19:45:58 +0000 (20:45 +0100)]
net: phy: add phy_check_link_status

In few places in the state machine the state is set to PHY_RUNNING or
PHY_NOLINK after doing a phy_read_status(). So factor this out to
phy_check_link_status().

First use it in phy_start_aneg(): By setting the state to PHY_RUNNING
or PHY_NOLINK directly we can remove the code to handle the case that
we're using interrupts and aneg was finished already.

Definition of phy_link_up and phy_link_down needs to be moved because
they are called in the new function.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: remove useless check in state machine case PHY_RESUMING
Heiner Kallweit [Wed, 7 Nov 2018 19:44:56 +0000 (20:44 +0100)]
net: phy: remove useless check in state machine case PHY_RESUMING

If aneg isn't finished yet then the PHY reports the link as down.
There's no benefit in setting the state to PHY_AN because the next
state machine run would set the status to PHY_NOLINK anyway (except
in the meantime aneg has been finished and link is up). Therefore
we can set the state to PHY_RUNNING or PHY_NOLINK directly.

In addition change the do_carrier parameter in phy_link_down() to true.
If carrier was marked as up before (what should never be the case because
PHY was in state PHY_HALTED before) then we should mark it as down now.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: remove useless check in state machine case PHY_NOLINK
Heiner Kallweit [Wed, 7 Nov 2018 19:43:20 +0000 (20:43 +0100)]
net: phy: remove useless check in state machine case PHY_NOLINK

If aneg is enabled and the PHY reports the link as up then definitely
aneg finished successfully. Therefore this check is useless and
can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 8 Nov 2018 07:07:04 +0000 (23:07 -0800)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-11-07

This series contains updates to almost all of the Intel wired LAN
drivers.

Lance Roy replaces a spin lock with lockdep_assert_held() for igbvf
driver in move toward trying to remove spin_is_locked().

Colin Ian King fixes a potential null pointer dereference by adding a
check in ixgbe.  Also fixed the igc driver by properly assigning the
return error code of a function call, so that we can properly check it.

Shannon Nelson updates the ixgbe driver to not block IPsec offload when
in VEPA mode, in VEB mode, IPsec offload is still blocked because the
device drops packets into a black hole.

Jake adds support for software timestamping for packets sent over
ixgbevf.  Also modifies i40e, iavf, igb, igc, and ixgbe to delay calling
skb_tx_timestamp() to the latest point possible, which is just prior to
notifying the hardware of the new Tx packet.

Todd adds the new WoL filter flag so that we properly report that we do
not support this new feature.

YueHaibing from Huawei fixes the igc driver by cleaning up variables
that are not "really" used.

Dan Carpenter cleans up igc whitespace issues.

Miroslav Lichvar fixes e1000e for potential underflow issue in the
timecounter, so modify the driver to use timecounter_cyc2time() to allow
non-monotonic SYSTIM readings.

Sasha provides additional igc cleanups based on community feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'nfp-add-and-use-tunnel-netdev-helpers'
David S. Miller [Thu, 8 Nov 2018 07:00:24 +0000 (23:00 -0800)]
Merge branch 'nfp-add-and-use-tunnel-netdev-helpers'

John Hurley says:

====================
nfp: add and use tunnel netdev helpers

A recent patch introduced the function netif_is_vxlan() to verify the
tunnel type of a given netdev as vxlan.

Add a similar function to detect geneve netdevs and make use of this
function in the NFP driver. Also make use of the vxlan helper where
applicable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: flower: include geneve as supported offload tunnel type
John Hurley [Wed, 7 Nov 2018 18:32:50 +0000 (18:32 +0000)]
nfp: flower: include geneve as supported offload tunnel type

Offload of geneve decap rules is supported in NFP. Include geneve in the
check for supported types.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: flower: use geneve and vxlan helpers
John Hurley [Wed, 7 Nov 2018 18:32:49 +0000 (18:32 +0000)]
nfp: flower: use geneve and vxlan helpers

Make use of the recently added VXLAN and geneve helper functions to
determine the type of the netdev from its rtnl_link_ops.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add netif_is_geneve()
John Hurley [Wed, 7 Nov 2018 18:32:48 +0000 (18:32 +0000)]
net: add netif_is_geneve()

Add a helper function to determine if the type of a netdev is geneve based
on its rtnl_link_ops. This allows drivers that may wish to offload tunnels
to check the underlying type of the device.

A recent patch added a similar helper to vxlan.h

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc: add missing NVRAM partition types for EF10
Edward Cree [Wed, 7 Nov 2018 18:12:42 +0000 (18:12 +0000)]
sfc: add missing NVRAM partition types for EF10

Expose the MUM/SUC Firmware, UEFI Expansion ROM and MC Status partitions
 of the NIC's NVRAM as MTDs if found on the NIC.  The first two are needed
 in order to properly update them when performing firmware updates; the MC
 Status partition is used to determine whether a signed firmware image was
 accepted or rejected by a Secure NIC.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'vlan-prepare-for-removal-of-VLAN_TAG_PRESENT'
David S. Miller [Thu, 8 Nov 2018 06:41:20 +0000 (22:41 -0800)]
Merge branch 'vlan-prepare-for-removal-of-VLAN_TAG_PRESENT'

Michał Mirosław says:

====================
net/vlan: prepare for removal of VLAN_TAG_PRESENT

This is a preparatory patchset before removing the use of VLAN_TAG_PRESENT
bit in skb->vlan_tci as indication of VLAN offload. This set includes
only cleanups that allow abstracting of code testing VLAN tag presence
in drivers and networking code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/vlan: remove unused #define HAVE_VLAN_GET_TAG
Michał Mirosław [Wed, 7 Nov 2018 17:07:03 +0000 (18:07 +0100)]
net/vlan: remove unused #define HAVE_VLAN_GET_TAG

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/vlan: include the shift in skb_vlan_tag_get_prio()
Michał Mirosław [Wed, 7 Nov 2018 17:07:03 +0000 (18:07 +0100)]
net/vlan: include the shift in skb_vlan_tag_get_prio()

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/vlan: introduce __vlan_hwaccel_copy_tag() helper
Michał Mirosław [Wed, 7 Nov 2018 17:07:02 +0000 (18:07 +0100)]
net/vlan: introduce __vlan_hwaccel_copy_tag() helper

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/vlan: introduce __vlan_hwaccel_clear_tag() helper
Michał Mirosław [Wed, 7 Nov 2018 17:07:02 +0000 (18:07 +0100)]
net/vlan: introduce __vlan_hwaccel_clear_tag() helper

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: minor optimization for backlog setting in listen(2)
Yafang Shao [Wed, 7 Nov 2018 11:20:16 +0000 (19:20 +0800)]
inet: minor optimization for backlog setting in listen(2)

Set the backlog earlier in inet_dccp_listen() and inet_listen(),
then we can avoid the redundant setting.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: vlan: add support for tunnel offload
Davide Caratti [Wed, 7 Nov 2018 10:28:18 +0000 (11:28 +0100)]
net: vlan: add support for tunnel offload

GSO tunneled packets are always segmented in software before they are
transmitted by a VLAN, even when the lower device can offload tunnel
encapsulation and VLAN together (i.e., some bits in NETIF_F_GSO_ENCAP_ALL
mask are set in the lower device 'vlan_features'). If we let VLANs have
the same tunnel offload capabilities as their lower device, throughput
can improve significantly when CPU is limited on the transmitter side.

 - set NETIF_F_GSO_ENCAP_ALL bits in the VLAN 'hw_features', to ensure
 that 'features' will have those bits zeroed only when the lower device
 has no hardware support for tunnel encapsulation.
 - for the same reason, copy GSO-related bits of 'hw_enc_features' from
 lower device to VLAN, and ensure to update that value when the lower
 device changes its features.
 - set NETIF_F_HW_CSUM bit in the VLAN 'hw_enc_features' if 'real_dev'
 is able to compute checksums at least for a kind of packets, like done
 with commit 8403debeead8 ("vlan: Keep NETIF_F_HW_CSUM similar to other
 software devices"). This avoids software segmentation due to mismatching
 checksum capabilities between VLAN's 'features' and 'hw_enc_features'.

Reported-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotun: compute the RFS hash only if needed.
Paolo Abeni [Wed, 7 Nov 2018 09:34:36 +0000 (10:34 +0100)]
tun: compute the RFS hash only if needed.

The tun XDP sendmsg code path, unconditionally computes the symmetric
hash of each packet for RFS's sake, even when we could skip it. e.g.
when the device has a single queue.

This change adds the check already in-place for the skb sendmsg path
to avoid unneeded hashing.

The above gives small, but measurable, performance gain for VM xmit
path when zerocopy is not enabled.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/wan/fsl_ucc_hdlc: add BQL support
Mathias Thore [Wed, 7 Nov 2018 08:09:45 +0000 (09:09 +0100)]
net/wan/fsl_ucc_hdlc: add BQL support

Add byte queue limits support in the fsl_ucc_hdlc driver.

Signed-off-by: Mathias Thore <mathias.thore@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: realtek: load driver for all PHYs with a Realtek OUI
Heiner Kallweit [Wed, 7 Nov 2018 07:52:46 +0000 (08:52 +0100)]
net: phy: realtek: load driver for all PHYs with a Realtek OUI

Instead of listing every single PHYID, load the driver for every PHYID
with a Realtek OUI, independent of model number and revision.

This patch also improves two further aspects:
- constify realtek_tbl[]
- the mask should have been 0xffffffff instead of 0x001fffff so far,
  by masking out some bits a PHY from another vendor could have been
  matched

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: make phy_trigger_machine static
Heiner Kallweit [Wed, 7 Nov 2018 07:15:58 +0000 (08:15 +0100)]
net: phy: make phy_trigger_machine static

phy_trigger_machine() is used in phy.c only, so we can make it static.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: bcm_sf2: fix semicolon.cocci warnings
kbuild test robot [Wed, 7 Nov 2018 00:50:34 +0000 (08:50 +0800)]
net: dsa: bcm_sf2: fix semicolon.cocci warnings

drivers/net/dsa/bcm_sf2_cfp.c:1168:2-3: Unneeded semicolon
drivers/net/dsa/bcm_sf2_cfp.c:532:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: ae7a5aff783c ("net: dsa: bcm_sf2: Keep copy of inserted rules")
CC: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: bcm7xxx: Add entry for BCM7255
Justin Chen [Wed, 7 Nov 2018 00:37:44 +0000 (16:37 -0800)]
net: phy: bcm7xxx: Add entry for BCM7255

Add support for BCM7255 EPHY.

Signed-off-by: Justin Chen <justinpopo6@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'udp-gro'
David S. Miller [Thu, 8 Nov 2018 00:23:05 +0000 (16:23 -0800)]
Merge branch 'udp-gro'

Paolo Abeni says:

====================
udp: implement GRO support

This series implements GRO support for UDP sockets, as the RX counterpart
of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
The core functionality is implemented by the second patch, introducing a new
sockopt to enable UDP_GRO, while patch 3 implements support for passing the
segment size to the user space via a new cmsg.
UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
directed to UDP GRO enabled sockets with constant l4 tuple.

UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
rules, and that could potentially confuse existing applications.

The solution adopted here is to de-segment the GRO packet before enqueuing
as needed. Since we must cope with packet reinsertion after de-segmentation,
the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
to UDP usage.

While the current code can probably be improved, this safeguard ,implemented in
the patches 4-7, allows future enachements to enable UDP GSO offload on more
virtual devices eventually even on forwarded packets.

The last 4 for patches implement some performance and functional self-tests,
re-using the existing udpgso infrastructure. The problematic scenario described
above is explicitly tested.

This revision of the series try to address the feedback provided by Willem and
Subash on previous iteration.
====================

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: add functionals test for UDP GRO
Paolo Abeni [Wed, 7 Nov 2018 11:38:37 +0000 (12:38 +0100)]
selftests: add functionals test for UDP GRO

Extends the existing udp programs to allow checking for proper
GRO aggregation/GSO size, and run the tests via a shell script, using
a veth pair with XDP program attached to trigger the GRO code path.

rfc v3 -> v1:
 - use ip route to attach the xdp helper to the veth

rfc v2 -> rfc v3:
 - add missing test program options documentation
 - fix sporatic test failures (receiver faster than sender)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: add some benchmark for UDP GRO
Paolo Abeni [Wed, 7 Nov 2018 11:38:36 +0000 (12:38 +0100)]
selftests: add some benchmark for UDP GRO

Run on top of veth pair, using a dummy XDP program to enable the GRO.

 rfc v3 -> v1:
  - use ip route to attach the xdp helper to the veth

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: add dummy xdp test helper
Paolo Abeni [Wed, 7 Nov 2018 11:38:35 +0000 (12:38 +0100)]
selftests: add dummy xdp test helper

This trivial XDP program does nothing, but will be used by the
next patch to test the GRO path in a net namespace, leveraging
the veth XDP implementation.

It's added here, despite its 'net' usage, to avoid the duplication
of the llc-related makefile boilerplate.

rfc v3 -> v1:
 - move the helper implementation into the bpf directory, don't
   touch udpgso_bench_rx

rfc v2 -> rfc v3:
 - move 'x' option handling here

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: add GRO support to udp bench rx program
Paolo Abeni [Wed, 7 Nov 2018 11:38:34 +0000 (12:38 +0100)]
selftests: add GRO support to udp bench rx program

And fix a couple of buglets (port option processing,
clean termination on SIGINT). This is preparatory work
for GRO tests.

rfc v2 -> rfc v3:
 - use ETH_MAX_MTU

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: cope with UDP GRO packet misdirection
Paolo Abeni [Wed, 7 Nov 2018 11:38:33 +0000 (12:38 +0100)]
udp: cope with UDP GRO packet misdirection

In some scenarios, the GRO engine can assemble an UDP GRO packet
that ultimately lands on a non GRO-enabled socket.
This patch tries to address the issue explicitly checking for the UDP
socket features before enqueuing the packet, and eventually segmenting
the unexpected GRO packet, as needed.

We must also cope with re-insertion requests: after segmentation the
UDP code calls the helper introduced by the previous patches, as needed.

Segmentation is performed by a common helper, which takes care of
updating socket and protocol stats is case of failure.

rfc v3 -> v1
 - fix compile issues with rxrpc
 - when gso_segment returns NULL, treat is as an error
 - added 'ipv4' argument to udp_rcv_segment()

rfc v2 -> rfc v3
 - moved udp_rcv_segment() into net/udp.h, account errors to socket
   and ns, always return NULL or segs list

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: factor out protocol delivery helper
Paolo Abeni [Wed, 7 Nov 2018 11:38:32 +0000 (12:38 +0100)]
ipv6: factor out protocol delivery helper

So that we can re-use it at the UDP level in the next patch

rfc v3 -> v1:
 - add the helper declaration into the ipv6 header

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoip: factor out protocol delivery helper
Paolo Abeni [Wed, 7 Nov 2018 11:38:31 +0000 (12:38 +0100)]
ip: factor out protocol delivery helper

So that we can re-use it at the UDP level in a later patch

rfc v3 -> v1
 - add the helper declaration into the ip header

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: add support for UDP_GRO cmsg
Paolo Abeni [Wed, 7 Nov 2018 11:38:30 +0000 (12:38 +0100)]
udp: add support for UDP_GRO cmsg

When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
datagram size. User-space can use such info to compute the original
packets layout.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: implement GRO for plain UDP sockets.
Paolo Abeni [Wed, 7 Nov 2018 11:38:29 +0000 (12:38 +0100)]
udp: implement GRO for plain UDP sockets.

This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
eligible for GRO in the rx path: UDP segments directed to such socket
are assembled into a larger GSO_UDP_L4 packet.

The core UDP GRO support is enabled with setsockopt(UDP_GRO).

Initial benchmark numbers:

Before:
udp rx:   1079 MB/s   769065 calls/s

After:
udp rx:   1466 MB/s    24877 calls/s

This change introduces a side effect in respect to UDP tunnels:
after a UDP tunnel creation, now the kernel performs a lookup per ingress
UDP packet, while before such lookup happened only if the ingress packet
carried a valid internal header csum.

rfc v2 -> rfc v3:
 - fixed typos in macro name and comments
 - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1
 - acquire socket lock in UDP_GRO setsockopt

rfc v1 -> rfc v2:
 - use a new option to enable UDP GRO
 - use static keys to protect the UDP GRO socket lookup

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: implement complete book-keeping for encap_needed
Paolo Abeni [Wed, 7 Nov 2018 11:38:28 +0000 (12:38 +0100)]
udp: implement complete book-keeping for encap_needed

The *encap_needed static keys are enabled by UDP tunnels
and several UDP encapsulations type, but they are never
turned off. This can cause unneeded overall performance
degradation for systems where such features are used
transiently.

This patch introduces complete book-keeping for such keys,
decreasing the usage at socket destruction time, if needed,
and avoiding that the same socket could increase the key
usage multiple times.

rfc v3 -> v1:
 - add socket lock around udp_tunnel_encap_enable()

rfc v2 -> rfc v3:
 - use udp_tunnel_encap_enable() in setsockopt()

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'vrf-allow-simultaneous-service-instances-in-default-and-other-VRFs'
David S. Miller [Thu, 8 Nov 2018 00:12:39 +0000 (16:12 -0800)]
Merge branch 'vrf-allow-simultaneous-service-instances-in-default-and-other-VRFs'

Mike Manning says:

====================
vrf: allow simultaneous service instances in default and other VRFs

Services currently have to be VRF-aware if they are using an unbound
socket. One cannot have multiple service instances running in the
default and other VRFs for services that are not VRF-aware and listen
on an unbound socket. This is because there is no easy way of isolating
packets received in the default VRF from those arriving in other VRFs.

This series provides this isolation for stream sockets subject to the
existing kernel parameter net.ipv4.tcp_l3mdev_accept not being set,
given that this is documented as allowing a single service instance to
work across all VRF domains. Similarly, net.ipv4.udp_l3mdev_accept is
checked for datagram sockets, and net.ipv4.raw_l3mdev_accept is
introduced for raw sockets. The functionality applies to UDP & TCP
services as well as those using raw sockets, and is for IPv4 and IPv6.

Example of running ssh instances in default and blue VRF:

$ /usr/sbin/sshd -D
$ ip vrf exec vrf-blue /usr/sbin/sshd
$ ss -ta | egrep 'State|ssh'
State   Recv-Q   Send-Q           Local Address:Port       Peer Address:Port
LISTEN  0        128           0.0.0.0%vrf-blue:ssh             0.0.0.0:*
LISTEN  0        128                    0.0.0.0:ssh             0.0.0.0:*
ESTAB   0        0              192.168.122.220:ssh       192.168.122.1:50282
LISTEN  0        128              [::]%vrf-blue:ssh                [::]:*
LISTEN  0        128                       [::]:ssh                [::]:*
ESTAB   0        0           [3000::2]%vrf-blue:ssh           [3000::9]:45896
ESTAB   0        0                    [2000::2]:ssh           [2000::9]:46398

v1:
   - Address Paolo Abeni's comments (patch 4/5)
   - Fix build when CONFIG_NET_L3_MASTER_DEV not defined (patch 1/5)
v2:
   - Address David Aherns' comments (patches 4/5 and 5/5)
   - Remove patches 3/5 and 5/5 from series for individual submissions
   - Include a sysctl for raw sockets as recommended by David Ahern
   - Expand series into 10 patches and provide improved descriptions
v3:
   - Update description for patch 1/10 and remove patch 6/10
v4:
   - Set default to enabled for raw socket sysctl as recommended by David Ahern
v5:
   - Address review comments from David Ahern in patches 2-5
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: do not drop vrf udp multicast packets
Dewi Morgan [Wed, 7 Nov 2018 15:36:10 +0000 (15:36 +0000)]
ipv6: do not drop vrf udp multicast packets

For bound udp sockets in a vrf, also check the sdif to get the index
for ingress devices enslaved to an l3mdev.

Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: handling of multicast packets received in VRF
Mike Manning [Wed, 7 Nov 2018 15:36:09 +0000 (15:36 +0000)]
ipv6: handling of multicast packets received in VRF

If the skb for multicast packets marked as enslaved to a VRF are
received, then the secondary device index should be used to obtain
the real device. And verify the multicast address against the
enslaved rather than the l3mdev device.

Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: allow ping to link-local address in VRF
Mike Manning [Wed, 7 Nov 2018 15:36:08 +0000 (15:36 +0000)]
ipv6: allow ping to link-local address in VRF

If link-local packets are marked as enslaved to a VRF, then to allow
ping to the link-local from a vrf, the error handling for IPV6_PKTINFO
needs to be relaxed to also allow the pkt ipi6_ifindex to be that of a
slave device to the vrf.

Note that the real device also needs to be retrieved in icmp6_iif()
to set the ipv6 flow oif to this for icmp echo reply handling. The
recent commit 24b711edfc34 ("net/ipv6: Fix linklocal to global address
with VRF") takes care of this, so the sdif does not need checking here.

This fix makes ping to link-local consistent with that to global
addresses, in that this can now be done from within the same VRF that
the address is in.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovrf: mark skb for multicast or link-local as enslaved to VRF
Mike Manning [Wed, 7 Nov 2018 15:36:07 +0000 (15:36 +0000)]
vrf: mark skb for multicast or link-local as enslaved to VRF

The skb for packets that are multicast or to a link-local address are
not marked as being enslaved to a VRF, if they are received on a socket
bound to the VRF. This is needed for ND and it is preferable for the
kernel not to have to deal with the additional use-cases if ll or mcast
packets are handled as enslaved. However, this does not allow service
instances listening on unbound and bound to VRF sockets to distinguish
the VRF used, if packets are sent as multicast or to a link-local
address. The fix is for the VRF driver to also mark these skb as being
enslaved to the VRF.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>