OSDN Git Service

uclinux-h8/linux.git
5 years agotest_rhashtable: remove semaphore usage
Arnd Bergmann [Sun, 16 Dec 2018 19:48:21 +0000 (20:48 +0100)]
test_rhashtable: remove semaphore usage

This is one of only two files that initialize a semaphore to a negative
value. We don't really need the two semaphores here at all, but can do
the same thing in more conventional and more effient way, by using a
single waitqueue and an atomic thread counter.

This gets us a little bit closer to eliminating classic semaphores from
the kernel. It also fixes a corner case where we fail to continue after
one of the threads fails to start up.

An alternative would be to use a split kthread_create()+wake_up_process()
and completely eliminate the separate synchronization.

Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: print stack trace in phy_error
Heiner Kallweit [Sun, 16 Dec 2018 18:18:26 +0000 (19:18 +0100)]
net: phy: print stack trace in phy_error

So far phy_error() silently stops the PHY state machine. If the network
driver doesn't inform about a  MDIO error then the user may wonder why
his network is down. Let's print the stack trace to facilitate search
for the root cause of the error.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: improve phy state checking
Heiner Kallweit [Sun, 16 Dec 2018 17:30:14 +0000 (18:30 +0100)]
net: phy: improve phy state checking

Add helpers phy_is_started() and __phy_is_started() to avoid open-coded
checks whether PHY has been started. To make the check easier move
PHY_HALTED before PHY_UP in enum phy_state. Further improvements:

phy_start_aneg():
Return -EBUSY and print warning if function is called from a non-started
state (DOWN, READY, HALTED). Better check because function is exported
and drivers may use it incorrectly.

phy_interrupt():
Return IRQ_NONE also if state is DOWN or READY. We should never receive
an interrupt in one of these states, but better play safe.

phy_stop():
Just return and print a warning if PHY is in a non-started state.
This warning should help to identify drivers with unbalanced calls to
phy_start() / phy_stop().

phy_state_machine():
Schedule state machine run only if PHY is in a started state.
E.g. if state is READY we don't need the state machine, it will be
started by phy_start().

v2:
- don't use __func__ within phy_warn_state
v3:
- use WARN() instead of printing error message to facilitate debugging

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fec: remove workaround to restart phylib state machine on MDIO timeout
Heiner Kallweit [Sun, 16 Dec 2018 14:00:40 +0000 (15:00 +0100)]
net: fec: remove workaround to restart phylib state machine on MDIO timeout

There's a workaround to restart the phylib state machine in case of a
MDIO access timeout. Seems it was introduced to deal with the
consequences of a too small MDIO timeout. See also commit message of
c3b084c24c8a ("net: fec: Adjust ENET MDIO timeouts") which increased
the timeout value later. Due to the later timeout value fix it seems
to be safe to remove the workaround.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: fix indentation issues, remove extra spaces
Colin Ian King [Sun, 16 Dec 2018 13:33:15 +0000 (13:33 +0000)]
bonding: fix indentation issues, remove extra spaces

There are two statements that are indented too much by one space each,
fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Tue, 18 Dec 2018 20:01:02 +0000 (12:01 -0800)]
Merge branch 'hns3-next'

Peng Li says:

====================
net: hns3: code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix a SSU buffer checking bug
Yunsheng Lin [Tue, 18 Dec 2018 11:37:59 +0000 (19:37 +0800)]
net: hns3: fix a SSU buffer checking bug

When caculating the SSU buffer, it first allocate tx and
rx private buffer, then the remaining buffer is for rx
shared buffer. The remaining buffer size should be at
least bigger than or equal to the shared_std, which is the
minimum shared buffer size required by the driver, but
currently if the remaining buffer size is equal to the
shared_std, it returns failure, which causes SSU buffer
allocation failure problem.

This patch fixes this problem by rounding up shared_std before
checking the the remaining buffer size bigger than or equal to
the shared_std.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: aligning buffer size in SSU to 256 bytes
Yunsheng Lin [Tue, 18 Dec 2018 11:37:58 +0000 (19:37 +0800)]
net: hns3: aligning buffer size in SSU to 256 bytes

The hardware expects the buffer size set to SSU is aligned to
256 bytes, this patch aligns the buffer size to 256 byte using
roundup or rounddown function.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: getting tx and dv buffer size through firmware
Yunsheng Lin [Tue, 18 Dec 2018 11:37:57 +0000 (19:37 +0800)]
net: hns3: getting tx and dv buffer size through firmware

This patch adds support of getting tx and dv buffer size through
firmware, because different version of hardware requires different
size of tx and dv buffer.

This patch also add dv_buf_size to tc' private buffer size even if
pfc is not enable for the tc.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: synchronize speed and duplex from phy when phy link up
Peng Li [Tue, 18 Dec 2018 11:37:56 +0000 (19:37 +0800)]
net: hns3: synchronize speed and duplex from phy when phy link up

Driver calls phy_connect_direct and registers hclge_mac_adjust_link
to synchronize mac speed and duplex from phy. It is better to
synchronize mac speed and duplex from phy when phy link up.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove 1000M/half support of phy
Fuyun Liang [Tue, 18 Dec 2018 11:37:55 +0000 (19:37 +0800)]
net: hns3: remove 1000M/half support of phy

Our phy does not support 1000M/half, this patch removes 1000M/half from
PHY_SUPPORTED_FEATURES.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: update coalesce param per second
Peng Li [Tue, 18 Dec 2018 11:37:54 +0000 (19:37 +0800)]
net: hns3: update coalesce param per second

coalesce param updates every 100 napi times, it may update a little
late if ping test after a high rate flow, may over napi poll is called
100 times as ping test sends packets every second.

This patch updates coalesce param every second, instead with every
100 napi times. It can not update the param 100% in time, but the
lag time is very short.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix incomplete uninitialization of IRQ in the hns3_nic_uninit_vector_data()
Huazhong Tan [Tue, 18 Dec 2018 11:37:53 +0000 (19:37 +0800)]
net: hns3: fix incomplete uninitialization of IRQ in the hns3_nic_uninit_vector_data()

In the hns3_nic_uninit_vector_data(), the procedure of uninitializing
the tqp_vector's IRQ has not set affinity_notify to NULL and changes
its init flag. This patch fixes it. And for simplificaton, local
variable tqp_vector is used instead of priv->tqp_vector[i].

Fixes: 424eb834a9be ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove unnecessary configuration recapture while resetting
Huazhong Tan [Tue, 18 Dec 2018 11:37:52 +0000 (19:37 +0800)]
net: hns3: remove unnecessary configuration recapture while resetting

When doing reset, it is unnecessary to get the hardware's default
configuration again, otherwise, the user's configuration will be
overwritten.

Fixes: 4ed340ab8f49 ("net: hns3: Add reset process in hclge_main")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: update some variables while hclge_reset()/hclgevf_reset() done
Huazhong Tan [Tue, 18 Dec 2018 11:37:51 +0000 (19:37 +0800)]
net: hns3: update some variables while hclge_reset()/hclgevf_reset() done

When hclge_reset() completes successfully, it should update the
last_reset_time, set reset_fail_cnt to 0, and set reset_type of
hnae3_ae_dev to HNAE3_NONE_RESET.

Also when hclgevf_reset() completes successfully, it should update
the last_reset_time, and set reset_type of hnae3_ae_dev to
HNAE3_NONE_RESET.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix napi_disable not return problem
Huazhong Tan [Tue, 18 Dec 2018 11:37:50 +0000 (19:37 +0800)]
net: hns3: fix napi_disable not return problem

While doing DOWN, the calling of napi_disable() may not return, since the
napi_complete() in the hns3_nic_common_poll() will never be called when
HNS3_NIC_STATE_DOWN is set. So we need to call napi_complete() before
checking HNS3_NIC_STETE_DOWN.

Fixes: ff0699e04b97 ("net: hns3: stop napi polling when HNS3_NIC_STATE_DOWN is set")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: uninitialize pci in the hclgevf_uninit
Huazhong Tan [Tue, 18 Dec 2018 11:37:49 +0000 (19:37 +0800)]
net: hns3: uninitialize pci in the hclgevf_uninit

In the hclgevf_pci_reset(), it only uninitialize and initialize
the msi, so if the initialization fails, hclgevf_uninit_hdev()
does not need to uninitialize the msi, but needs to uninitialize
the pci, otherwise it will cause pci resource not free.

Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix error handling int the hns3_get_vector_ring_chain
Huazhong Tan [Tue, 18 Dec 2018 11:37:48 +0000 (19:37 +0800)]
net: hns3: fix error handling int the hns3_get_vector_ring_chain

When hns3_get_vector_ring_chain() failed in the
hns3_nic_init_vector_data(), it should do the error handling instead
of return directly.

Also, cur_chain should be freed instead of chain and head->next should
be set to NULL in error handling of hns3_get_vector_ring_chain.

This patch fixes them.

Fixes: 73b907a083b8 ("net: hns3: bugfix for buffer not free problem during resetting")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Tue, 18 Dec 2018 16:49:48 +0000 (08:49 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-12-18

1) Add xfrm policy selftest scripts.
   From Florian Westphal.

2) Split inexact policies into four different search list
   classes and use the rbtree infrastructure to store/lookup
   the policies. This is to improve the policy lookup
   performance after the flowcache removal.
   Patches from Florian Westphal.

3) Various coding style fixes, from Colin Ian King.

4) Fix policy lookup logic after adding the inexact policy
   search tree infrastructure. From Florian Westphal.

5) Remove a useless remove BUG_ON from xfrm6_dst_ifdown.
   From Li RongQing.

6) Use the correct policy direction for lookups on hash
   rebuilding. From Florian Westphal.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: flower: fix cb_ident duplicate in indirect block register
John Hurley [Tue, 18 Dec 2018 03:18:39 +0000 (19:18 -0800)]
nfp: flower: fix cb_ident duplicate in indirect block register

Previously the identifier used for indirect block callback registry and
for block rule cb registry (when done via indirect blocks) was the pointer
to the netdev we were interested in receiving updates on. This worked fine
if a single app existed that registered one callback per netdev of
interest. However, if multiple cards are in place and, in turn, multiple
apps, then each app may register the same callback with the same
identifier to both the netdev's indirect block cb list and to a block's cb
list. This can lead to EEXIST errors and/or incorrect cb deletions.

Prevent this conflict by using the app pointer as the identifier for
netdev indirect block cb registry, allowing each app to register a unique
callback per netdev. For block cb registry, the same app may register
multiple cbs to the same block if using TC shared blocks. Instead of the
app, use the pointer to the allocated cb_priv data as the identifier here.
This means that there can be a unique block callback for each app/netdev
combo.

Fixes: 3166dd07a9cb ("nfp: flower: offload tunnel decap rules via indirect TC blocks")
Reported-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Update the supported firmware to version 13.1910.622
Shalom Toledo [Tue, 18 Dec 2018 07:31:31 +0000 (07:31 +0000)]
mlxsw: spectrum: Update the supported firmware to version 13.1910.622

This new firmware contains:
 * New packet traps for discarded packets
 * Secure firmware flash bug fix
 * Fence mechanism bug fix
 * TCAM RMA bug fix

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoip6mr: Drop mfc6_cache argument to ip6mr_forward2
David Ahern [Mon, 17 Dec 2018 23:36:11 +0000 (15:36 -0800)]
ip6mr: Drop mfc6_cache argument to ip6mr_forward2

mfc6_cache is not needed by ip6mr_forward2 so drop it from the input
argument list.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipmr: Drop mfc_cache argument to ipmr_queue_xmit
David Ahern [Mon, 17 Dec 2018 23:34:48 +0000 (15:34 -0800)]
ipmr: Drop mfc_cache argument to ipmr_queue_xmit

mfc_cache is not needed by ipmr_queue_xmit so drop it from the input
argument list.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dccp: initialize (addr,port) listening hashtable
Peter Oskolkov [Sun, 16 Dec 2018 23:42:48 +0000 (15:42 -0800)]
net: dccp: initialize (addr,port) listening hashtable

Commit d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
removes port-only listener lookups. This caused segfaults in DCCP
lookups because DCCP did not initialize the (addr,port) hashtable.

This patch adds said initialization.

The only non-trivial issue here is the size of the new hashtable.
It seemed reasonable to make it match the size of the port-only
hashtable (= INET_LHTABLE_SIZE) that was used previously. Other
parameters to inet_hashinfo2_init() match those used in TCP.

V2 changes: marked inet_hashinfo2_init as an exported symbol
so that DCCP compiles when configured as a module.

Tested: syzcaller issues fixed; the second patch in the patchset
        tests that DCCP lookups work correctly.

Fixes: d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
Reported-by: syzcaller <syzkaller@googlegroups.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'bnxt_en-next'
David S. Miller [Tue, 18 Dec 2018 07:08:54 +0000 (23:08 -0800)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next.

Two main changes in this seris plus some miscellaneous changes.

1. Improvements and fixes for resource accounting which are required
for enabling SR-IOV and RDMA on the new 57500 chips.  Only SR-IOV
for 57500 chips is enabled in this series.

2. New statistics counters and improvements to keep the basic
counters and port counters during IFDOWN.

3. Msic. small changes for ETS, returning proper error codes
when flashing NVRAM, and a link speed related fix for ethtool
loopback selftest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: query force speeds before disabling autoneg mode.
Vasundhara Volam [Sun, 16 Dec 2018 23:46:31 +0000 (18:46 -0500)]
bnxt_en: query force speeds before disabling autoneg mode.

With autoneg enabled, PHY loopback test fails. To disable autoneg,
driver needs to send a valid forced speed to FW. FW is not sending
async event for invalid speeds. To fix this, query forced speeds
and send the correct speed when disabling autoneg mode.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Do not free port statistics buffer when device is down.
Michael Chan [Sun, 16 Dec 2018 23:46:30 +0000 (18:46 -0500)]
bnxt_en: Do not free port statistics buffer when device is down.

Port statistics which include RDMA counters are useful even when the
netdevice is down.  Do not free the port statistics DMA buffers
when the netdevice is down.  This is keep the snapshot of the port
statistics and counters will just continue counting when the
netdevice goes back up.

Split the bnxt_free_stats() function into 2 functions.  The port
statistics buffers will only be freed when the netdevice is
removed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Save ring statistics before reset.
Michael Chan [Sun, 16 Dec 2018 23:46:29 +0000 (18:46 -0500)]
bnxt_en: Save ring statistics before reset.

With the current driver, the statistics reported by .ndo_get_stats64()
are reset when the device goes down.  Store a snapshot of the
rtnl_link_stats64 before shutdown.  This snapshot is added to the
current counters in .ndo_get_stats64() so that the counters will not
get reset when the device is down.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Return linux standard errors in bnxt_ethtool.c
Vasundhara Volam [Sun, 16 Dec 2018 23:46:28 +0000 (18:46 -0500)]
bnxt_en: Return linux standard errors in bnxt_ethtool.c

Currently firmware specific errors are returned directly in flash_device
and reset ethtool hooks. Modify it to return linux standard errors
to userspace when flashing operations fail.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Don't set ETS on unused TCs.
Michael Chan [Sun, 16 Dec 2018 23:46:27 +0000 (18:46 -0500)]
bnxt_en: Don't set ETS on unused TCs.

Currently, the code allows ETS bandwidth weight 0 to be set on unused TCs.
We should not set any DCB parameters on unused TCs at all.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Add ethtool -S priority counters.
Michael Chan [Sun, 16 Dec 2018 23:46:26 +0000 (18:46 -0500)]
bnxt_en: Add ethtool -S priority counters.

Display the CoS counters as additional priority counters by looking up
the priority to CoS queue mapping.  If the TX extended port statistics
block size returned by firmware is big enough to cover the CoS counters,
then we will display the new priority counters.  We call firmware to get
the up-to-date pri2cos mapping to convert the CoS counters to
priority counters.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Add SR-IOV support for 57500 chips.
Michael Chan [Sun, 16 Dec 2018 23:46:25 +0000 (18:46 -0500)]
bnxt_en: Add SR-IOV support for 57500 chips.

There are some minor differences when assigning VF resources on the
new chips.  The MSIX (NQ) resource has to be assigned and ring group
is not needed on the new chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.
Michael Chan [Sun, 16 Dec 2018 23:46:24 +0000 (18:46 -0500)]
bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.

When bringing up a device, the code checks to see if the number of
MSIX has changed.  pci_disable_msix() should be called first before
changing the number of reserved NQs/CMPL rings.  This ensures that
the MSIX vectors associated with the NQs/CMPL rings are still
properly mapped when pci_disable_msix() masks the vectors.

This patch will prevent errors when RDMA support is added for the new
57500 chips.  When the RDMA driver shuts down, the number of NQs is
decreased and we must use the new sequence to prevent MSIX errors.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Reserve 1 stat_ctx for RDMA driver.
Vasundhara Volam [Sun, 16 Dec 2018 23:46:23 +0000 (18:46 -0500)]
bnxt_en: Reserve 1 stat_ctx for RDMA driver.

bnxt_en requires same number of stat_ctxs as CP rings but RDMA
requires only 1 stat_ctx.  Also add a new parameter resv_stat_ctxs
to better keep track of stat_ctxs reserved including resources used
by RDMA.  Add a stat_ctxs parameter to all the relevant resource
reservation functions so we can reserve the correct number of
stat_ctxs.

Prior to this patch, we were not reserving the extra stat_ctx for
RDMA and RDMA would not work on the new 57500 chips.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Do not modify max_stat_ctxs after RDMA driver requests/frees stat_ctxs
Vasundhara Volam [Sun, 16 Dec 2018 23:46:22 +0000 (18:46 -0500)]
bnxt_en: Do not modify max_stat_ctxs after RDMA driver requests/frees stat_ctxs

Calling bnxt_set_max_func_stat_ctxs() to modify max stat_ctxs requested
or freed by the RDMA driver is wrong. After introducing reservation of
resources recently, the driver has to keep track of all stat_ctxs
including the ones used by the RDMA driver.  This will provide a better
foundation for accurate accounting of the stat_ctxs.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: get rid of num_stat_ctxs variable
Vasundhara Volam [Sun, 16 Dec 2018 23:46:21 +0000 (18:46 -0500)]
bnxt_en: get rid of num_stat_ctxs variable

For bnxt_en driver, stat_ctxs created will always be same as
cp_nr_rings. Remove extra variable that duplicates the value.
Also introduce bnxt_get_avail_stat_ctxs_for_en() helper to get
available stat_ctxs and bnxt_get_ulp_stat_ctxs() helper to return
number of stat_ctxs used by RDMA.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Add bnxt_get_avail_cp_rings_for_en() helper function.
Michael Chan [Sun, 16 Dec 2018 23:46:20 +0000 (18:46 -0500)]
bnxt_en: Add bnxt_get_avail_cp_rings_for_en() helper function.

The available CP rings are calculated differently on the new 57500
chips, so add this helper to do this calculation correctly.  The
VFs will be assigned these available CP rings.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Store the maximum NQs available on the PF.
Michael Chan [Sun, 16 Dec 2018 23:46:19 +0000 (18:46 -0500)]
bnxt_en: Store the maximum NQs available on the PF.

The PF has a pool of NQs and MSIX vectors assigned to it based on
NVRAM configurations.  The number of usable MSIX vectors on the PF
is the minimum of the NQs and MSIX vectors.  Any excess NQs without
associated MSIX may be used for the VFs, so we need to store this
max_nqs value.  max_nqs minus the NQs used by the PF will be the
available NQs for the VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofou: Prevent unbounded recursion in GUE error handler
Stefano Brivio [Mon, 17 Dec 2018 23:13:17 +0000 (00:13 +0100)]
fou: Prevent unbounded recursion in GUE error handler

Handling exceptions for direct UDP encapsulation in GUE (that is,
UDP-in-UDP) leads to unbounded recursion in the GUE exception handler,
syzbot reported.

While draft-ietf-intarea-gue-06 doesn't explicitly forbid direct
encapsulation of UDP in GUE, it probably doesn't make sense to set up GUE
this way, and it's currently not even possible to configure this.

Skip exception handling if the GUE proto/ctype field is set to the UDP
protocol number. Should we need to handle exceptions for UDP-in-GUE one
day, we might need to either explicitly set a bound for recursion, or
implement a special iterative handling for these cases.

Reported-and-tested-by: syzbot+43f6755d1c2e62743468@syzkaller.appspotmail.com
Fixes: b8a51b38e4d4 ("fou, fou6: ICMP error handlers for FoU and GUE")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoucc_geth: Add change_carrier() for Fixed PHYs
Joakim Tjernlund [Fri, 14 Dec 2018 14:17:08 +0000 (15:17 +0100)]
ucc_geth: Add change_carrier() for Fixed PHYs

This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogianfar: Add change_carrier() for Fixed PHYs
Joakim Tjernlund [Fri, 14 Dec 2018 14:17:07 +0000 (15:17 +0100)]
gianfar: Add change_carrier() for Fixed PHYs

This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa_eth: Add change_carrier() for Fixed PHYs
Joakim Tjernlund [Fri, 14 Dec 2018 14:17:06 +0000 (15:17 +0100)]
dpaa_eth: Add change_carrier() for Fixed PHYs

This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoFixed PHY: Add fixed_phy_change_carrier()
Joakim Tjernlund [Fri, 14 Dec 2018 14:17:05 +0000 (15:17 +0100)]
Fixed PHY: Add fixed_phy_change_carrier()

Drivers can use this as .ndo_change_carrier() to change carrier
via /sys/class/net/ethX/carrier.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx4_en: remove fallback after kzalloc_node()
Eric Dumazet [Thu, 13 Dec 2018 11:03:37 +0000 (03:03 -0800)]
net/mlx4_en: remove fallback after kzalloc_node()

kzalloc_node(..., GFP_KERNEL, node) will attempt to allocate
memory as close as possible to the node.

There is no need to fallback to kzalloc() if this has failed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: unbreak CONFIG_RETPOLINE=n builds
Paolo Abeni [Mon, 17 Dec 2018 11:39:02 +0000 (12:39 +0100)]
net: unbreak CONFIG_RETPOLINE=n builds

The kbuild bot reported a build breakage with CONFIG_RETPOLINE=n
due to commit aaa5d90b395a ("net: use indirect call wrappers at
GRO network layer").
I screwed the wrapper implementation for such config.
Fix the issue properly ignoring the builtin symbols arguments,
when retpoline is not enabled.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: aaa5d90b395a ("net: use indirect call wrappers at GRO network layer")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-spectrum_acl-Add-Bloom-filter-support'
David S. Miller [Sun, 16 Dec 2018 23:20:35 +0000 (15:20 -0800)]
Merge branch 'mlxsw-spectrum_acl-Add-Bloom-filter-support'

Ido Schimmel says:

====================
mlxsw: spectrum_acl: Add Bloom filter support

Nir says:

Spectrum-2 uses Bloom filter to reduce the number of lookups in the
algorithmic TCAM (A-TCAM). HW performs multiple exact match lookups in a
given region using a key composed of { packet & mask, mask ID, region ID }.
The masks which are used in a region are called rule patterns or RP.
When such multiple masks are used, the A-TCAM region uses an eRP
(extended RP) table that describes which rule patterns are in use and
defines the order of the lookup. When eRP table is used in a region, one
way to reduce the number of the lookups is to consult a Bloom filter
before doing the lookup.

A Bloom filter is a space-efficient probabilistic data structure, on
which a query returns either "possibly in set" or "definitely not in
set". HW can skip a lookup if a query on the Bloom filter results a
"definitely not set" response. The mlxsw driver implements a "counting
filter" and when either a new entry is marked or the last entry is
removed it will update the HW. Update of this counting filter occurs
when rule is configured or deleted from a region.

Patch #1 adds PEABFE register which is used for setting Bloom filter
entries.

Patch #2 adds Bloom filter resources.

Patch #3 and patch #4 provide Bloom filter handling within mlxsw, by
adding initialization and logic for updating the Bloom bit vector in HW.

Patch #5 and patch #6 add required calls for Bloom filter update as part
of rule configuration flow.

Patch #7 handles transitions to and from eRP table. It uses a list to
keep A-TCAM rules in order to update rules in Bloom filter, in cases of
transitions from master mask based A-TCAM region to an eRP table based
region and vice versa.

Patch #8 removes a trick done on master RP index to a remaining RP,
since Bloom filter is updated on eRP transitions.

Finally, patch #9 activates Bloom filter mechanism in HW, by cancelling
the bypass that was configured before and the remaining three patches
are selftests that exercise the new code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Add Bloom delta test
Nir Dotan [Sun, 16 Dec 2018 08:49:37 +0000 (08:49 +0000)]
selftests: mlxsw: Add Bloom delta test

The eRP table is active when there is more than a single rule
pattern. It may be that the patterns are close enough and use delta
mechanism. Bloom filter index computation is based on the values of
{rule & mask, mask ID, region ID} where the rule delta bits must be
cleared.

Add a test that exercises Bloom filter with delta mechanism.
Configure rules within delta range and pass a packet which is
supposed to hit the correct rule.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Add Bloom filter complex test
Nir Dotan [Sun, 16 Dec 2018 08:49:36 +0000 (08:49 +0000)]
selftests: mlxsw: Add Bloom filter complex test

Bloom filter index computation is based on the values of
{rule & mask, mask ID, region ID} and the computation also varies
according to the region key size.

Add a test that exercises the possible combinations by creating
multiple chains using different key sizes and then pass a frame that
is supposed to to produce a hit on all of the regions.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Add Bloom filter simple test
Nir Dotan [Sun, 16 Dec 2018 08:49:35 +0000 (08:49 +0000)]
selftests: mlxsw: Add Bloom filter simple test

Add a test that exercises Bloom filter code.
Activate eRP table in the region by adding multiple rule patterns which
with very high probability use different entries in the Bloom filter.
Then send packets in order to check lookup hits on all relevant rules.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Activate Bloom filter
Nir Dotan [Sun, 16 Dec 2018 08:49:34 +0000 (08:49 +0000)]
mlxsw: reg: Activate Bloom filter

Now that mlxsw driver handles all aspects of updating
the Bloom filter mechanism, set bf_bypass value to false
and allow HW to use Bloom filter.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Set master RP index on transition to eRP
Nir Dotan [Sun, 16 Dec 2018 08:49:33 +0000 (08:49 +0000)]
mlxsw: spectrum_acl: Set master RP index on transition to eRP

Bloom filter is updated on transitions from a single rule pattern,
also called master RP, to eRP table and vice versa. Since rules are
being written to or deleted from the Bloom filter on such transitions,
it is not required to keep the same eRP bank ID for the master RP.

Change master RP index assignment so it will be assigned with zero.
This is consistent with the assignment of the first available spot
that is used for allocating eRP's indices.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Update Bloom filter on eRP transitions
Nir Dotan [Sun, 16 Dec 2018 08:49:32 +0000 (08:49 +0000)]
mlxsw: spectrum_acl: Update Bloom filter on eRP transitions

Bloom filter update is required only for rules which reside on an
eRP. When the region has only a single rule pattern then eRP table
is not used, however insertion of another pattern would trigger a
move to an active eRP table so it is imperative to update the Bloom
filter with all previously configured rules.

Add a method that updates Bloom filter entries for all rules
currently configured in the region, on the event of a transition
from master mask to eRP, or vice versa. For that purpose, maintain
a list of all A-TCAM rules within mlxsw_sp_acl_atcam_region.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Set A-TCAM rules in Bloom filter
Nir Dotan [Sun, 16 Dec 2018 08:49:30 +0000 (08:49 +0000)]
mlxsw: spectrum_acl: Set A-TCAM rules in Bloom filter

Add calls to eRP module for updating Bloom filter when a rule is
added or removed from the A-TCAM. eRP module will update the Bloom
filter only for cases in which the region has an active eRP table.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Add Bloom filter update
Nir Dotan [Sun, 16 Dec 2018 08:49:29 +0000 (08:49 +0000)]
mlxsw: spectrum_acl: Add Bloom filter update

Add Bloom filter update for rule insertion and rule removal scenarios.
This is done within eRP module in order to assure that Bloom filter
updates are done only for rules which are part of an eRP, as HW does not
consult Bloom filter for entries when there is a single (master) mask in
the region.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Add Bloom filter handling
Nir Dotan [Sun, 16 Dec 2018 08:49:28 +0000 (08:49 +0000)]
mlxsw: spectrum_acl: Add Bloom filter handling

Spectrum-2 HW uses Bloom filter in order to skip lookups on specific
eRPs. It uses crc-16-Msbit-first calculation over a specific layout
of a rule's key fields combined with eRP ID as well as region ID.
Per potential lookup, iff the Bloom filter entry of the calculated
index is empty, then the lookup can be skipped. Hence, the mlxsw
driver should update the Bloom filter entry per each rule insertion
or deletion when rules are part of an eRP.

Add functions for adding and deleting entries in the Bloom filter.
In order to do so also add crc-16 computation based on the specific
Spectrum-2 polynomial and a function for encoding the crc-16 input
in the manner dictated by HW implementation.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Introduce Bloom filter
Nir Dotan [Sun, 16 Dec 2018 08:49:26 +0000 (08:49 +0000)]
mlxsw: spectrum_acl: Introduce Bloom filter

Lay the foundations for Bloom filter handling. Introduce a new file for
Bloom filter actions.

Add struct mlxsw_sp_acl_bf to struct mlxsw_sp_acl_erp_core and initialize
the Bloom filter data structure. Also take care of proper destruction when
terminating.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: resources: Add Spectrum-2 Bloom filter resource
Nir Dotan [Sun, 16 Dec 2018 08:49:25 +0000 (08:49 +0000)]
mlxsw: resources: Add Spectrum-2 Bloom filter resource

Add the maximum Bloom filter logarithmic size per eRP table bank.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Add Policy Engine Algorithmic Bloom Filter Entries Register
Nir Dotan [Sun, 16 Dec 2018 08:49:24 +0000 (08:49 +0000)]
mlxsw: reg: Add Policy Engine Algorithmic Bloom Filter Entries Register

Bloom filter is a bit vector which allows the HW a fast lookup on a
small size bit vector, that may reduce the number of lookups on the
A-TCAM memory. PEABFE register allows setting values to the bits of
the bit vector mentioned above.
Add the register to be later used in A-TCAM optimizations.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'rtnl-fdb-get'
David S. Miller [Sun, 16 Dec 2018 22:42:35 +0000 (14:42 -0800)]
Merge branch 'rtnl-fdb-get'

Roopa Prabhu says:

====================
rtnl fdb get

This series adds support for rtnl fdb get similar to
route get.

v2: add nda_policy, fixes to exact msgs, strict nlmsg parsing

v3: remove unnecessary attribute length checks + simplify code
as pointed out by david
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: rtnetlink.sh: add fdb get test
Roopa Prabhu [Sun, 16 Dec 2018 06:35:11 +0000 (22:35 -0800)]
selftests: net: rtnetlink.sh: add fdb get test

tests the below three cases of bridge fdb get:
[bridge, mac, vlan]
[bridge_port, mac, vlan, flags=[NTF_MASTER]]
[vxlandev, mac, flags=NTF_SELF]

depends on iproute2 support for bridge fdb get.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: support for ndo_fdb_get
Roopa Prabhu [Sun, 16 Dec 2018 06:35:10 +0000 (22:35 -0800)]
vxlan: support for ndo_fdb_get

This patch implements ndo_fdb_get for a vxlan device.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobridge: support for ndo_fdb_get
Roopa Prabhu [Sun, 16 Dec 2018 06:35:09 +0000 (22:35 -0800)]
bridge: support for ndo_fdb_get

This patch implements ndo_fdb_get for the bridge
fdb.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: rtnetlink: support for fdb get
Roopa Prabhu [Sun, 16 Dec 2018 06:35:08 +0000 (22:35 -0800)]
net: rtnetlink: support for fdb get

This patch adds support for fdb get similar to
route get. arguments can be any of the following (similar to fdb add/del/dump):
[bridge, mac, vlan] or
[bridge_port, mac, vlan, flags=[NTF_MASTER]] or
[dev, mac, [vni|vlan], flags=[NTF_SELF]]

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'dsa-tag-cleanups'
David S. Miller [Sun, 16 Dec 2018 22:23:33 +0000 (14:23 -0800)]
Merge branch 'dsa-tag-cleanups'

Marek Vasut says:

====================
net: dsa: ksz: Clean up the tag code in prep for more switches

Clean up the KSZ DSA tag code in preparation for adding more switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: ksz: Add STP multicast handling
Marek Vasut [Sat, 15 Dec 2018 00:58:06 +0000 (01:58 +0100)]
net: dsa: ksz: Add STP multicast handling

In case the destination address is link local, add override bit into the
switch tag to let such a packet through the switch even if the port is
blocked.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: ksz: Factor out common tag code
Tristram Ha [Sat, 15 Dec 2018 00:58:05 +0000 (01:58 +0100)]
net: dsa: ksz: Factor out common tag code

Factor out common code from the tag_ksz , so that the code can be used
with other KSZ family switches which use differenly sized tags.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477
Tristram Ha [Sat, 15 Dec 2018 00:58:04 +0000 (01:58 +0100)]
net: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477

Rename the tag Kconfig option and related macros in preparation for
addition of new KSZ family switches with different tag formats.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: abm: allow to opt-out of RED offload
Jakub Kicinski [Fri, 14 Dec 2018 23:01:54 +0000 (15:01 -0800)]
nfp: abm: allow to opt-out of RED offload

FW team asks to be able to not support RED even if NIC is capable
of buffering for testing and experimentation.  Add an opt-out flag.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "net: dccp: initialize (addr,port) listening hashtable"
David S. Miller [Sun, 16 Dec 2018 20:36:41 +0000 (12:36 -0800)]
Revert "net: dccp: initialize (addr,port) listening hashtable"

This reverts commit ec49d83f245453515a9b6e88324e27bbcb69fbae.

Cause build failures when DCCP is modular.

ERROR: "inet_hashinfo2_init" [net/dccp/dccp.ko] undefined!

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Add protocol attribute
David Ahern [Sat, 15 Dec 2018 22:09:06 +0000 (14:09 -0800)]
neighbor: Add protocol attribute

Similar to routes and rules, add protocol attribute to neighbor entries
for easier tracking of how each was created.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: reuseport_addr_any: add DCCP
Peter Oskolkov [Sat, 15 Dec 2018 22:27:24 +0000 (14:27 -0800)]
selftests: net: reuseport_addr_any: add DCCP

This patch adds coverage of DCCP to reuseport_addr_any selftest.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dccp: initialize (addr,port) listening hashtable
Peter Oskolkov [Sat, 15 Dec 2018 22:27:23 +0000 (14:27 -0800)]
net: dccp: initialize (addr,port) listening hashtable

Commit d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
removes port-only listener lookups. This caused segfaults in DCCP
lookups because DCCP did not initialize the (addr,port) hashtable.

This patch adds said initialization.

The only non-trivial issue here is the size of the new hashtable.
It seemed reasonable to make it match the size of the port-only
hashtable (= INET_LHTABLE_SIZE) that was used previously. Other
parameters to inet_hashinfo2_init() match those used in TCP.

Tested: syzcaller issues fixed; the second patch in the patchset
        tests that DCCP lookups work correctly.

Fixes: d9fbc7f6431f "net: tcp: prefer listeners bound to an address"
Reported-by: syzcaller <syzkaller@googlegroups.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agol2tp: Add protocol field decompression
Sam Protsenko [Fri, 14 Dec 2018 17:59:21 +0000 (19:59 +0200)]
l2tp: Add protocol field decompression

When Protocol Field Compression (PFC) is enabled, the "Protocol" field
in PPP packet will be received without leading 0x00. See section 6.5 in
RFC 1661 for details. So let's decompress protocol field if needed, the
same way it's done in drivers/net/ppp/pptp.c.

In case when "nopcomp" pppd option is not enabled, PFC (pcomp) can be
negotiated during LCP handshake, and L2TP driver in kernel will receive
PPP packets with compressed Protocol field, which in turn leads to next
error:

    Protocol Rejected (unsupported protocol 0x2145)

because instead of Protocol=0x0021 in PPP packet there will be
Protocol=0x21. This patch unwraps it back to 0x0021, which fixes the
issue.

Sending the compressed Protocol field will be implemented in subsequent
patch, this one is self-sufficient.

Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5e-updates-2018-12-14' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 15 Dec 2018 21:29:56 +0000 (13:29 -0800)]
Merge tag 'mlx5e-updates-2018-12-14' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-14 (VF Lag)

From Aviv Heller,

Subsequent patches introduce VF LAG, which provdies load-balancing and
high-availability capabilities for VFs associated with different
physical ports of the same Connect-X card.

This series consists of the following:
 - mlx5 devcom, driver infrastructure that facilitates operations that involve
   both core devices (physical functions) of the same card, to synchronize and
   communicate between two driver instances of the same card.
 - Infrastructure for TC rule duplication.
 - Changes to LAG logic to enable its use when SR-IOV is enabled
 - PFs in switchdev mode is the only mode currently supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-mitigate-retpoline-overhead'
David S. Miller [Sat, 15 Dec 2018 21:23:03 +0000 (13:23 -0800)]
Merge branch 'net-mitigate-retpoline-overhead'

Paolo Abeni says:

====================
net: mitigate retpoline overhead

The spectre v2 counter-measures, aka retpolines, are a source of measurable
overhead[1]. We can partially address that when the function pointer refers to
a builtin symbol resorting to a list of tests vs well-known builtin function and
direct calls.

Experimental results show that replacing a single indirect call via
retpoline with several branches and a direct call gives performance gains
even when multiple branches are added - 5 or more, as reported in [2].

This may lead to some uglification around the indirect calls. In netconf 2018
Eric Dumazet described a technique to hide the most relevant part of the needed
boilerplate with some macro help.

This series is a [re-]implementation of such idea, exposing the introduced
helpers in a new header file. They are later leveraged to avoid the indirect
call overhead in the GRO path, when possible.

Overall this gives > 10% performance improvement for UDP GRO benchmark and
smaller but measurable for TCP syn flood.

The added infra can be used in follow-up patches to cope with retpoline overhead
in other points of the networking stack (e.g. at the qdisc layer) and possibly
even in other subsystems.

v2  -> v3:
 - fix build error with CONFIG_IPV6=m

v1  -> v2:
 - list explicitly the builtin function names in INDIRECT_CALL_*(),
   as suggested by Ed Cree
 - expand the recipients list

rfc -> v1:
 - use branch prediction hints, as suggested by Eric

[1] http://vger.kernel.org/netconf2018_files/PaoloAbeni_netconf2018.pdf
[2] https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: use indirect call wrappers for GRO socket lookup
Paolo Abeni [Fri, 14 Dec 2018 10:52:00 +0000 (11:52 +0100)]
udp: use indirect call wrappers for GRO socket lookup

This avoids another indirect call for UDP GRO. Again, the test
for the IPv6 variant is performed first.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: use indirect call wrappers at GRO transport layer
Paolo Abeni [Fri, 14 Dec 2018 10:51:59 +0000 (11:51 +0100)]
net: use indirect call wrappers at GRO transport layer

This avoids an indirect call in the receive path for TCP and UDP
packets. TCP takes precedence on UDP, so that we have a single
additional conditional in the common case.

When IPV6 is build as module, all gro symbols except UDPv6 are
builtin, while the latter belong to the ipv6 module, so we
need some special care.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes
v2 -> v3:
 - fix build issue with CONFIG_IPV6=m

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: use indirect call wrappers at GRO network layer
Paolo Abeni [Fri, 14 Dec 2018 10:51:58 +0000 (11:51 +0100)]
net: use indirect call wrappers at GRO network layer

This avoids an indirect calls for L3 GRO receive path, both
for ipv4 and ipv6, if the latter is not compiled as a module.

Note that when IPv6 is compiled as builtin, it will be checked first,
so we have a single additional compare for the more common path.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoindirect call wrappers: helpers to speed-up indirect calls of builtin
Paolo Abeni [Fri, 14 Dec 2018 10:51:57 +0000 (11:51 +0100)]
indirect call wrappers: helpers to speed-up indirect calls of builtin

This header define a bunch of helpers that allow avoiding the
retpoline overhead when calling builtin functions via function pointers.
It boils down to explicitly comparing the function pointers to
known builtin functions and eventually invoke directly the latter.

The macros defined here implement the boilerplate for the above schema
and will be used by the next patches.

rfc -> v1:
 - use branch prediction hint, as suggested by Eric
v1  -> v2:
 - list explicitly the builtin function names in INDIRECT_CALL_*(),
   as suggested by Ed Cree

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socionext: remove mmio reads on Tx
Ilias Apalodimas [Fri, 14 Dec 2018 08:59:01 +0000 (10:59 +0200)]
net: socionext: remove mmio reads on Tx

Currently the driver issues 2 mmio reads to figure out the number of
transmitted packets and clean them. We can get rid of the expensive
reads since BIT 31 of the Tx descriptor can be used for that.
We can also remove the budget counting of Tx completions since all of
the descriptors are not deliberately processed.

Performance numbers using pktgen are:
size  pre-patch(pps)  post-patch(pps)
64       362483           427916
128      358315           411686
256      352725           389683
512      215675           216464
1024     113812           114442

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socionext: correctly recover txq after being full
Ilias Apalodimas [Fri, 14 Dec 2018 08:59:00 +0000 (10:59 +0200)]
net: socionext: correctly recover txq after being full

Running pktgen with packets sizes > 512b ends up in the interface Txq
getting stuck.
"netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!"
appears on dmesg but the interface never recovers. It requires an
ifconfig down/up to make the interface usable again.

The reason that triggers this, is a race condition between
.ndo_start_xmit and the napi completion. The available budget is
calculated first and indicates the queue is full. Due to a costly
netif_err() the queue is not stopped in time while the napi completion
runs, clears the irq and frees up descriptors, thus the queue never wakes
up again.

Fix this by moving the print after stopping the queue, make the print
ratelimited, add barriers and check for cleaned descriptors..

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: ravb: Add support for r8a774c0 SoC
Fabrizio Castro [Thu, 13 Dec 2018 20:18:34 +0000 (20:18 +0000)]
dt-bindings: net: ravb: Add support for r8a774c0 SoC

Document RZ/G2E (R8A774C0) SoC bindings.

Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Improve neighbour struct layout
David Ahern [Thu, 13 Dec 2018 16:16:50 +0000 (08:16 -0800)]
neighbor: Improve neighbour struct layout

Move arp_queue_len_bytes ahead of arp_queue to remove two 4-byte holes.
Ensure ha element is always 8-byte aligned.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: simplify the qdisc_leaf code
Tonghao Zhang [Thu, 13 Dec 2018 08:43:23 +0000 (00:43 -0800)]
net: sched: simplify the qdisc_leaf code

Except for returning, the var leaf is not
used in the qdisc_leaf(). For simplicity, remove it.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Fix handling of LLA with VRF and sockets bound to VRF
David Ahern [Wed, 12 Dec 2018 23:27:38 +0000 (15:27 -0800)]
ipv6: Fix handling of LLA with VRF and sockets bound to VRF

A recent commit allows sockets bound to a VRF to receive ipv6 link local
packets. However, it only works for UDP and worse TCP connection attempts
to the LLA with the only listener bound to the VRF just hang where as
before the client gets a reset and connection refused. Fix by adjusting
ir_iif for LL addresses and packets received through a device enslaved
to a VRF.

Fixes: 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cc: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: improve spurious interrupt detection
Heiner Kallweit [Sat, 15 Dec 2018 15:25:05 +0000 (16:25 +0100)]
r8169: improve spurious interrupt detection

Improve detection of spurious interrupts by checking against the
interrupt mask as currently set in the chip.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()
Yangtao Li [Sat, 15 Dec 2018 07:59:30 +0000 (02:59 -0500)]
cxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()

We already have the DEFINE_SHOW_ATTRIBUTE. There is no need to define
such a macro, so remove DEFINE_SIMPLE_DEBUGFS_FILE. Also use the
DEFINE_SHOW_ATTRIBUTE macro to simplify some code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipconfig: convert to DEFINE_SHOW_ATTRIBUTE
Yangtao Li [Sat, 15 Dec 2018 07:19:53 +0000 (02:19 -0500)]
ipconfig: convert to DEFINE_SHOW_ATTRIBUTE

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'
David S. Miller [Sat, 15 Dec 2018 18:54:18 +0000 (10:54 -0800)]
Merge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'

Salil Mehta says:

====================
net: hns3: Add more commands to Debugfs in HNS3 driver

This patch-set adds few more debugfs commands to HNS3 Ethernet
Driver. Support has been added to query info related to below
items:
1. Packet buffer descriptor ("echo bd info [queue no] [bd index] > cmd")
2. Manager table("echo dump mng tbl > cmd")
3. Dfx status register("echo dump reg ssu [prt id] > cmd")
4. Dcb status register("echo dump reg dcb [port id] > cmd")
5. Queue map ("echo queue map [queue no] > cmd")
6. Tm map ("echo tm map [queue no] > cmd")

NOTE: Above commands are *read-only* and are only intended to
query the information from the SoC(and dump inside the kernel,
for now) and in no way tries to perform write operations for
the purpose of configuration etc.

Change Log:
V1-->V2:
1. Addressed the GCC-8.2 compiler issue reported by David S. Miller.
Link: https://lkml.org/lkml/2018/12/14/1298
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "tm map" status information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:58 +0000 (15:31 +0000)]
net: hns3: Add "tm map" status information query function

This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump tm map 100 > cmd
queue_id | qset_id | pri_id | tc_id
0100     | 0065    | 08     | 00
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "queue map" information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:57 +0000 (15:31 +0000)]
net: hns3: Add "queue map" information query function

This patch prints queue map information.

debugfs command:
echo dump queue map > cmd

Sample Command:
root@(none)# echo queue map > cmd
 local queue id | global queue id | vector id
          0              32             769
          1              33             770
          2              34             771
          3              35             772
          4              36             773
          5              37             774
          6              38             775
          7              39             776
          8              40             777
          9              41             778
         10              42             779
         11              43             780
         12              44             781
         13              45             782
         14              46             783
         15              47             784
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "dcb register" status information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:56 +0000 (15:31 +0000)]
net: hns3: Add "dcb register" status information query function

This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump reg dcb > cmd
 roce_qset_mask: 0x0
 nic_qs_mask: 0x0
 qs_shaping_pass: 0x0
 qs_bp_sts: 0x0
 pri_mask: 0x0
 pri_cshaping_pass: 0x0
 pri_pshaping_pass: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "status register" information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:55 +0000 (15:31 +0000)]
net: hns3: Add "status register" information query function

This patch prints status register information by module.

debugfs command:
echo dump reg [mode name] > cmd

Sample Command:
root@(none)# echo dump reg bios common > cmd
 BP_CPU_STATE: 0x0
 DFX_MSIX_INFO_NIC_0: 0xc000
 DFX_MSIX_INFO_NIC_1: 0xf
 DFX_MSIX_INFO_NIC_2: 0x2
 DFX_MSIX_INFO_NIC_3: 0x2
 DFX_MSIX_INFO_ROC_0: 0xc000
 DFX_MSIX_INFO_ROC_1: 0x0
 DFX_MSIX_INFO_ROC_2: 0x0
 DFX_MSIX_INFO_ROC_3: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "manager table" information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:54 +0000 (15:31 +0000)]
net: hns3: Add "manager table" information query function

This patch prints manager table information.

debugfs command:
echo dump mng tbl > cmd

Sample Command:
root@(none)# echo dump mng tbl > cmd
 entry|mac_addr         |mask|ether|mask|vlan|mask|i_map|i_dir|e_type
 00   |01:00:5e:00:00:01|0   |00000|0   |0000|0   |00   |00   |0
 01   |c2:f1:c5:82:68:17|0   |00000|0   |0000|0   |00   |00   |0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "bd info" query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:53 +0000 (15:31 +0000)]
net: hns3: Add "bd info" query function

This patch prints Sending and receiving
package descriptor information.

debugfs command:
echo dump bd info 1 > cmd

Sample Command:
root@(none)# echo bd info 1 > cmd
hns3 0000:7d:00.0: TX Queue Num: 0, BD Index: 0
hns3 0000:7d:00.0: (TX) addr: 0x0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)send_size: 0
hns3 0000:7d:00.0: (TX)vlan_tso: 0
hns3 0000:7d:00.0: (TX)l2_len: 0
hns3 0000:7d:00.0: (TX)l3_len: 0
hns3 0000:7d:00.0: (TX)l4_len: 0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)tv: 0
hns3 0000:7d:00.0: (TX)vlan_msec: 0
hns3 0000:7d:00.0: (TX)ol2_len: 0
hns3 0000:7d:00.0: (TX)ol3_len: 0
hns3 0000:7d:00.0: (TX)ol4_len: 0
hns3 0000:7d:00.0: (TX)paylen: 0
hns3 0000:7d:00.0: (TX)vld_ra_ri: 0
hns3 0000:7d:00.0: (TX)mss: 0
hns3 0000:7d:00.0: RX Queue Num: 0, BD Index: 120
hns3 0000:7d:00.0: (RX)addr: 0xffee7000
hns3 0000:7d:00.0: (RX)pkt_len: 0
hns3 0000:7d:00.0: (RX)size: 0
hns3 0000:7d:00.0: (RX)rss_hash: 0
hns3 0000:7d:00.0: (RX)fd_id: 0
hns3 0000:7d:00.0: (RX)vlan_tag: 0
hns3 0000:7d:00.0: (RX)o_dm_vlan_id_fb: 0
hns3 0000:7d:00.0: (RX)ot_vlan_tag: 0
hns3 0000:7d:00.0: (RX)bd_base_info: 0

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-prefer-listeners-bound-to-an-address'
David S. Miller [Fri, 14 Dec 2018 23:55:21 +0000 (15:55 -0800)]
Merge branch 'net-prefer-listeners-bound-to-an-address'

Peter Oskolkov says:

====================
net: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patchset eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In a future patchset I plan to explore whether it is possible
to remove port-only hashtables completely: additional refactoring
will be required, as some non-lookup code uses the hashtables.
====================

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: test that listening sockets match on address properly
Peter Oskolkov [Wed, 12 Dec 2018 21:15:37 +0000 (13:15 -0800)]
selftests: net: test that listening sockets match on address properly

This patch adds a selftest that verifies that a socket listening
on a specific address is chosen in preference over sockets
that listen on any address. The test covers UDP/UDP6/TCP/TCP6.

It is based on, and similar to, reuseport_dualstack.c selftest.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tcp6: prefer listeners bound to an address
Peter Oskolkov [Wed, 12 Dec 2018 21:15:36 +0000 (13:15 -0800)]
net: tcp6: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tcp: prefer listeners bound to an address
Peter Oskolkov [Wed, 12 Dec 2018 21:15:35 +0000 (13:15 -0800)]
net: tcp: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: udp6: prefer listeners bound to an address
Peter Oskolkov [Wed, 12 Dec 2018 21:15:34 +0000 (13:15 -0800)]
net: udp6: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>