git.osdn.net Git - uclinux-h8/linux.git/log

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Add .release_ops to properly unroll .select_ops, use it from nft_compat.
   After this change, we can remove list of extensions too to simplify this
   codebase.

2) Update amanda conntrack helper to support v3.4, from Florian Tham.

3) Get rid of the obsolete BUGPRINT macro in ebtables, from
   Florian Westphal.

4) Merge IPv4 and IPv6 masquerading infrastructure into one single module.
   From Florian Westphal.

5) Patchset to remove nf_nat_l3proto structure to get rid of
   indirections, from Florian Westphal.

6) Skip unnecessary conntrack timeout updates in case the value is
   still the same, also from Florian Westphal.

7) Remove unnecessary 'fall through' comments in empty switch cases,
   from Li RongQing.

8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys.

9) Incorrect logic to deactivate path of fixed size hashtable sets,
   element was being tested to self.

10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit
    keys.

11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi.

12) Enter close state in conntrack if RST matches exact sequence number,
    from Florian Westphal.

13) Initialize dst_cache in tunnel extension, from wenxu.

14) Pass protocol as u16 to xt_check_match and xt_check_target, from
    Li RongQing.

15) SCTP header is granted to be in a linear area from IPVS NAT handler,
    from Xin Long.

16) Don't steal packets coming from slave VRF device from the
    ip_sabotage_in() path, from David Ahern.

17) Fix unsafe update of basechain stats, from Li RongQing.

18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize
    modulo operation as bitwise AND, from Li RongQing.

19) Use device_attribute instead of internal definition in the IDLETIMER
    target, from Sami Tolvanen.

20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2019-03-02

Here's one more bluetooth-next pull request for the 5.1 kernel:

- Added support for MediaTek MT7663U and MT7668U UART devices
- Cleanups & fixes to the hci_qca driver
- Fixed wakeup pin behavior for QCA6174A controller

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Bluetooth: mediatek: add support for MediaTek MT7663U and MT7668U UART devices

This adds the support of enabling MT7663U and MT7668U Bluetooth function
running on the top of btmtkuart driver.

There are a few differences between MT766[3,8]U and MT7622 where
MT766[3,8]U are standalone devices based on UART transport while MT7622
bluetooth is a built-in device on MediaTek SoC communicating with the host
through BTIF serial transport. Thus, extra setup sequence is necessary
for these standalone devices such as remote regulator and reset control via
GPIO, baud rate changing handshake between the host and device and so on.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

dt-bindings: net: bluetooth: add support for MediaTek MT7663U and MT7668U UART devices

Update binding document with adding support of MT7663U and MT7668U UART
devices to mediatek-bluetooth.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: hci_qca: Reduce delay after sending baudrate request for WCN3990

The current 300ms delay after a baudrate change is extremely long.
For WCN3990 it is sufficient to wait 10ms after the baudrate change
request has been sent over the wire.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Merge tag 'wireless-drivers-next-for-davem-2019-03-01' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 5.1

Last set of patches. A new hardware support for mt76 otherwise quite
normal.

Major changes:

mt76

* add driver for MT7603E/MT7628

ath10k

* more preparation for SDIO support

wil6210

* support up to 20 stations in AP mode
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sit: fix memory leak in sit_init_net()

If register_netdev() is failed to register sitn->fb_tunnel_dev,
it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev).

BUG: memory leak
unreferenced object 0xffff888378daad00 (size 512):
  comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s)
  hex dump (first 32 bytes):
    00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
backtrace:
    [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline]
    [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline]
    [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline]
    [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970
    [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848
    [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129
    [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314
    [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437
    [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107
    [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165
    [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919
    [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline]
    [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224
    [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
    [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<0000000039acff8a>] 0xffffffffffffffff

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix statistics on mv88e6161

Despite what the datesheet says, the silicon implements the older way
of snapshoting the statistics. Change the op.

Reported-by: Chris.Healy@zii.aero
Tested-by: Chris.Healy@zii.aero
Fixes: 0ac64c394900 ("net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: Fix NULL pointer dereference in route lookup

When calculating the multipath hash for input routes the flow info is
not available and therefore should not be used.

Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Cc: wenxu <wenxu@ucloud.cn>
Acked-by: wenxu <wenxu@ucloud.cn>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-mvpp2-fixes-and-improvements'

Antoine Tenart says:

====================
net: mvpp2: fixes and improvements

This series aims to improve the Marvell PPv2 driver and to fix various
issues we encountered while testing the ports in many different
configurations. The series is based on top of Russell PPv2 phylink
rework and improvement.

I'm not sending a v2 of the previous fixes series as half the patches
are not the same and lots of development happened in between.

While this series contains fixes, it's sent to net-next as it is based
on top of Russell patches that were merged into net-next. I'm also
aiming at net-next as the series reworks critical paths of the PPv2
driver, such as the reset handling of various blocks, to let more weeks
for users to tests and for possible fixes to be sent before it lands
into a stable kernel version.

The series is divided into three parts:

- Patches 1 to 3 are cosmetic changes, sent alongside the series, as I
  saw these small issues while working on this.

- Patches 5 to 8 are fixing (or improving) individual issues that we
  found while testing PPv2.1 and PPv2.2 ports while using various
  interfaces.

  Notable fixes are we support back RGMII interfaces (on both PPv2.1 and
  PPv2.2), as their support was broken by previous patches. We also
  reworked the RXQ computation as the RXQ assignment was not checking
  the maximum number of RXQ available, and was broken for PPv2.1.

- As discussed in a previous fixes series, patches 9 to 15 rework the
  way blocks are set in reset in the PPv2 engine (plus related changes).

  There are four blocks we want to control the reset status: two MAC
  (GMAC and XLG MAC) and two PCS (MPCS and XPCS). The XLG MAC is used
  for 10G connexions and uses the MPCS or the XPCS depending on the mode
  used (10GKR / XAUI / RXAUI) and the GMAC is used for the other modes.

  The idea is to set all blocks in reset by default, and when not used,
  and to de-assert the reset only when a block is used. There are four
  cases to take in account:

  1. Boot time: all four blocks should be put in reset, as we do not
     know their initial state (configured by the firmware/bootloader).

  2. Link up: only the blocks used by a given mode should be put out of
     reset (eg. 10GKR uses the XLG MAC and the MPCS).

  3. Mode reconfiguration: some ports may support mode reconfiguration,
     and switching between the GMAC and the XLG MAC (or between the two
     PCS). All blocks should be put in reset, and only the one used
     should be put out of reset.

  4. Link down: all four blocks are put in reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: set the GMAC, XLG MAC, XPCS and MPCS in reset when a port is down

This patch adds calls in the stop() helper to ensure both MACs and
both PCS blocks are set in reset when the user manually sets a port
down. This is done so that we have the exact same block reset states at
boot time and when a port is set down.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: set the XPCS and MPCS in reset when not used

This patch sets both the XPCS and MPCS blocks in reset when they aren't
used. This is done both at boot time and when reconfiguring a port mode.
The advantage now is that only the PCS used is set out of reset when the
port is configured (10GKR uses the MCPS while RXAUI uses the XPCS).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: reset the MACs when reconfiguring a port

This patch makes sure both PPv2 MACs (GMAC + XLG MAC) are set in reset
while a port is reconfigured. This is done so that we make sure a MAC is
in a reset state when not used, as only one of the two will be set out
of reset after the port is configured properly.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: rework the XLG MAC reset handling

This patch reworks the way the XLG MAC is set in reset: the XLG MAC is
set in reset at probe time and taken out of this state only when used.
The idea is to move forward a situation where only the blocks used are
taken out of reset. This also has the effect to handle the GMAC and the
XLG MAC in a similar way (the GMAC already is set in reset at boot
time).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: force the XLG MAC link up or down when not using in-band

This patch force the XLG MAC link state in the phylink link_up() and
link_down() helpers when not using in-band auto-negotiation. This mimics
what's already done for the GMAC and follows what's advised in the
phylink documentation.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: only update the XLG configuration when needed

This patch improves the XLG configuration function, to only update the
XLG configuration register when a change is needed. This helps not
writing over and over the same XLG configuration each time phylink
request the MAC to be configured. This mimics the GMAC configuration
function.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: always disable both MACs when disabling a port

This patch modifies the port_disable() helper to always disable both the
GMAC and the XLG MAC when called. At boot time we do not know of a port
was enabled in the firmware/bootloader, and if so what mode was used
(hence which of the two MACs was used).

This also help in implementing a logic where all blocks are disabled
when not used, and only enabled regarding the current mode used on a
given port.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: some AN fields require the link to be down when updated

The GMAC configuration helper modifies values in the auto-negotiation
register. Some of its values require the port to be forced down when
modifying their values. This patches fixes the check made on the bit to
be updated in this register, so that the port is forced down when
needed. This fix cases where some of those parameters were updated, but
not taken into account, such as when using RGMII interfaces.

Fixes: d14e078f23cc ("net: marvell: mvpp2: only reprogram what is necessary on mac_config")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix the computation of the RXQs

The patch fixes the computation of RXQs being used by the PPv2 driver,
which is set depending on the PPv2 engine version and the queue mode
used. There are three cases:

- PPv2.1: 1 RXQ per CPU.
- PPV2.2 with MVPP2_QDIST_MULTI_MODE: 1 RXQ per CPU.
- PPv2.2 with MVPP2_QDIST_SINGLE_MODE: 1 RXQ is shared between the CPUs.

The PPv2 engine supports a maximum of 32 queues per port. This patch
adds a check so that we do not overstep this maximum.

It appeared the calculation was broken for PPv2.1 engines since
f8c6ba8424b0, as PPv2.1 ports ended up with a single RXQ while they
needed 4. This patch fixes it.

Fixes: f8c6ba8424b0 ("net: mvpp2: use only one rx queue per port per CPU")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix validate for PPv2.1

The Phylink validate function is the Marvell PPv2 driver makes a check
on the GoP id. This is valid an has to be done when using PPv2.2 engines
but makes no sense when using PPv2.1. The check done when using an RGMII
interface makes sure the GoP id is not 0, but this breaks PPv2.1. Fixes
it.

Fixes: 0fb628f0f250 ("net: mvpp2: fix phylink handling of invalid PHY modes")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: reconfiguring the port interface is PPv2.2 specific

This patch adds a check on the PPv2 version in-use not to reconfigure
the port mode when an interface is updated when using PPv2.1 as the
functions called are PPv2.2 specific.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: a port can be disabled even if we use the link IRQ

We had a check in the mvpp2_mac_link_down() function (called by phylink)
to avoid disabling the port when link interrupts are used. It turned out
the interrupt can still be used with the port disabled. We can thus
remove this check.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix alignment of MVPP2_GMAC_CONFIG_MII_SPEED definition

Cosmetic patch fix the alignment of the MVPP2_GMAC_CONFIG_MII_SPEED
macro definition.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: update the port documentation regarding the GoP

The Marvell PPv2 port structure stores the GoP id of a given port. This
information is specific to PPv2.2, but cannot be used by PPv2.1. Update
its comment to denote this specificity.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix a typo in the header

This cosmetic patch fixes a typo made in a comment in the Marvell PPv2
Ethernet driver header.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Call netif_carrier_off properly in pci_probe

netif_carrier_off() should be called only after register_netdev().

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-vf-link-state'

Arjun Vynipadath says:

====================
cxgb4/cxgb4vf: VF link state support

This series of patches adds support for ndo_set_vf_link_state in
cxgb4 driver.

Patch 1 implements ndo_set_vf_link_state
Patch 2 reverts the existing force_link_up behaviour for cxgb4vf driver.

v2:
- Using reverse christmas tree for variable declaration in Patch 1
- Patch 2 has no change
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Revert force link up behaviour

Reverting force link up changes since this behaviour can be
achieved using VF link state feature.

Reverts:
commit 0913667ab3ad ("cxgb4vf: Forcefully link up virtual interfaces")

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add VF Link state support

Use ndo_set_vf_link_state to control the link states associated
with the virtual interfaces.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Prefix adapter flags with CXGB4VF

Some of these macros were conflicting with global namespace,
hence prefixing them with CXGB4VF.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: Remove unnecessary semicolon

drivers/net/dsa/mt7530.c:649:3-4: Unneeded semicolon
drivers/net/ethernet/cisco/enic/enic_clsf.c:35:2-3: Unneeded semicolon
drivers/net/ethernet/faraday/ftgmac100.c:1640:2-3: Unneeded semicolon
drivers/net/ethernet/mediatek/mtk_eth_soc.c:229:2-3: Unneeded semicolon
drivers/net/usb/sr9700.c:437:2-3: Unneeded semicolon

Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sean Wang <sean.wang@mediatek.com> for mt7530 and mtk_eth_soc
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'SO_MAX_PACING_RATE-64-bit'

Eric Dumazet says:

====================
net: 64bit support for SO_MAX_PACING_RATE

64bit kernels adopted 64bit type for sk_max_pacing_rate in linux-4.20

We can change how we implement SO_MAX_PACING_RATE socket option
to support 64bit values to/from user space as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: support 64bit rates for getsockopt(SO_MAX_PACING_RATE)

For legacy applications using 32bit variable, SO_MAX_PACING_RATE
has to cap the returned value to 0xFFFFFFFF, meaning that
rates above 34.35 Gbit are capped.

This patch allows applications to read socket pacing rate
at full resolution, if they provide a 64bit variable to store it,
and the kernel is 64bit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: support 64bit values for setsockopt(SO_MAX_PACING_RATE)

64bit kernels now support 64bit pacing rates.

This commit changes setsockopt() to accept 64bit
values provided by applications.

Old applications providing 32bit value are still supported,
but limited to the old 34Gbit limitation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tc-testing: Allow test cases to be skipped

By adding a check for an optional key/value pair to the test case
data, individual test cases may be skipped to prevent tdc from
aborting a test run due to setup or teardown failure.

If a test case is skipped, it will still appear in the results
output to allow for a consistent number of executed tests in each
run. However, the test will be marked as skipped.

This support for skipping extends to any plugins that may generate
additional results for each executed test.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: correctly handle ipv6.disable module parameter

When IPv6 is compiled but disabled at runtime, geneve_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.

Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.

This is the same fix as what commit d074bf960044 ("vxlan: correctly handle
ipv6.disable module parameter") is doing for vxlan.

Note there's also commit c0a47e44c098 ("geneve: should not call rt6_lookup()
when ipv6 was disabled") which fixes a similar issue but for regular
tunnels, while this patch is needed for metadata based tunnels.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2019-03-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix sanitation rewrite, from Daniel.

2) fix error path on map_new_fd, from Peng.

3) fix icache flush address, from Paul.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-rehash-split'

Ido Schimmel says:

====================
mlxsw: spectrum_acl: Split rehash work into chunks

Jiri says:

When rehash happens on a vregion with many rules and they are being
migrated, it might take significant time to finish the job. During that
time vregion->lock is taken which prevents rules from being
added/deleted from the vregion.

Aim of this patchset is to allow to interrupt migration of rules during
rehash, reschedule and give chance for rules to be added/deleted. Then
continue migration in another execution of scheduled work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Make mlxsw_sp_acl_tcam_vregion_rehash() return void

The return value is ignored anyway, so just return void.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Remember where to continue rehash migration

Store pointer to vchunk where the migration was interrupted, as well as
ventry pointer to start from and to stop at (during rollback). This
saved pointers need to be forgotten in case of ventries list or vchunk
list changes, which is done by couple of "changed" helpers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Allow to interrupt/continue rehash work

Currently, migration of vregions with many entries may take long time
during which insertions and removals of the rules are blocked
due to wait to acquire vregion->lock.

To overcome this, allow to interrupt and continue rehash work according
to the set credits - number of rules to migrate.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()

In order to simplify the code and to prepare it for
interrupted/continued migration process, do the rollback in case of
migration error as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all().
It can be understood as "migrate all back".

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Put vchunk migrate start/end code into separate functions

In preparations of interrupt/continue of rehash work, put the code that
is done at the beginning/end of vchunk migrate function into separate
functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Put this_is_rollback to rehash context struct

Put the this_is_rollback flag into rehash context struct in preparations
for interrupt/continue of rehash work.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Rename variables in mlxsw_sp_acl_tcam_ventry_migrate()

Remove some of variables in function mlxsw_sp_acl_tcam_ventry_migrate()
so the names are aligned with the rest of the code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: assign vchunk->chunk by the newly created chunk

Make the vchunk->chunk contain pointer of a new chunk we migrate to.
In case of a rollback, it contains the original chunk.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: assign vregion->region by the newly created region

Make the vregion->region contain pointer of a new region we migrate to.
In case of a rollback, it contains the original region.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Push code start/end from mlxsw_sp_acl_tcam_vregion_migrate()

Push code from the beginning and end of function
mlxsw_sp_acl_tcam_vregion_migrate() into rehash_start()/end() functions.
Then all the things needed to be done before and after the actual
migration process will be grouped together.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Push rehash start/end code into separate functions

In preparations for interrupt/continue of rehash work, put the code at
the beginning/end of the rehash function into separate functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Introduce new rehash context struct and save hint_priv there

Prepare for continued migration. Introduce a new structure to track
rehash context and save hint_priv into it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Don't migrate already migrated entry

Check if the entry is already in a chunk where we want it to be. In that
case, skip migration. This is preparation for "per parts" migration
where this situation may occur.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Push rehash dw struct into rehash sub-struct

More rehash related fields are going to come. Push "dw" into sub-struct
that will accommodate the others as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode

When debugging another issue I faced an interrupt storm in this
driver (88E6390, port 9 in SGMII mode), consisting of alternating
link-up / link-down interrupts. Analysis showed that the driver
wanted to set a cmode that was set already. But so far
mv88e6390x_port_set_cmode() doesn't check this and powers down
SERDES, what causes the link to break, and eventually results in
the described interrupt storm.

Fix this by checking whether the cmode actually changes. We want
that the very first call to mv88e6390x_port_set_cmode() always
configures the registers, therefore initialize port.cmode with
a value that is different from any supported cmode value.
We have to take care that we only init the ports cmode once
chip->info->num_ports is set.

v2:
- add small helper and init the number of actual ports only

Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: Remove unused transaction item queue

There are no more in tree users of the
switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure
in the kernel since commit 00fc0c51e35b ("rocker: Change world_ops API
and implementation to be switchdev independant").

Remove this unused code and update the documentation accordingly since.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix sanitation rewrite in case of non-pointers

Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:

    [...]
    uint64_t a = bpf_ktime_get_ns();
    uint64_t b = bpf_ktime_get_ns();
    uint64_t delta = b - a;
    if ((int64_t)delta > 0) {
    [...]

Turns out there is a bug where a corner case is missing in the fix
d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).

Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Reported-by: Arthur Fabre <afabre@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'doc-net-ieee802154-move-from-plain-text-to-rst'

Stefan Schmidt says:

====================
doc: net: ieee802154: move from plain text to rst

The ieee802154 subsystem doc was still in plain text. With the networking book
taking shape I thought it was time to do the first step and move it over to rst.
This really is only the minimal conversion. I need to take some time to update
and extend the docs.

The patches are based on net-next, but they only touch the networking book so I
would not expect and trouble. From what I have seen they would go through
Jonathan's tree after being acked by Dave? If you want this patches against a
different tree let me know.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

doc: net: ieee802154: remove old plain text docs after switching to rst

The plain text docs are converted to rst now, which allows us to remove
the old text file from the tree.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc: net: ieee802154: introduce IEEE 802.15.4 subsystem doc in rst style

Moving the ieee802154 docs from a plain text file into the new rst
style. This commit only does the minimal needed change to bring the
documentation over. Follow up patches will improve and extend on this.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: fix kdoc

devlink suffers from a few kdoc warnings:

net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register'
net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register'
net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register'
net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create'
net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create'

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-aquantia-minor-bug-fixes-after-static-analysis'

Igor Russkikh says:

====================
net: aquantia: minor bug fixes after static analysis

This patchset fixes minor errors and warnings found by smatch and kasan.

Extra patch is to replace AQ_HW_WAIT_FOR with readx_poll_timeout
to improve readability.

V2:
use readx_poll
resubmitted to net-next since the changeset became quite big.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: use better wrappers for state registers

Replace some direct registers reads with better
online functions.

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic

David noticed the original define was hiding 'err' variable
reference. Thats confusing and counterintuitive.

Andrew noted the whole macro could be replaced with standard readx_poll
kernel macro. This makes code more readable.

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fixed instack structure overflow

This is a real stack undercorruption found by kasan build.

The issue did no harm normally because it only overflowed
2 bytes after `bitary` array which on most architectures
were mapped into `err` local.

Fixes: bab6de8fd180 ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.")
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fixed buffer overflow

The overflow is detected by smatch:

drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c: 175
aq_pci_func_free_irqs() error: buffer overflow 'self->aq_vec' 8 <= 31

In reality msix_entry_mask always restricts number of iterations.
Adding extra condition to make logic clear and smatch happy.

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: added newline at end of file

drivers/net/ethernet/aquantia/atlantic/aq_nic.c: 991:1:
warning: no newline at end of file

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fixed memcpy size

Not careful array dereference caused analysis tools
to think there could be memory overflow.

There was actually no corruption because the array is
two dimensional.

drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c: 140
aq_ethtool_get_strings() error:
memcpy() '*aq_ethtool_stat_names' too small (32 vs 704)

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Add ICMPv6 support when parse route ipproto

For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.

Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.

v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MIPS: eBPF: Fix icache flush end address

The MIPS eBPF JIT calls flush_icache_range() in order to ensure the
icache observes the code that we just wrote. Unfortunately it gets the
end address calculation wrong due to some bad pointer arithmetic.

The struct jit_ctx target field is of type pointer to u32, and as such
adding one to it will increment the address being pointed to by 4 bytes.
Therefore in order to find the address of the end of the code we simply
need to add the number of 4 byte instructions emitted, but we mistakenly
add the number of instructions multiplied by 4. This results in the call
to flush_icache_range() operating on a memory region 4x larger than
intended, which is always wasteful and can cause crashes if we overrun
into an unmapped page.

Fix this by correcting the pointer arithmetic to remove the bogus
multiplication, and use braces to remove the need for a set of brackets
whilst also making it obvious that the target field is a pointer.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'nfp-control-processor-DMA-support-and-RJ45'

Jakub Kicinski says:

====================
nfp: control processor DMA support and RJ45

This series starts with adding support for reporting twisted pair
media type in ethtool.

Remaining patches add support for using DMA with the control/service
processor.  Currently we always copy the command data into card's
memory.  DMA support allows us to have the NSP read the data from
host memory by itself.  Unfortunately, the FW loading and flashing
cannot directly map the buffers for DMA because (a) the firmware
ABI returns const buffers, and (b) the buffers may be vmalloc()ed
in many mysterious/unmappable way.  So just bite the bullet -
allocate new host buffer for the command and copy.

As Dirk explains, the NSP now supports updating all FWs at once
which means the max flashing time grew significantly.  He bumps
the max wait to avoid timeouts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: nsp: set higher timeout for flash bundle

The management firmware now supports being passed a bundle with
multiple components to be stored in flash at once. This makes it
easier to update all components to a known state with a single
user command, however, this also has the potential to increase
the time required to perform the update significantly.

The management firmware only updates the components out of a bundle
which are outdated, however, we need to make sure we can handle
the absolute worst case where a CPLD update can take a long time
to perform.

We set a very conservative total timeout of 900s which already
adds a contingency.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: nsp: allow the use of DMA buffer

Newer versions of NSP can access host memory. Simplest access
type requires all data to be in one contiguous area. Since we
don't have the guarantee on where callers of the NSP ABI will
allocate their buffers we allocate a bounce buffer and copy
the data in and out.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: nsp: move default buffer handling into its own function

DMA version of NSP communication is coming, move the code which
copies data into the NFP buffer into a separate function.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: nsp: use fractional size of the buffer

NSP expresses the buffer size in MB and 4 kB blocks. For small
buffers the kB part may make a difference, so count it in.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: report RJ45 connector in ethtool

Add support for reporting twisted pair port type.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lan743x: Fix TX Stall Issue

It has been observed that tx queue stalls while downloading
from certain web sites (example www.speedtest.net)

The cause has been tracked down to a corner case where
dma descriptors where not setup properly. And there for a tx
completion interrupt was not signaled.

This fix corrects the problem by properly marking the end of
a multi descriptor transmission.

Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: phylink: fix uninitialized variable in phylink_get_mac_state

When debugging an issue I found implausible values in state->pause.
Reason in that state->pause isn't initialized and later only single
bits are changed. Also the struct itself isn't initialized in
phylink_resolve(). So better initialize state->pause and other
not yet initialized fields.

v2:
- use right function name in subject
v3:
- initialize additional fields

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: regression on cpus with high cores: set mode with 8 queues

Recently the maximum number of queues was increased up to 8, but
NIC was not fully configured for 8 queues. In setups with more than 4 CPU
cores parts of TX traffic gets lost if the kernel routes it to queues 4th-8th.

This patch sets a tx hw traffic mode with 8 queues.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202651

Fixes: 71a963cfc50b ("net: aquantia: increase max number of hw queues")
Reported-by: Nicholas Johnson <nicholas.johnson@outlook.com.au>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fixes for UDP GRO

The current implementation for UDP GRO tests is racy: the receiver
may flush the RX queue while the sending is still transmitting and
incorrectly report RX errors, with a wrong number of packet received.

Add explicit timeouts to the receiver for both connection activation
(first packet received for UDP) and reception completion, so that
in the above critical scenario the receiver will wait for the
transfer completion.

Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: neta: disable comphy when setting mode

The comphy driver for Armada 3700 by Miquèl Raynal (which is currently
in linux-next) does not actually set comphy mode when phy_set_mode_ext
is called. The mode is set at next call of phy_power_on.

Update the driver to semantics similar to mvpp2: helper
mvneta_comphy_init sets comphy mode and powers it on.
When mode is to be changed in mvneta_mac_config, first power the comphy
off, then call mvneta_comphy_init (which sets the mode to new one).

Only do this when new mode is different from old mode.

This should also work for Armada 38x, since in that comphy driver
methods power_on and power_off are unimplemented.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'enetc-Add-mdio-support-and-device-tree-nodes'

Claudiu Manoil says:

====================
enetc: Add mdio support and device tree nodes

This is the missing part to enable PCI probing of the ENETC ethernet
ports on the LS1028A SoC and external traffic on the LS1028A RDB board.
It's one of the first items on the TODO list for the recently merged
ENETC ethernet driver.

v3: Add DT bindings doc for ENETC connections
v4: none
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: freescale: enetc: Add connection bindings for ENETC ethernet nodes

Define connection bindings (external PHY connections and internal links)
for the ENETC on-chip ethernet controllers.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: Add ENETC PF level external MDIO support

Each ENETC PF has its own MDIO interface, the corresponding
MDIO registers are mapped in the ENETC's Port register block.
The current patch adds a driver for these PF level MDIO buses,
so that each PF can manage directly its own external link.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A RDB board

The LS1028A RDB board features an Atheros PHY connected over
SGMII to the ENETC PF0 (or Port0). ENETC Port1 (PF1) has no
external connection on this board, so it can be disabled for now.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpoints

The LS1028A SoC features a PCI Integrated Endpoint Root Complex
(IERC) defining several integrated PCI devices, including the ENETC
ethernet controller integrated endpoints (IEPs). The IERC implements
ECAM (Enhanced Configuration Access Mechanism) to provide access
to the PCIe config space of the IEPs. This means the the IEPs
(including ENETC) do not support the standard PCIe BARs, instead
the Enhanced Allocation (EA) capability structures in the ECAM space
are used to fix the base addresses in the system, and the PCI
subsystem uses these structures for device enumeration and discovery.
The "ranges" entries contain basic information from these EA capabily
structures required by the kernel for device enumeration.

The current patch also enables the first 2 ENETC PFs (Physiscal
Functions) and the associated VFs (Virtual Functions), 2 VFs for
each PF. Each of these ENETC PFs has an external ethernet port
on the LS1028A SoC.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: drop refcount if bpf_map_new_fd() fails in map_create()

In bpf/syscall.c, map_create() first set map->usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.

Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: Peng Sun <sironhide0null@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

netfilter: nf_tables: merge ipv4 and ipv6 nat chain types

Merge the ipv4 and ipv6 nat chain type. This is the last
missing piece which allows to provide inet family support
for nat in a follow patch.

The kconfig knobs for ipv4/ipv6 nat chain are removed, the
nat chain type will be built unconditionally if NFT_NAT
expression is enabled.

Before:
   text    data     bss     dec     hex filename
   1576     896       0    2472     9a8 nft_chain_nat_ipv4.ko
   1697     896       0    2593     a21 nft_chain_nat_ipv6.ko

After:
   text    data     bss     dec     hex filename
   1832     896       0    2728     aa8 nft_chain_nat.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: nat: merge nft_masq protocol specific modules

The family specific masq modules are way too small to warrant
an extra module, just place all of them in nft_masq.

before:
  text    data     bss     dec     hex filename
   1001     832       0    1833     729 nft_masq.ko
    766     896       0    1662     67e nft_masq_ipv4.ko
    764     896       0    1660     67c nft_masq_ipv6.ko

after:
   2010     960       0    2970     b9a nft_masq.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: nat: merge nft_redir protocol specific modules

before:
text    data     bss     dec     hex filename
990     832       0    1822     71e nft_redir.ko
697     896       0    1593     639 nft_redir_ipv4.ko
713     896       0    1609     649 nft_redir_ipv6.ko

after:
text    data     bss     dec     hex filename
1910     960       0    2870     b36 nft_redir.ko

size is reduced, all helpers from nft_redir.ko can be made static.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_IDLETIMER: fix sysfs callback function type

Use struct device_attribute instead of struct idletimer_tg_attr, and
the correct callback function type to avoid indirect call mismatches
with Control Flow Integrity checking.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2

CONNTRACK_LOCKS is divisor when computer array index, if it is power of
2, compiler will optimize modulo operation as bitwise AND, or else
modulo will lower performance.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: check the result of dereferencing base_chain->stats

Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.

base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.

And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.

And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.

Thanks for Eric's help to finish this patch.

Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slave

Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook
calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb
device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in
to not drop packets for slave devices too. Currently, neighbor
solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting
dropped. This patch enables IPv6 communications for bridges with an
address that are enslaved to a VRF.

Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: get sctphdr by sctphoff in sctp_csum_check

sctp_csum_check() is called by sctp_s/dnat_handler() where it calls
skb_make_writable() to ensure sctphdr to be linearized.

So there's no need to get sctphdr by calling skb_header_pointer()
in sctp_csum_check().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: convert the proto argument from u8 to u16

The proto in struct xt_match and struct xt_target is u16, when
calling xt_check_target/match, their proto argument is u8,
and will cause truncation, it is harmless to ip packet, since
ip proto is u8

if a etable's match/target has proto that is u16, will cause
the check failure.

and convert be16 to short in bridge/netfilter/ebtables.c

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_tunnel: Add dst_cache support

The metadata_dst does not initialize the dst_cache field, this causes
problems to ip_md_tunnel_xmit() since it cannot use this cache, hence,
Triggering a route lookup for every packet.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: conntrack: tcp: only close if RST matches exact sequence

TCP resets cause instant transition from established to closed state
provided the reset is in-window.  Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.

Main problem for conntrack is that its a middlebox, i.e.  whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).

Therefore we can't simply flag RSTs with non-exact match as invalid.

This updates RST processing as follows:

1. If the connection is in a state other than ESTABLISHED, nothing is
   changed, RST is subject to normal in-window check.

2. If the RSTs sequence number either matches exactly RCV.NXT,
   connection state moves to CLOSE.

3. The same applies if the RST sequence number aligns with a previous
   packet in the same direction.

In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.

If the peer sends a challenge ack, connection timeout will be reset.

If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.

If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.

Packetdrill test case:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001

// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001

// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R  2:2(0) ack 1001 win 260

// 2nd reset, but too far ahead sequence.  Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260

// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260

// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501

// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260

// ACK
0.300 < . 1001:1001(0) ack 1001 win 257

// more data could be exchanged here, connection
// is still established

// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002

// Close the connection without reading outstanding data
0.700 close(4) = 0

// so one more reset.  Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.

With patch, this generates following conntrack events:
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: change some data types from int to bool

Change the data type of the following variables from int to bool
across ipvs code:

  - found
  - loop
  - need_full_dest
  - need_full_svc
  - payload_csum

Also change the following functions to use bool full_entry param
instead of int:

  - ip_vs_genl_parse_dest()
  - ip_vs_genl_parse_service()

This patch does not change any functionality but makes the source
code slightly easier to read.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X

Upon setting the cmode on 6390 and 6390X, the associated serdes
interfaces must be powered off/on.

Both 6390X and 6390 share code to do so, but it currently uses the 6390
specific helper mv88e6390_serdes_power() to disable and enable the
serdes interface.

This call will fail silently on 6390X when trying so set a 10G interface
such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs
the lane number based on modes supported by the 6390, and returns 0 when
getting -ENODEV as a lane number.

Using mv88e6390x_serdes_power() should be safe here, since we explicitly
rule-out all ports but the 9 and 10, and because modes supported by 6390
ports 9 and 10 are a subset of those supported on 6390X.

This was tested on 6390X using RXAUI mode.

Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: rtnetlink: use internal netns switch for ip commands

'ip' can switch network namespaces internally and then run a given
command relative to that namespace without the need to fork and exec
another ip instance. Update all references of the form:
ip netns exec "$testns" ip ...
to
ip -netns "$testns" ...

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>