OSDN Git Service

tomoyo/tomoyo-test1.git
4 years agor8169: load firmware for RTL8168fp/RTL8117
Heiner Kallweit [Fri, 15 Nov 2019 21:38:25 +0000 (22:38 +0100)]
r8169: load firmware for RTL8168fp/RTL8117

Load Realtek-provided firmware for RTL8168fp/RTL8117. Unlike the
firmware for other chip versions which is for the PHY, firmware for
RTL8168fp/RTL8117 is for the MAC.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: improve conditional firmware loading for RTL8168d
Heiner Kallweit [Fri, 15 Nov 2019 20:35:22 +0000 (21:35 +0100)]
r8169: improve conditional firmware loading for RTL8168d

Using constant MII_EXPANSION is misleading here because register 0x06
has a different meaning on page 0x0005. Here a proprietary PHY
parameter is read by writing the parameter id to register 0x05 on page
0x0005, followed by reading the parameter value from register 0x06.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: update to use phy_support_asym_pause()
Russell King [Fri, 15 Nov 2019 20:05:45 +0000 (20:05 +0000)]
net: phylink: update to use phy_support_asym_pause()

Use phy_support_asym_pause() rather than open-coding it.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'wireless-drivers-next-2019-11-15' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Sat, 16 Nov 2019 21:06:07 +0000 (13:06 -0800)]
Merge tag 'wireless-drivers-next-2019-11-15' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.5

Second set of patches for v5.5. Nothing special this time, smaller
features to various drivers and of course fixes all over.

Major changes:

iwlwifi

* update scan FW API

* bump the supported FW API version

* add debug dump collection on assert in WoWLAN

* enable adaptive dwell on P2P interfaces

ath10k

* request for PM_QOS_CPU_DMA_LATENCY to improve firmware initialisation time

qtnfmac

* add support for getting/setting transmit power

* handle MIC failure event from firmware

rtl8xxxu

* add support for Edimax EW-7611ULB

wil6210

* add SPDX license identifiers
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobonding: symmetric ICMP transmit
Matteo Croce [Fri, 15 Nov 2019 11:10:37 +0000 (12:10 +0100)]
bonding: symmetric ICMP transmit

A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports
to balance packets between slaves. With some network errors, we receive an ICMP
error packet by the remote host or a router. If sent by a router, the source IP
can differ from the remote host one. Additionally the ICMP protocol has no port
numbers, so a layer3+4 bonding will get a different hash than the previous one.
These two conditions could let the packet go through a different interface than
the other packets of the same flow:

    # tcpdump -qltnni veth0 |sed 's/^/0: /' &
    # tcpdump -qltnni veth1 |sed 's/^/1: /' &
    # hping3 -2 192.168.0.2 -p 9
    0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36

An ICMP error packet contains the header of the packet which caused the network
error, so inspect it and match the flow against it, so we can send the ICMP via
the same interface of the previous packet in the flow.
Move the IP and port dissect code into a generic function bond_flow_ip() and if
we are dissecting an ICMP error packet, call it again with the adjusted offset.

    # hping3 -2 192.168.0.2 -p 9
    1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0
    0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0
    0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: omit error check from of_get_phy_mode
Horatiu Vultur [Fri, 15 Nov 2019 10:11:15 +0000 (11:11 +0100)]
net: mscc: ocelot: omit error check from of_get_phy_mode

The commit 0c65b2b90d13c ("net: of_get_phy_mode: Change API to solve
int/unit warnings") updated the function of_get_phy_mode declaration.
Now it returns an error code and in case the node doesn't contain the
property 'phy-mode' or 'phy-connection-type' it returns -EINVAL and would
set the phy_interface_t to PHY_INTERFACE_MODE_NA.

Ocelot VSC7514 has 4 internal phys which have the phy interface
PHY_INTERFACE_MODE_NA. So because of_get_phy_mode would assign
PHY_INTERFACE_MODE_NA to phy_mode when there is an error, there is no need
to add the error check.

Updates for v2:
 - drop error check because of_get_phy_mode already assigns phy_interface
   to PHY_INTERFACE_MODE in case of error.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: core: allow fast GRO for skbs with Ethernet header in head
Alexander Lobakin [Fri, 15 Nov 2019 09:11:35 +0000 (12:11 +0300)]
net: core: allow fast GRO for skbs with Ethernet header in head

Commit 78d3fd0b7de8 ("gro: Only use skb_gro_header for completely
non-linear packets") back in May'09 (v2.6.31-rc1) has changed the
original condition '!skb_headlen(skb)' to
'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since
the drivers that need this optimisation all provide completely
non-linear packets" (note that this condition has become the current
'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit
ced14f6804a9 ("net: Correct comparisons and calculations using
skb->tail and skb-transport_header") without any functional changes).

For now, we have the following rough statistics for v5.4-rc7:
1) napi_gro_frags: 14
2) napi_gro_receive with skb->head containing (most of) payload: 83
3) napi_gro_receive with skb->head containing all the headers: 20
4) napi_gro_receive with skb->head containing only Ethernet header: 2

With the current condition, fast GRO with the usage of
NAPI_GRO_CB(skb)->frag0 is available only in the [1] case.
Packets pushed by [2] and [3] go through the 'slow' path, but
it's not a problem for them as they already contain all the needed
headers in skb->head, so pskb_may_pull() only moves skb->data.

The layout of skbs in the fourth [4] case at the moment of
dev_gro_receive() is identical to skbs that have come through [1],
as napi_frags_skb() pulls Ethernet header to skb->head. The only
difference is that the mentioned condition is always false for them,
because skb_put() and friends irreversibly alter the tail pointer.
They also go through the 'slow' path, but now every single
pskb_may_pull() in every single .gro_receive() will call the *really*
slow __pskb_pull_tail() to pull headers to head. This significantly
decreases the overall performance for no visible reasons.

The only two users of method [4] is:
* drivers/staging/qlge
* drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq)

Note that in case with wireless drivers we can't use [1]
(napi_gro_frags()) at least for now and mac80211 stack always
performs pushes and pulls anyways, so performance hit is inavoidable.

At the moment of v2.6.31 the mentioned change was necessary (that's
why I don't add the "Fixes:" tag), but it became obsolete since
skb_gro_mac_header() has gone in commit a50e233c50db ("net-gro:
restore frag0 optimization"), so we can simply revert the condition
in gro_reset_offset() to allow skbs from [4] go through the 'fast'
path just like in case [1].

This was tested on a 600 MHz MIPS CPU and a custom driver and this
patch gave boosts up to 40 Mbps to method [4] in both directions
comparing to net-next, which made overall performance relatively
close to [1] (without it, [4] is the slowest).

v2:
- Add more references and explanations to commit message
- Fix some typos ibid
- No functional changes

Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'bnx2x-Remove-function-casts'
David S. Miller [Sat, 16 Nov 2019 20:50:57 +0000 (12:50 -0800)]
Merge branch 'bnx2x-Remove-function-casts'

Kees Cook says:

====================
bnx2x: Remove function casts

In order to make the entire kernel usable under Clang's Control Flow
Integrity protections, function prototype casts need to be avoided
because this will trip CFI checks at runtime (i.e. a mismatch between
the caller's expected function prototype and the destination function's
prototype). Many of these cases can be found with -Wcast-function-type,
which found that bnx2x had a bunch of needless (or at least confusing)
function casts. This series removes them all.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove hw_reset_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:15 +0000 (21:07 -0800)]
bnx2x: Remove hw_reset_t function casts

All .rw_reset callbacks except bnx2x_84833_hw_reset_phy() use a
void return type. No callers of .hw_reset check a return value and
bnx2x_84833_hw_reset_phy() unconditionally returns 0. Remove all
hw_reset_t casts and fix the return type to void.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove format_fw_ver_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:14 +0000 (21:07 -0800)]
bnx2x: Remove format_fw_ver_t function casts

The return values for format_fw_ver_t callbacks are supposed to be
"int", not "u8". Ultimately, the top-level caller doesn't actually check
the return value at all, but just clean this all up anyway and fix the
prototypes so that casts are no longer needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove config_init_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:13 +0000 (21:07 -0800)]
bnx2x: Remove config_init_t function casts

No callers of .config_init check return values. Remove the casting and
change all callbacks to have the correct function prototype.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove read_status_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:12 +0000 (21:07 -0800)]
bnx2x: Remove read_status_t function casts

The function casts for .read_status callbacks end up casting some int
return values to u8. This seems to be bug-prone (-EINVAL being returned
into something that appears to be true/false), but fixing the function
prototypes doesn't change the existing behavior. Fix the return values
to remove the casts.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Drop redundant callback function casts
Kees Cook [Fri, 15 Nov 2019 05:07:11 +0000 (21:07 -0800)]
bnx2x: Drop redundant callback function casts

NULL is already "void *" so it will auto-cast in assignments and
initializers. Additionally, all the callbacks for .link_reset,
.config_loopback, .set_link_led, and .phy_specific_func are already
correct. No casting is needed for these, so remove them.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoenetc: update TSN Qbv PSPEED set according to adjust link speed
Po Liu [Fri, 15 Nov 2019 03:33:41 +0000 (03:33 +0000)]
enetc: update TSN Qbv PSPEED set according to adjust link speed

ENETC has a register PSPEED to indicate the link speed of hardware.
It is need to update accordingly. PSPEED field needs to be updated
with the port speed for QBV scheduling purposes. Or else there is
chance for gate slot not free by frame taking the MAC if PSPEED and
phy speed not match. So update PSPEED when link adjust. This is
implement by the adjust_link.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoenetc: Configure the Time-Aware Scheduler via tc-taprio offload
Po Liu [Fri, 15 Nov 2019 03:33:33 +0000 (03:33 +0000)]
enetc: Configure the Time-Aware Scheduler via tc-taprio offload

ENETC supports in hardware for time-based egress shaping according
to IEEE 802.1Qbv. This patch implement the Qbv enablement by the
hardware offload method qdisc tc-taprio method.
Also update cbdr writeback to up level since control bd ring may
writeback data to control bd ring.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agopage_pool: do not release pool until inflight == 0.
Jonathan Lemon [Thu, 14 Nov 2019 22:13:00 +0000 (14:13 -0800)]
page_pool: do not release pool until inflight == 0.

The page pool keeps track of the number of pages in flight, and
it isn't safe to remove the pool until all pages are returned.

Disallow removing the pool until all pages are back, so the pool
is always available for page producers.

Make the page pool responsible for its own delayed destruction
instead of relying on XDP, so the page pool can be used without
the xdp memory model.

When all pages are returned, free the pool and notify xdp if the
pool is registered with the xdp memory system.  Have the callback
perform a table walk since some drivers (cpsw) may share the pool
among multiple xdp_rxq_info.

Note that the increment of pages_state_release_cnt may result in
inflight == 0, resulting in the pool being released.

Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'smc-last-part-of-termination-improvements'
David S. Miller [Sat, 16 Nov 2019 20:26:49 +0000 (12:26 -0800)]
Merge branch 'smc-last-part-of-termination-improvements'

Karsten Graul says:

====================
last part of termination improvements

Patches 1 and 2 finish the set of termination patches, introducing
a reboot handler that terminates all link groups. Patch 3 adds an
rcu_barrier before the module is unloaded, and patch 4 is cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: remove unused constant
Ursula Braun [Sat, 16 Nov 2019 16:47:32 +0000 (17:47 +0100)]
net/smc: remove unused constant

Constant SMC_CLOSE_WAIT_LISTEN_CLCSOCK_TIME is defined, but since
commit 3d502067599f ("net/smc: simplify wait when closing listen socket")
no longer used. Remove it.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: use rcu_barrier() on module unload
Ursula Braun [Sat, 16 Nov 2019 16:47:31 +0000 (17:47 +0100)]
net/smc: use rcu_barrier() on module unload

Add rcu_barrier() to make sure no RCU readers or callbacks are
pending when the module is unloaded.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: guarantee removal of link groups in reboot
Ursula Braun [Sat, 16 Nov 2019 16:47:30 +0000 (17:47 +0100)]
net/smc: guarantee removal of link groups in reboot

When rebooting it should be guaranteed all link groups are cleaned
up and freed.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: introduce bookkeeping of SMCR link groups
Ursula Braun [Sat, 16 Nov 2019 16:47:29 +0000 (17:47 +0100)]
net/smc: introduce bookkeeping of SMCR link groups

If the smc module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
Counters for the total number of SMCR link groups and for the total
number of SMCR links per IB device are introduced. smc module unloading
continues only if the total number of SMCR link groups is zero. IB device
removal continues only it the total number of SMCR links per IB device
has decreased to zero.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotun: fix data-race in gro_normal_list()
Petar Penkov [Thu, 14 Nov 2019 17:52:09 +0000 (09:52 -0800)]
tun: fix data-race in gro_normal_list()

There is a race in the TUN driver between napi_busy_loop and
napi_gro_frags. This commit resolves the race by adding the NAPI struct
via netif_tx_napi_add, instead of netif_napi_add, which disables polling
for the NAPI struct.

KCSAN reported:
BUG: KCSAN: data-race in gro_normal_list.part.0 / napi_busy_loop

write to 0xffff8880b5d474b0 of 4 bytes by task 11205 on cpu 0:
 gro_normal_list.part.0+0x77/0xb0 net/core/dev.c:5682
 gro_normal_list net/core/dev.c:5678 [inline]
 gro_normal_one net/core/dev.c:5692 [inline]
 napi_frags_finish net/core/dev.c:5705 [inline]
 napi_gro_frags+0x625/0x770 net/core/dev.c:5778
 tun_get_user+0x2150/0x26a0 drivers/net/tun.c:1976
 tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
 call_write_iter include/linux/fs.h:1895 [inline]
 do_iter_readv_writev+0x487/0x5b0 fs/read_write.c:693
 do_iter_write fs/read_write.c:970 [inline]
 do_iter_write+0x13b/0x3c0 fs/read_write.c:951
 vfs_writev+0x118/0x1c0 fs/read_write.c:1015
 do_writev+0xe3/0x250 fs/read_write.c:1058
 __do_sys_writev fs/read_write.c:1131 [inline]
 __se_sys_writev fs/read_write.c:1128 [inline]
 __x64_sys_writev+0x4e/0x60 fs/read_write.c:1128
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffff8880b5d474b0 of 4 bytes by task 11168 on cpu 1:
 gro_normal_list net/core/dev.c:5678 [inline]
 napi_busy_loop+0xda/0x4f0 net/core/dev.c:6126
 sk_busy_loop include/net/busy_poll.h:108 [inline]
 __skb_recv_udp+0x4ad/0x560 net/ipv4/udp.c:1689
 udpv6_recvmsg+0x29e/0xe90 net/ipv6/udp.c:288
 inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
 sock_recvmsg_nosec net/socket.c:871 [inline]
 sock_recvmsg net/socket.c:889 [inline]
 sock_recvmsg+0x92/0xb0 net/socket.c:885
 sock_read_iter+0x15f/0x1e0 net/socket.c:967
 call_read_iter include/linux/fs.h:1889 [inline]
 new_sync_read+0x389/0x4f0 fs/read_write.c:414
 __vfs_read+0xb1/0xc0 fs/read_write.c:427
 vfs_read fs/read_write.c:461 [inline]
 vfs_read+0x143/0x2c0 fs/read_write.c:446
 ksys_read+0xd5/0x1b0 fs/read_write.c:587
 __do_sys_read fs/read_write.c:597 [inline]
 __se_sys_read fs/read_write.c:595 [inline]
 __x64_sys_read+0x4c/0x60 fs/read_write.c:595
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 11168 Comm: syz-executor.0 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 943170998b20 ("tun: enable NAPI for TUN/TAP driver")
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: net: tcp_mmap should create detached threads
Eric Dumazet [Thu, 14 Nov 2019 16:43:27 +0000 (08:43 -0800)]
selftests: net: tcp_mmap should create detached threads

Since we do not plan using pthread_join() in the server do_accept()
loop, we better create detached threads, or risk increasing memory
footprint over time.

Fixes: 192dc405f308 ("selftests: net: add tcp_mmap program")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: openvswitch: don't call pad_packet if not necessary
Tonghao Zhang [Thu, 14 Nov 2019 15:51:08 +0000 (23:51 +0800)]
net: openvswitch: don't call pad_packet if not necessary

The nla_put_u16/nla_put_u32 makes sure that
*attrlen is align. The call tree is that:

nla_put_u16/nla_put_u32
  -> nla_put attrlen = sizeof(u16) or sizeof(u32)
  -> __nla_put attrlen
  -> __nla_reserve attrlen
  -> skb_put(skb, nla_total_size(attrlen))

nla_total_size returns the total length of attribute
including padding.

Cc: Joe Stringer <joe@ovn.org>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'DSA-driver-for-Vitesse-Felix-switch'
David S. Miller [Fri, 15 Nov 2019 20:32:17 +0000 (12:32 -0800)]
Merge branch 'DSA-driver-for-Vitesse-Felix-switch'

Vladimir Oltean says:

====================
DSA driver for Vitesse Felix switch

This series builds upon the previous "Accomodate DSA front-end into
Ocelot" topic and does the following:

- Reworks the Ocelot (VSC7514) driver to support one more switching core
  (VSC9959), used in NPI mode. Some code which was thought to be
  SoC-specific (ocelot_board.c) wasn't, and vice versa, so it is being
  accordingly moved.
- Exports ocelot driver structures and functions to include/soc/mscc.
- Adds a DSA ocelot front-end for VSC9959, which is a PCI device and
  uses the exported ocelot functionality for hardware configuration.
- Adds a tagger driver for the Vitesse injection/extraction DSA headers.
  This is known to be compatible with at least Ocelot and Felix.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: ocelot: add driver for Felix switch family
Vladimir Oltean [Thu, 14 Nov 2019 15:03:30 +0000 (17:03 +0200)]
net: dsa: ocelot: add driver for Felix switch family

This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.

The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.

The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).

The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).

In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields.  This is the case for the switch core
reset and init.  Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.

Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: ocelot: add tagger for Ocelot/Felix switches
Vladimir Oltean [Thu, 14 Nov 2019 15:03:29 +0000 (17:03 +0200)]
net: dsa: ocelot: add tagger for Ocelot/Felix switches

While it is entirely possible that this tagger format is in fact more
generic than just these 2 switch families, I don't have that knowledge.
The Seville switch in NXP T1040 has a similar frame format, but there
are enough differences (e.g. DEST field starts at bit 57 instead of 56)
that calling this file tag_vitesse.c is a bit of a stretch at the
moment. The frame format has been listed in a comment so that people who
add support for further Vitesse switches can rework this tagger while
keeping compatibility with Felix.

The "ocelot" name was chosen instead of "felix" because even the Ocelot
switch can act as a DSA device when it is used in NPI mode, and the Felix
tagger format is almost identical. Currently it is only used for the
Felix switch embedded in the NXP LS1028A chip.

The ABI for this tagger should be considered "not stable" at the moment.
The DSA tag is always placed before the Ethernet header and therefore,
we are using the long prefix for RX tags to avoid putting the DSA master
port in promiscuous mode. Once there will be an API in DSA for drivers
to request DSA masters to be in promiscuous mode unconditionally, we
will switch to the "no prefix" extraction frame header, which will save
16 padding bytes for each RX frame.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: publish ocelot_sys.h to include/soc/mscc
Vladimir Oltean [Thu, 14 Nov 2019 15:03:28 +0000 (17:03 +0200)]
net: mscc: ocelot: publish ocelot_sys.h to include/soc/mscc

The Felix DSA driver needs to write to SYS_RAM_INIT_RAM_INIT for its own
chip initialization process.

Also update the MAINTAINERS file such that the headers exported by the
ocelot driver are under the same maintainers' umbrella as the driver
itself.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: publish structure definitions to include/soc/mscc/ocelot.h
Vladimir Oltean [Thu, 14 Nov 2019 15:03:27 +0000 (17:03 +0200)]
net: mscc: ocelot: publish structure definitions to include/soc/mscc/ocelot.h

We will be registering another switch driver based on ocelot, which
lives under drivers/net/dsa.

Make sure the Felix DSA front-end has the necessary abstractions to
implement a new Ocelot driver instantiation. This includes the function
prototypes for implementing DSA callbacks.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: separate the implementation of switch reset
Vladimir Oltean [Thu, 14 Nov 2019 15:03:26 +0000 (17:03 +0200)]
net: mscc: ocelot: separate the implementation of switch reset

The Felix switch has a different reset procedure, so a function pointer
needs to be created and added to the ocelot_ops structure.

The reset procedure has been moved into ocelot_init.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: adjust MTU on the CPU port in NPI mode
Vladimir Oltean [Thu, 14 Nov 2019 15:03:25 +0000 (17:03 +0200)]
net: mscc: ocelot: adjust MTU on the CPU port in NPI mode

When using the NPI port, the DSA tag is passed through Ethernet, so the
switch's MAC needs to accept it as it comes from the DSA master. Increase
the MTU on the external CPU port to account for the length of the
injection header.

Without this patch, MTU-sized frames are dropped by the switch's CPU
port on xmit, which is especially obvious in TCP sessions.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: export a constant for the tag length in bytes
Vladimir Oltean [Thu, 14 Nov 2019 15:03:24 +0000 (17:03 +0200)]
net: mscc: ocelot: export a constant for the tag length in bytes

This constant will be used in a future patch to increase the MTU on NPI
ports, and will also be used in the tagger driver for Felix.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: create a helper for changing the port MTU
Vladimir Oltean [Thu, 14 Nov 2019 15:03:23 +0000 (17:03 +0200)]
net: mscc: ocelot: create a helper for changing the port MTU

Since in an NPI/DSA setup, not all ports will have the same MTU, we need
to make sure the watermarks for pause frames and/or tail dropping logic
that existed in the driver is still coherent for the new MTU values.

We need to do this because the NPI (aka external CPU) port needs an
increased MTU for the DSA tag. This will be done in a future patch.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: move invariant configs out of adjust_link
Vladimir Oltean [Thu, 14 Nov 2019 15:03:22 +0000 (17:03 +0200)]
net: mscc: ocelot: move invariant configs out of adjust_link

It doesn't make sense to rewrite all these registers every time the PHY
library notifies us about a link state change.

In a future patch we will customize the MTU for the CPU port, and since
the MTU was previously configured from adjust_link, if we don't make
this change, its value would have got overridden.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: filter out ocelot SoC specific PCS config from common path
Claudiu Manoil [Thu, 14 Nov 2019 15:03:21 +0000 (17:03 +0200)]
net: mscc: ocelot: filter out ocelot SoC specific PCS config from common path

The adjust_link routine should be generic enough to be (re)used by
any SoC that integrates a switch core compatible with the Ocelot
core switch driver.  Currently all configurations are generic except
for the PCS settings that are SoC specific.  Move these out to the
Ocelot SoC/board instance.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: move resource ioremap and regmap init to common code
Claudiu Manoil [Thu, 14 Nov 2019 15:03:20 +0000 (17:03 +0200)]
net: mscc: ocelot: move resource ioremap and regmap init to common code

Let's make this ioremap and regmap init code common.  It should not
be platform dependent as it should be usable by PCI devices too.
Use better names where necessary to avoid clashes.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-smc-improve-termination-handling-part-3'
David S. Miller [Fri, 15 Nov 2019 20:28:28 +0000 (12:28 -0800)]
Merge branch 'net-smc-improve-termination-handling-part-3'

Karsten Graul says:

====================
net/smc: improve termination handling (part 3)

Part 3 of the SMC termination patches improves the link group
termination processing and introduces the ability to immediately
terminate a link group.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: immediate termination for SMCR link groups
Ursula Braun [Thu, 14 Nov 2019 12:02:47 +0000 (13:02 +0100)]
net/smc: immediate termination for SMCR link groups

If the SMC module is unloaded or an IB device is thrown away, the
immediate link group freeing introduced for SMCD is exploited for SMCR
as well. That means SMCR-specifics are added to smc_conn_kill().

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: wait for tx completions before link freeing
Ursula Braun [Thu, 14 Nov 2019 12:02:46 +0000 (13:02 +0100)]
net/smc: wait for tx completions before link freeing

Make sure all pending work requests are completed before freeing
a link.
Dismiss tx pending slots already when terminating a link group to
exploit termination shortcut in tx completion queue handler.

And kill the completion queue tasklets after destroy of the
completion queues, otherwise there is a time window for another
tasklet schedule of an already killed tasklet.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: abnormal termination without orderly flag
Ursula Braun [Thu, 14 Nov 2019 12:02:45 +0000 (13:02 +0100)]
net/smc: abnormal termination without orderly flag

For abnormal termination issue an LLC DELETE_LINK without the
orderly flag.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: no WR buffer wait for terminating link group
Ursula Braun [Thu, 14 Nov 2019 12:02:44 +0000 (13:02 +0100)]
net/smc: no WR buffer wait for terminating link group

Avoid waiting for a free work request buffer, if the link group
is already terminating.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: introduce bookkeeping of SMCD link groups
Ursula Braun [Thu, 14 Nov 2019 12:02:43 +0000 (13:02 +0100)]
net/smc: introduce bookkeeping of SMCD link groups

If the ism module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
A counters for the total number of SMCD link groups per ISM device is
introduced. ism module unloading continues only if the total number of
SMCD link groups for all ISM devices is zero. ISM device
removal continues only it the total number of SMCD link groups per ISM
device has decreased to zero.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: abnormal termination of SMCD link groups
Ursula Braun [Thu, 14 Nov 2019 12:02:42 +0000 (13:02 +0100)]
net/smc: abnormal termination of SMCD link groups

A final cleanup due to SMCD device removal means immediate freeing
of all link groups belonging to this device in interrupt context.

This patch introduces a separate SMCD link group termination routine,
which terminates all link groups of an SMCD device.

This new routine smcd_terminate_all ()is reused if the smc module is
unloaded.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: immediate termination for SMCD link groups
Ursula Braun [Thu, 14 Nov 2019 12:02:41 +0000 (13:02 +0100)]
net/smc: immediate termination for SMCD link groups

SMCD link group termination is called when peer signals its shutdown
of its corresponding link group. For regular shutdowns no connections
exist anymore. For abnormal shutdowns connections must be killed and
their DMBs must be unregistered immediately. That means the SMCR method
to delay the link group freeing several seconds does not fit.

This patch adds immediate termination of a link group and its SMCD
connections and makes sure all SMCD link group related cleanup steps
are finished.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: fix final cleanup sequence for SMCD devices
Ursula Braun [Thu, 14 Nov 2019 12:02:40 +0000 (13:02 +0100)]
net/smc: fix final cleanup sequence for SMCD devices

If peer announces shutdown, use the link group terminate worker for
local cleanup of link groups and connections to terminate link group
in proper context.

Make sure link groups are cleaned up first before destroying the
event queue of the SMCD device, because link group cleanup may
raise events.

Send signal shutdown only if peer has not done it already.

Send socket abort or close only, if peer has not already announced
shutdown.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-stmmac-CPU-Performance-Improvements'
David S. Miller [Fri, 15 Nov 2019 20:25:42 +0000 (12:25 -0800)]
Merge branch 'net-stmmac-CPU-Performance-Improvements'

Jose Abreu says:

====================
net: stmmac: CPU Performance Improvements

CPU Performance improvements for stmmac. Please check bellow for results
before and after the series.

Patch 1/7, allows RX Interrupt on Completion to be disabled and only use the
RX HW Watchdog.

Patch 2/7, setups the default RX coalesce settings instead of using the
minimum value.

Patch 3/7 and 4/7, removes the uneeded computations for RX Flow Control
activation/de-activation, on some cases.

Patch 5/7, tunes-up the default coalesce settings.

Patch 6/7, re-works the TX coalesce timer activation logic.

Patch 7/7, removes the now uneeded TBU interrupt.

NetPerf UDP Results:
--------------------

Socket  Message  Elapsed      Messages                   CPU      Service
Size    Size     Time         Okay Errors   Throughput   Util     Demand
bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
--- XGMAC@2.5G: Before
212992    1400   10.00     2100620      0     2351.7     36.69    5.112
212992           10.00     2100539            2351.6     26.18    3.648
--- XGMAC@2.5G: After
212992    1400   10.00     2108972      0     2361.5     21.73    3.015
212992           10.00     2097038            2348.1     19.21    2.666

--- GMAC5@1G: Before
212992    1400   10.00      786000      0      880.2     34.71    12.923
212992           10.00      786000             880.2     23.42    8.719
--- GMAC5@1G: After
212992    1400   10.00      842648      0      943.7     14.12    4.903
212992           10.00      842648             943.7     12.73    4.418

Perf TCP Results on RX Path:
----------------------------
--- XGMAC@2.5G: Before
22.51%  swapper          [stmmac]           [k] dwxgmac2_dma_interrupt
10.82%  swapper          [stmmac]           [k] dwxgmac2_host_mtl_irq_status
 5.21%  swapper          [stmmac]           [k] dwxgmac2_host_irq_status
 4.67%  swapper          [stmmac]           [k] dwxgmac3_safety_feat_irq_status
 3.63%  swapper          [kernel.kallsyms]  [k] stack_trace_consume_entry
 2.74%  iperf3           [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
 2.52%  swapper          [kernel.kallsyms]  [k] update_stack_state
 1.94%  ksoftirqd/0      [stmmac]           [k] dwxgmac2_dma_interrupt
 1.45%  iperf3           [kernel.kallsyms]  [k] queued_spin_lock_slowpath
 1.26%  swapper          [kernel.kallsyms]  [k] create_object
--- XGMAC@2.5G: After
 7.43%  swapper          [kernel.kallsyms]   [k] stack_trace_consume_entry
 5.86%  swapper          [stmmac]            [k] dwxgmac2_dma_interrupt
 5.68%  swapper          [kernel.kallsyms]   [k] update_stack_state
 4.71%  iperf3           [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
 2.88%  swapper          [kernel.kallsyms]   [k] create_object
 2.69%  swapper          [stmmac]            [k] dwxgmac2_host_mtl_irq_status
 2.61%  swapper          [stmmac]            [k] stmmac_napi_poll_rx
 2.52%  swapper          [kernel.kallsyms]   [k] unwind_next_frame.part.4
 1.48%  swapper          [kernel.kallsyms]   [k] unwind_get_return_address
 1.38%  swapper          [kernel.kallsyms]   [k] arch_stack_walk

--- GMAC5@1G: Before
31.29%  swapper          [stmmac]           [k] dwmac4_dma_interrupt
14.57%  swapper          [stmmac]           [k] dwmac4_irq_mtl_status
10.66%  swapper          [stmmac]           [k] dwmac4_irq_status
 1.97%  swapper          [kernel.kallsyms]  [k] stack_trace_consume_entry
 1.73%  iperf3           [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
 1.59%  swapper          [kernel.kallsyms]  [k] update_stack_state
 1.15%  iperf3           [kernel.kallsyms]  [k] do_syscall_64
 1.01%  ksoftirqd/0      [stmmac]           [k] dwmac4_dma_interrupt
 0.89%  swapper          [kernel.kallsyms]  [k] __default_send_IPI_dest_field
 0.75%  swapper          [stmmac]           [k] stmmac_napi_poll_rx
--- GMAC5@1G: After
 6.70%  swapper          [kernel.kallsyms]   [k] stack_trace_consume_entry
 5.79%  swapper          [stmmac]            [k] dwmac4_dma_interrupt
 5.29%  swapper          [kernel.kallsyms]   [k] update_stack_state
 3.52%  iperf3           [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
 2.83%  swapper          [stmmac]            [k] dwmac4_irq_mtl_status
 2.62%  swapper          [kernel.kallsyms]   [k] create_object
 2.46%  swapper          [stmmac]            [k] stmmac_napi_poll_rx
 2.32%  swapper          [kernel.kallsyms]   [k] unwind_next_frame.part.4
 2.19%  swapper          [stmmac]            [k] dwmac4_irq_status
 1.39%  swapper          [kernel.kallsyms]   [k] unwind_get_return_address
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: xgmac: Do not enable TBU interrupt
Jose Abreu [Thu, 14 Nov 2019 11:42:51 +0000 (12:42 +0100)]
net: stmmac: xgmac: Do not enable TBU interrupt

Now that TX Coalesce has been rewritten we no longer need this
additional interrupt enabled. This reduces CPU usage.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: Rework TX Coalesce logic
Jose Abreu [Thu, 14 Nov 2019 11:42:50 +0000 (12:42 +0100)]
net: stmmac: Rework TX Coalesce logic

Coalesce logic currently increments the number of packets and sets the
IC bit when the coalesced packets have passed a given limit. This does
not reflect very well what coalesce was meant for as we can have a large
number of packets that are coalesced and then a single one, sent later
on that has the IC bit.

Rework the logic so that it coalesces only upon a limit of packets and
sets the IC bit for large number of packets.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: Tune-up default coalesce settings
Jose Abreu [Thu, 14 Nov 2019 11:42:49 +0000 (12:42 +0100)]
net: stmmac: Tune-up default coalesce settings

Tune-up the defalt coalesce settings for optimal values. This gives the
best performance in most of the use-cases.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: xgmac: Remove uneeded computation for RFA/RFD
Jose Abreu [Thu, 14 Nov 2019 11:42:48 +0000 (12:42 +0100)]
net: stmmac: xgmac: Remove uneeded computation for RFA/RFD

RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO
space we have, the later we can activate Flow Control. Let's use
hard-coded values for RFA and RFD for all FIFO sizes with the exception
of 4k, which is a special case.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: gmac4+: Remove uneeded computation for RFA/RFD
Jose Abreu [Thu, 14 Nov 2019 11:42:47 +0000 (12:42 +0100)]
net: stmmac: gmac4+: Remove uneeded computation for RFA/RFD

RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO
space we have, the later we can activate Flow Control. Let's use
hard-coded values for RFA and RFD for all FIFO sizes with the exception
of 4k, which is a special case.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: Setup a default RX Coalesce value instead of the minimum
Jose Abreu [Thu, 14 Nov 2019 11:42:46 +0000 (12:42 +0100)]
net: stmmac: Setup a default RX Coalesce value instead of the minimum

For performance reasons, sometimes using the minimum RX Coalesce value
is not optimal. Lets setup a default value that is optimal in most of
the use cases.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: Do not set RX IC bit if RX Coalesce is zero
Jose Abreu [Thu, 14 Nov 2019 11:42:45 +0000 (12:42 +0100)]
net: stmmac: Do not set RX IC bit if RX Coalesce is zero

We may only want to use the RX Watchdog so lets check if RX Coalesce
settings are non-zero and only set the RX Interrupt on Completion bit if
its not.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: spectrum_router: Allocate discard adjacency entry when needed
Ido Schimmel [Thu, 14 Nov 2019 09:54:19 +0000 (11:54 +0200)]
mlxsw: spectrum_router: Allocate discard adjacency entry when needed

Commit 0c3cbbf96def ("mlxsw: Add specific trap for packets routed via
invalid nexthops") allocated an adjacency entry during driver
initialization whose purpose is to discard packets hitting the route
pointing to it.

These adjacency entries are allocated from a resource called KVD linear
(KVDL). There are situations in which the user can decide to set the
size of this resource (via devlink-resource) to 0, in which case the
driver will not be able to load.

Therefore, instead of pre-allocating this adjacency entry, simply
allocate it only when needed. A variable indicating the validity of the
entry is added and is used to ensure it is only allocated and written
once and that it is freed after all the routes were flushed.

Fixes: 0c3cbbf96def ("mlxsw: Add specific trap for packets routed via invalid nexthops")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/tls: Fix unused function warning
YueHaibing [Thu, 14 Nov 2019 07:39:46 +0000 (15:39 +0800)]
net/tls: Fix unused function warning

If PROC_FS is not set, gcc warning this:

net/tls/tls_proc.c:23:12: warning:
 'tls_statistics_seq_show' defined but not used [-Wunused-function]

Use #ifdef to guard this.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agortw88: remove duplicated include from ps.c
YueHaibing [Mon, 11 Nov 2019 03:34:27 +0000 (03:34 +0000)]
rtw88: remove duplicated include from ps.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agortl8xxxu: Remove set but not used variable 'rsr'
Zheng Yongjun [Sun, 10 Nov 2019 10:49:55 +0000 (18:49 +0800)]
rtl8xxxu: Remove set but not used variable 'rsr'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c: In function rtl8xxxu_gen2_config_channel:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:1266:13: warning: variable rsr set but not used [-Wunused-but-set-variable]

rsr is never used, so remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agobrcmsmac: remove unnecessary return
Eduardo Abinader [Thu, 7 Nov 2019 16:07:46 +0000 (17:07 +0100)]
brcmsmac: remove unnecessary return

Signed-off-by: Eduardo Abinader <eduardoabinader@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoMerge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
Kalle Valo [Fri, 15 Nov 2019 07:54:25 +0000 (09:54 +0200)]
Merge ath-next from git://git./linux/kernel/git/kvalo/ath.git

ath.git patches for v5.5. Major changes:

ath10k

* request for PM_QOS_CPU_DMA_LATENCY to improve firmware initialisation time

4 years agoiwlwifi: mvm: fix non-ACPI function
Johannes Berg [Fri, 15 Nov 2019 07:28:31 +0000 (09:28 +0200)]
iwlwifi: mvm: fix non-ACPI function

The code now compiles without ACPI, but there's a warning since
iwl_mvm_get_ppag_table() isn't used, and iwl_mvm_ppag_init() must
not unconditionally fail but return success instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: 22000: fix some indentation
Johannes Berg [Fri, 15 Nov 2019 07:28:28 +0000 (09:28 +0200)]
iwlwifi: 22000: fix some indentation

Somehow two tabs snuck into this file where just one should be
used, fix that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: remove IWL_DEVICE_22560/IWL_DEVICE_FAMILY_22560
Johannes Berg [Fri, 15 Nov 2019 07:28:25 +0000 (09:28 +0200)]
iwlwifi: remove IWL_DEVICE_22560/IWL_DEVICE_FAMILY_22560

This is dead code, nothing uses the IWL_DEVICE_22560 macro and
thus nothing every uses IWL_DEVICE_FAMILY_22560. Remove it all.

While at it, remove some code and definitions used only in this
case, and clean up some comments/names that still refer to it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: sync the iwl_mvm_session_prot_notif layout
Emmanuel Grumbach [Fri, 15 Nov 2019 07:28:22 +0000 (09:28 +0200)]
iwlwifi: mvm: sync the iwl_mvm_session_prot_notif layout

The firmware API has changed a little bit but this change
has no impact on the flow and is backward compatible.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: start CTDP budget from 2400mA
Mordechay Goodstein [Fri, 15 Nov 2019 07:28:20 +0000 (09:28 +0200)]
iwlwifi: mvm: start CTDP budget from 2400mA

The current budget of 2000mA is preventing us from reaching maximum
throughput.  According to our system engineers, we can increase the
maximum budget to 2400mA to solve this problem.

Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: don't skip mgmt tid when flushing all tids
Haim Dreyfuss [Fri, 15 Nov 2019 07:28:17 +0000 (09:28 +0200)]
iwlwifi: mvm: don't skip mgmt tid when flushing all tids

There are various of flows which require tids flushing
(disconnection, suspend, etc...).
Currently, when the driver instructs the FW to flush
he masks all the data tids(0-7).
However, the driver doesn't set the management tid (#15)
which cause the FW not to flush it.
When the FW tries to remove the mgmt queue he throws an assert
since it is not an empty queue.
instead of just set only the data tids set everything and let
the FW ignore the invalid tids.

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: scan: enable adaptive dwell in p2p
Shahar S Matityahu [Fri, 15 Nov 2019 07:28:14 +0000 (09:28 +0200)]
iwlwifi: mvm: scan: enable adaptive dwell in p2p

Align to the requirement update and support adaptive dwell in p2p scan.

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: refactor the SAR tables from mvm to acpi
Ihab Zhaika [Fri, 15 Nov 2019 07:28:11 +0000 (09:28 +0200)]
iwlwifi: refactor the SAR tables from mvm to acpi

Refactored the SAR related functions from iwlmvm to acpi
in order to make it shared between different opmodes
in addition to removing unused variable ppag_rev.

Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: scan: support scan req cmd ver 12
Shahar S Matityahu [Fri, 15 Nov 2019 07:28:08 +0000 (09:28 +0200)]
iwlwifi: scan: support scan req cmd ver 12

Implement scan request command version 12.

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: scan: make new scan req versioning flow
Shahar S Matityahu [Fri, 15 Nov 2019 07:28:05 +0000 (09:28 +0200)]
iwlwifi: scan: make new scan req versioning flow

Implement a new versioning handling flow supported from version 11
onwards.

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: fix unaligned read of rx_pkt_status
Wang Xuerui [Fri, 15 Nov 2019 07:28:02 +0000 (09:28 +0200)]
iwlwifi: mvm: fix unaligned read of rx_pkt_status

This is present since the introduction of iwlmvm.
Example stack trace on MIPS:

[<ffffffffc0789328>] iwl_mvm_rx_rx_mpdu+0xa8/0xb88 [iwlmvm]
[<ffffffffc0632b40>] iwl_pcie_rx_handle+0x420/0xc48 [iwlwifi]

Tested with a Wireless AC 7265 for ~6 months, confirmed to fix the
problem. No other unaligned accesses are spotted yet.

Signed-off-by: Wang Xuerui <wangxuerui@qiniu.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: remove redundant assignment to variable bufsz
Colin Ian King [Fri, 15 Nov 2019 07:27:59 +0000 (09:27 +0200)]
iwlwifi: remove redundant assignment to variable bufsz

The variable bufsz is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: bump FW API to 51 for 22000 series
Luca Coelho [Fri, 15 Nov 2019 07:27:56 +0000 (09:27 +0200)]
iwlwifi: bump FW API to 51 for 22000 series

Start supporting API version 51 for 22000 series.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: FW API: reference enum in docs of modify_mask
Johannes Berg [Fri, 15 Nov 2019 07:27:54 +0000 (09:27 +0200)]
iwlwifi: FW API: reference enum in docs of modify_mask

Add a reference to the correct enum rather than showing
the pattern of the actual constants.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: print rate_n_flags in a pretty format
Mordechay Goodstein [Fri, 15 Nov 2019 07:27:51 +0000 (09:27 +0200)]
iwlwifi: mvm: print rate_n_flags in a pretty format

Use the rs_pretty_print_rate() function to print the rate_n_flags in
more human-readable format.

Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: scan: adapt the code to use api ver 11
Tova Mussai [Fri, 15 Nov 2019 07:27:48 +0000 (09:27 +0200)]
iwlwifi: scan: adapt the code to use api ver 11

FW scan api ver 11 adds support for some new features,
in this version the fw did also some cleanup in the api,
which causes the driver not to be able to use the
current scan req struct.

Therefore, in this patch the driver has new version for the scan command
code

Signed-off-by: Tova Mussai <tova.mussai@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: scan: Create function to build scan cmd
Tova Mussai [Fri, 15 Nov 2019 07:27:45 +0000 (09:27 +0200)]
iwlwifi: scan: Create function to build scan cmd

Currently, the code to build scan cmd is duplicated in
iwl_mvm_reg_scan_start and iwl_mvm_sched_scan_start.

Create a function to build this command, and call the function instead.

Signed-off-by: Tova Mussai <tova.mussai@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: scan: create function for scan scheduling params
Tova Mussai [Fri, 15 Nov 2019 07:27:42 +0000 (09:27 +0200)]
iwlwifi: scan: create function for scan scheduling params

In the next patch, this code will be used from different places.
As preparation export this code into function.

Signed-off-by: Tova Mussai <tova.mussai@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: dbg_ini: support dump collection upon assert during D3
Shahar S Matityahu [Fri, 15 Nov 2019 07:27:40 +0000 (09:27 +0200)]
iwlwifi: dbg_ini: support dump collection upon assert during D3

add assert time point in the D3 resume flow in case there was an assert
during D3.

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: pcie: make iwl_pcie_gen2_update_byte_tbl static
Emmanuel Grumbach [Fri, 15 Nov 2019 07:27:37 +0000 (09:27 +0200)]
iwlwifi: pcie: make iwl_pcie_gen2_update_byte_tbl static

It is called within tx-gen2.c only.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: in VHT connection use only VHT capabilities
Mordechay Goodstein [Fri, 15 Nov 2019 07:27:34 +0000 (09:27 +0200)]
iwlwifi: mvm: in VHT connection use only VHT capabilities

mac80211 limits amsdu size to the minimum of HT and VHT capabilities
but since in a VHT connection we don't transmit HT frames we can discard
HT limits.

Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: nvm: update iwl_uhb_nvm_channels
Tova Mussai [Fri, 15 Nov 2019 07:27:31 +0000 (09:27 +0200)]
iwlwifi: nvm: update iwl_uhb_nvm_channels

Change the UHB channels to start from 1 to match the specs (11ax Draft 5.0).

Signed-off-by: Tova Mussai <tova.mussai@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: remove else-if in iwl_send_phy_cfg_cmd()
Luca Coelho [Fri, 15 Nov 2019 07:27:28 +0000 (09:27 +0200)]
iwlwifi: mvm: remove else-if in iwl_send_phy_cfg_cmd()

We return in the if block, so it's unnecessary to have an else
statement.  Remove it.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoiwlwifi: mvm: fix support for single antenna diversity
Luca Coelho [Fri, 15 Nov 2019 07:27:25 +0000 (09:27 +0200)]
iwlwifi: mvm: fix support for single antenna diversity

When the single antenna diversity support was sent upstream, only some
definitions were sent, due to a bad revert.

Fix this by adding the actual code.

Fixes: 5952e0ec3f05 ("iwlwifi: mvm: add support for single antenna diversity")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agowcn36xx: fix typo
Eduardo Abinader [Fri, 8 Nov 2019 09:10:46 +0000 (11:10 +0200)]
wcn36xx: fix typo

Signed-off-by: Eduardo Abinader <eduardoabinader@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoath10k: qmi: Sleep for a while before assigning MSA memory
Bjorn Andersson [Wed, 13 Nov 2019 23:35:58 +0000 (15:35 -0800)]
ath10k: qmi: Sleep for a while before assigning MSA memory

Unless we sleep for a while before transitioning the MSA memory to WLAN
the MPSS.AT.4.0.c2-01184-SDM845_GEN_PACK-1 firmware triggers a security
violation fairly reliably. Unforutnately recovering from this failure
always results in the entire system freezing.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoath10k: Revert "ath10k: add cleanup in ath10k_sta_state()"
Bjorn Andersson [Wed, 13 Nov 2019 20:26:44 +0000 (12:26 -0800)]
ath10k: Revert "ath10k: add cleanup in ath10k_sta_state()"

This reverts commit 334f5b61a6f29834e881923b98d1e27e5ce9620d.

This caused ath10k_snoc on Qualcomm MSM8998, SDM845 and QCS404 platforms to
trigger an assert in the firmware:

err_qdi.c:456:EF:wlan_process:1:cmnos_thread.c:3900:Asserted in wlan_vdev.c:_wlan_vdev_up:3219

Revert the offending commit for now.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoMerge branch 's390-next'
David S. Miller [Fri, 15 Nov 2019 02:16:51 +0000 (18:16 -0800)]
Merge branch 's390-next'

Julian Wiedmann says:

====================
s390/qeth: updates 2019-11-14

please apply the following qeth patches to net-next.
Along with the usual cleanups, this
(1) reduces collateral packet loss in the RX path when dealing with
    bad packets and/or allocation errors, and
(2) simplifies how the L3 driver deals with mcast IP addresses.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: don't check drvdata in sysfs code
Julian Wiedmann [Thu, 14 Nov 2019 10:19:24 +0000 (11:19 +0100)]
s390/qeth: don't check drvdata in sysfs code

Given the way how the sysfs attributes are registered / unregistered,
the show/store helpers will never be called with a NULL drvdata.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: replace qeth_l3_get_addr_buffer()
Julian Wiedmann [Thu, 14 Nov 2019 10:19:23 +0000 (11:19 +0100)]
s390/qeth: replace qeth_l3_get_addr_buffer()

The remaining usage effectively is a kmemdup() of the query object.
By not wrapping it, some of the callers can now use GFP_KERNEL for the
allocation.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: remove VLAN tracking for L3 devices
Julian Wiedmann [Thu, 14 Nov 2019 10:19:22 +0000 (11:19 +0100)]
s390/qeth: remove VLAN tracking for L3 devices

Use vlan_for_each() instead of tracking each registered VID internally.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: consolidate L3 mcast registration code
Julian Wiedmann [Thu, 14 Nov 2019 10:19:21 +0000 (11:19 +0100)]
s390/qeth: consolidate L3 mcast registration code

Current code processes each (VLAN) device twice - once to inspect the
IPv4 mcast addresses, and then a second time to walk the IPv6 mcast
addresses. Unify all this into a single helper, thus removing some
checks and a duplicated VLAN lookup.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: remove gratuitious RX modeset
Julian Wiedmann [Thu, 14 Nov 2019 10:19:20 +0000 (11:19 +0100)]
s390/qeth: remove gratuitious RX modeset

Trust the IPv4/IPv6 code to properly remove its mcast addresses when a
VLAN device is unregistered, and then also trigger an RX modeset
whenever it's needed.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: fine-tune L3 mcast locking
Julian Wiedmann [Thu, 14 Nov 2019 10:19:19 +0000 (11:19 +0100)]
s390/qeth: fine-tune L3 mcast locking

Push the inet6_dev locking down into the helper that actually needs it
for walking the mc_list.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: clean up error path in qeth_core_probe_device()
Julian Wiedmann [Thu, 14 Nov 2019 10:19:18 +0000 (11:19 +0100)]
s390/qeth: clean up error path in qeth_core_probe_device()

qeth_core_free_card() is meant to be the counterpart of
qeth_alloc_card() - but unfortunately was also picked as the place
to free the QDIO queues.

This gets messy when qeth_core_probe_device() fails during
qeth_add_dbf_entry(). At this point the card->qdio.state is not initialized
yet, so qeth_free_qdio_queues() ends up operating on uninitialized data.

Luckily for now, the whole qeth_card struct is zero-allocated and the value
of the QETH_QDIO_UNINITIALIZED enum is 0 as well. So there's no real impact
from this bug at the moment, it's just really fragile.

Clean this up by moving the qeth_free_qdio_queues() call up one level in
the hierarchy. This way it doesn't get called from the error path.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: handle skb allocation error gracefully
Julian Wiedmann [Thu, 14 Nov 2019 10:19:17 +0000 (11:19 +0100)]
s390/qeth: handle skb allocation error gracefully

When current code fails to allocate an skb in the RX path, it drops the
whole RX buffer. Considering the large number of packets that a single
RX buffer might contain, this is quite drastic.

Skip over the packet instead, and try to extract the next packet from
the RX buffer.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: drop unwanted packets earlier in RX path
Julian Wiedmann [Thu, 14 Nov 2019 10:19:16 +0000 (11:19 +0100)]
s390/qeth: drop unwanted packets earlier in RX path

Packets with an unexpected HW format are currently first extracted from
the RX buffer, passed upwards to the layer-specific driver and only then
finally dropped.

Enhance the RX path so that we can drop such packets before even
allocating an skb. For this, add some additional logic so that when a
packet is meant to be dropped, we can still walk along the packet's data
chunks in the RX buffer. This allows us to extract the following
packet(s) from the buffer.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: support per-frame invalidation
Julian Wiedmann [Thu, 14 Nov 2019 10:19:15 +0000 (11:19 +0100)]
s390/qeth: support per-frame invalidation

Each RX buffer may contain up to 64KB worth of data. In case the device
needs to discard a packet _after_ already having reserved space for it
in the buffer, the whole buffer gets set to ERROR state. As the buffer
might contain any number of good packets, this can result in collateral
packet loss.

qeth can provide relief by enabling per-frame invalidation. The RX
buffer is then presented as usual, we just need to spot & drop any
individual packet that was flagged as invalid.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agos390/qeth: gather more detailed RX dropped/error statistics
Julian Wiedmann [Thu, 14 Nov 2019 10:19:14 +0000 (11:19 +0100)]
s390/qeth: gather more detailed RX dropped/error statistics

Where available, use the fine-grained counters in rtnl_link_stats64 to
indicate different RX error causes. For drop reasons, use driver-private
ethtool counters.

In particular this patch allows us to keep track of driver-side drops due
to unknown/unsupported HW descriptor format.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'vsock-add-multi-transports-support'
David S. Miller [Fri, 15 Nov 2019 02:12:19 +0000 (18:12 -0800)]
Merge branch 'vsock-add-multi-transports-support'

Stefano Garzarella says:

====================
vsock: add multi-transports support

Most of the patches are reviewed by Dexuan, Stefan, and Jorgen.
The following patches need reviews:
- [11/15] vsock: add multi-transports support
- [12/15] vsock/vmci: register vmci_transport only when VMCI guest/host
          are active
- [15/15] vhost/vsock: refuse CID assigned to the guest->host transport

RFC: https://patchwork.ozlabs.org/cover/1168442/
v1: https://patchwork.ozlabs.org/cover/1181986/

v1 -> v2:
- Patch 11:
    + vmci_transport: sent reset when vsock_assign_transport() fails
      [Jorgen]
    + fixed loopback in the guests, checking if the remote_addr is the
      same of transport_g2h->get_local_cid()
    + virtio_transport_common: updated space available while creating
      the new child socket during a connection request
- Patch 12:
    + removed 'features' variable in vmci_transport_init() [Stefan]
    + added a flag to register only once the host [Jorgen]
- Added patch 15 to refuse CID assigned to the guest->host transport in
  the vhost_transport

This series adds the multi-transports support to vsock, following
this proposal: https://www.spinics.net/lists/netdev/msg575792.html

With the multi-transports support, we can use VSOCK with nested VMs
(using also different hypervisors) loading both guest->host and
host->guest transports at the same time.
Before this series, vmci_transport supported this behavior but only
using VMware hypervisor on L0, L1, etc.

The first 9 patches are cleanups and preparations, maybe some of
these can go regardless of this series.

Patch 10 changes the hvs_remote_addr_init(). setting the
VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make
the choice of transport to be used work properly.

Patch 11 adds multi-transports support.

Patch 12 changes a little bit the vmci_transport and the vmci driver
to register the vmci_transport only when there are active host/guest.

Patch 13 prevents the transport modules unloading while sockets are
assigned to them.

Patch 14 fixes an issue in the bind() logic discoverable only with
the new multi-transport support.

Patch 15 refuses CID assigned to the guest->host transport in the
vhost_transport.

I've tested this series with nested KVM (vsock-transport [L0,L1],
virtio-transport[L1,L2]) and with VMware (L0) + KVM (L1)
(vmci-transport [L0,L1], vhost-transport [L1], virtio-transport[L2]).

Dexuan successfully tested the RFC series on HyperV with a Linux guest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovhost/vsock: refuse CID assigned to the guest->host transport
Stefano Garzarella [Thu, 14 Nov 2019 09:57:50 +0000 (10:57 +0100)]
vhost/vsock: refuse CID assigned to the guest->host transport

In a nested VM environment, we have to refuse to assign to a nested
guest the same CID assigned to our guest->host transport.
In this way, the user can use the local CID for loopback.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>