git.osdn.net Git - tomoyo/tomoyo-test1.git/log

packet: fix data-race in fanout_flow_is_huge()

KCSAN reported the following data-race [1]

Adding a couple of READ_ONCE()/WRITE_ONCE() should silence it.

Since the report hinted about multiple cpus using the history
concurrently, I added a test avoiding writing on it if the
victim slot already contains the desired value.

[1]

BUG: KCSAN: data-race in fanout_demux_rollover / fanout_demux_rollover

read to 0xffff8880b01786cc of 4 bytes by task 18921 on cpu 1:
fanout_flow_is_huge net/packet/af_packet.c:1303 [inline]
fanout_demux_rollover+0x33e/0x3f0 net/packet/af_packet.c:1353
packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
deliver_skb net/core/dev.c:1888 [inline]
dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
xmit_one net/core/dev.c:3195 [inline]
dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
__dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

write to 0xffff8880b01786cc of 4 bytes by task 18922 on cpu 0:
fanout_flow_is_huge net/packet/af_packet.c:1306 [inline]
fanout_demux_rollover+0x3a4/0x3f0 net/packet/af_packet.c:1353
packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
deliver_skb net/core/dev.c:1888 [inline]
dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
xmit_one net/core/dev.c:3195 [inline]
dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
__dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 18922 Comm: syz-executor.3 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 3b3a5b0aab5b ("packet: rollover huge flows before small flows")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'TIPC-Encryption'

Tuong Lien says:

====================
TIPC Encryption

This series provides TIPC encryption feature, kernel part. There will be
another one in the 'iproute2/tipc' for user space to set key.

v2: add select crypto 'aes(gcm)' for TIPC_CRYPTO in Kconfig
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add support for AEAD key setting via netlink

This commit adds two netlink commands to TIPC in order for user to be
able to set or remove AEAD keys:
- TIPC_NL_KEY_SET
- TIPC_NL_KEY_FLUSH

When the 'KEY_SET' is given along with the key data, the key will be
initiated and attached to TIPC crypto. On the other hand, the
'KEY_FLUSH' command will remove all existing keys if any.

Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: introduce TIPC encryption & authentication

This commit offers an option to encrypt and authenticate all messaging,
including the neighbor discovery messages. The currently most advanced
algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All
encryption/decryption is done at the bearer layer, just before leaving
or after entering TIPC.

Supported features:
- Encryption & authentication of all TIPC messages (header + data);
- Two symmetric-key modes: Cluster and Per-node;
- Automatic key switching;
- Key-expired revoking (sequence number wrapped);
- Lock-free encryption/decryption (RCU);
- Asynchronous crypto, Intel AES-NI supported;
- Multiple cipher transforms;
- Logs & statistics;

Two key modes:
- Cluster key mode: One single key is used for both TX & RX in all
nodes in the cluster.
- Per-node key mode: Each nodes in the cluster has one specific TX key.
For RX, a node requires its peers' TX key to be able to decrypt the
messages from those peers.

Key setting from user-space is performed via netlink by a user program
(e.g. the iproute2 'tipc' tool).

Internal key state machine:

                                 Attach    Align(RX)
                                     +-+   +-+
                                     | V   | V
        +---------+      Attach     +---------+
        |  IDLE   |---------------->| PENDING |(user = 0)
        +---------+                 +---------+
           A   A                   Switch|  A
           |   |                         |  |
           |   | Free(switch/revoked)    |  |
     (Free)|   +----------------------+  |  |Timeout
           |              (TX)        |  |  |(RX)
           |                          |  |  |
           |                          |  v  |
        +---------+      Switch     +---------+
        | PASSIVE |<----------------| ACTIVE  |
        +---------+       (RX)      +---------+
        (user = 1)                  (user >= 1)

The number of TFMs is 10 by default and can be changed via the procfs
'net/tipc/max_tfms'. At this moment, as for simplicity, this file is
also used to print the crypto statistics at runtime:

echo 0xfff1 > /proc/sys/net/tipc/max_tfms

The patch defines a new TIPC version (v7) for the encryption message (-
backward compatibility as well). The message is basically encapsulated
as follows:

   +----------------------------------------------------------+
   | TIPCv7 encryption  | Original TIPCv2    | Authentication |
   | header             | packet (encrypted) | Tag            |
   +----------------------------------------------------------+

The throughput is about ~40% for small messages (compared with non-
encryption) and ~9% for large messages. With the support from hardware
crypto i.e. the Intel AES-NI CPU instructions, the throughput increases
upto ~85% for small messages and ~55% for large messages.

By default, the new feature is inactive (i.e. no encryption) until user
sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO"
in the kernel configuration to enable/disable the new code when needed.

MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc

Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add new AEAD key structure for user API

The new structure 'tipc_aead_key' is added to the 'tipc.h' for user to
be able to transfer a key to TIPC in kernel. Netlink will be used for
this purpose in the later commits.

Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: enable creating a "preliminary" node

When user sets RX key for a peer not existing on the own node, a new
node entry is needed to which the RX key will be attached. However,
since the peer node address (& capabilities) is unknown at that moment,
only the node-ID is provided, this commit allows the creation of a node
with only the data that we call as “preliminary”.

A preliminary node is not the object of the “tipc_node_find()” but the
“tipc_node_find_by_id()”. Once the first message i.e. LINK_CONFIG comes
from that peer, and is successfully decrypted by the own node, the
actual peer node data will be properly updated and the node will
function as usual.

In addition, the node timer always starts when a node object is created
so if a preliminary node is not used, it will be cleaned up.

The later encryption functions will also use the node timer and be able
to create a preliminary node automatically when needed.

Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add reference counter to bearer

As a need to support the crypto asynchronous operations in the later
commits, apart from the current RCU mechanism for bearer pointer, we
add a 'refcnt' to the bearer object as well.

So, a bearer can be hold via 'tipc_bearer_hold()' without being freed
even though the bearer or interface can be disabled in the meanwhile.
If that happens, the bearer will be released then when the crypto
operation is completed and 'tipc_bearer_put()' is called.

Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-11-08

Another series that contains updates to the ice driver only.

Anirudh cleans up the code of kernel config of ifdef wrappers by moving
code that is needed by DCB to disable and enable the PF VSI for
configuration.  Implements ice_vsi_type_str() to convert an VSI type
enum value to its string equivalent to help identify VSI types from
module print statements.

Usha and Tarun add support for setting the maximum per-queue bit rate
for transmit queues.

Dave implements dcb_nl set functions and supporting software DCB
functions to support the callbacks defined in the dcbnl_rtnl_ops
structure.

Henry adds a check to ensure we are not resetting the device when trying
to configure it, and to return -EBUSY during a reset.

Usha fixes a call trace caused by the receive/transmit descriptor size
change request via ethtool when DCB is configured by using the number of
enabled queues and not the total number of allocated queues.

Paul cleans up and refactors the software LLDP configuration to handle
when firmware DCBX is disabled.

Akeem adds checks to ensure the VF or PF is not disabled before honoring
mailbox messages to configure the VF.

Brett corrects the check to make sure the vector_id passed down from
iavf is less than the max allowed interrupts per VF.  Updates a flag bit
to align with the current specification.

Bruce updates a switch statement to use the correct status of the
Download Package AQ command.  Does some housekeeping by cleaning up a
conditional check that is not needed.

Mitch shortens up the delay for SQ responses to resolve issues with VF
resets failing.

Jake cleans up the code reducing namespace pollution and to simplify
ice_debug_cq() since it always uses the same mask, not need to pass it
in.  Improve debugging by adding the command opcode in the debug
messages that print an error code.

v2: fixed reverse christmas tree issue in patch 3 of the series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: icmp: fix data-race in cmp_global_allow()

This code reads two global variables without protection
of a lock. We need READ_ONCE()/WRITE_ONCE() pairs to
avoid load/store-tearing and better document the intent.

KCSAN reported :
BUG: KCSAN: data-race in icmp_global_allow / icmp_global_allow

read to 0xffffffff861a8014 of 4 bytes by task 11201 on cpu 0:
icmp_global_allow+0x36/0x1b0 net/ipv4/icmp.c:254
icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
dst_link_failure include/net/dst.h:419 [inline]
vti_xmit net/ipv4/ip_vti.c:243 [inline]
vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
__netdev_start_xmit include/linux/netdevice.h:4420 [inline]
netdev_start_xmit include/linux/netdevice.h:4434 [inline]
xmit_one net/core/dev.c:3280 [inline]
dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
__dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179

write to 0xffffffff861a8014 of 4 bytes by task 11183 on cpu 1:
icmp_global_allow+0x174/0x1b0 net/ipv4/icmp.c:272
icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
dst_link_failure include/net/dst.h:419 [inline]
vti_xmit net/ipv4/ip_vti.c:243 [inline]
vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
__netdev_start_xmit include/linux/netdevice.h:4420 [inline]
netdev_start_xmit include/linux/netdevice.h:4434 [inline]
xmit_one net/core/dev.c:3280 [inline]
dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
__dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 11183 Comm: syz-executor.2 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: annotate lockless accesses to qdisc->empty

KCSAN reported the following race [1]

BUG: KCSAN: data-race in __dev_queue_xmit / net_tx_action

read to 0xffff8880ba403508 of 1 bytes by task 21814 on cpu 1:
__dev_xmit_skb net/core/dev.c:3389 [inline]
__dev_queue_xmit+0x9db/0x1b40 net/core/dev.c:3761
dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
neigh_hh_output include/net/neighbour.h:500 [inline]
neigh_output include/net/neighbour.h:509 [inline]
ip6_finish_output2+0x873/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

write to 0xffff8880ba403508 of 1 bytes by interrupt on cpu 0:
qdisc_run_begin include/net/sch_generic.h:160 [inline]
qdisc_run include/net/pkt_sched.h:120 [inline]
net_tx_action+0x2b1/0x6c0 net/core/dev.c:4551
__do_softirq+0x115/0x33f kernel/softirq.c:292
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
do_softirq kernel/softirq.c:329 [inline]
__local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
local_bh_enable include/linux/bottom_half.h:32 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
ip6_finish_output2+0x7bb/0xec0 net/ipv6/ip6_output.c:117
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 21817 Comm: syz-executor.2 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: d518d2ed8640 ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: print opcode when printing controlq errors

To help aid in debugging, display the command opcode in debug messages
that print an error code. This makes it easier to see what command
failed if only ICE_DBG_AQ_MSG is enabled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: use more accurate ICE_DBG mask types

ice_debug_cq is passed a mask which is always ICE_DBG_AQ_CMD. Modify this
function, removing the mask parameter entirely, and directly use the more
appropriate ICE_DBG_AQ_DESC and ICE_DBG_AQ_DESC_BUF.

The function is only called from ice_controlq.c, and has no
other callers outside of that file. Move it and mark it static to avoid
namespace pollution.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Introduce and use ice_vsi_type_str

ice_vsi_type_str converts an ice_vsi_type enum value to its string
equivalent. This is expected to help easily identify VSI types from
module print statements.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: remove unnecessary conditional check

There is no reason to do this conditional check before the assignment so
simply remove it.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Update enum ice_flg64_bits to current specification

Currently the VLAN ice_flg64_bits are off by 1. Fix this by
setting the ICE_FLG_EVLAN_x8100 flag to 14, which also updates
ICE_FLG_EVLAN_x9100 to 15 and ICE_FLG_VLAN_x8100 to 16.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: delay less

Shorten the delay for SQ responses, but increase the number of loops.
Max delay time is unchanged, but some operations complete much more
quickly.

In the process, add a new define to make the delay count and delay time
more explicit. Add comments to make things more explicit.

This fixes a problem with VF resets failing on with many VFs.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: use pkg_dwnld_status instead of sq_last_status

Since the return value from the Download Package AQ command is stored in
hw->pkg_dwnld_status, use that instead of sq_last_status since that may
have the return value from some other AQ command leading to unexpected
results.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Change max MSI-x vector_id check in cfg_irq_map

Currently we check to make sure the vector_id passed down from iavf
is less than or equal to pf->hw.func_caps.common_caps.num_msix_vectors.
This is incorrect because the vector_id is always 0-based and never
greater than or equal to the ICE_MAX_INTR_PER_VF. Fix this by checking
to make sure the vector_id is less than the max allowed interrupts per
VF (ICE_MAX_INTR_PER_VF).

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Check if VF is disabled for Opcode and other operations

This patch adds code to check if PF or VF is disabled before honoring
mailbox message to configure VF - If it is disabled, and opcode is for
resetting VF, the PF driver simply tell VF that all is set. In addition,
if reset is ongoing, and Admin intend to configure VF on the host, we can
poll the VF enabling bit to make sure it is ready before continue - If
after ~250 milliseconds, VF is not in active state, we can bail out with
invalid error.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: configure software LLDP in ice_init_pf_dcb

Move software LLDP configuration when FW DCBX is disabled to
ice_init_pf_dcb, since that is where the FW DCBX state is determined.
Remove this software LLDP configuration from ice_vsi_setup and
ice_set_priv_flags. Software configuration includes redirecting Rx LLDP
packets up the stack, when FW DCBX is not running.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Fix to change Rx/Tx ring descriptor size via ethtool with DCBx

This patch fixes the call trace caused by the kernel when the Rx/Tx
descriptor size change request is initiated via ethtool when DCB is
configured. ice_set_ringparam() should use vsi->num_txq instead of
vsi->alloc_txq as it represents the queues that are enabled in the
driver when DCB is enabled/disabled. Otherwise, queue index being
used can go out of range.

For example, when vsi->alloc_txq has 104 queues and with 3 TCS enabled
via DCB, each TC gets 34 queues, vsi->num_txq will be 102 and only 102
queues will be enabled.

Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: avoid setting features during reset

Certain subsystems behave very badly when called during reset (core
dump). This patch returns -EBUSY when reconfiguring some subsystems
during reset. With this patch some ethtool functions will not core
dump during reset.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Implement DCBNL support

Implement interface layer for the DCBNL subsystem. These are the functions
to support the callbacks defined in the dcbnl_rtnl_ops struct. These
callbacks are going to be used to interface with the DCB settings of the
device. Implementation of dcb_nl set functions and supporting SW DCB
functions.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Add NDO callback to set the maximum per-queue bitrate

Allow for rate limiting Tx queues. Bitrate is set in
Mbps(megabits per second).

Mbps max-rate is set for the queue via sysfs:
/sys/class/net/<iface>/queues/tx-<queue>/tx_maxrate
ex: echo 100 >/sys/class/net/ens7/queues/tx-0/tx_maxrate
echo 200 >/sys/class/net/ens7/queues/tx-1/tx_maxrate
Note: A value of zero for tx_maxrate means disabled,
default is disabled.

Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Use ice_ena_vsi and ice_dis_vsi in DCB configuration flow

DCB configuration flow needs to disable and enable only the PF (main)
VSI, so use ice_ena_vsi and ice_dis_vsi. To avoid the use of ifdef to
control the staticness of these functions, move them to ice_lib.c.

Also replace the allocate and copy of old_cfg to kmemdup() in
ice_pf_dcb_cfg().

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

cxgb4: fix 64-bit division on i386

Fix following compile error on i386 architecture.

ERROR: "__udivdi3" [drivers/net/ethernet/chelsio/cxgb4/cxgb4.ko] undefined!

Fixes: 0e395b3cb1fb ("cxgb4: add FLOWC based QoS offload")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-next-for-net-next-2019-11-08' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Some relatively small changes:
* typo fixes in docs
* APIs for station separation using VLAN tags rather
than separate wifi netdevs
* some preparations for upcoming features (802.3 offload
and airtime queue limits (AQL)
* stack reduction in ieee80211_assoc_success()
* use DEFINE_DEBUGFS_ATTRIBUTE in hwsim
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Use match_string() helper to simplify the code

match_string() returns the array index of a matching string.
Use it instead of the open-coded implementation.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: stmmac: Add support for syscfg clock

Add optional support for syscfg clock in dwmac-stm32.c
Now Syscfg clock is activated automatically when syscfg
registers are used

Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

cfg80211: VLAN offload support for set_key and set_sta_vlan

This provides an alternative mechanism for AP VLAN support where a
single netdev is used with VLAN tagged frames instead of separate
netdevs for each VLAN without tagged frames from the WLAN driver.

By setting NL80211_EXT_FEATURE_VLAN_OFFLOAD flag the driver indicates
support for a single netdev with VLAN tagged frames. Separate
VLAN-specific netdevs can be added using RTM_NEWLINK/IFLA_VLAN_ID
similarly to Ethernet. NL80211_CMD_NEW_KEY (for group keys),
NL80211_CMD_NEW_STATION, and NL80211_CMD_SET_STATION will optionally
specify vlan_id using NL80211_ATTR_VLAN_ID.

Signed-off-by: Gurumoorthi Gnanasambandhan <gguru@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20191031214640.5012-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: Shrink the size of ack_frame_id to make room for tx_time_est

To implement airtime queue limiting, we need to keep a running account of
the estimated airtime of all skbs queued into the device. Do to this
correctly, we need to store the airtime estimate into the skb so we can
decrease the outstanding balance when the skb is freed. This means that the
time estimate must be stored somewhere that will survive for the lifetime
of the skb.

To get this, decrease the size of the ack_frame_id field to 6 bits, and
lower the size of the ID space accordingly. This leaves 10 bits for use for
tx_time_est, which is enough to store a maximum of 4096 us, if we shift the
values so they become units of 4us.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/157182474063.150713.16132669599100802716.stgit@toke.dk
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: don't re-parse elems in ieee80211_assoc_success()

We've already parsed the same data in the caller, so we can
pass it. The only thing is that we might fill in more details
in ieee80211_assoc_success(), but that doesn't bother the
caller, so it's fine to do even when we share the parsed data.

This reduces the stack space usage of the call stack here,
Arnd reported it had grown above the 1024 byte warning limit.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20191028125240.cb7661671bd2.I757c8752bf4f2f35e54f5e0a2c0a9cd9216c3d8b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: move store skb ack code to its own function

This patch moves the code handling SKBTX_WIFI_STATUS inside the TX path
into an extra function. This allows us to reuse it inside the 802.11 encap
offloading datapath.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20191029091304.7330-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211_hwsim: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops

It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file
operation rather than DEFINE_SIMPLE_ATTRIBUTE.

It is detected with the help of coccinelle.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Link: https://lore.kernel.org/r/1572404462-45462-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

tipc: eliminate checking netns if node established

Currently, we scan over all network namespaces at each received
discovery message in order to check if the sending peer might be
present in a host local namespaces.

This is unnecessary since we can assume that a peer will not change its
location during an established session.

We now improve the condition for this testing so that we don't perform
any redundant scans.

Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add a READ_ONCE() in skb_peek_tail()

skb_peek_tail() can be used without protection of a lock,
as spotted by KCSAN [1]

In order to avoid load-stearing, add a READ_ONCE()

Note that the corresponding WRITE_ONCE() are already there.

[1]
BUG: KCSAN: data-race in sk_wait_data / skb_queue_tail

read to 0xffff8880b36a4118 of 8 bytes by task 20426 on cpu 1:
skb_peek_tail include/linux/skbuff.h:1784 [inline]
sk_wait_data+0x15b/0x250 net/core/sock.c:2477
kcm_wait_data+0x112/0x1f0 net/kcm/kcmsock.c:1103
kcm_recvmsg+0xac/0x320 net/kcm/kcmsock.c:1130
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
__sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
__do_sys_recvmmsg net/socket.c:2703 [inline]
__se_sys_recvmmsg net/socket.c:2696 [inline]
__x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

write to 0xffff8880b36a4118 of 8 bytes by task 451 on cpu 0:
__skb_insert include/linux/skbuff.h:1852 [inline]
__skb_queue_before include/linux/skbuff.h:1958 [inline]
__skb_queue_tail include/linux/skbuff.h:1991 [inline]
skb_queue_tail+0x7e/0xc0 net/core/skbuff.c:3145
kcm_queue_rcv_skb+0x202/0x310 net/kcm/kcmsock.c:206
kcm_rcv_strparser+0x74/0x4b0 net/kcm/kcmsock.c:370
__strp_recv+0x348/0xf50 net/strparser/strparser.c:309
strp_recv+0x84/0xa0 net/strparser/strparser.c:343
tcp_read_sock+0x174/0x5c0 net/ipv4/tcp.c:1639
strp_read_sock+0xd4/0x140 net/strparser/strparser.c:366
do_strp_work net/strparser/strparser.c:414 [inline]
strp_work+0x9a/0xe0 net/strparser/strparser.c:423
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 451 Comm: kworker/u4:3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: kstrp strp_work

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add annotations on hh->hh_len lockless accesses

KCSAN reported a data-race [1]

While we can use READ_ONCE() on the read sides,
we need to make sure hh->hh_len is written last.

[1]

BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output

write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0:
eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247
neigh_hh_init net/core/neighbour.c:1463 [inline]
neigh_resolve_output net/core/neighbour.c:1480 [inline]
neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
NF_HOOK include/linux/netfilter.h:305 [inline]
ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1:
neigh_resolve_output net/core/neighbour.c:1479 [inline]
neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
NF_HOOK include/linux/netfilter.h:305 [inline]
ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events rt6_probe_deferred

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'u64_stats_t'

Eric Dumazet says:

====================
net: introduce u64_stats_t

KCSAN found a data-race in per-cpu u64 stats accounting.

(The stack traces are included in the 8th patch :
tun: switch to u64_stats_t)

This patch series first consolidate code in five patches.
Then the last three patches address the data-race resolution.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: use u64_stats_t in struct pcpu_lstats

In order to fix the data-race found by KCSAN, we
can use the new u64_stats_t type and its accessors instead
of plain u64 fields. This will still generate optimal code
for both 32 and 64 bit platforms.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: switch to u64_stats_t

In order to fix this data-race found by KCSAN [1],
switch to u64_stats_t helpers. They provide all
the needed annotations, without adding extra cost.

[1]
BUG: KCSAN: data-race in tun_get_user / tun_net_get_stats64

read to 0xffffe8ffffd8aca8 of 8 bytes by task 4882 on cpu 0:
tun_net_get_stats64+0x9b/0x230 drivers/net/tun.c:1171
dev_get_stats+0x89/0x1e0 net/core/dev.c:9103
rtnl_fill_stats+0x56/0x370 net/core/rtnetlink.c:1177
rtnl_fill_ifinfo+0xd3b/0x2100 net/core/rtnetlink.c:1667
rtmsg_ifinfo_build_skb+0xb0/0x150 net/core/rtnetlink.c:3472
rtmsg_ifinfo_event.part.0+0x4e/0xb0 net/core/rtnetlink.c:3504
rtmsg_ifinfo_event net/core/rtnetlink.c:3515 [inline]
rtmsg_ifinfo+0x85/0x90 net/core/rtnetlink.c:3513
__dev_notify_flags+0x18b/0x200 net/core/dev.c:7649
dev_change_flags+0xb8/0xe0 net/core/dev.c:7691
dev_ifsioc+0x201/0x6a0 net/core/dev_ioctl.c:237
dev_ioctl+0x149/0x660 net/core/dev_ioctl.c:489
sock_do_ioctl+0xdb/0x230 net/socket.c:1061
sock_ioctl+0x3a3/0x5e0 net/socket.c:1189
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0x991/0xc60 fs/ioctl.c:696

write to 0xffffe8ffffd8aca8 of 8 bytes by task 4883 on cpu 1:
tun_get_user+0x1d94/0x2ba0 drivers/net/tun.c:2002
tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
call_write_iter include/linux/fs.h:1895 [inline]
new_sync_write+0x388/0x4a0 fs/read_write.c:483
__vfs_write+0xb1/0xc0 fs/read_write.c:496
__kernel_write+0xb8/0x240 fs/read_write.c:515
write_pipe_buf+0xb6/0xf0 fs/splice.c:794
splice_from_pipe_feed fs/splice.c:500 [inline]
__splice_from_pipe+0x248/0x480 fs/splice.c:624
splice_from_pipe+0xbb/0x100 fs/splice.c:659
default_file_splice_write+0x45/0x90 fs/splice.c:806
do_splice_from fs/splice.c:848 [inline]
direct_splice_actor+0xa0/0xc0 fs/splice.c:1020
splice_direct_to_actor+0x215/0x510 fs/splice.c:975
do_splice_direct+0x161/0x1e0 fs/splice.c:1063
do_sendfile+0x384/0x7f0 fs/read_write.c:1464

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4883 Comm: syz-executor.1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

u64_stats: provide u64_stats_t type

On 64bit arches, struct u64_stats_sync is empty and provides
no help against load/store tearing.

Using READ_ONCE()/WRITE_ONCE() would be needed.

But the update side would be slightly more expensive.

local64_t was defined so that we could use regular adds
in a manner which is atomic wrt IRQs.

However the u64_stats infra means we do not have to use
local64_t on 32bit arches since the syncp provides the needed
protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dummy: use standard dev_lstats_add() and dev_lstats_read()

This driver can simply use the common infrastructure instead
of duplicating it.

This cleanup will ease u64_stats_t adoption in a single location.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vsockmon: use standard dev_lstats_add() and dev_lstats_read()

This cleanup will ease u64_stats_t adoption in a single location.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

veth: use standard dev_lstats_add() and dev_lstats_read()

This cleanup will ease u64_stats_t adoption in a single location.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: nlmon: use standard dev_lstats_add() and dev_lstats_read()

No need to hand-code the exact same functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: provide dev_lstats_add() helper

Many network drivers need it and hand-coded the same function.

In order to ease u64_stats_t adoption, it is time to factorize.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: provide dev_lstats_read() helper

Many network drivers use hand-coded implementation of the same thing,
let's factorize things so that u64_stats_t adoption is done once.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-Demote-MTU-change-prints-to-debug'

Florian Fainelli says:

====================
net: Demote MTU change prints to debug

This patch series demotes several drivers that printed MTU change and
could therefore spam the kernel console if one has a test that it's all
about testing the values. Intel drivers were not also particularly
consistent in how they printed the same message, so now they are.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: Demote MTU change print to debug

Changing the MTU can be a frequent operation and it is already clear
when (or not) a MTU change is successful, demote prints to debug prints.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Timur Tabi <timur@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: intel: Demote MTU change prints to debug

Changing a network device MTU can be a fairly frequent operation, and
failure to change the MTU is reflected to user-space properly, both by
an appropriate message as well as by looking at whether the device's MTU
matches the configuration.

Demote the prints to debug prints by using netdev_dbg(), making all
Intel wired LAN drivers consistent, since they used a mixture of PCI
device and network device prints before.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: ti: cpts: use ktime_get_real_ns helper

Update on more short variant for getting real clock in ns.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'aquantia-next'

Igor Russkikh says:

====================
Aquantia Marvell atlantic driver updates 11-2019

Here is a bunch of atlantic driver new features and updates.

Shortlist:
- Me adding ethtool private flags for various loopback test modes,
- Nikita is doing some work here on power management, implementing new PM API,
He also did some checkpatch style cleanup of older driver parts.
- I'm also adding a new UDP GSO offload support and flags for loopback activation
- We are now Marvell, so I am changing email addresses on maintainers list.

v2: styling, ip6 correct handling in udpgso
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: change email domains to Marvell

Aquantia is now part of Marvell, eventually we'll cease standalone
aquantia.com domain. Thus, change the maintainers file and some other
references to @marvell.com domain

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: implement UDP GSO offload

atlantic hardware does support UDP hardware segmentation offload.
This allows user to specify one large contiguous buffer with data
which then will be split automagically into multiple UDP packets
of specified size.

Bulk sending of large UDP streams lowers CPU usage and increases
bandwidth.

We did estimations both with udpgso_bench_tx test tool and with modified
iperf3 measurement tool (4 streams, multithread, 200b packet size)
over AQC<->AQC 10G link. Flow control is disabled to prevent RX side
impact on measurements.

No UDP GSO:
iperf3 -c 10.0.1.2 -u -b0 -l 200 -P4 --multithread
UDP GSO:
iperf3 -c 10.0.1.2 -u -b0 -l 12600 --udp-lso 200 -P4 --multithread

Mode          CPU   iperf speed    Line speed   Packets per second
-------------------------------------------------------------
NO UDP GSO    350%   3.07 Gbps      3.8 Gbps     1,919,419
SW UDP GSO    200%   5.55 Gbps      6.4 Gbps     3,286,144
HW UDP GSO    90%    6.80 Gbps      8.4 Gbps     4,273,117

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: update flow control logic

We now differentiate requested and negotiated flow control
modes. Therefore `ethtool -A` now operates on local requested
FC values, and regular link settings shows the negotiated FC
settings.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: stylistic renames

We are trying to follow the naming of the chip (atlantic), not
company. So replace some old namings.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: code style cleanup

Thats a pure checkpatck walkthrough the code with no functional
changes. Reverse christmas tree, spacing, etc.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: loopback tests via private flags

Here we add a number of ethtool private flags
to allow enabling various loopbacks on HW.

Thats useful for verification and bringup works.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: add fw configuration memory area

Device FW has a separate memory area where various
config fields are stored and could be used by the
driver.

Here we modify download/upload infrastructure to
allow accessing this area.

Lateron this will be used to configure various behaviours

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: adding ethtool physical identification

`ethtool -p eth0` will blink leds helping identify
physical port.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: add msglevel configuration

We add ethtool msglevel configuration and change some
printouts to use netdev_info set of functions.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: refactoring pm logic

We now implement .driver.pm callbacks, these
allows driver to work correctly in hibernate
usecases, especially when used in conjunction with
WOL feature.

Before that driver only reacted to legacy .suspend/.resume
callbacks, that was a limitation in some cases.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: implement wake_phy feature

Wake on PHY allows to configure device to wakeup host
as soon as PHY link status is changed to active.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: update firmware interface

Here we improve FW interface structures layout
and prepare these for the wake phy feature implementation.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Add-layer-3-devlink-trap-support'

Ido Schimmel says:

====================
mlxsw: Add layer 3 devlink-trap support

This patch set from Amit adds support in mlxsw for layer 3 traps that
can report drops and exceptions via devlink-trap.

In a similar fashion to the existing layer 2 traps, these traps can send
packets to the CPU that were not routed as intended by the underlying
device.

The traps are divided between the two types detailed in devlink-trap
documentation: drops and exceptions. Unlike drops, packets received via
exception traps are also injected to the kernel's receive path, as they
are required for the correct functioning of the control plane. For
example, packets trapped due to TTL error must be injected to kernel's
receive path for traceroute to work properly.

Patch set overview:

Patch #1 adds the layer 3 drop traps to devlink along with their
documentation.

Patch #2 adds support for layer 3 drop traps in mlxsw.

Patches #3-#5 add selftests for layer 3 drop traps.

Patch #6 adds the layer 3 exception traps to devlink along with their
documentation.

Patches #7-#9 gradually add support for layer 3 exception traps in
mlxsw.

Patches #10-#12 add selftests for layer 3 exception traps.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: Add test cases for devlink-trap layer 3 exceptions

Test that each supported packet trap exception is triggered under the
right conditions.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: tc_common: Add hitting check

Add an option to check that packets hit the tc filter without providing
the exact number of packets that should hit it.

It is useful while sending many packets in background and checking that
at least one of them hit the tc filter.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: devlink: Add functionality for trap exceptions test

Add common part of all the tests - check devlink status to ensure that
packets were trapped.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Add layer 3 devlink-trap exceptions support

Add the trap IDs used to report layer 3 exceptions.

Trapped packets are first reported to devlink and then injected to the
kernel's receive path. All the packets have 'offload_fwd_mark' set in
order to prevent them from potentially being forwarded by the bridge
again.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Add specific trap for packets routed via invalid nexthops

Currently, mlxsw does not differentiate between these two cases of
routes with invalid nexthops:

1. Nexthops whose nexthop device is a mlxsw upper (has a RIF), but whose
neighbour could not be resolved

2. Nexthops whose nexthop device is not a mlxsw upper (e.g., management
interface)

Up until now this did not matter and mlxsw trapped packets for both
cases using the same trap ID. However, packets that should have been
routed in hardware (case 1), but incurred a problem are considered
exceptions and should be reported to the user. The two cases should
therefore be split between two different trap IDs.

Allocate a new adjacency entry during initialization and upon the
insertion of the first route with an invalid mlxsw nexthop, program this
entry to discard packets. Packets hitting this entry will be reported
using new trap ID - "DISCARD_ROUTER3".

In the future, the entry could be written during initialization, but
currently firmware requires a valid RIF, which is not available at this
stage.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Add new FIB entry type for reject routes

Currently, packets that cannot be routed in hardware (e.g., nexthop
device is not upper of mlxsw), are trapped to the kernel for forwarding.
Such packets are trapped using "RTR_INGRESS0" trap. This trap also traps
packets that hit reject routes (e.g., "unreachable") so that the kernel
will generate the appropriate ICMP error message for them.

Subsequent patch will need to only report to devlink packets that hit a
reject route, which is impossible as long as "RTR_INGRESS0" is
overloaded like that.

Solve this by using "RTR_INGRESS1" trap for packets that hit reject
routes.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: Add layer 3 generic packet exception traps

Add layer 3 generic packet exception traps that can report trapped
packets and documentation of the traps.

Unlike drop traps, these exception traps also need to inject the packet
to the kernel's receive path. For example, a packet that was trapped due
to unreachable neighbour need to be injected into the kernel so that it
will trigger an ARP request or a neighbour solicitation message.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: Add test cases for devlink-trap layer 3 drops

Test that each supported packet trap is triggered under the right
conditions and that packets are indeed dropped and not forwarded.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: devlink: Make devlink_trap_cleanup() more generic

Add proto parameter in order to enable the use of devlink_trap_cleanup()
in tests that use IPv6 protocol.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: devlink: Export functions to devlink library

l2_drops_test() is used to check that drop traps are functioning as
intended. Currently it is only used in the layer 2 test, but it is also
useful for the layer 3 test introduced in the subsequent patch.

l2_drops_cleanup() is used to clean configurations and kill mausezahn
proccess.

Export the functions to the common devlink library to allow it to be
re-used by future tests.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Add layer 3 devlink-trap support

Add the trap IDs and trap group used to report layer 3 drops. Register
layer 3 packet traps and associated layer 3 trap group with devlink
during driver initialization.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: Add layer 3 generic packet traps

Add packet traps that can report packets that were dropped during layer
3 forwarding.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: fix return value for enetc_ioctl()

Return -EOPNOTSUPP instead of -EINVAL if the requested ioctl is not
implemented.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Remove one extra ktime_get_ns() from cookie_init_timestamp

tcp_make_synack() already uses tcp_clock_ns(), and can pass
the value to cookie_init_timestamp() to avoid another call
to ktime_get_ns() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: Add source route tests to fib_tests

Add tests to verify routes with source address set are deleted when
source address is deleted.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

inetpeer: fix data-race in inet_putpeer / inet_putpeer

We need to explicitely forbid read/store tearing in inet_peer_gc()
and inet_putpeer().

The following syzbot report reminds us about inet_putpeer()
running without a lock held.

BUG: KCSAN: data-race in inet_putpeer / inet_putpeer

write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 0:
inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240
ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102
inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157
rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377
rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386
__do_softirq+0x115/0x33f kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0xbb/0xe0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1af/0x280 kernel/sched/idle.c:263

write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 1:
inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240
ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102
inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157
rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377
rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 4b9d9be839fd ("inetpeer: remove unused list")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: add missing dependency on CONFIG_REGULATOR

Compilation fails on PPC targets as CONFIG_REGULATOR is not set and
drivers/regulator/devres.c is not compiled in while functions exported
there are used by drivers/net/phy/at803x.c. Here's the error log:

LD .tmp_vmlinux1
drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_set_voltage_sel':
drivers/net/phy/at803x.c:294: undefined reference to `.rdev_get_drvdata'
drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_get_voltage_sel':
drivers/net/phy/at803x.c:306: undefined reference to `.rdev_get_drvdata'
drivers/net/phy/at803x.o: In function `at8031_register_regulators':
drivers/net/phy/at803x.c:359: undefined reference to `.devm_regulator_register'
drivers/net/phy/at803x.c:365: undefined reference to `.devm_regulator_register'
drivers/net/phy/at803x.o:(.data.rel+0x0): undefined reference to `regulator_list_voltage_table'
linux/Makefile:1074: recipe for target 'vmlinux' failed
make[1]: *** [vmlinux] Error 1

Fixes: 2f664823a470 ("net: phy: at803x: add device tree binding")
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: add ethtool MAC counters

When a DPNI is connected to a MAC, export its associated counters.
Ethtool related functions are added in dpaa2_mac for returning the
number of counters, their strings and also their values.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: ethtool: add wake-on-lan callbacks

If there is an external PHY, pass the wake-on-lan request to the PHY.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: add ioctl() support for PHY-related ops

If there is an attached PHY try to handle the requested ioctl with its
handler, which allows the userspace to access PHY registers, for
example. This will make mii-diag and similar tools work.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Fix error return code in mlxsw_sp_port_module_info_init()

Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 4a7f970f1240 ("mlxsw: spectrum: Replace port_to_module array with array of structs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-add-support-for-TC-MQPRIO-Qdisc-Offload'

Rahul Lakkireddy says:

====================
cxgb4: add support for TC-MQPRIO Qdisc Offload

This series of patches add support for offloading TC-MQPRIO Qdisc
to Chelsio T5/T6 NICs. Offloading QoS traffic shaping and pacing
requires using Ethernet Offload (ETHOFLD) resources available on
Chelsio NICs. The ETHOFLD resources are configured by firmware
and taken from the resource pool shared with other Chelsio Upper
Layer Drivers. Traffic flowing through ETHOFLD region requires a
software netdev Tx queue (EOSW_TXQ) exposed to networking stack,
and an underlying hardware Tx queue (EOHW_TXQ) used for sending
packets through hardware.

ETHOFLD region is addressed using EOTIDs, which are per-connection
resource. Hence, EOTIDs are capable of storing only a very small
number of packets in flight. To allow more connections to share
the the QoS rate limiting configuration, multiple EOTIDs must be
allocated to reduce packet drops. EOTIDs are 1-to-1 mapped with
software EOSW_TXQ. Several software EOSW_TXQs can post packets to
a single hardware EOHW_TXQ.

The series is broken down as follows:

Patch 1 queries firmware for maximum available traffic classes,
as well as, start and maximum available indices (EOTID) into ETHOFLD
region, supported by the underlying device.

Patch 2 reworks queue configuration and simplifies MSI-X allocation
logic in preparation for ETHOFLD queues support.

Patch 3 adds skeleton for validating and configuring TC-MQPRIO Qdisc
offload. Also, adds support for software EOSW_TXQs and exposes them
to network stack. Updates Tx queue selection to use fallback NIC Tx
path for unsupported traffic that can't go through ETHOFLD queues.

Patch 4 adds support for managing hardware queues to rate limit
traffic flowing through them. The queues are allocated/removed based
on enabling/disabling TC-MQPRIO Qdisc offload, respectively.

Patch 5 adds Tx path for traffic flowing through software EOSW_TXQ
and EOHW_TXQ. Also, adds Rx path to handle Tx completions.

Patch 6 updates exisiting SCHED API to configure FLOWC based QoS
offload. In the existing QUEUE based rate limiting, multiple queues
sharing a traffic class get the aggreagated max rate limit value.
On the other hand, in FLOWC based rate limiting, multiple queues
sharing a traffic class get their own individual max rate limit
value. For example, if 2 queues are bound to class 0, which is rate
limited to 1 Gbps, then in QUEUE based rate limiting, both the
queues get the aggregate max output of 1 Gbps only. In FLOWC based
rate limiting, each queue gets its own output of max 1 Gbps each;
i.e. 2 queues * 1 Gbps rate limit = 2 Gbps max output.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add FLOWC based QoS offload

Rework SCHED API to allow offloading TC-MQPRIO QoS configuration.
The existing QUEUE based rate limiting throttles all queues sharing
a traffic class, to the specified max rate limit value. So, if
multiple queues share a traffic class, then all the queues get
the aggregate specified max rate limit.

So, introduce the new FLOWC based rate limiting, where multiple
queues can share a traffic class with each queue getting its own
individual specified max rate limit.

For example, if 2 queues are bound to class 0, which is rate limited
to 1 Gbps, then 2 queues using QUEUE based rate limiting, get the
aggregate output of 1 Gbps only. In FLOWC based rate limiting, each
queue gets its own output of max 1 Gbps each; i.e. 2 queues * 1 Gbps
rate limit = 2 Gbps.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add Tx and Rx path for ETHOFLD traffic

Implement Tx path for traffic flowing through software EOSW_TXQ
and EOHW_TXQ. Since multiple EOSW_TXQ can post packets to a single
EOHW_TXQ, protect the hardware queue with necessary spinlock. Also,
move common code used to generate TSO work request to a common
function.

Implement Rx path to handle Tx completions for successfully
transmitted packets.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add ETHOFLD hardware queue support

Add support for configuring and managing ETHOFLD hardware queues.
Keep the queue count and MSI-X allocation scheme same as NIC queues.
ETHOFLD hardware queues are dynamically allocated/destroyed as
TC-MQPRIO Qdisc offload is enabled/disabled on the corresponding
interface, respectively.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: parse and configure TC-MQPRIO offload

Add logic for validation and configuration of TC-MQPRIO Qdisc
offload. Also, add support to manage EOSW_TXQ, which have 1-to-1
mapping with EOTIDs, and expose them to network stack.

Move common skb validation in Tx path to a separate function and
add minimal Tx path for ETHOFLD. Update Tx queue selection to return
normal NIC Txq to send traffic pattern that can't go through ETHOFLD
Tx path.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: rework queue config and MSI-X allocation

Simplify queue configuration and MSI-X allocation logic. Use a single
MSI-X information table for both NIC and ULDs. Remove hard-coded
MSI-X indices for firmware event queue and non data interrupts.
Instead, use the MSI-X bitmap to obtain a free MSI-X index
dynamically. Save each Rxq's index into the MSI-X information table,
within the Rxq structures themselves, for easier cleanup.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: query firmware for QoS offload resources

QoS offload needs Ethernet Offload (ETHOFLD) resources present in the
NIC. These resources are shared with other ULDs. So, query firmware
for the available number of traffic classes, as well as, start and
end indices (EOTID) of the ETHOFLD region.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: gen_estimator: extend packet counter to 64bit

I forgot to change last_packets field in struct net_rate_estimator.

Without this fix, rate estimators would misbehave after more
than 2^32 packets have been sent.

Another solution would be to be careful and only use the
32 least significant bits of packets counters, but we have
a hole in net_rate_estimator structure and this looks
easier to read/maintain.

Fixes: d0083d98f685 ("net_sched: extend packet counter to 64bit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-ptp: fix compile error

phylink_set_port_modes will be compiled if CONFIG_PHYLINK enabled,
dpaa2_mac_validate will be compiled if CONFIG_FSL_DPAA2_ETH enabled,
it should select CONFIG_PHYLINK when dpaa2_mac_validate call
phylink_set_port_modes

drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_validate':
dpaa2-mac.c:(.text+0x3a1): undefined reference to `phylink_set_port_modes'
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_connect':
dpaa2-mac.c:(.text+0x91a): undefined reference to `phylink_create'
dpaa2-mac.c:(.text+0x94e): undefined reference to `phylink_of_phy_connect'
dpaa2-mac.c:(.text+0x97f): undefined reference to `phylink_destroy'
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_disconnect':
dpaa2-mac.c:(.text+0xa9f): undefined reference to `phylink_disconnect_phy'
dpaa2-mac.c:(.text+0xab0): undefined reference to `phylink_destroy'
make: *** [vmlinux] Error 1

Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-11-06

This series contains updates to ice driver only.

Scott adds ethtool -m support so that we can read eeprom data on SFP/OSFP
modules.

Anirudh updates the return value to properly reflect when SRIOV is not
supported.

Md Fahad updates the driver to handle a change in the NVM, where the
boot configuration section was moved to the Preserved Field Area (PFA)
of the NVM.

Paul resolves an issue when DCBx requests non-contiguous TCs, transmit
hangs could occur, so configure a default traffic class (TC0) in these
cases to prevent traffic hangs.  Adds a print statement to notify the
user when unsupported modules are inserted.

Bruce fixes up the driver unload code flow to ensure we do not clear the
interrupt scheme until the reset is complete, otherwise a hardware error
may occur.

Dave updates the DCB initialization to set is_sw_lldp boolean when the
firmware has been detected to be in an untenable state.  This will
ensure that the firmware is in a known state.

Michal saves off the PCI state and I/O BARs address after PCI bus reset
so that after the reset, device registers can be read.  Also adds a NULL
pointer check to prevent a potential kernel panic.

Mitch resolves an issue where VF's on PF's other than 0 were not seeing
resets by using the per-PF VF ID instead of the absolute VF ID.

Krzysztof does some code cleanup to remove a unneeded wrapper and
reduces the code complexity.

Brett reduces confusion by changing the name of ice_vc_dis_vf() to
ice_vc_reset_vf() to better describe what the function is actually
doing.

v2: dropped patch 3 "ice: Add support for FW recovery mode detection"
    from the origin al series, while Ani makes changes based on
    community feedback to implement devlink into the changes.
v3: dropped patch 1 "ice: implement set_eeprom functionality" due to a
    bug found and additional changes will be needed when Ani implements
    devlink in the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv8e6xxx: Fix stub function parameters

mv88e6xxx_g2_atu_stats_get() takes two parameters. Make the stub
function also take two, otherwise we get compile errors.

Fixes: c5f299d59261 ("net: dsa: mv88e6xxx: global1_atu: Add helper for get next")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-at803x-device-tree-binding'

Michael Walle says:

====================
net: phy: at803x device tree binding

Adds a device tree binding to configure the clock and the RGMII voltage.

Changes since v1:
- rebased to latest net-next
- renamed "Atheros" to "Qualcomm Atheros"
- add a new patch to remove config_init() from AR9331

Changes since the RFC:
- renamed the Kconfig entry to "Qualcomm Atheros.." and reordered the
   item
- renamed the prefix from atheros to qca
- use the correct name AR803x (instead of AT803x) in new files and
   dt-bindings.
- listed the PHY maintainers in the new schema. Hopefully, thats ok.
- fixed a typo in the bindings schema
- run dtb_checks and dt_binding_check and fixed the schema
- dropped the rgmii-io-1v8 property; instead provide two regulators vddh
   and vddio, add one consumer vddio-supply
- fix the clock settings for the AR8030/AR8035
- only the AR8031 supports chaning the LDO and the PLL mode in software.
   Check if we have the correct PHY.
- new patch to mention the AR8033 which is the same as the AR8031 just
   without PTP support
- new patch which corrects any displayed PHY names and comments. Be
   consistent.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: remove config_init for AR9331

According to its datasheet, the internal PHY doesn't have debug
registers nor MMDs. Since config_init() only configures delays and
clocks and so on in these registers it won't be needed on this PHY.
Remove it.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: fix the PHY names

Fix at least the displayed strings. The actual name of the chip is
AR803x.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>