OSDN Git Service

uclinux-h8/linux.git
5 years agonet: stmmac: drop the reset delays from struct stmmac_mdio_bus_data
Martin Blumenstingl [Sat, 15 Jun 2019 10:09:31 +0000 (12:09 +0200)]
net: stmmac: drop the reset delays from struct stmmac_mdio_bus_data

Only OF platforms use the reset delays and these delays are only read in
stmmac_mdio_reset(). Move them from struct stmmac_mdio_bus_data to a
stack variable inside stmmac_mdio_reset() because that's the only usage
of these delays.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: drop the reset GPIO from struct stmmac_mdio_bus_data
Martin Blumenstingl [Sat, 15 Jun 2019 10:09:30 +0000 (12:09 +0200)]
net: stmmac: drop the reset GPIO from struct stmmac_mdio_bus_data

No platform uses the "reset_gpio" field from stmmac_mdio_bus_data
anymore. Drop it so we don't get any new consumers either.

Plain GPIO numbers are being deprecated in favor of GPIO descriptors. If
needed any new non-OF platform can add a GPIO descriptor lookup table.
devm_gpiod_get_optional() will find the GPIO in that case.

Suggested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: use device_property_read_u32_array to read the reset delays
Martin Blumenstingl [Sat, 15 Jun 2019 10:09:29 +0000 (12:09 +0200)]
net: stmmac: use device_property_read_u32_array to read the reset delays

Change stmmac_mdio_reset() to use device_property_read_u32_array()
instead of of_property_read_u32_array().

This is meant as a cleanup because we can drop the struct device_node
variable. Also it will make it easier to get rid of struct
stmmac_mdio_bus_data (or at least make it private) in the future because
non-OF platforms can now pass the reset delays as device properties.

No functional changes (neither for OF platforms nor for ones that are
not using OF, because the modified code is still contained in an "if
(priv->device->of_node)").

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: drop redundant check in stmmac_mdio_reset
Martin Blumenstingl [Sat, 15 Jun 2019 10:09:28 +0000 (12:09 +0200)]
net: stmmac: drop redundant check in stmmac_mdio_reset

A simplified version of the existing code looks like this:
  if (priv->device->of_node) {
      struct device_node *np = priv->device->of_node;
      if (!np)
          return 0;

The second "if" never evaluates to true because the first "if" checks
for exactly the opposite.
Drop the redundant check and early return to make the code easier to
understand.

No functional changes intended.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: remove NET_CLS_IND config option
Jiri Pirko [Sat, 15 Jun 2019 09:03:49 +0000 (11:03 +0200)]
net: sched: remove NET_CLS_IND config option

This config option makes only couple of lines optional.
Two small helpers and an int in couple of cls structs.

Remove the config option and always compile this in.
This saves the user from unexpected surprises when he adds
a filter with ingress device match which is silently ignored
in case the config option is not set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: improve handling of Abit Fatal1ty F-190HD
Heiner Kallweit [Sat, 15 Jun 2019 07:58:21 +0000 (09:58 +0200)]
r8169: improve handling of Abit Fatal1ty F-190HD

The Abit Fatal1ty F-190HD has a PCI ID quirk and the entry marks this
board as not GBit-capable, what is wrong. According to [0] the board
has a RTL8111B that is GBit-capable, therefore remove the
RTL_CFG_NO_GBIT flag.

[0] https://www.centos.org/forums/viewtopic.php?t=23390

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: Fix wrapper drivers not detecting PHY
Jose Abreu [Fri, 14 Jun 2019 15:06:57 +0000 (17:06 +0200)]
net: stmmac: Fix wrapper drivers not detecting PHY

Because of PHYLINK conversion we stopped parsing the phy-handle property
from DT. Unfortunatelly, some wrapper drivers still rely on this phy
node to configure the PHY.

Let's restore the parsing of PHY handle while these wrapper drivers are
not fully converted to PHYLINK.

Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Reuse-ptp_qoriq-driver-for-dpaa2-ptp'
David S. Miller [Sat, 15 Jun 2019 20:43:07 +0000 (13:43 -0700)]
Merge branch 'Reuse-ptp_qoriq-driver-for-dpaa2-ptp'

Yangbo Lu says:

====================
Reuse ptp_qoriq driver for dpaa2-ptp

Although dpaa2-ptp.c driver is a fsl_mc_driver which
is using MC APIs for register accessing, it's same IP
block with eTSEC/DPAA/ENETC 1588 timer.
This patch-set is to convert to reuse ptp_qoriq driver by
using register ioremap and dropping related MC APIs.
However the interrupts could only be handled by MC which
fires MSIs to ARM cores. So the interrupt enabling and
handling still rely on MC APIs. MC APIs for interrupt
and PPS event support are also added by this patch-set.

---
Changes for v2:
- Allowed to compile with COMPILE_TEST.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMAINTAINERS: maintain DPAA2 PTP driver in QorIQ PTP entry
Yangbo Lu [Fri, 14 Jun 2019 10:40:55 +0000 (18:40 +0800)]
MAINTAINERS: maintain DPAA2 PTP driver in QorIQ PTP entry

Maintain DPAA2 PTP driver in QorIQ PTP entry.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa2-ptp: add interrupt support
Yangbo Lu [Fri, 14 Jun 2019 10:40:54 +0000 (18:40 +0800)]
dpaa2-ptp: add interrupt support

This patch is to add interrupt support for dpaa2 ptp clock,
including MC APIs and PPS interrupt support. Other events
haven't been supported in MC by now.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoarm64: dts: fsl: add ptp timer node for dpaa2 platforms
Yangbo Lu [Fri, 14 Jun 2019 10:40:53 +0000 (18:40 +0800)]
arm64: dts: fsl: add ptp timer node for dpaa2 platforms

This patch is to add ptp timer device tree node for dpaa2
platforms(ls1088a/ls208xa/lx2160a).

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-binding: ptp_qoriq: support DPAA2 PTP compatible
Yangbo Lu [Fri, 14 Jun 2019 10:40:52 +0000 (18:40 +0800)]
dt-binding: ptp_qoriq: support DPAA2 PTP compatible

Add a new compatible for DPAA2 PTP.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa2-ptp: reuse ptp_qoriq driver
Yangbo Lu [Fri, 14 Jun 2019 10:40:51 +0000 (18:40 +0800)]
dpaa2-ptp: reuse ptp_qoriq driver

Although dpaa2-ptp.c driver is a fsl_mc_driver which
is using MC APIs for register accessing, it's same IP
block with eTSEC/DPAA/ENETC 1588 timer.
This patch is to convert to reuse ptp_qoriq driver by
using register ioremap and dropping related MC APIs.
However the interrupts could only be handled by MC which
fires MSIs to ARM cores. So the interrupt enabling and
handling still rely on MC APIs.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoptp: add QorIQ PTP support for DPAA2
Yangbo Lu [Fri, 14 Jun 2019 10:40:50 +0000 (18:40 +0800)]
ptp: add QorIQ PTP support for DPAA2

This patch is to add QorIQ PTP support for DPAA2.
Although dpaa2-ptp.c driver is a fsl_mc_driver which
is using MC APIs for register accessing, it's same
IP block with eTSEC/DPAA/ENETC 1588 timer. We will
convert to reuse ptp_qoriq driver by using register
ioremap and dropping related MC APIs.
Also allow to compile ptp_qoriq with COMPILE_TEST.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohinic: Use devm_kasprintf instead of hard coding it
Christophe JAILLET [Thu, 13 Jun 2019 19:54:12 +0000 (21:54 +0200)]
hinic: Use devm_kasprintf instead of hard coding it

'devm_kasprintf' is less verbose than:
   snprintf(NULL, 0, ...);
   devm_kzalloc(...);
   sprintf
so use it instead.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "net: dsa: mv88e6xxx: do not flood CPU with unknown multicast"
David S. Miller [Sat, 15 Jun 2019 20:35:29 +0000 (13:35 -0700)]
Revert "net: dsa: mv88e6xxx: do not flood CPU with unknown multicast"

This reverts commit 422efd032775757c41e9579facd9656a87bf4f00.

It breaks ipv6.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: do not flood CPU with unknown multicast
Vivien Didelot [Wed, 12 Jun 2019 22:33:44 +0000 (18:33 -0400)]
net: dsa: mv88e6xxx: do not flood CPU with unknown multicast

The DSA ports must flood unknown unicast and multicast, but the switch
must not flood the CPU ports with unknown multicast, as this results
in a lot of undesirable traffic that the network stack needs to filter
in software.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-dsa-use-switchdev-attr-and-obj-handlers'
David S. Miller [Sat, 15 Jun 2019 03:20:07 +0000 (20:20 -0700)]
Merge branch 'net-dsa-use-switchdev-attr-and-obj-handlers'

Vivien Didelot says:

====================
net: dsa: use switchdev attr and obj handlers

This series reduces boilerplate in the handling of switchdev attribute and
object operations by using the switchdev_handle_* helpers, which check the
targeted devices and recurse into their lower devices.

This also brings back the ability to inspect operations targeting the bridge
device itself (where .orig_dev and .dev were originally the bridge device),
even though that is of no use yet and skipped by this series.

Changes in v2: Only VLAN and (non-host) MDB objects not directly targeting
the slave device are unsupported at the moment, so only skip these cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: use switchdev handle helpers
Vivien Didelot [Fri, 14 Jun 2019 17:49:22 +0000 (13:49 -0400)]
net: dsa: use switchdev handle helpers

Get rid of the dsa_slave_switchdev_port_{attr_set,obj}_event functions
in favor of the switchdev_handle_port_{attr_set,obj_add,obj_del}
helpers which recurse into the lower devices of the target interface.

This has the benefit of being aware of the operations made on the
bridge device itself, where orig_dev is the bridge, and dev is the
slave. This can be used later to configure the hardware switches.

Only VLAN and (port) MDB objects not directly targeting the slave
device are unsupported at the moment, so skip this case in their
respective case statements.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: make dsa_slave_dev_check use const
Vivien Didelot [Fri, 14 Jun 2019 17:49:21 +0000 (13:49 -0400)]
net: dsa: make dsa_slave_dev_check use const

The switchdev handle helpers make use of a device checking helper
requiring a const net_device. Make dsa_slave_dev_check compliant
to this.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: make cpu_dp non const
Vivien Didelot [Fri, 14 Jun 2019 17:49:20 +0000 (13:49 -0400)]
net: dsa: make cpu_dp non const

A port may trigger operations on its dedicated CPU port, so using
cpu_dp as const will raise warnings. Make cpu_dp non const.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: do not check orig_dev in vlan del
Vivien Didelot [Fri, 14 Jun 2019 17:49:19 +0000 (13:49 -0400)]
net: dsa: do not check orig_dev in vlan del

The current DSA code handling switchdev objects does not recurse into
the lower devices thus is never called with an orig_dev member being
a bridge device, hence remove this useless check.

At the same time, remove the comments about the callers, which is
unlikely to be updated if the code changes and thus will be confusing.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Sat, 15 Jun 2019 02:53:10 +0000 (19:53 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2019-06-14

This series contains updates to i40e only.

Aleksandr adds stub functions for Energy Efficient Ethernet (EEE) to
currently report that it is not supported in i40e.  Fixed up the Link
Layer Detection Protocol (LLDP) code to ensure we do not set the LLDP
flag too early before we ensure that we have a successful start.  This
also will prevent needles restarting of the device if LLDP did not
change its state with an unsuccessful start.

Piotr bumps up the amount of VLANs that an untrusted VF can implement,
from 8 VLANs to 16.  Adds checks to the Virtual Embedded Bridge (VEB)
and channel arrays so access does not exceed the boundary and ensure the
index is below the maximum.  Fixed an issue in the driver where we were
not checking the response from the LLDP flag and were returned success
no matter what the value of the response was.

Mitch fixes a variable counter, which can be negative in value so make
it an integer instead of an unsigned-integer.

Doug improves the admin queue log granularity by making it possible to
log only the admin queue descriptors without the entire admin queue
message buffers.

Sergey fixes up the virtchnl code by removing duplicate checks, ensure
the variable type is correct when comparing integers, enhance error and
warning messages to include useful information.

Adam fixes a potential kernel panic when the i40e driver was being bound
to a non-i40e port by adding a check on the BAR size to ensure it is
large enough by reading the highest register.

Jake fixes a statistics error in the "transmit errors" stat, which was
being calculated twice.

Gustavo A. R. Silva adds a fall-through code comment to help with
compiler checks.

v2: Fixed the return values wrapped in parenthesis in patch 8 and
    cleaned up the commit message in patch 12 so the Gustavo does
    not repeat himself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: Remove unused variable/function (exact_dif)
Tim Beale [Fri, 14 Jun 2019 04:41:27 +0000 (16:41 +1200)]
udp: Remove unused variable/function (exact_dif)

This was originally passed through to the VRF logic in compute_score().
But that logic has now been replaced by udp_sk_bound_dev_eq() and so
this code is no longer used or needed.

Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: Remove unused parameter (exact_dif)
Tim Beale [Fri, 14 Jun 2019 04:41:26 +0000 (16:41 +1200)]
udp: Remove unused parameter (exact_dif)

Originally this was used by the VRF logic in compute_score(), but that
was later replaced by udp_sk_bound_dev_eq() and the parameter became
unused.

Note this change adds an 'unused variable' compiler warning that will be
removed in the next patch (I've split the removal in two to make review
slightly easier).

Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: tcp: fix ACK/RST sent with a transmit delay
Eric Dumazet [Fri, 14 Jun 2019 04:22:35 +0000 (21:22 -0700)]
ipv4: tcp: fix ACK/RST sent with a transmit delay

If we want to set a EDT time for the skb we want to send
via ip_send_unicast_reply(), we have to pass a new parameter
and initialize ipc.sockc.transmit_time with it.

This fixes the EDT time for ACK/RST packets sent on behalf of
a TIME_WAIT socket.

Fixes: a842fe1425cb ("tcp: add optional per socket transmit delay")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: remove empty netlink_tap_exit_net
Li RongQing [Fri, 14 Jun 2019 01:29:09 +0000 (09:29 +0800)]
net: remove empty netlink_tap_exit_net

Pointer members of an object with static storage duration, if not
explicitly initialized, will be initialized to a NULL pointer. The
net namespace API checks if this pointer is not NULL before using it,
it are safe to remove the function.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'nfp-flower-loosen-L4-checks-and-add-extack-to-flower-offload'
David S. Miller [Sat, 15 Jun 2019 02:48:58 +0000 (19:48 -0700)]
Merge branch 'nfp-flower-loosen-L4-checks-and-add-extack-to-flower-offload'

Jakub Kicinski says:

====================
nfp: flower: loosen L4 checks and add extack to flower offload

Pieter says:

This set allows the offload of filters that make use of an unknown
ip protocol, given that layer 4 is being wildcarded. The set then
aims to make use of extack messaging for flower offloads. It adds
about 70 extack messages to the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: flower: extend extack messaging for flower match and actions
Pieter Jansen van Vuuren [Thu, 13 Jun 2019 21:17:11 +0000 (14:17 -0700)]
nfp: flower: extend extack messaging for flower match and actions

Use extack messages in flower offload when compiling match and actions
messages that will configure hardware.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: flower: use extack messages in flower offload
Pieter Jansen van Vuuren [Thu, 13 Jun 2019 21:17:10 +0000 (14:17 -0700)]
nfp: flower: use extack messages in flower offload

Use extack messages in flower offload, specifically focusing on
the extack use in add offload, remove offload and get stats paths.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: flower: check L4 matches on unknown IP protocols
Pieter Jansen van Vuuren [Thu, 13 Jun 2019 21:17:09 +0000 (14:17 -0700)]
nfp: flower: check L4 matches on unknown IP protocols

Matching on fields with a protocol that is unknown to hardware
is not strictly unsupported. Determine if hardware can offload
a filter with an unknown protocol by checking if any L4 fields
are being matched as well.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-updates-2019-06-13' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 15 Jun 2019 02:44:29 +0000 (19:44 -0700)]
Merge tag 'mlx5-updates-2019-06-13' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-06-13

Mlx5 devlink health fw reporters and sw reset support

This series provides mlx5 firmware reset support and firmware devlink health
reporters.

1) Add initial mlx5 kernel documentation and include devlink health reporters

2) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot

3) Issue software reset upon FW asserts

4) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by
dump and recover procedures and enable trigger these functionality by user.

4.1) fw reporter:
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it and any other fw trace into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.

4.2) fw_fatal repoter:
The fw_fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors. The
CR-space dump is stored as a memory region snapshot to ease read by address.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: Support multipath hashing on inner IP pkts for GRE tunnel
Stephen Suryaputra [Thu, 13 Jun 2019 18:38:58 +0000 (14:38 -0400)]
ipv4: Support multipath hashing on inner IP pkts for GRE tunnel

Multipath hash policy value of 0 isn't distributing since the outer IP
dest and src aren't varied eventhough the inner ones are. Since the flow
is on the inner ones in the case of tunneled traffic, hashing on them is
desired.

This is done mainly for IP over GRE, hence only tested for that. But
anything else supported by flow dissection should work.

v2: Use skb_flow_dissect_flow_keys() directly so that other tunneling
    can be supported through flow dissection (per Nikolay Aleksandrov).
v3: Remove accidental inclusion of ports in the hash keys and clarify
    the documentation (Nikolay Alexandrov).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovirtio_net: enable napi_tx by default
Willem de Bruijn [Thu, 13 Jun 2019 16:24:57 +0000 (12:24 -0400)]
virtio_net: enable napi_tx by default

NAPI tx mode improves TCP behavior by enabling TCP small queues (TSQ).
TSQ reduces queuing ("bufferbloat") and burstiness.

Previous measurements have shown significant improvement for
TCP_STREAM style workloads. Such as those in commit 86a5df1495cc
("Merge branch 'virtio-net-tx-napi'").

There has been uncertainty about smaller possible regressions in
latency due to increased reliance on tx interrupts.

The above results did not show that, nor did I observe this when
rerunning TCP_RR on Linux 5.1 this week on a pair of guests in the
same rack. This may be subject to other settings, notably interrupt
coalescing.

In the unlikely case of regression, we have landed a credible runtime
solution. Ethtool can configure it with -C tx-frames [0|1] as of
commit 0c465be183c7 ("virtio_net: ethtool tx napi configuration").

NAPI tx mode has been the default in Google Container-Optimized OS
(COS) for over half a year, as of release M70 in October 2018,
without any negative reports.

Link: https://marc.info/?l=linux-netdev&m=149305618416472
Link: https://lwn.net/Articles/507065/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: ingress: set 'unlocked' flag for clsact Qdisc ops
Vlad Buslov [Thu, 13 Jun 2019 16:12:05 +0000 (19:12 +0300)]
net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops

To remove rtnl lock dependency in tc filter update API when using clsact
Qdisc, set QDISC_CLASS_OPS_DOIT_UNLOCKED flag in clsact Qdisc_class_ops.

Clsact Qdisc ops don't require any modifications to be used without rtnl
lock on tc filter update path. Implementation never changes its q->block
and only releases it when Qdisc is being destroyed. This means it is enough
for RTM_{NEWTFILTER|DELTFILTER|GETTFILTER} message handlers to hold clsact
Qdisc reference while using it without relying on rtnl lock protection.
Unlocked Qdisc ops support is already implemented in filter update path by
unlocked cls API patch set.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'enable-and-use-static_branch_deferred_inc'
David S. Miller [Sat, 15 Jun 2019 02:31:48 +0000 (19:31 -0700)]
Merge branch 'enable-and-use-static_branch_deferred_inc'

Willem de Bruijn says:

====================
enable and use static_branch_deferred_inc

1. make static_branch_deferred_inc available if !CONFIG_JUMP_LABEL
2. convert the existing STATIC_KEY_DEFERRED_FALSE user to this api
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: use static_branch_deferred_inc for clean_acked_data_enabled
Willem de Bruijn [Thu, 13 Jun 2019 15:08:16 +0000 (11:08 -0400)]
tcp: use static_branch_deferred_inc for clean_acked_data_enabled

Deferred static key clean_acked_data_enabled uses the deferred
variants of dec and flush. Do the same for inc.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agolocking/static_key: always define static_branch_deferred_inc
Willem de Bruijn [Thu, 13 Jun 2019 15:08:15 +0000 (11:08 -0400)]
locking/static_key: always define static_branch_deferred_inc

This interface is currently only defined if CONFIG_JUMP_LABEL. Make it
available also when jump labels are off.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Sat, 15 Jun 2019 02:26:16 +0000 (19:26 -0700)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
net: hns3: some code optimizations & cleanups & bugfixes

This patch-set includes code optimizations, cleanups and bugfixes for
the HNS3 ethernet controller driver.

[patch 1/12 - 6/12] adds some code optimizations and bugfixes about RAS
and MSI-X HW error.

[patch 7/12] fixes a loading issue.

[patch 8/12 - 11/12] adds some bugfixes.

[patch 12/12] adds some cleanups, which does not change the logic of code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: some variable modification
Weihang Li [Thu, 13 Jun 2019 09:12:32 +0000 (17:12 +0800)]
net: hns3: some variable modification

This patch does following things:
1. add the keyword const before some variables which won't be modified
   in functions.
2. changes some variables from signed to unsigned to avoid bitwise
   operation on signed variables.
3. adds or removes initialization of some variables.
4. defines a new structure to help parsing mailbox messages instead of
   using an array which is harder to get the meaning of each element.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: delay ring buffer clearing during reset
Yunsheng Lin [Thu, 13 Jun 2019 09:12:31 +0000 (17:12 +0800)]
net: hns3: delay ring buffer clearing during reset

The driver may not be able to disable the ring through firmware
when downing the netdev during reset process, which may cause
hardware accessing freed buffer problem.

This patch delays the ring buffer clearing to reset uninit
process because hardware will not access the ring buffer after
hardware reset is completed.

Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix for skb leak when doing selftest
Yunsheng Lin [Thu, 13 Jun 2019 09:12:30 +0000 (17:12 +0800)]
net: hns3: fix for skb leak when doing selftest

If hns3_nic_net_xmit does not return NETDEV_TX_BUSY when doing
a loopback selftest, the skb is not freed in hns3_clean_tx_ring
or hns3_nic_net_xmit, which causes skb not freed problem.

This patch fixes it by freeing skb when hns3_nic_net_xmit does
not return NETDEV_TX_OK.

Fixes: c39c4d98dc65 ("net: hns3: Add mac loopback selftest support in hns3 driver")

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix for dereferencing before null checking
Yunsheng Lin [Thu, 13 Jun 2019 09:12:29 +0000 (17:12 +0800)]
net: hns3: fix for dereferencing before null checking

The netdev is dereferenced before null checking in the function
hns3_setup_tc.

This patch moves the dereferencing after the null checking.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: free irq when exit from abnormal branch
Yonglong Liu [Thu, 13 Jun 2019 09:12:28 +0000 (17:12 +0800)]
net: hns3: free irq when exit from abnormal branch

In hns3_nic_init_irq(), if request irq fail at index i,
the function return directly without releasing irq resources
that already requested, and nowhere else will release them.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: clear restting state when initializing HW device
Peng Li [Thu, 13 Jun 2019 09:12:27 +0000 (17:12 +0800)]
net: hns3: clear restting state when initializing HW device

IMP will set restting state for all function when PF FLR, driver
just clear the restting state in resetting progress, but don't do
it in initializing progress. As FLR is not created by driver,
it is necessary to clear restting state when initializing HW device.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: extract handling of mpf/pf msi-x errors into functions
Weihang Li [Thu, 13 Jun 2019 09:12:26 +0000 (17:12 +0800)]
net: hns3: extract handling of mpf/pf msi-x errors into functions

Function hclge_handle_all_hw_msix_error() contains four parts:
1. Query buffer descriptors for MSI-X errors.
2. Query and clear all main PF MSI-X errors.
3. Query and clear all PF MSI-X errors.
4. Handle mac tunnel interrupts.
Part 2 and part 3 handle errors of some different modules respectively,
this patch extracts them into dividual functions, which makes the logic
clearer.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: some changes of MSI-X bits in PPU(RCB)
Weihang Li [Thu, 13 Jun 2019 09:12:25 +0000 (17:12 +0800)]
net: hns3: some changes of MSI-X bits in PPU(RCB)

This patch modifies print message of rx_q_search_miss from error to dfx to
prevent misleading users, because this interrupt may occur if we receive
packets during initialization of HNS3 driver.
Otherwise, this patch masks 28th bit of PPU_MPF_ABNORMAL_SRC2 which is now
meaningless.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add recovery for the H/W errors occurred before the HNS dev initialization
Shiju Jose [Thu, 13 Jun 2019 09:12:24 +0000 (17:12 +0800)]
net: hns3: add recovery for the H/W errors occurred before the HNS dev initialization

This patch adds the recovery for the HNS H/W errors which occurred
before the driver initialization.

Reported-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: process H/W errors occurred before HNS dev initialization
Shiju Jose [Thu, 13 Jun 2019 09:12:23 +0000 (17:12 +0800)]
net: hns3: process H/W errors occurred before HNS dev initialization

Presently the HNS driver enables the HNS H/W error interrupts after
the dev initialization is completed. However some exceptions such as
NCSI errors can occur when the network port driver is not loaded
and those errors required reporting to the BMC.
Therefore the firmware enabled all the HNS ras error interrupts
before the driver is loaded. And in some cases, there will be some
H/W errors remained unclear before reboot. Thus the HNS driver needs
to process and recover those hw errors occurred before HNS driver is
initialized.

This patch adds processing of the HNS hw errors(RAS and MSI-X)
which occurred before the driver initialization. For RAS, because
they are enabled by firmware, so we can detect specific bits, then
log and clear them. But for MSI-X which can not be enabled before
open vector0 irq, we can't detect the specific error bits, so we
just write 1 to all interrupt source registers to clear.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix avoid unnecessary resetting for the H/W errors which do not require...
Shiju Jose [Thu, 13 Jun 2019 09:12:22 +0000 (17:12 +0800)]
net: hns3: fix avoid unnecessary resetting for the H/W errors which do not require reset

HNS does not need to be reset when errors occur in some bits.
However presently the HNAE3_FUNC_RESET is set in this case and
as a result the default_reset is done when these errors are reported.
This patch fix this issue. Also patch does some optimization
in setting the reset level for the error recovery.

Reported-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: delay setting of reset level for hw errors until slot_reset is called
Shiju Jose [Thu, 13 Jun 2019 09:12:21 +0000 (17:12 +0800)]
net: hns3: delay setting of reset level for hw errors until slot_reset is called

Presently the error handling code sets the reset level required
for the recovery of the hw errors to the reset framework in the
error_detected AER callback. However the rest_event would be
called later from the slot_reset callback. This can cause issue
of using the wrong reset_level if a high priority reset request
occur before the slot_reset is called.

This patch delays setting of the reset level, required
for the hw errors, to the reset framework until the
slot_reset is called.

Reported-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'qed-iWARP-fixes'
David S. Miller [Sat, 15 Jun 2019 02:23:30 +0000 (19:23 -0700)]
Merge branch 'qed-iWARP-fixes'

Michal Kalderon says:

====================
qed: iWARP fixes

This series contains a few small fixes related to iWARP.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: iWARP - Fix default window size to be based on chip
Michal Kalderon [Thu, 13 Jun 2019 08:29:43 +0000 (11:29 +0300)]
qed: iWARP - Fix default window size to be based on chip

The default window size is calculated for best performance based
on internal hw buffer sizes. The size differs between the
different chips and modes.

Fixes: 67b40dccc45f ("qed: Implement iWARP initialization, teardown and qp operations")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: iWARP - Fix tc for MPA ll2 connection
Michal Kalderon [Thu, 13 Jun 2019 08:29:42 +0000 (11:29 +0300)]
qed: iWARP - Fix tc for MPA ll2 connection

The driver needs to assign a lossless traffic class for the MPA ll2
connection to ensure no packets are dropped when returning from the
driver as they will never be re-transmitted by the peer.

Fixes: ae3488ff37dc ("qed: Add ll2 connection for processing unaligned MPA packets")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: iWARP - fix uninitialized callback
Michal Kalderon [Thu, 13 Jun 2019 08:29:41 +0000 (11:29 +0300)]
qed: iWARP - fix uninitialized callback

Fix uninitialized variable warning by static checker.

Fixes: ae3488ff37dc ("qed: Add ll2 connection for processing unaligned MPA packets")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: iWARP - Use READ_ONCE and smp_store_release to access ep->state
Michal Kalderon [Thu, 13 Jun 2019 08:29:40 +0000 (11:29 +0300)]
qed: iWARP - Use READ_ONCE and smp_store_release to access ep->state

Destroy QP waits for it's ep object state to be set to CLOSED
before proceeding. ep->state can be updated from a different
context. Add smp_store_release/READ_ONCE to synchronize.

Fixes: fc4c6065e661 ("qed: iWARP implement disconnect flows")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: sfp: clean up a condition
Dan Carpenter [Thu, 13 Jun 2019 06:51:02 +0000 (09:51 +0300)]
net: phy: sfp: clean up a condition

The acpi_node_get_property_reference() doesn't return ACPI error codes,
it just returns regular negative kernel error codes.  This patch doesn't
affect run time, it's just a clean up.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ruslan Babayev <ruslan@babayev.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovsock: correct removal of socket from the list
Sunil Muthuswamy [Thu, 13 Jun 2019 03:52:27 +0000 (03:52 +0000)]
vsock: correct removal of socket from the list

The current vsock code for removal of socket from the list is both
subject to race and inefficient. It takes the lock, checks whether
the socket is in the list, drops the lock and if the socket was on the
list, deletes it from the list. This is subject to race because as soon
as the lock is dropped once it is checked for presence, that condition
cannot be relied upon for any decision. It is also inefficient because
if the socket is present in the list, it takes the lock twice.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'nfp-add-two-user-friendly-errors'
David S. Miller [Sat, 15 Jun 2019 02:18:27 +0000 (19:18 -0700)]
Merge branch 'nfp-add-two-user-friendly-errors'

Jakub Kicinski says:

====================
nfp: add two user friendly errors

This small series adds two error messages based on recent
bug reports which turned out not to be bugs..
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: print a warning when binding VFs to PF driver
Jakub Kicinski [Wed, 12 Jun 2019 23:59:03 +0000 (16:59 -0700)]
nfp: print a warning when binding VFs to PF driver

Users sometimes mistakenly try to manually bind the PF driver
to the VFs, print a warning message in that case.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: update the old flash error message
Jakub Kicinski [Wed, 12 Jun 2019 23:59:02 +0000 (16:59 -0700)]
nfp: update the old flash error message

Apparently there are still cards in the wild with a very old
management FW.  Let's make the error message in that case
indicate more clearly that management firmware has to be
updated.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Microchip-KSZ-driver-enhancements'
David S. Miller [Sat, 15 Jun 2019 02:11:54 +0000 (19:11 -0700)]
Merge branch 'Microchip-KSZ-driver-enhancements'

Robert Hancock says:

====================
Microchip KSZ driver enhancements

A couple of enhancements to the Microchip KSZ switch driver: one to add
PHY register settings for errata workarounds for more stable operation, and
another to add a device tree option to change the output clock rate as
required by some board designs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: microchip: Support optional 125MHz SYNCLKO output
Robert Hancock [Wed, 12 Jun 2019 20:49:06 +0000 (14:49 -0600)]
net: dsa: microchip: Support optional 125MHz SYNCLKO output

The KSZ9477 series chips have a SYNCLKO pin which by default outputs a
25MHz clock, but some board setups require a 125MHz clock instead. Added
a microchip,synclko-125 device tree property to allow indicating a
125MHz clock output is required.

Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: microchip: Add PHY errata workarounds
Robert Hancock [Wed, 12 Jun 2019 20:49:05 +0000 (14:49 -0600)]
net: dsa: microchip: Add PHY errata workarounds

The Silicon Errata and Data Sheet Clarification documents for the
KSZ9477 series of chips describe a number of otherwise undocumented PHY
register settings which are required to work around various chip errata.
Apply these settings when initializing the PHY ports on these chips.

Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: use GPIO descriptors in stmmac_mdio_reset
Martin Blumenstingl [Wed, 12 Jun 2019 19:31:15 +0000 (21:31 +0200)]
net: stmmac: use GPIO descriptors in stmmac_mdio_reset

Switch stmmac_mdio_reset to use GPIO descriptors. GPIO core handles the
"snps,reset-gpio" for GPIO descriptors so we don't need to take care of
it inside the driver anymore.

The advantage of this is that we now preserve the GPIO flags which are
passed via devicetree. This is required on some newer Amlogic boards
which use an Open Drain pin for the reset GPIO. This pin can only output
a LOW signal or switch to input mode but it cannot output a HIGH signal.
There are already devicetree bindings for these special cases and GPIO
core already takes care of them but only if we use GPIO descriptors
instead of GPIO numbers.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'packet-DDOS'
David S. Miller [Sat, 15 Jun 2019 01:52:14 +0000 (18:52 -0700)]
Merge branch 'packet-DDOS'

Eric Dumazet says:

====================
net/packet: better behavior under DDOS

Using tcpdump (or other af_packet user) on a busy host can lead to
catastrophic consequences, because suddenly, potentially all cpus
are spinning on a contended spinlock.

Both packet_rcv() and tpacket_rcv() grab the spinlock
to eventually find there is no room for an additional packet.

This patch series align packet_rcv() and tpacket_rcv() to both
check if the queue is full before grabbing the spinlock.

If the queue is full, they both increment a new atomic counter
placed on a separate cache line to let readers drain the queue faster.

There is still false sharing on this new atomic counter,
we might in the future make it per cpu if there is interest.
====================

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: introduce packet_rcv_try_clear_pressure() helper
Eric Dumazet [Wed, 12 Jun 2019 16:52:33 +0000 (09:52 -0700)]
net/packet: introduce packet_rcv_try_clear_pressure() helper

There are two places where we want to clear the pressure
if possible, add a helper to make it more obvious.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: remove locking from packet_rcv_has_room()
Eric Dumazet [Wed, 12 Jun 2019 16:52:32 +0000 (09:52 -0700)]
net/packet: remove locking from packet_rcv_has_room()

__packet_rcv_has_room() can now be run without lock being held.

po->pressure is only a non persistent hint, we can mark
all read/write accesses with READ_ONCE()/WRITE_ONCE()
to document the fact that the field could be written
without any synchronization.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: implement shortcut in tpacket_rcv()
Eric Dumazet [Wed, 12 Jun 2019 16:52:31 +0000 (09:52 -0700)]
net/packet: implement shortcut in tpacket_rcv()

tpacket_rcv() can be hit under DDOS quite hard, since
it will always grab a socket spinlock, to eventually find
there is no room for an additional packet.

Using tcpdump [1] on a busy host can lead to catastrophic consequences,
because of all cpus spinning on a contended spinlock.

This replicates a similar strategy used in packet_rcv()

[1] Also some applications mistakenly use af_packet socket
bound to ETH_P_ALL only to send packets.
Receive queue is never drained and immediately full.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: make tp_drops atomic
Eric Dumazet [Wed, 12 Jun 2019 16:52:30 +0000 (09:52 -0700)]
net/packet: make tp_drops atomic

Under DDOS, we want to be able to increment tp_drops without
touching the spinlock. This will help readers to drain
the receive queue slightly faster :/

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: constify __packet_rcv_has_room()
Eric Dumazet [Wed, 12 Jun 2019 16:52:29 +0000 (09:52 -0700)]
net/packet: constify __packet_rcv_has_room()

Goal is use the helper without lock being held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: constify prb_lookup_block() and __tpacket_v3_has_room()
Eric Dumazet [Wed, 12 Jun 2019 16:52:28 +0000 (09:52 -0700)]
net/packet: constify prb_lookup_block() and __tpacket_v3_has_room()

Goal is to be able to use __tpacket_v3_has_room() without holding
a lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: constify packet_lookup_frame() and __tpacket_has_room()
Eric Dumazet [Wed, 12 Jun 2019 16:52:27 +0000 (09:52 -0700)]
net/packet: constify packet_lookup_frame() and __tpacket_has_room()

Goal is to be able to use __tpacket_has_room() without holding a lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: constify __packet_get_status() argument
Eric Dumazet [Wed, 12 Jun 2019 16:52:26 +0000 (09:52 -0700)]
net/packet: constify __packet_get_status() argument

struct packet_sock  is only read.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Add more 1000BaseX support detection
Robert Hancock [Tue, 11 Jun 2019 22:06:09 +0000 (16:06 -0600)]
net: phy: Add more 1000BaseX support detection

Commit "net: phy: Add detection of 1000BaseX link mode support" added
support for not filtering out 1000BaseX mode from the PHY's supported
modes in genphy_config_init, but we have to make a similar change in
genphy_read_abilities in order to actually detect it as a supported mode
in the first place. Add this in.

Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw_ethtool: simplify slave loops
Ivan Khoronzhuk [Tue, 11 Jun 2019 21:59:40 +0000 (00:59 +0300)]
net: ethernet: ti: cpsw_ethtool: simplify slave loops

Only for consistency reasons, do it like in main cpsw.c module
and use ndev reference but not by means of slave.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: use cpsw as drv data
Ivan Khoronzhuk [Tue, 11 Jun 2019 21:49:03 +0000 (00:49 +0300)]
net: ethernet: ti: cpsw: use cpsw as drv data

No need to set ndev for drvdata when mainly cpsw reference is needed,
so correct this legacy decision.

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-mlx5-use-indirect-call-wrappers'
David S. Miller [Fri, 14 Jun 2019 22:35:18 +0000 (15:35 -0700)]
Merge branch 'net-mlx5-use-indirect-call-wrappers'

Paolo Abeni says:

====================
net/mlx5: use indirect call wrappers

The mlx5_core driver uses several indirect calls in fast-path, some of them
are invoked on each ingress packet, even for the XDP-only traffic.

This series leverage the indirect call wrappers infrastructure the avoid
the expansive RETPOLINE overhead for 2 indirect calls in fast-path.

Each call is addressed on a different patch, plus we need to introduce a couple
of additional helpers to cope with the higher number of possible direct-call
alternatives.

v2 -> v3:
 - do not add more INDIRECT_CALL_* macros
 - use only the direct calls always available regardless of
   the mlx5 build options in the last patch

v1 -> v2:
 - update the direct call list and use a macro to define it,
   as per Saeed suggestion. An intermediated additional
   macro is needed to allow arg list expansion
 - patch 2/3 is unchanged, as the generated code looks better this way than
   with possible alternative (dropping BP hits)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: use indirect calls wrapper for the rx packet handler
Paolo Abeni [Wed, 12 Jun 2019 10:18:36 +0000 (12:18 +0200)]
net/mlx5e: use indirect calls wrapper for the rx packet handler

We can avoid another indirect call per packet wrapping the rx
handler call with the proper helper.

To ensure that even the last listed direct call experience
measurable gain, despite the additional conditionals we must
traverse before reaching it, I tested reversing the order of the
listed options, with performance differences below noise level.

Together with the previous indirect call patch, this gives
~6% performance improvement in raw UDP tput.

v2 -> v3:
 - use only the direct calls always available regardless of
   the mlx5 build options
 - drop the direct call list macro, to keep the code as simple
   as possible for future rework

v1 -> v2:
 - update the direct call list and use a macro to define it,
   as per Saeed suggestion. An intermediated additional
   macro is needed to allow arg list expansion

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: use indirect calls wrapper for skb allocation
Paolo Abeni [Wed, 12 Jun 2019 10:18:35 +0000 (12:18 +0200)]
net/mlx5e: use indirect calls wrapper for skb allocation

We can avoid an indirect call per packet wrapping the skb creation
with the appropriate helper.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoi40e: mark expected switch fall-through
Gustavo A. R. Silva [Wed, 1 May 2019 20:55:41 +0000 (15:55 -0500)]
i40e: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/intel/i40e/i40e_xsk.c: In function ‘i40e_run_xdp_zc’:
drivers/net/ethernet/intel/i40e/i40e_xsk.c:217:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   bpf_warn_invalid_xdp_action(act);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/i40e/i40e_xsk.c:218:2: note: here
  case XDP_ABORTED:
  ^~~~

Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Missing response checks in driver when starting/stopping FW LLDP
Aleksandr Loktionov [Wed, 24 Apr 2019 12:20:55 +0000 (05:20 -0700)]
i40e: Missing response checks in driver when starting/stopping FW LLDP

Driver updated pf->flags before calling i40e_aq_start_lldp().
This patch moved down updating pf->flags down so flags will be
updated only in case of successful i40e_aq_start_lldp() call.
Also was introduced is_reset_needed local flag to avoid unnecessary h/w
reset in case 40e_aq_start_lldp() didn't change lldp state.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: remove duplicate stat calculation for tx_errors
Jacob Keller [Wed, 24 Apr 2019 12:20:54 +0000 (05:20 -0700)]
i40e: remove duplicate stat calculation for tx_errors

The tx_errors statistic was being calculated twice in
i40e_update_eth_stats.

This appears to be as of commit 201db2898f2c ("i40e: add missing VSI
statistics", 2014-03-25).

Remove the extra i40e_stat_update32 call for GLV_TEPC.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Check if the BAR size is large enough before writing to registers
Adam Ludkiewicz [Wed, 24 Apr 2019 12:20:53 +0000 (05:20 -0700)]
i40e: Check if the BAR size is large enough before writing to registers

This patch fixes the problem with a kernel panic occurring when trying
to bind the i40e driver to a non-i40e port. The problem is fixed by
checking if the BAR size in the device is large enough by reading the
highest register.

Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Missing response checks in driver when starting/stopping FW LLDP
Piotr Marczak [Wed, 24 Apr 2019 12:20:52 +0000 (05:20 -0700)]
i40e: Missing response checks in driver when starting/stopping FW LLDP

Driver did not check response on LLDP flag change and always returned
SUCCESS.

This patch now checks for an error and returns an error code and has
additional information in the log.

Signed-off-by: Piotr Marczak <piotr.marczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: add input validation for virtchnl handlers
Sergey Nemov [Wed, 24 Apr 2019 12:20:51 +0000 (05:20 -0700)]
i40e: add input validation for virtchnl handlers

Change some data to unsigned int instead of integer when we compare.

Check LUT values in VIRTCHNL_OP_CONFIG_RSS_LUT handler.

Also enhance error/warning messages to print the real values of
I40E_MAX_VF_QUEUES, I40E_MAX_VF_VSI and I40E_DEFAULT_QUEUES_PER_VF
instead of plain text.

Refactor code to comply with 'check first then assign' policy.

Remove duplicate checks for VIRTCHNL_OP_CONFIG_RSS_KEY and
VIRTCHNL_OP_CONFIG_RSS_LUT opcodes in i40e_vc_process_vf_msg(). We have
the very same checks inside the handlers already.

Signed-off-by: Sergey Nemov <sergey.nemov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Improve AQ log granularity
Doug Dziggel [Wed, 24 Apr 2019 12:20:50 +0000 (05:20 -0700)]
i40e: Improve AQ log granularity

This patch makes it possible to log only AQ descriptors, without the
entire AQ message buffers being dumped too. It should greatly reduce
kernel log size in cases where a full AQ dump is not needed.
Selection is made by setting flags in hw->debug_mask.

Additionally, some debug messages that preceded an AQ dump have been
moved to I40E_DEBUG_AQ_COMMAND class, which seems more appropriate.

Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Add bounds check for ch[] array
Piotr Kwapulinski [Wed, 24 Apr 2019 12:20:49 +0000 (05:20 -0700)]
i40e: Add bounds check for ch[] array

Add bounds check for ch[] array.
Use ARRAY_SIZE() to ensure that idx is within the range.

Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: Use signed variable
Mitch Williams [Wed, 24 Apr 2019 12:20:48 +0000 (05:20 -0700)]
i40e: Use signed variable

The counter variable in i40e_clean_tx_irq starts out negative and climbs
to 0. So it should not be defined as a u16. This was working by accident
due to the fact the u16 overflows and underflows predictably.

Replace the u16 with int, which is signed and can handle the negativity.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: add constraints for accessing veb array
Piotr Kwapulinski [Wed, 24 Apr 2019 12:20:47 +0000 (05:20 -0700)]
i40e: add constraints for accessing veb array

Add veb array access boundary checks.
Ensure veb array index is smaller than I40E_MAX_VEB.

Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: let untrusted VF to create up to 16 VLANs
Piotr Kwapulinski [Wed, 24 Apr 2019 12:20:46 +0000 (05:20 -0700)]
i40e: let untrusted VF to create up to 16 VLANs

This patch lets untrusted VF to create up to 16 VLANs.
It was implemented by increasing I40E_VC_MAX_VLAN_PER_VF up to 16.
Without this patch untrusted VF could create only up to 8 VLANs.

Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: add functions stubs to support EEE
Aleksandr Loktionov [Sat, 30 Mar 2019 00:04:56 +0000 (17:04 -0700)]
i40e: add functions stubs to support EEE

This patch adds functions stubs to support EEE on/off.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge tag 'mac80211-next-for-davem-2019-06-14' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Fri, 14 Jun 2019 18:27:26 +0000 (11:27 -0700)]
Merge tag 'mac80211-next-for-davem-2019-06-14' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Many changes all over:
 * HE (802.11ax) work continues
 * WPA3 offloads
 * work on extended key ID handling continues
 * fixes to honour AP supported rates with auth/assoc frames
 * nl80211 netlink policy improvements to fix some issues
   with strict validation on new commands with old attrs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosched: act_ctinfo: use extack error reporting
Kevin Darbyshire-Bryant [Fri, 14 Jun 2019 09:09:44 +0000 (10:09 +0100)]
sched: act_ctinfo: use extack error reporting

Use extack error reporting mechanism in addition to returning -EINVAL

NL_SET_ERR_* code shamelessy copy/paste/adjusted from act_pedit &
sch_cake and used as reference as to what I should have done in the
first place.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agol2tp: no need to check return value of debugfs_create functions
Greg Kroah-Hartman [Fri, 14 Jun 2019 07:04:38 +0000 (09:04 +0200)]
l2tp: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Also, there is no need to store the individual debugfs file name, just
remove the whole directory all at once, saving a local variable.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'r8169-add-and-use-helper-rtl_is_8168evl_up'
David S. Miller [Fri, 14 Jun 2019 15:38:27 +0000 (08:38 -0700)]
Merge branch 'r8169-add-and-use-helper-rtl_is_8168evl_up'

Heiner Kallweit says:

====================
r8169: add and use helper rtl_is_8168evl_up

Few registers have been added or changed its purpose with version
RTL8168e-vl, so create a helper for identifying chip versions from
RTL8168e-vl.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: use helper rtl_is_8168evl_up for setting register MaxTxPacketSize
Heiner Kallweit [Fri, 14 Jun 2019 05:55:21 +0000 (07:55 +0200)]
r8169: use helper rtl_is_8168evl_up for setting register MaxTxPacketSize

>From RTL8168e-vl the value in register MaxTxPacketSize is interpreted
differently, therefore use new helper rtl_is_8168evl_up to set this
register.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: add helper rtl_is_8168evl_up
Heiner Kallweit [Fri, 14 Jun 2019 05:54:07 +0000 (07:54 +0200)]
r8169: add helper rtl_is_8168evl_up

Add helper rtl_is_8168evl_up to make the code better readable and to
simplify it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomac80211: notify offchannel expire on mgmt_tx
James Prestwood [Wed, 12 Jun 2019 19:35:10 +0000 (12:35 -0700)]
mac80211: notify offchannel expire on mgmt_tx

When the offchannel TX wait time expires, send the appropriate event.

Signed-off-by: James Prestwood <james.prestwood@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agonl80211: send event when CMD_FRAME duration expires
James Prestwood [Wed, 12 Jun 2019 19:35:09 +0000 (12:35 -0700)]
nl80211: send event when CMD_FRAME duration expires

cfg80211_remain_on_channel_expired is used to notify userspace when
the remain on channel duration expired by sending an event. There is
no such equivalent to CMD_FRAME, where if offchannel and a duration
is provided, the card will go offchannel for that duration. Currently
there is no way for userspace to tell when that duration expired
apart from setting an independent timeout. This timeout is quite
erroneous as the kernel may not immediately send out the frame
because of scheduling or work queue delays. In testing, it was found
this timeout had to be quite large to accomidate any potential delays.

A better solution is to have the kernel send an event when this
duration has expired. There is already NL80211_CMD_FRAME_WAIT_CANCEL
which can be used to cancel a NL80211_CMD_FRAME offchannel. Using this
command matches perfectly to how NL80211_CMD_CANCEL_REMAIN_ON_CHANNEL
works, where its both used to cancel and notify if the duration has
expired.

Signed-off-by: James Prestwood <james.prestwood@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>