OSDN Git Service

uclinux-h8/linux.git
2 years agoMerge tag 'mlx5-updates-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 1 Oct 2021 21:41:38 +0000 (14:41 -0700)]
Merge tag 'mlx5-updates-2021-09-30' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-09-30

1) From Yevgeny Kliteynik:

This patch series deals with vport handling in SW steering.

For every vport, SW steering queries FW for this vport's properties,
such as RX/TX ICM addresses to be able to add this vport as dest action.
The following patches rework vport capabilities managements and add support
for Scalable Functions (SFs).

 - Patch 1 fixes the vport number data type all over the DR code to 16 bits
   in accordance with HW spec.
 - Patch 2 replaces local SW steering WIRE_PORT macro with the existing
   mlx5 define.
 - Patch 3 adds missing query for vport 0 and and handles eswitch manager
   capabilities for ECPF (BlueField in embedded CPU mode).
 - Patch 4 fixes error messages for failure to obtain vport caps from
   different locations in the code to have the same verbosity level and
   similar wording.
 - Patch 5 adds support for csum recalculation flow tables on SFs: it
   implements these FTs management in XArray instead of the fixed size array,
   thus adding support for csum recalculation table for any valid vport.
 - Patch 6 is the main patch of this whole series: it refactors vports
   capabilities handling and adds SFs support.

2) Minor and trivial updates and cleanups

* tag 'mlx5-updates-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Use array_size() helper
  net/mlx5: Use struct_size() helper in kvzalloc()
  net/mlx5: Use kvcalloc() instead of kvzalloc()
  net/mlx5: Tolerate failures in debug features while driver load
  net/mlx5: Warn for devlink reload when there are VFs alive
  net/mlx5: DR, Add missing string for action type SAMPLER
  net/mlx5: DR, init_next_match only if needed
  net/mlx5: DR, Fix typo 'offeset' to 'offset'
  net/mlx5: DR, Increase supported num of actions to 32
  net/mlx5: DR, Add support for SF vports
  net/mlx5: DR, Support csum recalculation flow table on SFs
  net/mlx5: DR, Align error messages for failure to obtain vport caps
  net/mlx5: DR, Add missing query for vport 0
  net/mlx5: DR, Replace local WIRE_PORT macro with the existing MLX5_VPORT_UPLINK
  net/mlx5: DR, Fix vport number data type to u16
====================

Link: https://lore.kernel.org/r/20210930232050.41779-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agofix up for "net: add new socket option SO_RESERVE_MEM"
Stephen Rothwell [Fri, 1 Oct 2021 13:43:30 +0000 (14:43 +0100)]
fix up for "net: add new socket option SO_RESERVE_MEM"

Some architectures do not include uapi/asm/socket.h

Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoRevert "Merge branch 'mctp-kunit-tests'"
David S. Miller [Fri, 1 Oct 2021 13:56:52 +0000 (14:56 +0100)]
Revert "Merge branch 'mctp-kunit-tests'"

This reverts commit 4f42ad2011d2fcbd89f5cdf56121271a8cd5ee5d, reversing
changes made to ea2dd331bfaaeba74ba31facf437c29044f7d4cb.

These chanfges break the build when mctp is modular.

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosparc: add SO_RESERVE_MEM definition.
David S. Miller [Fri, 1 Oct 2021 13:38:10 +0000 (14:38 +0100)]
sparc: add SO_RESERVE_MEM definition.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodevlink: report maximum number of snapshots with regions
Jacob Keller [Thu, 30 Sep 2021 21:21:04 +0000 (14:21 -0700)]
devlink: report maximum number of snapshots with regions

Each region has an independently configurable number of maximum
snapshots. This information is not reported to userspace, making it not
very discoverable. Fix this by adding a new
DEVLINK_ATTR_REGION_MAX_SNAPSHOST attribute which is used to report this
maximum.

Ex:

  $devlink region
  pci/0000:af:00.0/nvm-flash: size 10485760 snapshot [] max 1
  pci/0000:af:00.0/device-caps: size 4096 snapshot [] max 10
  pci/0000:af:00.1/nvm-flash: size 10485760 snapshot [] max 1
  pci/0000:af:00.1/device-caps: size 4096 snapshot [] max 10

This information enables users to understand why a new region command
may fail due to having too many existing snapshots.

Reported-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mctp-kunit-tests'
David S. Miller [Fri, 1 Oct 2021 13:19:01 +0000 (14:19 +0100)]
Merge branch 'mctp-kunit-tests'

Jeremy Kerr says:

====================
MCTP kunit tests

This change adds some initial kunit tests for the MCTP core. We'll
expand the coverage in a future series, and augment with a few
selftests, but this establishes a baseline set of tests for now.

Thanks to the kunit folks for the framework!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add input reassembly tests
Jeremy Kerr [Fri, 1 Oct 2021 08:18:44 +0000 (16:18 +0800)]
mctp: Add input reassembly tests

Add multi-packet route input tests, for message reassembly. These will
feed packets to be received by a bound socket, or dropped.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add route input to socket tests
Jeremy Kerr [Fri, 1 Oct 2021 08:18:43 +0000 (16:18 +0800)]
mctp: Add route input to socket tests

Add a few tests for single-packet route inputs, testing the
mctp_route_input function.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add packet rx tests
Jeremy Kerr [Fri, 1 Oct 2021 08:18:42 +0000 (16:18 +0800)]
mctp: Add packet rx tests

Add a few tests for the initial packet ingress through
mctp_pkttype_receive function; mainly packet header sanity checks. Full
input routing checks will be added as a separate change.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add test utils
Jeremy Kerr [Fri, 1 Oct 2021 08:18:41 +0000 (16:18 +0800)]
mctp: Add test utils

Add a new object for shared test utilities

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add initial test structure and fragmentation test
Jeremy Kerr [Fri, 1 Oct 2021 08:18:40 +0000 (16:18 +0800)]
mctp: Add initial test structure and fragmentation test

This change adds the first kunit test for the mctp subsystem, and an
initial test for the fragmentation path.

We're adding tests under a new net/mctp/test/ directory.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'mlx5-fixes-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 1 Oct 2021 10:19:40 +0000 (11:19 +0100)]
Merge tag 'mlx5-fixes-2021-09-30' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-09-30

This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: Use struct_size() helper in kvmalloc()
Gustavo A. R. Silva [Wed, 29 Sep 2021 20:17:18 +0000 (15:17 -0500)]
net: sched: Use struct_size() helper in kvmalloc()

Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows
that, in the worst scenario, could lead to heap overflows.

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210929201718.GA342296@embeddedor
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5e: Use array_size() helper
Gustavo A. R. Silva [Tue, 28 Sep 2021 21:54:10 +0000 (16:54 -0500)]
net/mlx5e: Use array_size() helper

Use array_size() helper to aid in 2-factor allocation instances.

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Use struct_size() helper in kvzalloc()
Gustavo A. R. Silva [Tue, 28 Sep 2021 22:11:57 +0000 (17:11 -0500)]
net/mlx5: Use struct_size() helper in kvzalloc()

Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worse scenario, could lead to heap overflows.

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Use kvcalloc() instead of kvzalloc()
Gustavo A. R. Silva [Tue, 28 Sep 2021 22:04:47 +0000 (17:04 -0500)]
net/mlx5: Use kvcalloc() instead of kvzalloc()

Use 2-factor argument form kvcalloc() instead of kvzalloc().

Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Tolerate failures in debug features while driver load
Aya Levin [Mon, 2 Aug 2021 10:33:03 +0000 (13:33 +0300)]
net/mlx5: Tolerate failures in debug features while driver load

FW tracer and resource dump are debug features. Although failing to
initialize them may indicate an error, don't let this stop device
loading.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Warn for devlink reload when there are VFs alive
Lama Kayal [Sun, 1 Aug 2021 11:57:16 +0000 (14:57 +0300)]
net/mlx5: Warn for devlink reload when there are VFs alive

When performing PF reload, VF can't communicate with FW until
it recovers and reloads as well.

Add a warning message when performing devlink reload while
VFs are still present. Thus, giving a notice of an unfavorable
behavior that might occur as a result of a consequential reloads
and cause interruption of VF recovery.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Add missing string for action type SAMPLER
Yevgeny Kliteynik [Mon, 20 Sep 2021 00:11:19 +0000 (03:11 +0300)]
net/mlx5: DR, Add missing string for action type SAMPLER

Add missing string value for DR_ACTION_TYP_SAMPLER action type

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, init_next_match only if needed
Yevgeny Kliteynik [Sun, 19 Sep 2021 15:48:07 +0000 (18:48 +0300)]
net/mlx5: DR, init_next_match only if needed

Allocate next steering table entry only if the remaining space requires to.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Fix typo 'offeset' to 'offset'
Yevgeny Kliteynik [Thu, 12 Aug 2021 00:00:52 +0000 (03:00 +0300)]
net/mlx5: DR, Fix typo 'offeset' to 'offset'

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Increase supported num of actions to 32
Yevgeny Kliteynik [Sat, 11 Sep 2021 08:44:01 +0000 (11:44 +0300)]
net/mlx5: DR, Increase supported num of actions to 32

Increase max supported number of actions in the same rule.

Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Add support for SF vports
Yevgeny Kliteynik [Thu, 9 Sep 2021 14:32:46 +0000 (17:32 +0300)]
net/mlx5: DR, Add support for SF vports

Move all the vport capabilities to a separate struct and store vport caps
in XArray: SFs vport numbers will not come in the same range as VF vports,
so the existing implementation of vport capabilities as a fixed size array
is not suitable here.

XArray is a perfect fit: it is efficient when the indices used are densely
clustered. In addition to being a perfect fit as a dynamic data structure,
XArray also provides locking - it uses RCU and an internal spinlock to
synchronise access, so no additional protection needed.

Now except for the eswitch manager vport, all other vports (including the
uplink vport) are handled in the same way: when a new go-to-vport action
is added, this vport's caps are loaded from the xarray. If it is the first
time for this particular vport number, then its capabilities are queried
from FW and filled in into the appropriate entry.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Support csum recalculation flow table on SFs
Yevgeny Kliteynik [Wed, 8 Sep 2021 16:44:11 +0000 (19:44 +0300)]
net/mlx5: DR, Support csum recalculation flow table on SFs

Implement csum recalculation flow tables in XAarray instead of a fixed
array, thus adding support for csum recalc table on any valid vport
number, which enables this support for SFs.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Align error messages for failure to obtain vport caps
Yevgeny Kliteynik [Tue, 17 Aug 2021 08:16:39 +0000 (11:16 +0300)]
net/mlx5: DR, Align error messages for failure to obtain vport caps

Print similar error messages when an invalid vport number is
provided during action creation and during STEv0/1 creation.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Add missing query for vport 0
Yevgeny Kliteynik [Tue, 10 Aug 2021 19:34:58 +0000 (22:34 +0300)]
net/mlx5: DR, Add missing query for vport 0

Currently, vport 0 capabilities are not set.
To fix this, we now querying both eswitch manager and vport 0.
Eswitch manager has an access to all the vports - for eswitch manager PF, all
vports can be referred as other vports. The exception is embedded CPU mode,
where there is vport 0 of ECPF and the PF vport 0.

Here is how vport are queried:

For Connect-X5/6:
    PF vport (0) and vports 1..n: vport number, other = true
    esw_manager is vport 0 (PF)
For BlueField (in embedded CPU mode):
    ECPF vport: vport = 0, other = false
    PF vport (0) and 1..n: vport number, other = true
    esw_manager = vport 0 (ECPF)

Also, note that there's no need for other_vport function parameter
in dr_domain_query_vport - this value is now deduced locally in the
function.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Replace local WIRE_PORT macro with the existing MLX5_VPORT_UPLINK
Yevgeny Kliteynik [Wed, 22 Sep 2021 23:23:23 +0000 (02:23 +0300)]
net/mlx5: DR, Replace local WIRE_PORT macro with the existing MLX5_VPORT_UPLINK

SW steering defines its own macro for uplink vport number.
Replace this macro with an already existing mlx5 macro.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: DR, Fix vport number data type to u16
Yevgeny Kliteynik [Thu, 12 Aug 2021 00:15:11 +0000 (03:15 +0300)]
net/mlx5: DR, Fix vport number data type to u16

According to the HW spec, vport number is a 16-bit value.
Fix vport usage all over the code to u16 data type.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 30 Sep 2021 21:49:21 +0000 (14:49 -0700)]
Merge git://git./linux/kernel/git/netdev/net

drivers/net/phy/bcm7xxx.c
  d88fd1b546ff ("net: phy: bcm7xxx: Fixed indirect MMD operations")
  f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165")

net/sched/sch_api.c
  b193e15ac69d ("net: prevent user from passing illegal stab size")
  69508d43334e ("net_sched: Use struct_size() and flex_array_size() helpers")

Both cases trivial - adjacent code additions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 30 Sep 2021 21:28:05 +0000 (14:28 -0700)]
Merge tag 'net-5.15-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from mac80211, netfilter and bpf.

  Current release - regressions:

   - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
     interrupt

   - mdio: revert mechanical patches which broke handling of optional
     resources

   - dev_addr_list: prevent address duplication

  Previous releases - regressions:

   - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
     (NULL deref)

   - Revert "mac80211: do not use low data rates for data frames with no
     ack flag", fixing broadcast transmissions

   - mac80211: fix use-after-free in CCMP/GCMP RX

   - netfilter: include zone id in tuple hash again, minimize collisions

   - netfilter: nf_tables: unlink table before deleting it (race -> UAF)

   - netfilter: log: work around missing softdep backend module

   - mptcp: don't return sockets in foreign netns

   - sched: flower: protect fl_walk() with rcu (race -> UAF)

   - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup

   - smsc95xx: fix stalled rx after link change

   - enetc: fix the incorrect clearing of IF_MODE bits

   - ipv4: fix rtnexthop len when RTA_FLOW is present

   - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this
     SKU

   - e100: fix length calculation & buffer overrun in ethtool::get_regs

  Previous releases - always broken:

   - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx

   - mac80211: drop frames from invalid MAC address in ad-hoc mode

   - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race
     -> UAF)

   - bpf, x86: Fix bpf mapping of atomic fetch implementation

   - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog

   - netfilter: ip6_tables: zero-initialize fragment offset

   - mhi: fix error path in mhi_net_newlink

   - af_unix: return errno instead of NULL in unix_create1() when over
     the fs.file-max limit

  Misc:

   - bpf: exempt CAP_BPF from checks against bpf_jit_limit

   - netfilter: conntrack: make max chain length random, prevent
     guessing buckets by attackers

   - netfilter: nf_nat_masquerade: make async masq_inet6_event handling
     generic, defer conntrack walk to work queue (prevent hogging RTNL
     lock)"

* tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
  af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
  net: stmmac: fix EEE init issue when paired with EEE capable PHYs
  net: dev_addr_list: handle first address in __hw_addr_add_ex
  net: sched: flower: protect fl_walk() with rcu
  net: introduce and use lock_sock_fast_nested()
  net: phy: bcm7xxx: Fixed indirect MMD operations
  net: hns3: disable firmware compatible features when uninstall PF
  net: hns3: fix always enable rx vlan filter problem after selftest
  net: hns3: PF enable promisc for VF when mac table is overflow
  net: hns3: fix show wrong state when add existing uc mac address
  net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
  net: hns3: don't rollback when destroy mqprio fail
  net: hns3: remove tc enable checking
  net: hns3: do not allow call hns3_nic_net_open repeatedly
  ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
  net: bridge: mcast: Associate the seqcount with its protecting lock.
  net: mdio-ipq4019: Fix the error for an optional regs resource
  net: hns3: fix hclge_dbg_dump_tm_pg() stack usage
  net: mdio: mscc-miim: Fix the mdio controller
  af_unix: Return errno instead of NULL in unix_create1().
  ...

2 years agonet/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode
Aya Levin [Mon, 13 Sep 2021 13:49:47 +0000 (16:49 +0300)]
net/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode

TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This
conflicts with MQPRIO in channel mode, which specifies explicitly which
TC accepts the packet. This patch mutually excludes the above
configuration.

Fixes: ec60c4581bd9 ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Fix the presented RQ index in PTP stats
Lama Kayal [Sun, 29 Aug 2021 08:26:03 +0000 (11:26 +0300)]
net/mlx5e: Fix the presented RQ index in PTP stats

PTP-RQ counters title format contains PTP-RQ identifier, which is
mistakenly not passed to sprinft().
This leads to unexpected garbage values instead.
This patch fixes it.

Before applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq15_packets: 0
     ptp_rq8_bytes: 0
     ptp_rq6_csum_complete: 0
     ptp_rq14_csum_complete_tail: 0
     ptp_rq3_csum_complete_tail_slow : 0
     ptp_rq9_csum_unnecessary: 0
     ptp_rq1_csum_unnecessary_inner: 0
     ptp_rq7_csum_none: 0
     ptp_rq10_xdp_drop: 0
     ptp_rq9_xdp_redirect: 0
     ptp_rq13_lro_packets: 0
     ptp_rq12_lro_bytes: 0
     ptp_rq10_ecn_mark: 0
     ptp_rq9_removed_vlan_packets: 0
     ptp_rq5_wqe_err: 0
     ptp_rq8_mpwqe_filler_cqes: 0
     ptp_rq2_mpwqe_filler_strides: 0
     ptp_rq5_oversize_pkts_sw_drop: 0
     ptp_rq6_buff_alloc_err: 0
     ptp_rq15_cqe_compress_blks: 0
     ptp_rq2_cqe_compress_pkts: 0
     ptp_rq2_cache_reuse: 0
     ptp_rq12_cache_full: 0
     ptp_rq11_cache_empty: 256
     ptp_rq12_cache_busy: 0
     ptp_rq11_cache_waive: 0
     ptp_rq12_congst_umr: 0
     ptp_rq11_arfs_err: 0
     ptp_rq9_recover: 0

After applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq0_packets: 0
     ptp_rq0_bytes: 0
     ptp_rq0_csum_complete: 0
     ptp_rq0_csum_complete_tail: 0
     ptp_rq0_csum_complete_tail_slow : 0
     ptp_rq0_csum_unnecessary: 0
     ptp_rq0_csum_unnecessary_inner: 0
     ptp_rq0_csum_none: 0
     ptp_rq0_xdp_drop: 0
     ptp_rq0_xdp_redirect: 0
     ptp_rq0_lro_packets: 0
     ptp_rq0_lro_bytes: 0
     ptp_rq0_ecn_mark: 0
     ptp_rq0_removed_vlan_packets: 0
     ptp_rq0_wqe_err: 0
     ptp_rq0_mpwqe_filler_cqes: 0
     ptp_rq0_mpwqe_filler_strides: 0
     ptp_rq0_oversize_pkts_sw_drop: 0
     ptp_rq0_buff_alloc_err: 0
     ptp_rq0_cqe_compress_blks: 0
     ptp_rq0_cqe_compress_pkts: 0
     ptp_rq0_cache_reuse: 0
     ptp_rq0_cache_full: 0
     ptp_rq0_cache_empty: 256
     ptp_rq0_cache_busy: 0
     ptp_rq0_cache_waive: 0
     ptp_rq0_congst_umr: 0
     ptp_rq0_arfs_err: 0
     ptp_rq0_recover: 0

Fixes: a28359e922c6 ("net/mlx5e: Add PTP-RX statistics")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Fix setting number of EQs of SFs
Shay Drory [Tue, 14 Sep 2021 07:13:02 +0000 (10:13 +0300)]
net/mlx5: Fix setting number of EQs of SFs

When setting number of completion EQs of the SF, consider number of
online CPUs.
Without this consideration, when number of online cpus are less than 8,
unnecessary 8 completion EQs are allocated.

Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Fix length of irq_index in chars
Shay Drory [Thu, 19 Aug 2021 13:01:28 +0000 (16:01 +0300)]
net/mlx5: Fix length of irq_index in chars

The maximum irq_index can be 2047, This means irq_name should have 4
characters reserve for the irq_index. Hence, increase it to 4.

Fixes: 3af26495a247 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Avoid generating event after PPS out in Real time mode
Aya Levin [Thu, 23 Sep 2021 12:30:01 +0000 (15:30 +0300)]
net/mlx5: Avoid generating event after PPS out in Real time mode

When in Real-time mode, HW clock is synced with the PTP daemon. Hence
driver should not re-calibrate the next pulse (via MTPPSE repetitive
events mechanism).

This patch arms repetitive events only in free-running mode.

Fixes: 432119de33d9 ("net/mlx5: Add cyc2time HW translation mode support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Force round second at 1PPS out start time
Aya Levin [Thu, 23 Sep 2021 13:56:09 +0000 (16:56 +0300)]
net/mlx5: Force round second at 1PPS out start time

Allow configuration of 1PPS start time only with time-stamp representing
a round second. Prior to this patch driver allowed setting of a
non-round-second which is not supported by the device. Avoid unexpected
behavior by restricting start-time configuration to a round-second.

Fixes: 4272f9b88db9 ("net/mlx5e: Change 1PPS out scheme")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: E-Switch, Fix double allocation of acl flow counter
Moshe Shemesh [Thu, 23 Sep 2021 14:57:47 +0000 (17:57 +0300)]
net/mlx5: E-Switch, Fix double allocation of acl flow counter

Flow counter is allocated in eswitch legacy acl setting functions
without checking if already allocated by previous setting. Add a check
to avoid such double allocation.

Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: ea651a86d468 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Improve MQPRIO resiliency
Tariq Toukan [Wed, 29 Sep 2021 12:51:26 +0000 (15:51 +0300)]
net/mlx5e: Improve MQPRIO resiliency

* Add netdev->tc_to_txq rollback in case of failure in
  mlx5e_update_netdev_queues().
* Fix broken transition between the two modes:
  MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
* Disable MQPRIO channel mode if re-attaching with a different number
  of channels.
* Improve code sharing.

Fixes: ec60c4581bd9 ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Keep the value for maximum number of channels in-sync
Tariq Toukan [Thu, 2 Sep 2021 07:33:32 +0000 (10:33 +0300)]
net/mlx5e: Keep the value for maximum number of channels in-sync

The value for maximum number of channels is first calculated based
on the netdev's profile and current function resources (specifically,
number of MSIX vectors, which depends among other things on the number
of online cores in the system).
This value is then used to calculate the netdev's number of rxqs/txqs.
Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
is constant and we must not exceed it.

To achieve this, keep the maximum number of channels in sync upon any
netdevice re-attach.

Use mlx5e_get_max_num_channels() for calculating the number of netdev's
rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
coinsiders core device resources, profile, and netdev) to init or
update priv->max_nch.

Before this patch, the value of priv->max_nch might get out of sync,
mistakenly allowing accesses to out-of-bounds objects, which would
crash the system.

Track the number of channels stats structures used in a separate
field, as they are persistent to suspend/resume operations. All the
collected stats of every channel index that ever existed should be
preserved. They are reset only when struct mlx5e_priv is,
in mlx5e_priv_cleanup(), which is part of the profile changing flow.

There is no point anymore in blocking a profile change due to max_nch
mismatch in mlx5e_netdev_change_profile(). Remove the limitation.

Fixes: a1f240f18017 ("net/mlx5e: Adjust to max number of channles when re-attaching")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: IPSEC RX, enable checksum complete
Raed Salem [Thu, 26 Aug 2021 14:07:17 +0000 (17:07 +0300)]
net/mlx5e: IPSEC RX, enable checksum complete

Currently in Rx data path IPsec crypto offloaded packets uses
csum_none flag, so checksum is handled by the stack, this naturally
have some performance/cpu utilization impact on such flows. As Nvidia
NIC starting from ConnectX6DX provides checksum complete value out of
the box also for such flows there is no sense in taking csum_none path,
furthermore the stack (xfrm) have the method to handle checksum complete
corrections for such flows i.e. IPsec trailer removal and consequently
checksum value adjustment.

Because of the above and in addition the ConnectX6DX is the first HW
which supports IPsec crypto offload then it is safe to report csum
complete for IPsec offloaded traffic.

Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agoMerge tag 'gpio-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 30 Sep 2021 19:11:35 +0000 (12:11 -0700)]
Merge tag 'gpio-fixes-for-v5.15-rc4' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "A single fix for the gpio-pca953x driver and two commits updating the
  MAINTAINERS entries for Mun Yew Tham (GPIO specific) and myself
  (treewide after a change in professional situation).

  Summary:

   - don't ignore I2C errors in gpio-pca953x

   - update MAINTAINERS entries for Mun Yew Tham and myself"

* tag 'gpio-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer
  MAINTAINERS: update my email address
  gpio: pca953x: do not ignore i2c errors

2 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Thu, 30 Sep 2021 19:00:46 +0000 (12:00 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Not much too exciting here, although two syzkaller bugs that seem to
  have 9 lives may have finally been squashed.

  Several core bugs and a batch of driver bug fixes:

   - Fix compilation problems in qib and hfi1

   - Do not corrupt the joined multicast group state when using
     SEND_ONLY

   - Several CMA bugs, a reference leak for listening and two syzkaller
     crashers

   - Various bug fixes for irdma

   - Fix a Sleeping while atomic bug in usnic

   - Properly sanitize kernel pointers in dmesg

   - Two bugs in the 64b CQE support for hns"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Add the check of the CQE size of the user space
  RDMA/hns: Fix the size setting error when copying CQE in clean_cq()
  RDMA/hfi1: Fix kernel pointer leak
  RDMA/usnic: Lock VF with mutex instead of spinlock
  RDMA/hns: Work around broken constant propagation in gcc 8
  RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests
  RDMA/cma: Do not change route.addr.src_addr.ss_family
  RDMA/irdma: Report correct WC error when there are MW bind errors
  RDMA/irdma: Report correct WC error when transport retry counter is exceeded
  RDMA/irdma: Validate number of CQ entries on create CQ
  RDMA/irdma: Skip CQP ring during a reset
  MAINTAINERS: Update Broadcom RDMA maintainers
  RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure
  IB/cma: Do not send IGMP leaves for sendonly Multicast groups
  IB/qib: Fix clang confusion of NULL pointer comparison

2 years agoaf_unix: fix races in sk_peer_pid and sk_peer_cred accesses
Eric Dumazet [Wed, 29 Sep 2021 22:57:50 +0000 (15:57 -0700)]
af_unix: fix races in sk_peer_pid and sk_peer_cred accesses

Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.

In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.

Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.

Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'snmp-optimizations'
David S. Miller [Thu, 30 Sep 2021 13:17:10 +0000 (14:17 +0100)]
Merge branch 'snmp-optimizations'

Eric Dumazet says:

====================
net: snmp: minor optimizations

Fetching many SNMP counters on hosts with large number of cpus
takes a lot of time. mptcp still uses the old non-batched
fashion which is not cache friendly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomptcp: use batch snmp operations in mptcp_seq_show()
Eric Dumazet [Thu, 30 Sep 2021 01:03:33 +0000 (18:03 -0700)]
mptcp: use batch snmp operations in mptcp_seq_show()

Using snmp_get_cpu_field_batch() allows for better cpu cache
utilization, especially on hosts with large number of cpus.

Also remove special handling when mptcp mibs where not yet
allocated.

I chose to use temporary storage on the stack to keep this patch simple.
We might in the future use the storage allocated in netstat_seq_show().

Combined with prior patch (inlining snmp_get_cpu_field)
time to fetch and output mptcp counters on a 256 cpu host [1]
goes from 75 usec to 16 usec.

[1] L1 cache size is 32KB, it is not big enough to hold all dataset.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: snmp: inline snmp_get_cpu_field()
Eric Dumazet [Thu, 30 Sep 2021 01:03:32 +0000 (18:03 -0700)]
net: snmp: inline snmp_get_cpu_field()

This trivial function is called ~90,000 times on 256 cpus hosts,
when reading /proc/net/netstat. And this number keeps inflating.

Inlining it saves many cycles.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/mlx4_en: Add XDP_REDIRECT statistics
Joshua Roys [Thu, 30 Sep 2021 02:30:23 +0000 (22:30 -0400)]
net/mlx4_en: Add XDP_REDIRECT statistics

Add counters for XDP REDIRECT success and failure. This brings the
redirect path in line with metrics gathered via the other XDP paths.

Signed-off-by: Joshua Roys <roysjosh@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: stmmac: fix EEE init issue when paired with EEE capable PHYs
Wong Vee Khee [Thu, 30 Sep 2021 06:44:36 +0000 (14:44 +0800)]
net: stmmac: fix EEE init issue when paired with EEE capable PHYs

When STMMAC is paired with Energy-Efficient Ethernet(EEE) capable PHY,
and the PHY is advertising EEE by default, we need to enable EEE on the
xPCS side too, instead of having user to manually trigger the enabling
config via ethtool.

Fixed this by adding xpcs_config_eee() call in stmmac_eee_init().

Fixes: 7617af3d1a5e ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet")
Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoixgbe: let the xdpdrv work with more than 64 cpus
Jason Xing [Wed, 29 Sep 2021 17:56:05 +0000 (10:56 -0700)]
ixgbe: let the xdpdrv work with more than 64 cpus

Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the
server is equipped with more than 64 cpus online. So it turns out that
the loading of xdpdrv causes the "NOMEM" failure.

Actually, we can adjust the algorithm and then make it work through
mapping the current cpu to some xdp ring with the protect of @tx_lock.

Here are some numbers before/after applying this patch with xdp-example
loaded on the eth0X:

As client (tx path):
                     Before    After
TCP_STREAM send-64   734.14    714.20
TCP_STREAM send-128  1401.91   1395.05
TCP_STREAM send-512  5311.67   5292.84
TCP_STREAM send-1k   9277.40   9356.22 (not stable)
TCP_RR     send-1    22559.75  21844.22
TCP_RR     send-128  23169.54  22725.13
TCP_RR     send-512  21670.91  21412.56

As server (rx path):
                     Before    After
TCP_STREAM send-64   1416.49   1383.12
TCP_STREAM send-128  3141.49   3055.50
TCP_STREAM send-512  9488.73   9487.44
TCP_STREAM send-1k   9491.17   9356.22 (not stable)
TCP_RR     send-1    23617.74  23601.60
...

Notice: the TCP_RR mode is unstable as the official document explains.

I tested many times with different parameters combined through netperf.
Though the result is not that accurate, I cannot see much influence on
this patch. The static key is places on the hot path, but it actually
shouldn't cause a huge regression theoretically.

Co-developed-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'SO_RESEVED_MEM'
David S. Miller [Thu, 30 Sep 2021 12:36:46 +0000 (13:36 +0100)]
Merge branch 'SO_RESEVED_MEM'

Wei Wang says:

====================
net: add new socket option SO_RESERVE_MEM

This patch series introduces a new socket option SO_RESERVE_MEM.
This socket option provides a mechanism for users to reserve a certain
amount of memory for the socket to use. When this option is set, kernel
charges the user specified amount of memory to memcg, as well as
sk_forward_alloc. This amount of memory is not reclaimable and is
available in sk_forward_alloc for this socket.
With this socket option set, the networking stack spends less cycles
doing forward alloc and reclaim, which should lead to better system
performance, with the cost of an amount of pre-allocated and
unreclaimable memory, even under memory pressure.
With a tcp_stream test with 10 flows running on a simulated 100ms RTT
link, I can see the cycles spent in __sk_mem_raise_allocated() dropping
by ~0.02%. Not a whole lot, since we already have logic in
sk_mem_uncharge() to only reclaim 1MB when sk_forward_alloc has more
than 2MB free space. But on a system suffering memory pressure
constently, the savings should be more.

The first patch is the implementation of this socket option. The
following 2 patches change the tcp stack to make use of this reserved
memory when under memory pressure. This makes the tcp stack behavior
more flexible when under memory pressure, and provides a way for user to
control the distribution of the memory among its sockets.
With a TCP connection on a simulated 100ms RTT link, the default
throughput under memory pressure is ~500Kbps. With SO_RESERVE_MEM set to
100KB, the throughput under memory pressure goes up to ~3.5Mbps.

Change since v2:
- Added description for new field added in struct sock in patch 1
Change since v1:
- Added performance stats in cover letter and rebased
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: adjust rcv_ssthresh according to sk_reserved_mem
Wei Wang [Wed, 29 Sep 2021 17:25:13 +0000 (10:25 -0700)]
tcp: adjust rcv_ssthresh according to sk_reserved_mem

When user sets SO_RESERVE_MEM socket option, in order to utilize the
reserved memory when in memory pressure state, we adjust rcv_ssthresh
according to the available reserved memory for the socket, instead of
using 4 * advmss always.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: adjust sndbuf according to sk_reserved_mem
Wei Wang [Wed, 29 Sep 2021 17:25:12 +0000 (10:25 -0700)]
tcp: adjust sndbuf according to sk_reserved_mem

If user sets SO_RESERVE_MEM socket option, in order to fully utilize the
reserved memory in memory pressure state on the tx path, we modify the
logic in sk_stream_moderate_sndbuf() to set sk_sndbuf according to
available reserved memory, instead of MIN_SOCK_SNDBUF, and adjust it
when new data is acked.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: add new socket option SO_RESERVE_MEM
Wei Wang [Wed, 29 Sep 2021 17:25:11 +0000 (10:25 -0700)]
net: add new socket option SO_RESERVE_MEM

This socket option provides a mechanism for users to reserve a certain
amount of memory for the socket to use. When this option is set, kernel
charges the user specified amount of memory to memcg, as well as
sk_forward_alloc. This amount of memory is not reclaimable and is
available in sk_forward_alloc for this socket.
With this socket option set, the networking stack spends less cycles
doing forward alloc and reclaim, which should lead to better system
performance, with the cost of an amount of pre-allocated and
unreclaimable memory, even under memory pressure.

Note:
This socket option is only available when memory cgroup is enabled and we
require this reserved memory to be charged to the user's memcg. We hope
this could avoid mis-behaving users to abused this feature to reserve a
large amount on certain sockets and cause unfairness for others.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dev_addr_list: handle first address in __hw_addr_add_ex
Jakub Kicinski [Wed, 29 Sep 2021 15:32:24 +0000 (08:32 -0700)]
net: dev_addr_list: handle first address in __hw_addr_add_ex

struct dev_addr_list is used for device addresses, unicast addresses
and multicast addresses. The first of those needs special handling
of the main address - netdev->dev_addr points directly the data
of the entry and drivers write to it freely, so we can't maintain
it in the rbtree (for now, at least, to be fixed in net-next).

Current work around sprinkles special handling of the first
address on the list throughout the code but it missed the case
where address is being added. First address will not be visible
during subsequent adds.

Syzbot found a warning where unicast addresses are modified
without holding the rtnl lock, tl;dr is that team generates
the same modification multiple times, not necessarily when
right locks are held.

In the repro we have:

  macvlan -> team -> veth

macvlan adds a unicast address to the team. Team then pushes
that address down to its memebers (veths). Next something unrelated
makes team sync member addrs again, and because of the bug
the addr entries get duplicated in the veths. macvlan gets
removed, removes its addr from team which removes only one
of the duplicated addresses from veths. This removal is done
under rtnl. Next syzbot uses iptables to add a multicast addr
to team (which does not hold rtnl lock). Team syncs veth addrs,
but because veths' unicast list still has the duplicate it will
also get sync, even though this update is intended for mc addresses.
Again, uc address updates need rtnl lock, boom.

Reported-by: syzbot+7a2ab2cdc14d134de553@syzkaller.appspotmail.com
Fixes: 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: marvell10g: add downshift tunable support
Russell King [Wed, 29 Sep 2021 15:28:53 +0000 (16:28 +0100)]
net: phy: marvell10g: add downshift tunable support

Add support for the downshift tunable for the Marvell 88x3310 PHY.
Downshift is only usable with firmware 0.3.5.0 and later.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: flower: protect fl_walk() with rcu
Vlad Buslov [Wed, 29 Sep 2021 15:08:49 +0000 (18:08 +0300)]
net: sched: flower: protect fl_walk() with rcu

Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul()
also removed rcu protection of individual filters which causes following
use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain
rcu read lock while iterating and taking the filter reference and temporary
release the lock while calling arg->fn() callback that can sleep.

KASAN trace:

[  352.773640] ==================================================================
[  352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower]
[  352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987

[  352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2
[  352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  352.781022] Call Trace:
[  352.781573]  dump_stack_lvl+0x46/0x5a
[  352.782332]  print_address_description.constprop.0+0x1f/0x140
[  352.783400]  ? fl_walk+0x159/0x240 [cls_flower]
[  352.784292]  ? fl_walk+0x159/0x240 [cls_flower]
[  352.785138]  kasan_report.cold+0x83/0xdf
[  352.785851]  ? fl_walk+0x159/0x240 [cls_flower]
[  352.786587]  kasan_check_range+0x145/0x1a0
[  352.787337]  fl_walk+0x159/0x240 [cls_flower]
[  352.788163]  ? fl_put+0x10/0x10 [cls_flower]
[  352.789007]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[  352.790102]  tcf_chain_dump+0x231/0x450
[  352.790878]  ? tcf_chain_tp_delete_empty+0x170/0x170
[  352.791833]  ? __might_sleep+0x2e/0xc0
[  352.792594]  ? tfilter_notify+0x170/0x170
[  352.793400]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[  352.794477]  tc_dump_tfilter+0x385/0x4b0
[  352.795262]  ? tc_new_tfilter+0x1180/0x1180
[  352.796103]  ? __mod_node_page_state+0x1f/0xc0
[  352.796974]  ? __build_skb_around+0x10e/0x130
[  352.797826]  netlink_dump+0x2c0/0x560
[  352.798563]  ? netlink_getsockopt+0x430/0x430
[  352.799433]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[  352.800542]  __netlink_dump_start+0x356/0x440
[  352.801397]  rtnetlink_rcv_msg+0x3ff/0x550
[  352.802190]  ? tc_new_tfilter+0x1180/0x1180
[  352.802872]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  352.803668]  ? tc_new_tfilter+0x1180/0x1180
[  352.804344]  ? _copy_from_iter_nocache+0x800/0x800
[  352.805202]  ? kasan_set_track+0x1c/0x30
[  352.805900]  netlink_rcv_skb+0xc6/0x1f0
[  352.806587]  ? rht_deferred_worker+0x6b0/0x6b0
[  352.807455]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  352.808324]  ? netlink_ack+0x4d0/0x4d0
[  352.809086]  ? netlink_deliver_tap+0x62/0x3d0
[  352.809951]  netlink_unicast+0x353/0x480
[  352.810744]  ? netlink_attachskb+0x430/0x430
[  352.811586]  ? __alloc_skb+0xd7/0x200
[  352.812349]  netlink_sendmsg+0x396/0x680
[  352.813132]  ? netlink_unicast+0x480/0x480
[  352.813952]  ? __import_iovec+0x192/0x210
[  352.814759]  ? netlink_unicast+0x480/0x480
[  352.815580]  sock_sendmsg+0x6c/0x80
[  352.816299]  ____sys_sendmsg+0x3a5/0x3c0
[  352.817096]  ? kernel_sendmsg+0x30/0x30
[  352.817873]  ? __ia32_sys_recvmmsg+0x150/0x150
[  352.818753]  ___sys_sendmsg+0xd8/0x140
[  352.819518]  ? sendmsg_copy_msghdr+0x110/0x110
[  352.820402]  ? ___sys_recvmsg+0xf4/0x1a0
[  352.821110]  ? __copy_msghdr_from_user+0x260/0x260
[  352.821934]  ? _raw_spin_lock+0x81/0xd0
[  352.822680]  ? __handle_mm_fault+0xef3/0x1b20
[  352.823549]  ? rb_insert_color+0x2a/0x270
[  352.824373]  ? copy_page_range+0x16b0/0x16b0
[  352.825209]  ? perf_event_update_userpage+0x2d0/0x2d0
[  352.826190]  ? __fget_light+0xd9/0xf0
[  352.826941]  __sys_sendmsg+0xb3/0x130
[  352.827613]  ? __sys_sendmsg_sock+0x20/0x20
[  352.828377]  ? do_user_addr_fault+0x2c5/0x8a0
[  352.829184]  ? fpregs_assert_state_consistent+0x52/0x60
[  352.830001]  ? exit_to_user_mode_prepare+0x32/0x160
[  352.830845]  do_syscall_64+0x35/0x80
[  352.831445]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  352.832331] RIP: 0033:0x7f7bee973c17
[  352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17
[  352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003
[  352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40
[  352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f
[  352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088

[  352.843784] Allocated by task 2960:
[  352.844451]  kasan_save_stack+0x1b/0x40
[  352.845173]  __kasan_kmalloc+0x7c/0x90
[  352.845873]  fl_change+0x282/0x22db [cls_flower]
[  352.846696]  tc_new_tfilter+0x6cf/0x1180
[  352.847493]  rtnetlink_rcv_msg+0x471/0x550
[  352.848323]  netlink_rcv_skb+0xc6/0x1f0
[  352.849097]  netlink_unicast+0x353/0x480
[  352.849886]  netlink_sendmsg+0x396/0x680
[  352.850678]  sock_sendmsg+0x6c/0x80
[  352.851398]  ____sys_sendmsg+0x3a5/0x3c0
[  352.852202]  ___sys_sendmsg+0xd8/0x140
[  352.852967]  __sys_sendmsg+0xb3/0x130
[  352.853718]  do_syscall_64+0x35/0x80
[  352.854457]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  352.855830] Freed by task 7:
[  352.856421]  kasan_save_stack+0x1b/0x40
[  352.857139]  kasan_set_track+0x1c/0x30
[  352.857854]  kasan_set_free_info+0x20/0x30
[  352.858609]  __kasan_slab_free+0xed/0x130
[  352.859348]  kfree+0xa7/0x3c0
[  352.859951]  process_one_work+0x44d/0x780
[  352.860685]  worker_thread+0x2e2/0x7e0
[  352.861390]  kthread+0x1f4/0x220
[  352.862022]  ret_from_fork+0x1f/0x30

[  352.862955] Last potentially related work creation:
[  352.863758]  kasan_save_stack+0x1b/0x40
[  352.864378]  kasan_record_aux_stack+0xab/0xc0
[  352.865028]  insert_work+0x30/0x160
[  352.865617]  __queue_work+0x351/0x670
[  352.866261]  rcu_work_rcufn+0x30/0x40
[  352.866917]  rcu_core+0x3b2/0xdb0
[  352.867561]  __do_softirq+0xf6/0x386

[  352.868708] Second to last potentially related work creation:
[  352.869779]  kasan_save_stack+0x1b/0x40
[  352.870560]  kasan_record_aux_stack+0xab/0xc0
[  352.871426]  call_rcu+0x5f/0x5c0
[  352.872108]  queue_rcu_work+0x44/0x50
[  352.872855]  __fl_put+0x17c/0x240 [cls_flower]
[  352.873733]  fl_delete+0xc7/0x100 [cls_flower]
[  352.874607]  tc_del_tfilter+0x510/0xb30
[  352.886085]  rtnetlink_rcv_msg+0x471/0x550
[  352.886875]  netlink_rcv_skb+0xc6/0x1f0
[  352.887636]  netlink_unicast+0x353/0x480
[  352.888285]  netlink_sendmsg+0x396/0x680
[  352.888942]  sock_sendmsg+0x6c/0x80
[  352.889583]  ____sys_sendmsg+0x3a5/0x3c0
[  352.890311]  ___sys_sendmsg+0xd8/0x140
[  352.891019]  __sys_sendmsg+0xb3/0x130
[  352.891716]  do_syscall_64+0x35/0x80
[  352.892395]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  352.893666] The buggy address belongs to the object at ffff8881c8251000
                which belongs to the cache kmalloc-2k of size 2048
[  352.895696] The buggy address is located 1152 bytes inside of
                2048-byte region [ffff8881c8251000ffff8881c8251800)
[  352.897640] The buggy address belongs to the page:
[  352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250
[  352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0
[  352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[  352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00
[  352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[  352.905861] page dumped because: kasan: bad access detected

[  352.907323] Memory state around the buggy address:
[  352.908218]  ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.909471]  ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.912012]                    ^
[  352.912642]  ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.913919]  ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  352.915185] ==================================================================

Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Remove redundant initialization of variable pin
Colin Ian King [Wed, 29 Sep 2021 13:27:53 +0000 (14:27 +0100)]
octeontx2-af: Remove redundant initialization of variable pin

The variable pin is being initialized with a value that is never
read, it is being updated later on in only one case of a switch
statement.  The assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: macb: ptp: Switch to gettimex64() interface
Lars-Peter Clausen [Wed, 29 Sep 2021 12:07:39 +0000 (14:07 +0200)]
net: macb: ptp: Switch to gettimex64() interface

The macb PTP support currently implements the `gettime64` callback to allow
to retrieve the hardware clock time. Update the implementation to provide
the `gettimex64` callback instead.

The difference between the two is that with `gettime64` a snapshot of the
system clock is taken before and after invoking the callback. Whereas
`gettimex64` expects the callback itself to take the snapshots.

To get the time from the macb Ethernet core multiple register accesses have
to be done. Only one of which will happen at the time reported by the
function. This leads to a non-symmetric delay and adds a slight offset
between the hardware and system clock time when using the `gettime64`
method. This offset can be a few 100 nanoseconds. Switching to the
`gettimex64` method allows for a more precise correlation of the hardware
and system clocks and results in a lower offset between the two.

On a Xilinx ZynqMP system `phc2sys` reports a delay of 1120 ns before and
300 ns after the patch. With the latter being mostly symmetric.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodissector: do not set invalid PPP protocol
Boris Sukholitko [Wed, 29 Sep 2021 11:32:23 +0000 (14:32 +0300)]
dissector: do not set invalid PPP protocol

The following flower filter fails to match non-PPP_IP{V6} packets
wrapped in PPP_SES protocol:

tc filter add dev eth0 ingress protocol ppp_ses flower \
        action simple sdata hi64

The reason is that proto local variable is being set even when
FLOW_DISSECT_RET_OUT_BAD status is returned.

The fix is to avoid setting proto variable if the PPP protocol is unknown.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: rtl8366rb: Use core filtering tracking
Linus Walleij [Wed, 29 Sep 2021 11:23:22 +0000 (13:23 +0200)]
net: dsa: rtl8366rb: Use core filtering tracking

We added a state variable to track whether a certain port
was VLAN filtering or not, but we can just inquire the DSA
core about this.

Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Alvin Å ipraga <alsi@bang-olufsen.dk>
Cc: Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: introduce and use lock_sock_fast_nested()
Paolo Abeni [Wed, 29 Sep 2021 09:59:17 +0000 (11:59 +0200)]
net: introduce and use lock_sock_fast_nested()

Syzkaller reported a false positive deadlock involving
the nl socket lock and the subflow socket lock:

MPTCP: kernel_bind error, err=-98
============================================
WARNING: possible recursive locking detected
5.15.0-rc1-syzkaller #0 Not tainted
--------------------------------------------
syz-executor998/6520 is trying to acquire lock:
ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738

but task is already holding lock:
ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(k-sk_lock-AF_INET);
  lock(k-sk_lock-AF_INET);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by syz-executor998/6520:
 #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802
 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790
 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720

stack backtrace:
CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_deadlock_bug kernel/locking/lockdep.c:2944 [inline]
 check_deadlock kernel/locking/lockdep.c:2987 [inline]
 validate_chain kernel/locking/lockdep.c:3776 [inline]
 __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015
 lock_acquire kernel/locking/lockdep.c:5625 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
 lock_sock_fast+0x36/0x100 net/core/sock.c:3229
 mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
 __sock_release net/socket.c:649 [inline]
 sock_release+0x87/0x1b0 net/socket.c:677
 mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900
 mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170
 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
 genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 sock_no_sendpage+0x101/0x150 net/core/sock.c:2980
 kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504
 kernel_sendpage net/socket.c:3501 [inline]
 sock_sendpage+0xe5/0x140 net/socket.c:1003
 pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
 splice_from_pipe_feed fs/splice.c:418 [inline]
 __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
 splice_from_pipe fs/splice.c:597 [inline]
 generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
 do_splice_from fs/splice.c:767 [inline]
 direct_splice_actor+0x110/0x180 fs/splice.c:936
 splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891
 do_splice_direct+0x1b3/0x280 fs/splice.c:979
 do_sendfile+0xae9/0x1240 fs/read_write.c:1249
 __do_sys_sendfile64 fs/read_write.c:1314 [inline]
 __se_sys_sendfile64 fs/read_write.c:1300 [inline]
 __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f215cb69969
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08
R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c
R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000

the problem originates from uncorrect lock annotation in the mptcp
code and is only visible since commit 2dcb96bacce3 ("net: core: Correct
the sock::sk_lock.owned lockdep annotations"), but is present since
the port-based endpoint support initial implementation.

This patch addresses the issue introducing a nested variant of
lock_sock_fast() and using it in the relevant code path.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-pf: Add XDP support to netdev PF
Geetha sowjanya [Wed, 29 Sep 2021 09:54:56 +0000 (15:24 +0530)]
octeontx2-pf: Add XDP support to netdev PF

Adds XDP_PASS, XDP_TX, XDP_DROP and XDP_REDIRECT support
for netdev PF.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-af: Adjust LA pointer for cpt parse header
Kiran Kumar K [Wed, 29 Sep 2021 05:58:31 +0000 (11:28 +0530)]
octeontx2-af: Adjust LA pointer for cpt parse header

In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the
start of cpt parse header. Since cpt parse header has veriable
length padding, this will be a problem for DMAC extraction. Adding
KPU profile changes to adjust the LA pointer to start at ether header
in case of cpt parse header by
   - Adding ptr advance in pkind 58 to a fixed value 40
   - Adding variable length offset 7 and mask 7 (pad len in
     CPT_PARSE_HDR).
Also added the missing static declaration for npc_set_var_len_offset_pkind
function.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet_sched: Use struct_size() and flex_array_size() helpers
Gustavo A. R. Silva [Tue, 28 Sep 2021 19:31:07 +0000 (14:31 -0500)]
net_sched: Use struct_size() and flex_array_size() helpers

Make use of the struct_size() and flex_array_size() helpers instead of
an open-coded version, in order to avoid any potential type mistakes
or integer overflows that, in the worse scenario, could lead to heap
overflows.

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210928193107.GA262595@embeddedor
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer
Mun Yew Tham [Wed, 29 Sep 2021 00:49:11 +0000 (08:49 +0800)]
MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer

Update Altera Pio Driver maintainer's email from <joyce.ooi@intel.com> to <mun.yew.tham@intel.com>

Signed-off-by: Mun Yew Tham <mun.yew.tham@intel.com>
Acked-by: Joyce Ooi <joyce.ooi@intel.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2 years agoMAINTAINERS: update my email address
Bartosz Golaszewski [Mon, 20 Sep 2021 07:18:37 +0000 (09:18 +0200)]
MAINTAINERS: update my email address

My professional situation changes soon. Update my email address.

Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2 years agogpio: pca953x: do not ignore i2c errors
Andrey Gusakov [Thu, 23 Sep 2021 17:22:16 +0000 (20:22 +0300)]
gpio: pca953x: do not ignore i2c errors

Per gpio_chip interface, error shall be proparated to the caller.

Attempt to silent diagnostics by returning zero (as written in the
comment) is plain wrong, because the zero return can be interpreted by
the caller as the gpio value.

Cc: stable@vger.kernel.org
Signed-off-by: Andrey Gusakov <andrey.gusakov@cogentembedded.com>
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2 years agodevlink: Add missed notifications iterators
Leon Romanovsky [Wed, 29 Sep 2021 14:18:20 +0000 (17:18 +0300)]
devlink: Add missed notifications iterators

The commit mentioned in Fixes line missed a couple of notifications that
were registered before devlink_register() and should be delayed too.

As such, the too early placed WARN_ON() check spotted it.

WARNING: CPU: 1 PID: 6540 at net/core/devlink.c:5158 devlink_nl_region_notify+0x184/0x1e0 net/core/devlink.c:5158
Modules linked in:
CPU: 1 PID: 6540 Comm: syz-executor.0 Not tainted 5.15.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:devlink_nl_region_notify+0x184/0x1e0 net/core/devlink.c:5158
Code: 38 41 b8 c0 0c 00 00 31 d2 48 89 ee 4c 89 e7 e8 72 1a 26 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e e9 01 bd 41 fa
e8 fc bc 41 fa <0f> 0b e9 f7 fe ff ff e8 f0 bc 41 fa 0f 0b eb da 4c 89 e7 e8 c4 18
RSP: 0018:ffffc90002d6f658 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88801f08d580 RSI: ffffffff87344e94 RDI: 0000000000000003
RBP: ffff88801ee42100 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff87344d8a R11: 0000000000000000 R12: ffff88801c1dc000
R13: 0000000000000000 R14: 000000000000002c R15: ffff88801c1dc070
FS:  0000555555e8e400(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dd7c590310 CR3: 0000000069a09000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 devlink_region_create+0x39f/0x4c0 net/core/devlink.c:10327
 nsim_dev_dummy_region_init drivers/net/netdevsim/dev.c:481 [inline]
 nsim_dev_probe+0x5f6/0x1150 drivers/net/netdevsim/dev.c:1479
 call_driver_probe drivers/base/dd.c:517 [inline]
 really_probe+0x245/0xcc0 drivers/base/dd.c:596
 __driver_probe_device+0x338/0x4d0 drivers/base/dd.c:751
 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:781
 __device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:898
 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427
 __device_attach+0x228/0x4a0 drivers/base/dd.c:969
 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487
 device_add+0xc35/0x21b0 drivers/base/core.c:3359
 nsim_bus_dev_new drivers/net/netdevsim/bus.c:435 [inline]
 new_device_store+0x48b/0x770 drivers/net/netdevsim/bus.c:302
 bus_attr_store+0x72/0xa0 drivers/base/bus.c:122
 sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
 kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:2163 [inline]
 new_sync_write+0x429/0x660 fs/read_write.c:507
 vfs_write+0x7cf/0xae0 fs/read_write.c:594
 ksys_write+0x12d/0x250 fs/read_write.c:647
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f328409d3ef
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01
00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
RSP: 002b:00007ffdc6851140 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f328409d3ef
RDX: 0000000000000003 RSI: 00007ffdc6851190 RDI: 0000000000000004
RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffdc68510e0
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3284144971
R13: 00007ffdc6851190 R14: 0000000000000000 R15: 00007ffdc6851860

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/2ed1159291f2a589b013914f2b60d8172fc525c1.1632925030.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Wed, 29 Sep 2021 14:48:00 +0000 (07:48 -0700)]
Merge tag 'sound-5.15-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This became a slightly large collection of changes, partly because
  I've been off in the last weeks. Most of changes are small and
  scattered while a bit big change is found in HD-audio Realtek codec
  driver; it's a very device-specific fix that has been long wanted, so
  I decided to pick up although it's in the middle RC.

  Some highlights:

   - A new guard ioctl for ALSA rawmidi API to avoid the misuse of the
     new timestamp framing mode; it's for a regression fix

   - HD-audio: a revert of the 5.15 change that might work badly, new
     quirks for Lenovo Legion & co, a follow-up fix for CS8409

   - ASoC: lots of SOF-related fixes, fsl component fixes, corrections
     of mediatek drivers

   - USB-audio: fix for the PM resume

   - FireWire: oxfw and motu fixes"

* tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
  ALSA: pcsp: Make hrtimer forwarding more robust
  ALSA: rawmidi: introduce SNDRV_RAWMIDI_IOCTL_USER_PVERSION
  ALSA: firewire-motu: fix truncated bytes in message tracepoints
  ASoC: SOF: trace: Omit error print when waking up trace sleepers
  ASoC: mediatek: mt8195: remove wrong fixup assignment on HDMITX
  ASoC: SOF: loader: Re-phrase the missing firmware error to avoid duplication
  ASoC: SOF: loader: release_firmware() on load failure to avoid batching
  ALSA: hda/cs8409: Setup Dolphin Headset Mic as Phantom Jack
  ALSA: pcxhr: "fix" PCXHR_REG_TO_PORT definition
  ASoC: SOF: imx: imx8m: Bar index is only valid for IRAM and SRAM types
  ASoC: SOF: imx: imx8: Bar index is only valid for IRAM and SRAM types
  ASoC: SOF: Fix DSP oops stack dump output contents
  ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i 15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops.
  ALSA: usb-audio: Unify mixer resume and reset_resume procedure
  Revert "ALSA: hda: Drop workaround for a hang at shutdown again"
  ALSA: oxfw: fix transmission method for Loud models based on OXFW971
  ASoC: mediatek: common: handle NULL case in suspend/resume function
  ASoC: fsl_xcvr: register platform component before registering cpu dai
  ASoC: fsl_spdif: register platform component before registering cpu dai
  ASoC: fsl_micfil: register platform component before registering cpu dai
  ...

2 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 29 Sep 2021 14:37:46 +0000 (07:37 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This contains fixes for a resource leak in ccp as well as stack
  corruption in x86/sm4"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/sm4 - Fix frame pointer stack corruption
  crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()

2 years agogve: Use kvcalloc() instead of kvzalloc()
Gustavo A. R. Silva [Tue, 28 Sep 2021 21:38:05 +0000 (16:38 -0500)]
gve: Use kvcalloc() instead of kvzalloc()

Use 2-factor argument form kvcalloc() instead of kvzalloc().

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/ipv4/datagram.c: remove superfluous header files from datagram.c
Mianhan Liu [Wed, 29 Sep 2021 05:31:09 +0000 (13:31 +0800)]
net/ipv4/datagram.c: remove superfluous header files from datagram.c

datagram.c hasn't use any macro or function declared in linux/ip.h.
Thus, these files can be removed from datagram.c safely without
affecting the compilation of the net/ipv4 module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/dsa/tag_ksz.c: remove superfluous headers
Mianhan Liu [Wed, 29 Sep 2021 06:41:06 +0000 (14:41 +0800)]
net/dsa/tag_ksz.c: remove superfluous headers

tag_ksz.c hasn't use any macro or function declared in linux/slab.h.
Thus, these files can be removed from tag_ksz.c safely without
affecting the compilation of the ./net/dsa module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/dsa/tag_8021q.c: remove superfluous headers
Mianhan Liu [Wed, 29 Sep 2021 06:36:11 +0000 (14:36 +0800)]
net/dsa/tag_8021q.c: remove superfluous headers

tag_8021q.c hasn't use any macro or function declared in linux/if_bridge.h.
Thus, these files can be removed from tag_8021q.c safely without
affecting the compilation of the ./net/dsa module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: bcm7xxx: Fixed indirect MMD operations
Florian Fainelli [Tue, 28 Sep 2021 20:32:33 +0000 (13:32 -0700)]
net: phy: bcm7xxx: Fixed indirect MMD operations

When EEE support was added to the 28nm EPHY it was assumed that it would
be able to support the standard clause 45 over clause 22 register access
method. It turns out that the PHY does not support that, which is the
very reason for using the indirect shadow mode 2 bank 3 access method.

Implement {read,write}_mmd to allow the standard PHY library routines
pertaining to EEE querying and configuration to work correctly on these
PHYs. This forces us to implement a __phy_set_clr_bits() function that
does not grab the MDIO bus lock since the PHY driver's {read,write}_mmd
functions are always called with that lock held.

Fixes: 83ee102a6998 ("net: phy: bcm7xxx: add support for 28nm EPHY")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/mlx4: Use array_size() helper in copy_to_user()
Gustavo A. R. Silva [Tue, 28 Sep 2021 20:17:33 +0000 (15:17 -0500)]
net/mlx4: Use array_size() helper in copy_to_user()

Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bridge: Use array_size() helper in copy_to_user()
Gustavo A. R. Silva [Tue, 28 Sep 2021 20:12:39 +0000 (15:12 -0500)]
net: bridge: Use array_size() helper in copy_to_user()

Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoethtool: ioctl: Use array_size() helper in copy_{from,to}_user()
Gustavo A. R. Silva [Tue, 28 Sep 2021 19:57:35 +0000 (14:57 -0500)]
ethtool: ioctl: Use array_size() helper in copy_{from,to}_user()

Use array_size() helper instead of the open-coded version in
copy_{from,to}_user().  These sorts of multiplication factors
need to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'hns3-fixes'
David S. Miller [Wed, 29 Sep 2021 10:03:54 +0000 (11:03 +0100)]
Merge branch 'hns3-fixes'

Guangbin Huang says:

====================
net: hns3: add some fixes for -net

This series adds some fixes for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: disable firmware compatible features when uninstall PF
Guangbin Huang [Wed, 29 Sep 2021 09:35:56 +0000 (17:35 +0800)]
net: hns3: disable firmware compatible features when uninstall PF

Currently, the firmware compatible features are enabled in PF driver
initialization process, but they are not disabled in PF driver
deinitialization process and firmware keeps these features in enabled
status.

In this case, if load an old PF driver (for example, in VM) which not
support the firmware compatible features, firmware will still send mailbox
message to PF when link status changed and PF will print
"un-supported mailbox message, code = 201".

To fix this problem, disable these firmware compatible features in PF
driver deinitialization process.

Fixes: ed8fb4b262ae ("net: hns3: add link change event report")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix always enable rx vlan filter problem after selftest
Guangbin Huang [Wed, 29 Sep 2021 09:35:55 +0000 (17:35 +0800)]
net: hns3: fix always enable rx vlan filter problem after selftest

Currently, the rx vlan filter will always be disabled before selftest and
be enabled after selftest as the rx vlan filter feature is fixed on in
old device earlier than V3.

However, this feature is not fixed in some new devices and it can be
disabled by user. In this case, it is wrong if rx vlan filter is enabled
after selftest. So fix it.

Fixes: bcc26e8dc432 ("net: hns3: remove unused code in hns3_self_test()")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: PF enable promisc for VF when mac table is overflow
Guangbin Huang [Wed, 29 Sep 2021 09:35:54 +0000 (17:35 +0800)]
net: hns3: PF enable promisc for VF when mac table is overflow

If unicast mac address table is full, and user add a new mac address, the
unicast promisc needs to be enabled for the new unicast mac address can be
used. So does the multicast promisc.

Now this feature has been implemented for PF, and VF should be implemented
too. When the mac table of VF is overflow, PF will enable promisc for this
VF.

Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix show wrong state when add existing uc mac address
Jian Shen [Wed, 29 Sep 2021 09:35:53 +0000 (17:35 +0800)]
net: hns3: fix show wrong state when add existing uc mac address

Currently, if function adds an existing unicast mac address, eventhough
driver will not add this address into hardware, but it will return 0 in
function hclge_add_uc_addr_common(). It will cause the state of this
unicast mac address is ACTIVE in driver, but it should be in TO-ADD state.

To fix this problem, function hclge_add_uc_addr_common() returns -EEXIST
if mac address is existing, and delete two error log to avoid printing
them all the time after this modification.

Fixes: 72110b567479 ("net: hns3: return 0 and print warning when hit duplicate MAC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
Jian Shen [Wed, 29 Sep 2021 09:35:52 +0000 (17:35 +0800)]
net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE

HCLGE_FLAG_MQPRIO_ENABLE is supposed to set when enable
multiple TCs with tc mqprio, and HCLGE_FLAG_DCB_ENABLE is
supposed to set when enable multiple TCs with ets. But
the driver mixed the flags when updating the tm configuration.

Furtherly, PFC should be available when HCLGE_FLAG_MQPRIO_ENABLE
too, so remove the unnecessary limitation.

Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: don't rollback when destroy mqprio fail
Jian Shen [Wed, 29 Sep 2021 09:35:51 +0000 (17:35 +0800)]
net: hns3: don't rollback when destroy mqprio fail

For destroy mqprio is irreversible in stack, so it's unnecessary
to rollback the tc configuration when destroy mqprio failed.
Otherwise, it may cause the configuration being inconsistent
between driver and netstack.

As the failure is usually caused by reset, and the driver will
restore the configuration after reset, so it can keep the
configuration being consistent between driver and hardware.

Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: remove tc enable checking
Jian Shen [Wed, 29 Sep 2021 09:35:50 +0000 (17:35 +0800)]
net: hns3: remove tc enable checking

Currently, in function hns3_nic_set_real_num_queue(), the
driver doesn't report the queue count and offset for disabled
tc. If user enables multiple TCs, but only maps user
priorities to partial of them, it may cause the queue range
of the unmapped TC being displayed abnormally.

Fix it by removing the tc enable checking, ensure the queue
count is not zero.

With this change, the tc_en is useless now, so remove it.

Fixes: a75a8efa00c5 ("net: hns3: Fix tc setup when netdev is first up")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: do not allow call hns3_nic_net_open repeatedly
Jian Shen [Wed, 29 Sep 2021 09:35:49 +0000 (17:35 +0800)]
net: hns3: do not allow call hns3_nic_net_open repeatedly

hns3_nic_net_open() is not allowed to called repeatly, but there
is no checking for this. When doing device reset and setup tc
concurrently, there is a small oppotunity to call hns3_nic_net_open
repeatedly, and cause kernel bug by calling napi_enable twice.

The calltrace information is like below:
[ 3078.222780] ------------[ cut here ]------------
[ 3078.230255] kernel BUG at net/core/dev.c:6991!
[ 3078.236224] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 3078.243431] Modules linked in: hns3 hclgevf hclge hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
[ 3078.258880] CPU: 0 PID: 295 Comm: kworker/u8:5 Tainted: G           O      5.14.0-rc4+ #1
[ 3078.269102] Hardware name:  , BIOS KpxxxFPGA 1P B600 V181 08/12/2021
[ 3078.276801] Workqueue: hclge hclge_service_task [hclge]
[ 3078.288774] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 3078.296168] pc : napi_enable+0x80/0x84
tc qdisc sho[w  3d0e7v8 .e3t0h218 79] lr : hns3_nic_net_open+0x138/0x510 [hns3]

[ 3078.314771] sp : ffff8000108abb20
[ 3078.319099] x29: ffff8000108abb20 x28: 0000000000000000 x27: ffff0820a8490300
[ 3078.329121] x26: 0000000000000001 x25: ffff08209cfc6200 x24: 0000000000000000
[ 3078.339044] x23: ffff0820a8490300 x22: ffff08209cd76000 x21: ffff0820abfe3880
[ 3078.349018] x20: 0000000000000000 x19: ffff08209cd76900 x18: 0000000000000000
[ 3078.358620] x17: 0000000000000000 x16: ffffc816e1727a50 x15: 0000ffff8f4ff930
[ 3078.368895] x14: 0000000000000000 x13: 0000000000000000 x12: 0000259e9dbeb6b4
[ 3078.377987] x11: 0096a8f7e764eb40 x10: 634615ad28d3eab5 x9 : ffffc816ad8885b8
[ 3078.387091] x8 : ffff08209cfc6fb8 x7 : ffff0820ac0da058 x6 : ffff0820a8490344
[ 3078.396356] x5 : 0000000000000140 x4 : 0000000000000003 x3 : ffff08209cd76938
[ 3078.405365] x2 : 0000000000000000 x1 : 0000000000000010 x0 : ffff0820abfe38a0
[ 3078.414657] Call trace:
[ 3078.418517]  napi_enable+0x80/0x84
[ 3078.424626]  hns3_reset_notify_up_enet+0x78/0xd0 [hns3]
[ 3078.433469]  hns3_reset_notify+0x64/0x80 [hns3]
[ 3078.441430]  hclge_notify_client+0x68/0xb0 [hclge]
[ 3078.450511]  hclge_reset_rebuild+0x524/0x884 [hclge]
[ 3078.458879]  hclge_reset_service_task+0x3c4/0x680 [hclge]
[ 3078.467470]  hclge_service_task+0xb0/0xb54 [hclge]
[ 3078.475675]  process_one_work+0x1dc/0x48c
[ 3078.481888]  worker_thread+0x15c/0x464
[ 3078.487104]  kthread+0x160/0x170
[ 3078.492479]  ret_from_fork+0x10/0x18
[ 3078.498785] Code: c8027c81 35ffffa2 d50323bf d65f03c0 (d4210000)
[ 3078.506889] ---[ end trace 8ebe0340a1b0fb44 ]---

Once hns3_nic_net_open() is excute success, the flag
HNS3_NIC_STATE_DOWN will be cleared. So add checking for this
flag, directly return when HNS3_NIC_STATE_DOWN is no set.

Fixes: e888402789b9 ("net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mctp-core-updates'
David S. Miller [Wed, 29 Sep 2021 10:00:12 +0000 (11:00 +0100)]
Merge branch 'mctp-core-updates'

Matt Johnston says:

====================
Updates to MCTP core

This series adds timeouts for MCTP tags (a limited resource), and a few
other improvements to the MCTP core.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Warn if pointer is set for a wrong dev type
Matt Johnston [Wed, 29 Sep 2021 07:26:14 +0000 (15:26 +0800)]
mctp: Warn if pointer is set for a wrong dev type

Should not occur but is a sanity check.

May help tracking down Trinity reported issue
https://lore.kernel.org/lkml/20210913030701.GA5926@xsang-OptiPlex-9020/

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Set route MTU via netlink
Matt Johnston [Wed, 29 Sep 2021 07:26:13 +0000 (15:26 +0800)]
mctp: Set route MTU via netlink

A route's RTAX_MTU can be set in nested RTAX_METRICS

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodoc/mctp: Add a little detail about kernel internals
Jeremy Kerr [Wed, 29 Sep 2021 07:26:12 +0000 (15:26 +0800)]
doc/mctp: Add a little detail about kernel internals

Describe common flows and refcounting behaviour.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Do inits as a subsys_initcall
Jeremy Kerr [Wed, 29 Sep 2021 07:26:11 +0000 (15:26 +0800)]
mctp: Do inits as a subsys_initcall

In a future change, we'll want to provide a registration call for
mctp-specific devices. This requires us to have the networks established
before device driver inits, so run the core init as a subsys_initcall.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add tracepoints for tag/key handling
Jeremy Kerr [Wed, 29 Sep 2021 07:26:10 +0000 (15:26 +0800)]
mctp: Add tracepoints for tag/key handling

The tag allocation, release and bind events are somewhat opaque outside
the kernel; this change adds a few tracepoints to assist in
instrumentation and debugging.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Implement a timeout for tags
Jeremy Kerr [Wed, 29 Sep 2021 07:26:09 +0000 (15:26 +0800)]
mctp: Implement a timeout for tags

Currently, a MCTP (local-eid,remote-eid,tag) tuple is allocated to a
socket on send, and only expires when the socket is closed.

This change introduces a tag timeout, freeing the tuple after a fixed
expiry - currently six seconds. This is greater than (but close to) the
max response timeout in upper-layer bindings.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Add refcounts to mctp_dev
Jeremy Kerr [Wed, 29 Sep 2021 07:26:08 +0000 (15:26 +0800)]
mctp: Add refcounts to mctp_dev

Currently, we tie the struct mctp_dev lifetime to the underlying struct
net_device, and hold/put that device as a proxy for a separate mctp_dev
refcount. This works because we're not holding any references to the
mctp_dev that are different from the netdev lifetime.

In a future change we'll break that assumption though, as we'll need to
hold mctp_dev references in a workqueue, which might live past the
netdev unregister notification.

In order to support that, this change introduces a refcount on the
mctp_dev, currently taken by the net_device->mctp_ptr reference, and
released on netdev unregister events. We can then use this for future
references that might outlast the net device.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: locking, lifetime and validity changes for sk_keys
Jeremy Kerr [Wed, 29 Sep 2021 07:26:07 +0000 (15:26 +0800)]
mctp: locking, lifetime and validity changes for sk_keys

We will want to invalidate sk_keys in a future change, which will
require a boolean flag to mark invalidated items in the socket & net
namespace lists. We'll also need to take a reference to keys, held over
non-atomic contexts, so we need a refcount on keys also.

This change adds a validity flag (currently always true) and refcount to
struct mctp_sk_key.  With a refcount on the keys, using RCU no longer
makes much sense; we have exact indications on the lifetime of keys. So,
we also change the RCU list traversal to a locked implementation.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Allow local delivery to the null EID
Jeremy Kerr [Wed, 29 Sep 2021 07:26:06 +0000 (15:26 +0800)]
mctp: Allow local delivery to the null EID

We may need to receive packets addressed to the null EID (==0), but
addressed to us at the physical layer.

This change adds a lookup for local routes when we see a packet
addressed to EID 0, and a local phys address.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomctp: Allow MCTP on tun devices
Matt Johnston [Wed, 29 Sep 2021 07:26:05 +0000 (15:26 +0800)]
mctp: Allow MCTP on tun devices

Allowing TUN is useful for testing, to route packets to userspace or to
tunnel between machines.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: micrel: Add support for LAN8804 PHY
Horatiu Vultur [Tue, 28 Sep 2021 18:45:19 +0000 (20:45 +0200)]
net: phy: micrel: Add support for LAN8804 PHY

The LAN8804 PHY has same features as that of LAN8814 PHY except that it
doesn't support 1588, SyncE or Q-USGMII.

This PHY is found inside the LAN966X switches.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
Feng Zhou [Tue, 28 Sep 2021 22:23:59 +0000 (15:23 -0700)]
ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup

The ixgbe driver currently generates a NULL pointer dereference with
some machine (online cpus < 63). This is due to the fact that the
maximum value of num_xdp_queues is nr_cpu_ids. Code is in
"ixgbe_set_rss_queues"".

Here's how the problem repeats itself:
Some machine (online cpus < 63), And user set num_queues to 63 through
ethtool. Code is in the "ixgbe_set_channels",
adapter->ring_feature[RING_F_FDIR].limit = count;

It becomes 63.

When user use xdp, "ixgbe_set_rss_queues" will set queues num.
adapter->num_rx_queues = rss_i;
adapter->num_tx_queues = rss_i;
adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);

And rss_i's value is from
f = &adapter->ring_feature[RING_F_FDIR];
rss_i = f->indices = f->limit;

So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup",
for (i = 0; i < adapter->num_rx_queues; i++)
if (adapter->xdp_ring[i]->xsk_umem)

It leads to panic.

Call trace:
[exception RIP: ixgbe_xdp+368]
RIP: ffffffffc02a76a0  RSP: ffff9fe16202f8d0  RFLAGS: 00010297
RAX: 0000000000000000  RBX: 0000000000000020  RCX: 0000000000000000
RDX: 0000000000000000  RSI: 000000000000001c  RDI: ffffffffa94ead90
RBP: ffff92f8f24c0c18   R8: 0000000000000000   R9: 0000000000000000
R10: ffff9fe16202f830  R11: 0000000000000000  R12: ffff92f8f24c0000
R13: ffff9fe16202fc01  R14: 000000000000000a  R15: ffffffffc02a7530
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc
 8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808
 9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235
10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384
11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd
12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb
13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88
14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319
15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290
16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8
17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64
18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9
19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c

So I fix ixgbe_max_channels so that it will not allow a setting of queues
to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup,
take the smaller value of num_rx_queues and num_xdp_queues.

Fixes: 4a9b32f30f80 ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>