OSDN Git Service

tomoyo/tomoyo-test1.git
23 months agonet: prestera: add support for egress traffic mirroring
Serhiy Boiko [Tue, 23 Aug 2022 11:39:57 +0000 (14:39 +0300)]
net: prestera: add support for egress traffic mirroring

This enables adding matchall rules for egress:

  tc filter add .. egress .. matchall skip_sw \
    action mirred egress mirror dev ..

Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
23 months agonet: prestera: acl: extract matchall logic into a separate file
Serhiy Boiko [Tue, 23 Aug 2022 11:39:56 +0000 (14:39 +0300)]
net: prestera: acl: extract matchall logic into a separate file

This commit adds more clarity to handling of TC_CLSMATCHALL_REPLACE and
TC_CLSMATCHALL_DESTROY events by calling newly added *_mall_*() handlers
instead of directly calling SPAN API.

This also extracts matchall rules management out of SPAN API since SPAN
is a hardware module which is used to implement 'matchall egress mirred'
action only.

Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
23 months agonet: asix: ax88772: add ethtool pause configuration
Oleksij Rempel [Mon, 22 Aug 2022 12:39:43 +0000 (14:39 +0200)]
net: asix: ax88772: add ethtool pause configuration

Add phylink based ethtool pause configuration

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
23 months agonet: asix: ax88772: migrate to phylink
Oleksij Rempel [Mon, 22 Aug 2022 12:39:42 +0000 (14:39 +0200)]
net: asix: ax88772: migrate to phylink

There are some exotic ax88772 based devices which may require
functionality provide by the phylink framework. For example:
- US100A20SFP, USB 2.0 auf LWL Converter with SFP Cage
- AX88772B USB to 100Base-TX Ethernet (with RMII) demo board, where it
  is possible to switch between internal PHY and external RMII based
  connection.

So, convert this driver to phylink as soon as possible.

Tested with:
- AX88772A + internal PHY
- AX88772B + external DP83TD510E T1L PHY

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
23 months agodt-bindings: net: Add missing (unevaluated|additional)Properties on child nodes
Rob Herring [Thu, 25 Aug 2022 19:26:07 +0000 (14:26 -0500)]
dt-bindings: net: Add missing (unevaluated|additional)Properties on child nodes

In order to ensure only documented properties are present, node schemas
must have unevaluatedProperties or additionalProperties set to false
(typically). Add missing properties/$refs as exposed by this addition.

Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220825192609.1538463-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 25 Aug 2022 23:07:42 +0000 (16:07 -0700)]
Merge git://git./linux/kernel/git/netdev/net

drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
  21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()")
  c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonetdev: Use try_cmpxchg in napi_if_scheduled_mark_missed
Uros Bizjak [Mon, 22 Aug 2022 14:32:43 +0000 (16:32 +0200)]
netdev: Use try_cmpxchg in napi_if_scheduled_mark_missed

Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
napi_if_scheduled_mark_missed. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).

Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails, enabling further code simplifications.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20220822143243.2798-1-ubizjak@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoMerge tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 25 Aug 2022 21:03:58 +0000 (14:03 -0700)]
Merge tag 'net-6.0-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from ipsec and netfilter (with one broken Fixes tag).

  Current release - new code bugs:

   - dsa: don't dereference NULL extack in dsa_slave_changeupper()

   - dpaa: fix <1G ethernet on LS1046ARDB

   - neigh: don't call kfree_skb() under spin_lock_irqsave()

  Previous releases - regressions:

   - r8152: fix the RX FIFO settings when suspending

   - dsa: microchip: keep compatibility with device tree blobs with no
     phy-mode

   - Revert "net: macsec: update SCI upon MAC address change."

   - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367

  Previous releases - always broken:

   - netfilter: conntrack: work around exceeded TCP receive window

   - ipsec: fix a null pointer dereference of dst->dev on a metadata dst
     in xfrm_lookup_with_ifid

   - moxa: get rid of asymmetry in DMA mapping/unmapping

   - dsa: microchip: make learning configurable and keep it off while
     standalone

   - ice: xsk: prohibit usage of non-balanced queue id

   - rxrpc: fix locking in rxrpc's sendmsg

  Misc:

   - another chunk of sysctl data race silencing"

* tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  net: lantiq_xrx200: restore buffer if memory allocation failed
  net: lantiq_xrx200: fix lock under memory pressure
  net: lantiq_xrx200: confirm skb is allocated before using
  net: stmmac: work around sporadic tx issue on link-up
  ionic: VF initial random MAC address if no assigned mac
  ionic: fix up issues with handling EAGAIN on FW cmds
  ionic: clear broken state on generation change
  rxrpc: Fix locking in rxrpc's sendmsg
  net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
  MAINTAINERS: rectify file entry in BONDING DRIVER
  i40e: Fix incorrect address type for IPv6 flow rules
  ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
  net: Fix a data-race around sysctl_somaxconn.
  net: Fix a data-race around netdev_unregister_timeout_secs.
  net: Fix a data-race around gro_normal_batch.
  net: Fix data-races around sysctl_devconf_inherit_init_net.
  net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
  net: Fix a data-race around netdev_budget_usecs.
  net: Fix data-races around sysctl_max_skb_frags.
  net: Fix a data-race around netdev_budget.
  ...

23 months agoMerge branch 'mlxsw-remove-some-unused-code'
Jakub Kicinski [Thu, 25 Aug 2022 20:24:47 +0000 (13:24 -0700)]
Merge branch 'mlxsw-remove-some-unused-code'

Petr Machata says:

====================
mlxsw: Remove some unused code

This patchset removes code that is not used anymore after the following two
commits removed all users:

- commit b0d80c013b04 ("mlxsw: Remove Mellanox SwitchX-2 ASIC support")
- commit 9b43fbb8ce24 ("mlxsw: Remove Mellanox SwitchIB ASIC support")
====================

Link: https://lore.kernel.org/r/cover.1661350629.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agomlxsw: Remove unused mlxsw_core_port_type_get()
Jiri Pirko [Wed, 24 Aug 2022 15:18:51 +0000 (17:18 +0200)]
mlxsw: Remove unused mlxsw_core_port_type_get()

Function mlxsw_core_port_type_get() is no longer used. So remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agomlxsw: Remove unused port_type_set devlink op
Jiri Pirko [Wed, 24 Aug 2022 15:18:50 +0000 (17:18 +0200)]
mlxsw: Remove unused port_type_set devlink op

port_type_set devlink op is no longer used by any mlxsw driver,
so remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agomlxsw: Remove unused IB stuff
Jiri Pirko [Wed, 24 Aug 2022 15:18:49 +0000 (17:18 +0200)]
mlxsw: Remove unused IB stuff

There are some IB leftovers that are no longer used in the code.
So remove them.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoMerge branch 'net-devlink-sync-flash-and-dev-info-commands'
Jakub Kicinski [Thu, 25 Aug 2022 20:22:58 +0000 (13:22 -0700)]
Merge branch 'net-devlink-sync-flash-and-dev-info-commands'

Jiri Pirko says:

====================
net: devlink: sync flash and dev info commands

Purpose of this patchset is to introduce consistency between two devlink
commands:
  devlink dev info
    Shows versions of running default flash target and components.
  devlink dev flash
    Flashes default flash target or component name (if specified
    on cmdline).

Currently it is up to the driver what versions to expose and what flash
update component names to accept. This is inconsistent. Thankfully, only
netdevsim currently using components so it is still time
to sanitize this.

This patchset makes sure, that devlink.c calls into driver for
component flash update only in case the driver exposes the same version
name.

Example:
$ devlink dev info
netdevsim/netdevsim10:
  driver netdevsim
  versions:
      running:
        fw.mgmt 10.20.30
      stored:
        fw.mgmt 10.20.30
$ devlink dev flash netdevsim/netdevsim10 file somefile.bin
[fw.mgmt] Preparing to flash
[fw.mgmt] Flashing 100%
[fw.mgmt] Flash select
[fw.mgmt] Flashing done
$ devlink dev flash netdevsim/netdevsim10 file somefile.bin component fw.mgmt
[fw.mgmt] Preparing to flash
[fw.mgmt] Flashing 100%
[fw.mgmt] Flash select
[fw.mgmt] Flashing done

$ devlink dev flash netdevsim/netdevsim10 file somefile.bin component dummy
Error: selected component is not supported by this device.
====================

Link: https://lore.kernel.org/r/20220824122011.1204330-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonet: devlink: limit flash component name to match version returned by info_get()
Jiri Pirko [Wed, 24 Aug 2022 12:20:11 +0000 (14:20 +0200)]
net: devlink: limit flash component name to match version returned by info_get()

Limit the acceptance of component name passed to cmd_flash_update() to
match one of the versions returned by info_get(), marked by version type.
This makes things clearer and enforces 1:1 mapping between exposed
version and accepted flash component.

Check VERSION_TYPE_COMPONENT version type during cmd_flash_update()
execution by calling info_get() with different "req" context.
That causes info_get() to lookup the component name instead of
filling-up the netlink message.

Remove "UPDATE_COMPONENT" flag which becomes used.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonetdevsim: add version fw.mgmt info info_get() and mark as a component
Jiri Pirko [Wed, 24 Aug 2022 12:20:10 +0000 (14:20 +0200)]
netdevsim: add version fw.mgmt info info_get() and mark as a component

Fix the only component user which is netdevsim. It uses component named
"fw.mgmt" in selftests. So add this version to info_get() output with
version type component.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonet: devlink: extend info_get() version put to indicate a flash component
Jiri Pirko [Wed, 24 Aug 2022 12:20:09 +0000 (14:20 +0200)]
net: devlink: extend info_get() version put to indicate a flash component

Whenever the driver is called by his info_get() op, it may put multiple
version names and values to the netlink message. Extend by additional
helper devlink_info_version_running/stored_put_ext() that allows to
specify a version type that indicates when particular version name
represents a flash component.

This is going to be used in follow-up patch calling info_get() during
flash update command checking if version with this the version type
exists.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoselftests/net: fix reinitialization of TEST_PROGS in net self tests.
Adel Abouchaev [Wed, 24 Aug 2022 18:43:51 +0000 (11:43 -0700)]
selftests/net: fix reinitialization of TEST_PROGS in net self tests.

Assinging will drop all previous tests.

Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220824184351.3759862-1-adel.abushaev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoMerge branch 'net-lantiq_xrx200-fix-errors-under-memory-pressure'
Jakub Kicinski [Thu, 25 Aug 2022 19:41:41 +0000 (12:41 -0700)]
Merge branch 'net-lantiq_xrx200-fix-errors-under-memory-pressure'

Aleksander Jan Bajkowski says:

====================
net: lantiq_xrx200: fix errors under memory pressure

This series fixes issues that can occur in the driver under memory pressure.
Situations when the system cannot allocate memory are rare, so the mentioned
bugs have been fixed recently. The patches have been tested on a BT Home
router with the Lantiq xRX200 chipset.

Changelog:
  v3: - removed netdev_err() log from the first patch
  v2:
   - the second patch has been changed, so that under memory pressure situation
     the driver will not receive packets indefinitely regardless of the NAPI budget,
   - the third patch has been added.
====================

Link: https://lore.kernel.org/r/20220824215408.4695-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonet: lantiq_xrx200: restore buffer if memory allocation failed
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:08 +0000 (23:54 +0200)]
net: lantiq_xrx200: restore buffer if memory allocation failed

In a situation where memory allocation fails, an invalid buffer address
is stored. When this descriptor is used again, the system panics in the
build_skb() function when accessing memory.

Fixes: 7ea6cd16f159 ("lantiq: net: fix duplicated skb in rx descriptor ring")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonet: lantiq_xrx200: fix lock under memory pressure
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:07 +0000 (23:54 +0200)]
net: lantiq_xrx200: fix lock under memory pressure

When the xrx200_hw_receive() function returns -ENOMEM, the NAPI poll
function immediately returns an error.
This is incorrect for two reasons:
* the function terminates without enabling interrupts or scheduling NAPI,
* the error code (-ENOMEM) is returned instead of the number of received
packets.

After the first memory allocation failure occurs, packet reception is
locked due to disabled interrupts from DMA..

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonet: lantiq_xrx200: confirm skb is allocated before using
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:06 +0000 (23:54 +0200)]
net: lantiq_xrx200: confirm skb is allocated before using

xrx200_hw_receive() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.

Add a check in case build_skb() failed to allocate and return NULL.

Fixes: e015593573b3 ("net: lantiq_xrx200: convert to build_skb")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agonet: stmmac: work around sporadic tx issue on link-up
Heiner Kallweit [Wed, 24 Aug 2022 20:34:49 +0000 (22:34 +0200)]
net: stmmac: work around sporadic tx issue on link-up

This is a follow-up to the discussion in [0]. It seems to me that
at least the IP version used on Amlogic SoC's sometimes has a problem
if register MAC_CTRL_REG is written whilst the chip is still processing
a previous write. But that's just a guess.
Adding a delay between two writes to this register helps, but we can
also simply omit the offending second write. This patch uses the second
approach and is based on a suggestion from Qi Duan.
Benefit of this approach is that we can save few register writes, also
on not affected chip versions.

[0] https://www.spinics.net/lists/netdev/msg831526.html

Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/e99857ce-bd90-5093-ca8c-8cd480b5a0a2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Thu, 25 Aug 2022 19:40:29 +0000 (12:40 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-24 (ixgbe, i40e)

This series contains updates to ixgbe and i40e drivers.

Jake stops incorrect resetting of SYSTIME registers when starting
cyclecounter for ixgbe.

Sylwester corrects a check on source IP address when validating destination
for i40e.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  i40e: Fix incorrect address type for IPv6 flow rules
  ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
====================

Link: https://lore.kernel.org/r/20220824193748.874343-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoMerge branch 'ionic-bug-fixes'
Jakub Kicinski [Thu, 25 Aug 2022 19:40:17 +0000 (12:40 -0700)]
Merge branch 'ionic-bug-fixes'

Shannon Nelson says:

====================
ionic: bug fixes

These are a couple of maintenance bug fixes for the Pensando ionic
networking driver.

Mohamed takes care of a "plays well with others" issue where the
VF spec is a bit vague on VF mac addresses, but certain customers
have come to expect behavior based on other vendor drivers.

Shannon addresses a couple of corner cases seen in internal
stress testing.
====================

Link: https://lore.kernel.org/r/20220824165051.6185-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoionic: VF initial random MAC address if no assigned mac
R Mohamed Shah [Wed, 24 Aug 2022 16:50:51 +0000 (09:50 -0700)]
ionic: VF initial random MAC address if no assigned mac

Assign a random mac address to the VF interface station
address if it boots with a zero mac address in order to match
similar behavior seen in other VF drivers.  Handle the errors
where the older firmware does not allow the VF to set its own
station address.

Newer firmware will allow the VF to set the station mac address
if it hasn't already been set administratively through the PF.
Setting it will also be allowed if the VF has trust.

Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: R Mohamed Shah <mohamed@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoionic: fix up issues with handling EAGAIN on FW cmds
Shannon Nelson [Wed, 24 Aug 2022 16:50:50 +0000 (09:50 -0700)]
ionic: fix up issues with handling EAGAIN on FW cmds

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW.  The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoionic: clear broken state on generation change
Shannon Nelson [Wed, 24 Aug 2022 16:50:49 +0000 (09:50 -0700)]
ionic: clear broken state on generation change

There is a case found in heavy testing where a link flap happens just
before a firmware Recovery event and the driver gets stuck in the
BROKEN state.  This comes from the driver getting interrupted by a FW
generation change when coming back up from the link flap, and the call
to ionic_start_queues() in ionic_link_status_check() fails.  This can be
addressed by having the fw_up code clear the BROKEN bit if seen, rather
than waiting for a user to manually force the interface down and then
back up.

Fixes: 9e8eaf8427b6 ("ionic: stop watchdog when in broken state")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agorxrpc: Fix locking in rxrpc's sendmsg
David Howells [Wed, 24 Aug 2022 16:35:45 +0000 (17:35 +0100)]
rxrpc: Fix locking in rxrpc's sendmsg

Fix three bugs in the rxrpc's sendmsg implementation:

 (1) rxrpc_new_client_call() should release the socket lock when returning
     an error from rxrpc_get_call_slot().

 (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
     held in the event that we're interrupted by a signal whilst waiting
     for tx space on the socket or relocking the call mutex afterwards.

     Fix this by: (a) moving the unlock/lock of the call mutex up to
     rxrpc_send_data() such that the lock is not held around all of
     rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
     whether we're return with the lock dropped.  Note that this means
     recvmsg() will not block on this call whilst we're waiting.

 (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
     to go and recheck the state of the tx_pending buffer and the
     tx_total_len check in case we raced with another sendmsg() on the same
     call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

WARNING: bad unlock balance detected!
5.16.0-rc6-syzkaller #0 Not tainted
-------------------------------------
syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
but there are no more locks to release!

other info that might help us debug this:
no locks held by syz-executor011/3597.
...
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
 __lock_release kernel/locking/lockdep.c:5306 [inline]
 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
cc: Hawkins Jiawei <yin31149@gmail.com>
cc: Khalid Masum <khalid.masum.92@gmail.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
23 months agoMerge tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 25 Aug 2022 17:52:16 +0000 (10:52 -0700)]
Merge tag 'cgroup-for-6.0-rc2-fixes-2' of git://git./linux/kernel/git/tj/cgroup

Pull another cgroup fix from Tejun Heo:
 "Commit 4f7e7236435c ("cgroup: Fix threadgroup_rwsem <->
  cpus_read_lock() deadlock") required the cgroup
  core to grab cpus_read_lock() before invoking ->attach().

  Unfortunately, it missed adding cpus_read_lock() in
  cgroup_attach_task_all(). Fix it"

* tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

23 months agocgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
Tetsuo Handa [Thu, 25 Aug 2022 08:38:38 +0000 (17:38 +0900)]
cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e
Reported-by: syzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo <tj@kernel.org>
23 months agonet: sched: delete duplicate cleanup of backlog and qlen
Zhengchao Shao [Wed, 24 Aug 2022 00:52:31 +0000 (08:52 +0800)]
net: sched: delete duplicate cleanup of backlog and qlen

qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog
_after_ calling qdisc->ops->reset. There is no need to clear them
again in the specific reset function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
23 months agonet: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
Lorenzo Bianconi [Tue, 23 Aug 2022 12:24:07 +0000 (14:24 +0200)]
net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2

Properly report hw rx hash for mt7986 chipset accroding to the new dma
descriptor layout.

Fixes: 197c9e9b17b11 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonfp: flower: support case of match on ct_state(0/0x3f)
Wenjuan Geng [Tue, 23 Aug 2022 09:01:22 +0000 (11:01 +0200)]
nfp: flower: support case of match on ct_state(0/0x3f)

is_post_ct_flow() function will process only ct_state ESTABLISHED,
then offload_pre_check() function will check FLOW_DISSECTOR_KEY_CT flag.
When config tc filter match ct_state(0/0x3f), dissector->used_keys
with FLOW_DISSECTOR_KEY_CT bit, function offload_pre_check() will
return false, so not offload. This is a special case that can be handled
safely.

Therefore, modify to let initial packet which won't go through conntrack
can be offloaded, as long as the cared ct fields are all zero.

Signed-off-by: Wenjuan Geng <wenjuan.geng@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220823090122.403631-1-simon.horman@corigine.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: gro: skb_gro_header helper function
Richard Gobert [Tue, 23 Aug 2022 07:10:49 +0000 (09:10 +0200)]
net: gro: skb_gro_header helper function

Introduce a simple helper function to replace a common pattern.
When accessing the GRO header, we fetch the pointer from frag0,
then test its validity and fetch it from the skb when necessary.

This leads to the pattern
skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow
recurring many times throughout GRO code.

This patch replaces these patterns with a single inlined function
call, improving code readability.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220823071034.GA56142@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoDocumentation: devlink: fix the locking section
Jiri Pirko [Tue, 23 Aug 2022 07:02:13 +0000 (09:02 +0200)]
Documentation: devlink: fix the locking section

As all callbacks are converted now, fix the text reflecting that change.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220823070213.1008956-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'add-a-second-bind-table-hashed-by-port-and-address'
Jakub Kicinski [Thu, 25 Aug 2022 02:30:19 +0000 (19:30 -0700)]
Merge branch 'add-a-second-bind-table-hashed-by-port-and-address'

Joanne Koong says:

====================
Add a second bind table hashed by port and address

Currently, there is one bind hashtable (bhash) that hashes by port only.
This patchset adds a second bind table (bhash2) that hashes by port and
address.

The motivation for adding bhash2 is to expedite bind requests in situations
where the port has many sockets in its bhash table entry (eg a large number
of sockets bound to different addresses on the same port), which makes checking
bind conflicts costly especially given that we acquire the table entry spinlock
while doing so, which can cause softirq cpu lockups and can prevent new tcp
connections.

We ran into this problem at Meta where the traffic team binds a large number
of IPs to port 443 and the bind() call took a significant amount of time
which led to cpu softirq lockups, which caused packet drops and other failures
on the machine.

When experimentally testing this on a local server for ~24k sockets bound to
the port, the results seen were:

ipv4:
before - 0.002317 seconds
with bhash2 - 0.000020 seconds

ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds

The additions to the initial bhash2 submission [0] are:
* Updating bhash2 in the cases where a socket's rcv saddr changes after it has
* been bound
* Adding locks for bhash2 hashbuckets

[0] https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/

v3: https://lore.kernel.org/netdev/20220722195406.1304948-2-joannelkoong@gmail.com/
v2: https://lore.kernel.org/netdev/20220712235310.1935121-1-joannelkoong@gmail.com/
v1: https://lore.kernel.org/netdev/20220623234242.2083895-2-joannelkoong@gmail.com/
====================

Link: https://lore.kernel.org/r/20220822181023.3979645-1-joannelkoong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr
Joanne Koong [Mon, 22 Aug 2022 18:10:23 +0000 (11:10 -0700)]
selftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr

This patch adds 2 new tests: sk_bind_sendto_listen and
sk_connect_zero_addr.

The sk_bind_sendto_listen test exercises the path where a socket's
rcv saddr changes after it has been added to the binding tables,
and then a listen() on the socket is invoked. The listen() should
succeed.

The sk_bind_sendto_listen test is copied over from one of syzbot's
tests: https://syzkaller.appspot.com/x/repro.c?x=1673a38df00000

The sk_connect_zero_addr test exercises the path where the socket was
never previously added to the binding tables and it gets assigned a
saddr upon a connect() to address 0.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/net: Add test for timing a bind request to a port with a populated bhash...
Joanne Koong [Mon, 22 Aug 2022 18:10:22 +0000 (11:10 -0700)]
selftests/net: Add test for timing a bind request to a port with a populated bhash entry

This test populates the bhash table for a given port with
MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long
a bind request on the port takes.

When populating the bhash table, we create the sockets and then bind
the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT
are set). When timing how long a bind on the port takes, we bind on a
different address without SO_REUSEPORT set. We do not set SO_REUSEPORT
because we are interested in the case where the bind request does not
go through the tb->fastreuseport path, which is fragile (eg
tb->fastreuseport path does not work if binding with a different uid).

To run the script:
    Usage: ./bind_bhash.sh [-6 | -4] [-p port] [-a address]
    6: use ipv6
    4: use ipv4
    port: Port number
    address: ip address

Without any arguments, ./bind_bhash.sh defaults to ipv6 using ip address
"2001:0db8:0:f101::1" on port 443.

On my local machine, I see:
ipv4:
before - 0.002317 seconds
with bhash2 - 0.000020 seconds

ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: Add a bhash2 table hashed by port and address
Joanne Koong [Mon, 22 Aug 2022 18:10:21 +0000 (11:10 -0700)]
net: Add a bhash2 table hashed by port and address

The current bind hashtable (bhash) is hashed by port only.
In the socket bind path, we have to check for bind conflicts by
traversing the specified port's inet_bind_bucket while holding the
hashbucket's spinlock (see inet_csk_get_port() and
inet_csk_bind_conflict()). In instances where there are tons of
sockets hashed to the same port at different addresses, the bind
conflict check is time-intensive and can cause softirq cpu lockups,
as well as stops new tcp connections since __inet_inherit_port()
also contests for the spinlock.

This patch adds a second bind table, bhash2, that hashes by
port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6).
Searching the bhash2 table leads to significantly faster conflict
resolution and less time holding the hashbucket spinlock.

Please note a few things:
* There can be the case where the a socket's address changes after it
has been bound. There are two cases where this happens:

  1) The case where there is a bind() call on INADDR_ANY (ipv4) or
  IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will
  assign the socket an address when it handles the connect()

  2) In inet_sk_reselect_saddr(), which is called when rebuilding the
  sk header and a few pre-conditions are met (eg rerouting fails).

In these two cases, we need to update the bhash2 table by removing the
entry for the old address, and add a new entry reflecting the updated
address.

* The bhash2 table must have its own lock, even though concurrent
accesses on the same port are protected by the bhash lock. Bhash2 must
have its own lock to protect against cases where sockets on different
ports hash to different bhash hashbuckets but to the same bhash2
hashbucket.

This brings up a few stipulations:
  1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock
  will always be acquired after the bhash lock and released before the
  bhash lock is released.

  2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always
  acquired+released before another bhash2 lock is acquired+released.

* The bhash table cannot be superseded by the bhash2 table because for
bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket
bound to that port must be checked for a potential conflict. The bhash
table is the only source of port->socket associations.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 25 Aug 2022 02:18:09 +0000 (19:18 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix crash with malformed ebtables blob which do not provide all
   entry points, from Florian Westphal.

2) Fix possible TCP connection clogging up with default 5-days
   timeout in conntrack, from Florian.

3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian.

4) Do not allow to update implicit chains.

5) Make table handle allocation per-netns to fix data race.

6) Do not truncated payload length and offset, and checksum offset.
   Instead report EINVAl.

7) Enable chain stats update via static key iff no error occurs.

8) Restrict osf expression to ip, ip6 and inet families.

9) Restrict tunnel expression to netdev family.

10) Fix crash when trying to bind again an already bound chain.

11) Flowtable garbage collector might leave behind pending work to
    delete entries. This patch comes with a previous preparation patch
    as dependency.

12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
  netfilter: flowtable: fix stuck flows on cleanup due to pending work
  netfilter: flowtable: add function to invoke garbage collection immediately
  netfilter: nf_tables: disallow binding to already bound chain
  netfilter: nft_tunnel: restrict it to netdev family
  netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
  netfilter: nf_tables: do not leave chain stats enabled on error
  netfilter: nft_payload: do not truncate csum_offset and csum_type
  netfilter: nft_payload: report ERANGE for too long offset and length
  netfilter: nf_tables: make table handle allocation per-netns friendly
  netfilter: nf_tables: disallow updates of implicit chain
  netfilter: nft_tproxy: restrict to prerouting hook
  netfilter: conntrack: work around exceeded receive window
  netfilter: ebtables: reject blobs that don't provide all entry points
====================

Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMAINTAINERS: rectify file entry in BONDING DRIVER
Lukas Bulwahn [Wed, 24 Aug 2022 07:29:45 +0000 (09:29 +0200)]
MAINTAINERS: rectify file entry in BONDING DRIVER

Commit c078290a2b76 ("selftests: include bonding tests into the kselftest
infra") adds the bonding tests in the directory:

  tools/testing/selftests/drivers/net/bonding/

The file entry in MAINTAINERS for the BONDING DRIVER however refers to:

  tools/testing/selftests/net/bonding/

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken file pattern.

Repair this file entry in BONDING DRIVER.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220824072945.28606-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetlink: fix some kernel-doc comments
Zhengchao Shao [Wed, 24 Aug 2022 01:36:21 +0000 (09:36 +0800)]
netlink: fix some kernel-doc comments

Modify the comment of input parameter of nlmsg_ and nla_ function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220824013621.365103-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses
Randy Dunlap [Wed, 24 Aug 2022 02:42:16 +0000 (19:42 -0700)]
net: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses

davinci_mdio.c uses mdio bitbang APIs, so it should select
MDIO_BITBANG to prevent build errors.

arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_remove':
drivers/net/ethernet/ti/davinci_mdio.c:649: undefined reference to `free_mdio_bitbang'
arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_probe':
drivers/net/ethernet/ti/davinci_mdio.c:545: undefined reference to `alloc_mdio_bitbang'
arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_read':
drivers/net/ethernet/ti/davinci_mdio.c:236: undefined reference to `mdiobb_read'
arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_write':
drivers/net/ethernet/ti/davinci_mdio.c:253: undefined reference to `mdiobb_write'

Fixes: d04807b80691 ("net: ethernet: ti: davinci_mdio: Add workaround for errata i2329")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ravi Gunasekaran <r-gunasekaran@ti.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com>
Link: https://lore.kernel.org/r/20220824024216.4939-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoDocumentation: sysctl: align cells in second content column
Bagas Sanjaya [Wed, 24 Aug 2022 03:58:04 +0000 (10:58 +0700)]
Documentation: sysctl: align cells in second content column

Stephen Rothwell reported htmldocs warning when merging net-next tree:

Documentation/admin-guide/sysctl/net.rst:37: WARNING: Malformed table.
Text in column margin in table line 4.

========= =================== = ========== ==================
Directory Content               Directory  Content
========= =================== = ========== ==================
802       E802 protocol         mptcp     Multipath TCP
appletalk Appletalk protocol    netfilter Network Filter
ax25      AX25                  netrom     NET/ROM
bridge    Bridging              rose      X.25 PLP layer
core      General parameter     tipc      TIPC
ethernet  Ethernet protocol     unix      Unix domain sockets
ipv4      IP version 4          x25       X.25 protocol
ipv6      IP version 6
========= =================== = ========== ==================

The warning above is caused by cells in second "Content" column of
/proc/sys/net subdirectory table which are in column margin.

Align these cells against the column header to fix the warning.

Link: https://lore.kernel.org/linux-next/20220823134905.57ed08d5@canb.auug.org.au/
Fixes: 1202cdd665315c ("Remove DECnet support from kernel")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220824035804.204322-1-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoi40e: Fix incorrect address type for IPv6 flow rules
Sylwester Dziedziuch [Fri, 19 Aug 2022 10:45:52 +0000 (12:45 +0200)]
i40e: Fix incorrect address type for IPv6 flow rules

It was not possible to create 1-tuple flow director
rule for IPv6 flow type. It was caused by incorrectly
checking for source IP address when validating user provided
destination IP address.

Fix this by changing ip6src to correct ip6dst address
in destination IP address validation for IPv6 flow type.

Fixes: efca91e89b67 ("i40e: Add flow director support for IPv6")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
Jacob Keller [Tue, 2 Aug 2022 00:24:19 +0000 (17:24 -0700)]
ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter

The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
cyclecounter parameters need to be changed.

Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x
devices"), this function has cleared the SYSTIME registers and reset the
TSAUXC DISABLE_SYSTIME bit.

While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
them during ixgbe_ptp_start_cyclecounter. This function may be called
during both reset and link status change. When link changes, the SYSTIME
counter is still operating normally, but the cyclecounter should be updated
to account for the possibly changed parameters.

Clearing SYSTIME when link changes causes the timecounter to jump because
the cycle counter now reads zero.

Extract the SYSTIME initialization out to a new function and call this
during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
an unnecessary reset of the current time.

This also restores the original SYSTIME clearing that occurred during
ixgbe_ptp_reset before the commit above.

Reported-by: Steve Payne <spayne@aurora.tech>
Reported-by: Ilya Evenbach <ievenbach@aurora.tech>
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoMerge tag 'trace-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 24 Aug 2022 17:43:34 +0000 (10:43 -0700)]
Merge tag 'trace-v6.0-rc2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:

 - Fix build warning for when MODULES and FTRACE_WITH_DIRECT_CALLS are
   not set. A warning happens with ops_references_rec() defined but not
   used.

* tag 'trace-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix build warning for ops_references_rec() not used

2 years agoMerge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvar...
Linus Torvalds [Wed, 24 Aug 2022 17:19:20 +0000 (10:19 -0700)]
Merge branch 'dmi-for-linus' of git://git./linux/kernel/git/jdelvare/staging

Pull dmi update from Jean Delvare.

Tiny cleanup.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi: Use the proper accessor for the version field

2 years agoMerge branch 'sysctl-data-races'
David S. Miller [Wed, 24 Aug 2022 12:46:59 +0000 (13:46 +0100)]
Merge branch 'sysctl-data-races'

Kuniyuki Iwashima says:

====================
net: sysctl: Fix data-races around net.core.XXX

This series fixes data-races around all knobs in net_core_table and
netns_core_table except for bpf stuff.

These knobs are skipped:

  - 4 bpf knobs
  - netdev_rss_key: Written only once by net_get_random_once() and
                    read-only knob
  - rps_sock_flow_entries: Protected with sock_flow_mutex
  - flow_limit_cpu_bitmap: Protected with flow_limit_update_mutex
  - flow_limit_table_len: Protected with flow_limit_update_mutex
  - default_qdisc: Protected with qdisc_mod_lock
  - warnings: Unused
  - high_order_alloc_disable: Protected with static_key_mutex
  - skb_defer_max: Already using READ_ONCE()
  - sysctl_txrehash: Already using READ_ONCE()

Note 5th patch fixes net.core.message_cost and net.core.message_burst,
and lib/ratelimit.c does not have an explicit maintainer.

Changes:
  v3:
    * Fix build failures of CONFIG_SYSCTL=n case in 13th & 14th patches

  v2: https://lore.kernel.org/netdev/20220818035227.81567-1-kuniyu@amazon.com/
    * Remove 4 bpf knobs and added 6 knobs

  v1: https://lore.kernel.org/netdev/20220816052347.70042-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around sysctl_somaxconn.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:47:00 +0000 (10:47 -0700)]
net: Fix a data-race around sysctl_somaxconn.

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around netdev_unregister_timeout_secs.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:59 +0000 (10:46 -0700)]
net: Fix a data-race around netdev_unregister_timeout_secs.

While reading netdev_unregister_timeout_secs, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around gro_normal_batch.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:58 +0000 (10:46 -0700)]
net: Fix a data-race around gro_normal_batch.

While reading gro_normal_batch, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around sysctl_devconf_inherit_init_net.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:57 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_devconf_inherit_init_net.

While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:56 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.

While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around netdev_budget_usecs.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:55 +0000 (10:46 -0700)]
net: Fix a data-race around netdev_budget_usecs.

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around sysctl_max_skb_frags.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:54 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_max_skb_frags.

While reading sysctl_max_skb_frags, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around netdev_budget.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:53 +0000 (10:46 -0700)]
net: Fix a data-race around netdev_budget.

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around sysctl_net_busy_read.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:52 +0000 (10:46 -0700)]
net: Fix a data-race around sysctl_net_busy_read.

While reading sysctl_net_busy_read, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around sysctl_net_busy_poll.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:51 +0000 (10:46 -0700)]
net: Fix a data-race around sysctl_net_busy_poll.

While reading sysctl_net_busy_poll, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 060212928670 ("net: add low latency socket poll")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix a data-race around sysctl_tstamp_allow_data.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:50 +0000 (10:46 -0700)]
net: Fix a data-race around sysctl_tstamp_allow_data.

While reading sysctl_tstamp_allow_data, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around sysctl_optmem_max.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:49 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_optmem_max.

While reading sysctl_optmem_max, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoratelimit: Fix data-races in ___ratelimit().
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:48 +0000 (10:46 -0700)]
ratelimit: Fix data-races in ___ratelimit().

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state).  Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around netdev_tstamp_prequeue.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:47 +0000 (10:46 -0700)]
net: Fix data-races around netdev_tstamp_prequeue.

While reading netdev_tstamp_prequeue, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around netdev_max_backlog.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:46 +0000 (10:46 -0700)]
net: Fix data-races around netdev_max_backlog.

While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

While at it, we remove the unnecessary spaces in the doc.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around weight_p and dev_weight_[rt]x_bias.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:45 +0000 (10:46 -0700)]
net: Fix data-races around weight_p and dev_weight_[rt]x_bias.

While reading weight_p, it can be changed concurrently.  Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time.  So, we
need to use READ_ONCE() and WRITE_ONCE() for its access.  Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix data-races around sysctl_[rw]mem_(max|default).
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:44 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_[rw]mem_(max|default).

While reading sysctl_[rw]mem_(max|default), they can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'r8169-next'
David S. Miller [Wed, 24 Aug 2022 12:24:10 +0000 (13:24 +0100)]
Merge branch 'r8169-next'

Heiner Kallweit says:

====================
r8169: remove support for few unused chip versions

There's a number of chip versions that apparently never made it to the
mass market. Detection of these chip versions has been disabled for
few kernel versions now and nobody complained. Therefore remove
support for these chip versions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8169: remove support for chip version 60
Heiner Kallweit [Tue, 23 Aug 2022 18:38:06 +0000 (20:38 +0200)]
r8169: remove support for chip version 60

Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8169: remove support for chip version 50
Heiner Kallweit [Tue, 23 Aug 2022 18:37:30 +0000 (20:37 +0200)]
r8169: remove support for chip version 50

Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

v3:
- rebase patch

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8169: remove support for chip version 49
Heiner Kallweit [Tue, 23 Aug 2022 18:36:07 +0000 (20:36 +0200)]
r8169: remove support for chip version 49

Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

v2:
- fix a typo: RTL_GIGA_MAC_VER_40 -> RTL_GIGA_MAC_VER_50

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8169: remove support for chip versions 45 and 47
Heiner Kallweit [Tue, 23 Aug 2022 18:35:25 +0000 (20:35 +0200)]
r8169: remove support for chip versions 45 and 47

Detection of these chip versions has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8169: remove support for chip version 41
Heiner Kallweit [Tue, 23 Aug 2022 18:34:52 +0000 (20:34 +0200)]
r8169: remove support for chip version 41

Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'mlx5-updates-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 24 Aug 2022 12:19:39 +0000 (13:19 +0100)]
Merge tag 'mlx5-updates-2022-08-22' of git://git./linux/kernel/git/saeed/linux

mlx5-updates-2022-08-22

Roi Dayan Says:
===============
Add support for SF tunnel offload

Mlx5 driver only supports VF tunnel offload.
To add support for SF tunnel offload the driver needs to:
1. Add send-to-vport metadata matching rules like done for VFs.
2. Set an indirect table for SF vport, same as VF vport.

info smaller sub functions for better maintainability.

rules from esw init phase to representor load phase.
SFs could be created after esw initialized and thus the send-to-vport
meta rules would not be created for those SFs.
By moving the creation of the rules to representor load phase
we ensure creating the rules also for SFs created later.

===============

Lama Kayal Says:
================
Make flow steering API loosely coupled from mlx5e_priv, in a manner to
introduce more readable and maintainable modules.

Make TC's private, let mlx5e_flow_steering struct be dynamically allocated,
and introduce its API to maintain the code via setters and getters
instead of publicly exposing it.

Introduce flow steering debug macros to provide an elegant finish to the
decoupled flow steering API, where errors related to flow steering shall
be reported via them.

All flow steering related files will drop any coupling to mlx5e_priv,
instead they will get the relevant members as input. Among these,
fs_tt_redirect, fs_tc, and arfs.
================

2 years agonet/core/skbuff: Check the return value of skb_copy_bits()
lily [Tue, 23 Aug 2022 05:44:11 +0000 (22:44 -0700)]
net/core/skbuff: Check the return value of skb_copy_bits()

skb_copy_bits() could fail, which requires a check on the return
value.

Signed-off-by: Li Zhong <floridsleeves@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomicrel: ksz8851: fixes struct pointer issue
Jerry Ray [Mon, 22 Aug 2022 21:39:32 +0000 (16:39 -0500)]
micrel: ksz8851: fixes struct pointer issue

Issue found during code review. This bug has no impact as long as the
ks8851_net structure is the first element of the ks8851_net_spi structure.
As long as the offset to the ks8851_net struct is zero, the container_of()
macro is subtracting 0 and therefore no damage done. But if the
ks8851_net_spi struct is ever modified such that the ks8851_net struct
within it is no longer the first element of the struct, then the bug would
manifest itself and cause problems.

struct ks8851_net is contained within ks8851_net_spi.
ks is contained within kss.
kss is the priv_data of the netdev structure.

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: annotate data-race around tcp_md5sig_pool_populated
Eric Dumazet [Mon, 22 Aug 2022 21:15:28 +0000 (21:15 +0000)]
tcp: annotate data-race around tcp_md5sig_pool_populated

tcp_md5sig_pool_populated can be read while another thread
changes its value.

The race has no consequence because allocations
are protected with tcp_md5sig_mutex.

This patch adds READ_ONCE() and WRITE_ONCE() to document
the race and silence KCSAN.

Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: marvell: prestera: implement br_port_locked flag offloading
Oleksandr Mazur [Mon, 22 Aug 2022 18:03:15 +0000 (21:03 +0300)]
net: marvell: prestera: implement br_port_locked flag offloading

Both <port> br_port_locked and <lag> interfaces's flag
offloading is supported. No new ABI is being added,
rather existing (port_param_set) API call gets extended.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
V2:
  add missing receipents (linux-kernel, netdev)
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Wed, 24 Aug 2022 11:51:50 +0000 (12:51 +0100)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2022-08-24

1) Fix a refcount leak in __xfrm_policy_check.
   From Xin Xiong.

2) Revert "xfrm: update SA curlft.use_time". This
   violates RFC 2367. From Antony Antony.

3) Fix a comment on XFRMA_LASTUSED.
   From Antony Antony.

4) x->lastused is not cloned in xfrm_do_migrate.
   Fix from Antony Antony.

5) Serialize the calls to xfrm_probe_algs.
   From Herbert Xu.

6) Fix a null pointer dereference of dst->dev on a metadata
   dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agofec: Restart PPS after link state change
Csókás Bence [Mon, 22 Aug 2022 08:10:52 +0000 (10:10 +0200)]
fec: Restart PPS after link state change

On link state change, the controller gets reset,
causing PPS to drop out and the PHC to lose its
time and calibration. So we restart it if needed,
restoring calibration and time registers.

Changes since v2:
* Add `fec_ptp_save_state()`/`fec_ptp_restore_state()`
* Use `ktime_get_real_ns()`
* Use `BIT()` macro
Changes since v1:
* More ECR #define's
* Stop PPS in `fec_ptp_stop()`

Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'j7200-support'
David S. Miller [Wed, 24 Aug 2022 08:52:04 +0000 (09:52 +0100)]
Merge branch 'j7200-support'

Siddharth Vadapalli says:

====================
J7200: CPSW5G: Add support for QSGMII mode to am65-cpsw driver

Add support for QSGMII mode to am65-cpsw driver.

Change log:

v4-> v5:
1. Move ti,j7200-cpswxg-nuss compatible to the line above the
   ti,j721e-cpsw-nuss compatible.
2. Add allOf and move if-then statements within it to allow future if-then
   statements to be added easily.

v3 -> v4:
1. Update bindings to disallow ports based on compatible, instead of
   adding a new if/then statement for the new compatible.
2. Add Else-If condition for RMII mode in the set of supported interfaces.
   Support for RMII mode is already present in the driver and I had
   missed out adding a condition for RMII mode in the previous patches.

v2 -> v3:
1. In ti,k3-am654-cpsw-nuss.yaml, restrict if/then statement to port
   nodes.

v1 -> v2:
1. Add new compatible for CPSW5G in ti,k3-am654-cpsw-nuss.yaml and extend
   properties for new compatible.
2. Add extra_modes member to struct am65_cpsw_pdata to be used for QSGMII
   mode by new compatible.
3. Add check for phylink supported modes to ensure that only one phy mode
   is advertised as supported.
4. Check if extra_modes supports QSGMII mode in am65_cpsw_nuss_mac_config()
   for register write.
5. Add check for assigning port->sgmii_base only when extra_modes is valid.

v4: https://lore.kernel.org/r/20220816060139.111934-1-s-vadapalli@ti.com/
v3: https://lore.kernel.org/r/20220606110443.30362-1-s-vadapalli@ti.com/
v2: https://lore.kernel.org/r/20220602114558.6204-1-s-vadapalli@ti.com/
v1: https://lore.kernel.org/r/20220531113058.23708-1-s-vadapalli@ti.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: Move phy_set_mode_ext() to correct location
Siddharth Vadapalli [Mon, 22 Aug 2022 07:01:25 +0000 (12:31 +0530)]
net: ethernet: ti: am65-cpsw: Move phy_set_mode_ext() to correct location

In TI's J7200 SoC CPSW5G ports, each of the 4 ports can be configured
as a QSGMII main or QSGMII-SUB port. This configuration is performed
by phy-gmii-sel driver on invoking the phy_set_mode_ext() function.

It is necessary for the QSGMII main port to be configured before any of
the QSGMII-SUB interfaces are brought up. Currently, the QSGMII-SUB
interfaces come up before the QSGMII main port is configured.

Fix this by moving the call to phy_set_mode_ext() from
am65_cpsw_nuss_ndo_slave_open() to am65_cpsw_nuss_init_slave_ports(),
thereby ensuring that the QSGMII main port is configured before any of
the QSGMII-SUB ports are brought up.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: Add support for J7200 CPSW5G
Siddharth Vadapalli [Mon, 22 Aug 2022 07:01:24 +0000 (12:31 +0530)]
net: ethernet: ti: am65-cpsw: Add support for J7200 CPSW5G

CPSW5G in J7200 supports additional modes like QSGMII and SGMII.
Add new compatible for J7200 and enable QSGMII mode in am65-cpsw driver.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: ti: k3-am654-cpsw-nuss: Update bindings for J7200 CPSW5G
Siddharth Vadapalli [Mon, 22 Aug 2022 07:01:23 +0000 (12:31 +0530)]
dt-bindings: net: ti: k3-am654-cpsw-nuss: Update bindings for J7200 CPSW5G

Update bindings for TI K3 J7200 SoC which contains 5 ports (4 external
ports) CPSW5G module and add compatible for it.

Changes made:
    - Add new compatible ti,j7200-cpswxg-nuss for CPSW5G.
    - Extend pattern properties for new compatible.
    - Change maximum number of CPSW ports to 4 for new compatible.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: neigh: don't call kfree_skb() under spin_lock_irqsave()
Yang Yingliang [Mon, 22 Aug 2022 02:53:46 +0000 (10:53 +0800)]
net: neigh: don't call kfree_skb() under spin_lock_irqsave()

It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So add all skb to
a tmp list, then free them after spin_unlock_irqrestore() at
once.

Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop")
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: skb: prevent the split of kfree_skb_reason() by gcc
Menglong Dong [Sun, 21 Aug 2022 05:18:58 +0000 (13:18 +0800)]
net: skb: prevent the split of kfree_skb_reason() by gcc

Sometimes, gcc will optimize the function by spliting it to two or
more functions. In this case, kfree_skb_reason() is splited to
kfree_skb_reason and kfree_skb_reason.part.0. However, the
function/tracepoint trace_kfree_skb() in it needs the return address
of kfree_skb_reason().

This split makes the call chains becomes:
  kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb()

which makes the return address that passed to trace_kfree_skb() be
kfree_skb().

Therefore, introduce '__fix_address', which is the combination of
'__noclone' and 'noinline', and apply it to kfree_skb_reason() to
prevent to from being splited or made inline.

(Is it better to simply apply '__noclone oninline' to kfree_skb_reason?
I'm thinking maybe other functions have the same problems)

Meanwhile, wrap 'skb_unref()' with 'unlikely()', as the compiler thinks
it is likely return true and splits kfree_skb_reason().

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonetfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
Eric Dumazet [Tue, 23 Aug 2022 23:38:48 +0000 (16:38 -0700)]
netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases

Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered.

I found this issue while investigating a probable kernel issue
causing flakes in tools/testing/selftests/net/ip_defrag.sh

In particular, these sysctl changes were ignored:
ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1
ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000  >/dev/null 2>&1

This change is inline with commit 836196239298 ("net/ipfrag: let ip[6]frag_high_thresh
in ns be higher than in init_net")

Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: flowtable: fix stuck flows on cleanup due to pending work
Pablo Neira Ayuso [Thu, 18 Nov 2021 21:24:15 +0000 (22:24 +0100)]
netfilter: flowtable: fix stuck flows on cleanup due to pending work

To clear the flow table on flow table free, the following sequence
normally happens in order:

  1) gc_step work is stopped to disable any further stats/del requests.
  2) All flow table entries are set to teardown state.
  3) Run gc_step which will queue HW del work for each flow table entry.
  4) Waiting for the above del work to finish (flush).
  5) Run gc_step again, deleting all entries from the flow table.
  6) Flow table is freed.

But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.

To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.

Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214]  dump_stack+0xbb/0x107
[47773.890818]  print_address_description.constprop.0+0x18/0x140
[47773.892990]  kasan_report.cold+0x7c/0xd8
[47773.894459]  kasan_check_range+0x145/0x1a0
[47773.895174]  down_read+0x99/0x460
[47773.899706]  nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137]  flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372]  process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031]  kasan_save_stack+0x1b/0x40
[47773.922730]  __kasan_kmalloc+0x7a/0x90
[47773.923411]  tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363]  tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207]  tcf_action_init_1+0x45b/0x700
[47773.925987]  tcf_action_init+0x453/0x6b0
[47773.926692]  tcf_exts_validate+0x3d0/0x600
[47773.927419]  fl_change+0x757/0x4a51 [cls_flower]
[47773.928227]  tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303]  kasan_save_stack+0x1b/0x40
[47773.938039]  kasan_set_track+0x1c/0x30
[47773.938731]  kasan_set_free_info+0x20/0x30
[47773.939467]  __kasan_slab_free+0xe7/0x120
[47773.940194]  slab_free_freelist_hook+0x86/0x190
[47773.941038]  kfree+0xce/0x3a0
[47773.941644]  tcf_ct_flow_table_cleanup_work

Original patch description and stack trace by Paul Blakey.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <paulb@nvidia.com>
Tested-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: flowtable: add function to invoke garbage collection immediately
Pablo Neira Ayuso [Mon, 22 Aug 2022 21:13:00 +0000 (23:13 +0200)]
netfilter: flowtable: add function to invoke garbage collection immediately

Expose nf_flow_table_gc_run() to force a garbage collector run from the
offload infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nf_tables: disallow binding to already bound chain
Pablo Neira Ayuso [Mon, 22 Aug 2022 09:06:39 +0000 (11:06 +0200)]
netfilter: nf_tables: disallow binding to already bound chain

Update nft_data_init() to report EINVAL if chain is already bound.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Gwangun Jung <exsociety@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nft_tunnel: restrict it to netdev family
Pablo Neira Ayuso [Sun, 21 Aug 2022 14:32:44 +0000 (16:32 +0200)]
netfilter: nft_tunnel: restrict it to netdev family

Only allow to use this expression from NFPROTO_NETDEV family.

Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
Pablo Neira Ayuso [Sun, 21 Aug 2022 14:25:07 +0000 (16:25 +0200)]
netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families

As it was originally intended, restrict extension to supported families.

Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nf_tables: do not leave chain stats enabled on error
Pablo Neira Ayuso [Sun, 21 Aug 2022 10:41:33 +0000 (12:41 +0200)]
netfilter: nf_tables: do not leave chain stats enabled on error

Error might occur later in the nf_tables_addchain() codepath, enable
static key only after transaction has been created.

Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nft_payload: do not truncate csum_offset and csum_type
Pablo Neira Ayuso [Sun, 21 Aug 2022 09:55:19 +0000 (11:55 +0200)]
netfilter: nft_payload: do not truncate csum_offset and csum_type

Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type
is not support.

Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nft_payload: report ERANGE for too long offset and length
Pablo Neira Ayuso [Sun, 21 Aug 2022 09:47:04 +0000 (11:47 +0200)]
netfilter: nft_payload: report ERANGE for too long offset and length

Instead of offset and length are truncation to u8, report ERANGE.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nf_tables: make table handle allocation per-netns friendly
Pablo Neira Ayuso [Sun, 21 Aug 2022 08:52:48 +0000 (10:52 +0200)]
netfilter: nf_tables: make table handle allocation per-netns friendly

mutex is per-netns, move table_netns to the pernet area.

*read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
 nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
 nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921

Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: nf_tables: disallow updates of implicit chain
Pablo Neira Ayuso [Sun, 21 Aug 2022 08:28:25 +0000 (10:28 +0200)]
netfilter: nf_tables: disallow updates of implicit chain

Updates on existing implicit chain make no sense, disallow this.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agoMerge tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 24 Aug 2022 02:33:28 +0000 (19:33 -0700)]
Merge tag 'cgroup-for-6.0-rc2-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - The psi data structure was changed to be allocated dynamically but
   it wasn't being cleared leading to it reporting garbage values and
   triggering spurious oom kills.

 - A deadlock involving cpuset and cpu hotplug.

 - When a controller is moved across cgroup hierarchies,
   css->rstat_css_node didn't get RCU drained properly from the previous
   list.

* tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Fix race condition at rebind_subsystems()
  cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
  sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS
  sched/psi: Remove unused parameter nbytes of psi_trigger_create()
  sched/psi: Zero the memory of struct psi_group

2 years agoMerge tag 'audit-pr-20220823' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoor...
Linus Torvalds [Wed, 24 Aug 2022 02:26:48 +0000 (19:26 -0700)]
Merge tag 'audit-pr-20220823' of git://git./linux/kernel/git/pcmoore/audit

Pull audit fix from Paul Moore:
 "A single fix for a potential double-free on a fsnotify error path"

* tag 'audit-pr-20220823' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix potential double free on error path from fsnotify_add_inode_mark

2 years agoMerge tag 'fs.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs...
Linus Torvalds [Wed, 24 Aug 2022 02:17:26 +0000 (19:17 -0700)]
Merge tag 'fs.fixes.v6.0-rc3' of git://git./linux/kernel/git/vfs/idmapping

Pull file_remove_privs() fix from Christian Brauner:
 "As part of Stefan's and Jens' work to add async buffered write
  support to xfs we refactored file_remove_privs() and added
  __file_remove_privs() to avoid calling __remove_privs() when
  IOCB_NOWAIT is passed.

  While debugging a recent performance regression report I found that
  during review we missed that commit faf99b563558 ("fs: add
  __remove_file_privs() with flags parameter") accidently changed
  behavior when dentry_needs_remove_privs() returns zero.

  Before the commit it would still call inode_has_no_xattr() setting
  the S_NOSEC bit and thereby avoiding even calling into
  dentry_needs_remove_privs() the next time this function is called.
  After that commit inode_has_no_xattr() would only be called if
  __remove_privs() had to be called.

  Restore the old behavior. This is likely the cause of the performance
  regression"

* tag 'fs.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: __file_remove_privs(): restore call to inode_has_no_xattr()

2 years agoMerge tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Wed, 24 Aug 2022 00:50:26 +0000 (17:50 -0700)]
Merge tag 'mlx5-fixes-2022-08-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-08-22

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Unlock on error in mlx5_sriov_enable()
  net/mlx5e: Fix use after free in mlx5e_fs_init()
  net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup()
  net/mlx5: unlock on error path in esw_vfs_changed_event_handler()
  net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off
  net/mlx5e: TC, Add missing policer validation
  net/mlx5e: Fix wrong application of the LRO state
  net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
  net/mlx5: Fix cmd error logging for manage pages cmd
  net/mlx5: Disable irq when locking lag_lock
  net/mlx5: Eswitch, Fix forwarding decision to uplink
  net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY
  net/mlx5e: Properly disable vlan strip on non-UL reps
====================

Link: https://lore.kernel.org/r/20220822195917.216025-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>