git.osdn.net Git - tomoyo/tomoyo-test1.git/log

tools: net: use python3 explicitly

The scripts require Python 3 and some distros are dropping
Python 2 support.

Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: netlink: add a starting guide for working with specs

We have a bit of documentation about the internals of Netlink
and the specs, but really the goal is for most people to not
worry about those. Add a practical guide for beginners who
want to poke at the specs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: add partial specification for ethtool

Ethtool is one of the most actively developed families.
With the changes to the CLI it should be possible to use
the YNL based code for easy prototyping and development.
Add a partial family definition. I've tested the string
set and rings. I don't have any MAC Merge implementation
to test with, but I added the definition for it, anyway,
because it's last. New commands can simply be added at
the end without having to worry about manually providing
IDs / values.

Set (with notification support - None is the response,
the data is from the notification):

$ sudo ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --do rings-set \
    --json '{"header":{"dev-name":"enp0s31f6"}, "rx":129}' \
    --subscribe monitor
None
[{'msg': {'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
          'rx': 136,
          'rx-max': 4096,
          'tx': 256,
          'tx-max': 4096,
          'tx-push': 0},
  'name': 'rings-ntf'}]

Do / dump (yes, the kernel requires that even for dump and even
if empty - the "header" nest must be there):

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --do rings-get \
    --json '{"header":{"dev-index": 2}}'
{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
'rx': 136,
'rx-max': 4096,
'tx': 256,
'tx-max': 4096,
'tx-push': 0}

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --dump rings-get \
    --json '{"header":{}}'
[{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
  'rx': 136,
  'rx-max': 4096,
  'tx': 256,
  'tx-max': 4096,
  'tx-push': 0},
{'header': {'dev-index': 3, 'dev-name': 'wlp0s20f3'}, 'tx-push': 0},
{'header': {'dev-index': 19, 'dev-name': 'enp58s0u1u1'},
  'rx': 100,
  'rx-max': 4096,
  'tx-push': 0}]

And error reporting:

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --dump rings-get \
    --json '{"header":{"flags":5}}'
Netlink error: Invalid argument
nl_len = 68 (52) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'reserved bit set',
                         'bad-attr-offs': 24,
'bad-attr': '.header.flags'}
None

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: finish up operation enum-models

I had a (bright?) idea of introducing the concept of enum-models
to account for all the weird ways families enumerate their messages.
I've never finished it because generating C code for each of them
is pretty daunting. But for languages which can use ID values directly
the support is simple enough, so clean this up a bit.

"unified" model is what I recommend going forward.
"directional" model is what ethtool uses.
"notify-split" is used by the proposed DPLL code, but we can just
make them use "unified", it hasn't been merged :)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: load jsonschema on demand

The CLI script tries to validate jsonschema by default.
It's seems better to validate too many times than too few.
However, when copying the scripts to random servers having
to install jsonschema is tedious. Load jsonschema via
importlib, and let the user opt out.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: use operation names from spec on the CLI

When I wrote the first version of the Python code I was quite
excited that we can generate class methods directly from the
spec. Unfortunately we need to use valid identifiers for method
names (specifically no dashes are allowed). Don't reuse those
names on the CLI, it's much more natural to use the operation
names exactly as listed in the spec.

Instead of:
./cli --do rings_get
use:
./cli --do rings-get

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support pretty printing bad attribute names

One of my favorite features of the Netlink specs is that they
make decoding structured extack a ton easier.
Implement pretty printing bad attribute names in YNL.

For example it will now say:

  'bad-attr': '.header.flags'

rather than the useless:

  'bad-attr-offs': 32

Proof:

  $ ./cli.py --spec ethtool.yaml --do rings_get \
     --json '{"header":{"dev-index":1, "flags":4}}'
  Netlink error: Invalid argument
  nl_len = 68 (52) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'reserved bit set',
'bad-attr': '.header.flags'}

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support multi-attr

Ethtool uses mutli-attr, add the support to YNL.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support directional enum-model in CLI

Support families which use different IDs for messages
to and from the kernel.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: add support for types needed by ethtool

Ethtool needs support for handful of extra types.
It doesn't have the definitions section yet.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: use the common YAML loading and validation code

Adapt the common object hierarchy in code gen and CLI.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: add an object hierarchy to represent parsed spec

There's a lot of copy and pasting going on between the "cli"
and code gen when it comes to representing the parsed spec.
Create a library which both can use.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: move the cli and netlink code around

Move the CLI code out of samples/ and the library part
of it into tools/net/ynl/lib/. This way we can start
sharing some code with the code gen.

Initially I thought that code gen is too C-specific to
share anything but basic stuff like calculating values
for enums can easily be shared.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: prevent do / dump reordering

An earlier fix tried to address generated code jumping around
one code-gen run to another. Turns out dict()s are already
ordered since Python 3.7, the problem is that we iterate over
operation modes using a set(). Sets are unordered in Python.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: use dev PM wakeirq handling

Replace the enable_irq_wake() call with one to dev_pm_set_wake_irq()
instead. This will let the dev PM framework automatically manage the
the wakeup capability of the ipa IRQ and ensure that userspace requests
to enable/disable wakeup for the IPA via sysfs are respected.

Signed-off-by: Caleb Connolly <caleb.connolly@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230127202758.2913612-1-caleb.connolly@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: microchip: ptp: fix up PTP dependency

When NET_DSA_MICROCHIP_KSZ_COMMON is built-in but PTP is a loadable
module, the ksz_ptp support still causes a link failure:

ld.lld-16: error: undefined symbol: ptp_clock_index
>>> referenced by ksz_ptp.c
>>> drivers/net/dsa/microchip/ksz_ptp.o:(ksz_get_ts_info) in archive vmlinux.a

This can happen if NET_DSA_MICROCHIP_KSZ8863_SMI is enabled, or
even if none of the KSZ9477_I2C/KSZ_SPI/KSZ8863_SMI ones are active
but only the common module is.

The most straightforward way to address this is to move the
dependency to NET_DSA_MICROCHIP_KSZ_PTP itself, which can now
only be enabled if both PTP_1588_CLOCK support is reachable
from NET_DSA_MICROCHIP_KSZ_COMMON. Alternatively, one could make
NET_DSA_MICROCHIP_KSZ_COMMON a hidden Kconfig symbol and extend the
PTP_1588_CLOCK_OPTIONAL dependency to NET_DSA_MICROCHIP_KSZ8863_SMI as
well, but that is a little more fragile.

Fixes: eac1ea20261e ("net: dsa: microchip: ptp: add the posix clock support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230130131808.1084796-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Documentation: networking: correct spelling

Correct spelling problems for Documentation/networking/ as reported
by codespell.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20230129231053.20863-5-rdunlap@infradead.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ibmvnic: Toggle between queue types in affinity mapping

Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all
the IRQs for transmit queues then assigning all the IRQs for receive
queues. With multi-threaded processors, in a heavy RX or TX environment,
physical cores would either be overloaded or underutilized (due to the
IRQ assignment algorithm). This approach is sub-optimal because IRQs for
the same subprocess (RX or TX) would be bound to adjacent CPU numbers,
meaning they were more likely to be contending for the same core.

For example, in a system with 64 CPU's and 32 queues, the IRQs would
be bound to CPU in the following pattern:

IRQ type | CPU number
-----------------------
TX0 | 0-1
TX1 | 2-3
<etc>
RX0 | 32-33
RX1 | 34-35
<etc>

Observe that in SMT-8, the first 4 tx queues would be sharing the
same core.

A more optimal algorithm would balance the number RX and TX IRQ's across
the physical cores. Therefore, to increase performance, distribute RX and
TX IRQs across cores by alternating between assigning IRQs for RX and TX
queues to CPUs.
With a system with 64 CPUs and 32 queues, this results in the following
pattern:

IRQ type | CPU number
-----------------------
TX0 | 0-1
RX0 | 2-3
TX1 | 4-5
RX1 | 6-7
<etc>

Observe that in SMT-8, there is equal distribution of RX and TX IRQs
per core. In the above case, each core handles 2 TX and 2 RX IRQ's.

Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Link: https://lore.kernel.org/r/20230127214358.318152-1-nnac123@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'add-support-for-the-the-vsc7512-internal-copper-phys'

Colin Foster says:

====================
add support for the the vsc7512 internal copper phys

This patch series is a continuation to add support for the VSC7512:
https://patchwork.kernel.org/project/netdevbpf/list/?series=674168&state=*

That series added the framework and initial functionality for the
VSC7512 chip. Several of these patches grew during the initial
development of the framework, which is why v1 will include changelogs.
It was during v9 of that original MFD patch set that these were dropped.

With that out of the way, the VSC7512 is mainly a subset of the VSC7514
chip. The 7512 lacks an internal MIPS processor, but otherwise many of
the register definitions are identical. That is why several of these
patches are simply to expose common resources from
drivers/net/ethernet/mscc/*.

This patch only adds support for the first four ports (swp0-swp3). The
remaining ports require more significant changes to the felix driver,
and will be handled in the future.
====================

Link: https://lore.kernel.org/r/20230127193559.1001051-1-colin.foster@in-advantage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mfd: ocelot: add external ocelot switch control

Utilize the existing ocelot MFD interface to add switch functionality to
the Microsemi VSC7512 chip.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Acked-for-MFD-by: Lee Jones <lee@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: ocelot: add external ocelot switch control

Add control of an external VSC7512 chip.

Currently the four copper phy ports are fully functional. Communication to
external phys is also functional, but the SGMII / QSGMII interfaces are
currently non-functional.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: mfd: ocelot: add ethernet-switch hardware support

The main purpose of the Ocelot chips are the Ethernet switching
functionalities. Document the support for these features.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: mscc,vsc7514-switch: add dsa binding for the vsc7512

The VSC7511, VSC7512, VSC7513 and VSC7514 all have the ability to be
controlled either internally by a memory-mapped CPU, or externally via
interfaces like SPI and PCIe. The internal CPU of the VSC7511 and 7512
don't have the resources to run Linux, so must be controlled via these
external interfaces in a DSA configuration.

Add mscc,vsc7512-switch compatible string to indicate that the chips are
being controlled externally in a DSA configuration.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mfd: ocelot: prepend resource size macros to be 32-bit

The *_RES_SIZE macros are initally <= 0x100. Future resource sizes will be
upwards of 0x200000 in size.

To keep things clean, fully align the RES_SIZE macros to 32-bit to do
nothing more than make the code more consistent.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Acked-for-MFD-by: Lee Jones <lee@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: add functionality when not all ports are supported

When the Felix driver would probe the ports and verify functionality, it
would fail if it hit single port mode that wasn't supported by the driver.

The initial case for the VSC7512 driver will have physical ports that
exist, but aren't supported by the driver implementation. Add the
OCELOT_PORT_MODE_NONE macro to handle this scenario, and allow the Felix
driver to continue with all the ports that are currently functional.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: add support for MFD configurations

The architecture around the VSC7512 differs from existing felix drivers. In
order to add support for all the chip's features (pinctrl, MDIO, gpio) the
device had to be laid out as a multi-function device (MFD).

One difference between an MFD and a standard platform device is that the
regmaps are allocated to the parent device before the child devices are
probed. As such, there is no need for felix to initialize new regmaps in
these configurations, they can simply be requested from the parent device.

Add support for MFD configurations by performing this request from the
parent device.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: add configurable device quirks

The define FELIX_MAC_QUIRKS was used directly in the felix.c shared driver.
Other devices (VSC7512 for example) don't require the same quirks, so they
need to be configured on a per-device basis.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose vsc7514_regmap definition

The VSC7514 target regmap is identical for ones shared with similar
hardware, specifically the VSC7512. Share this resource, and change the
name to match the pattern of other exported resources.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose ocelot_reset routine

Resetting the switch core is the same whether it is done internally or
externally. Move this routine to the ocelot library so it can be used by
other drivers.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose vcap_props structure

The vcap_props structure is common to other devices, specifically the
VSC7512 chip that can only be controlled externally. Export this structure
so it doesn't need to be recreated.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose regfield definition to be used by other drivers

The ocelot_regfields struct is common between several different chips, some
of which can only be controlled externally. Export this structure so it
doesn't have to be duplicated in these other drivers.

Rename the structure as well, to follow the conventions of other shared
resources.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose ocelot wm functions

Expose ocelot_wm functions so they can be shared with other drivers.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: change the phy id of yt8521 and yt8531s to lowercase

The phy id is usually defined in lower case.

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230128063558.5850-2-Frank.Sae@motor-comm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fix the spelling problem of Sentinel

CHECK: 'sentinal' may be misspelled - perhaps 'sentinel'?

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230128063558.5850-1-Frank.Sae@motor-comm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sh: checksum: add missing linux/uaccess.h include

SuperH does not include uaccess.h, even tho it calls access_ok().

Fixes: 68f4eae781dd ("net: checksum: drop the linux/uaccess.h include")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230128073108.1603095-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: b44: Remove the unused function __b44_cam_read()

The function __b44_cam_read() is defined in the b44.c file, but not called
elsewhere, so remove this unused function.

drivers/net/ethernet/broadcom/b44.c:199:20: warning: unused function '__b44_cam_read'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3858
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230128090413.79824-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'devlink-next'

Jakub Kicinski says:

====================
devlink: fix reload notifications and remove features

First two patches adjust notifications during devlink reload.
The last patch removes no longer needed devlink features.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>

devlink: remove devlink features

Devlink features were introduced to disallow devlink reload calls of
userspace before the devlink was fully initialized. The reason for this
workaround was the fact that devlink reload was originally called
without devlink instance lock held.

However, with recent changes that converted devlink reload to be
performed under devlink instance lock, this is redundant so remove
devlink features entirely.

Note that mlx5 used this to enable devlink reload conditionally only
when device didn't act as multi port slave. Move the multi port check
into mlx5_devlink_reload_down() callback alongside with the other
checks preventing the device from reload in certain states.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: send objects notifications during devlink reload

Currently, the notifications are only sent for params. People who
introduced other objects forgot to add the reload notifications here.

To make sure all notifications happen according to existing comment,
benefit from existence of devlink_notify_register/unregister() helpers
and use them in reload code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: move devlink reload notifications back in between _down() and _up() calls

This effectively reverts commit 05a7f4a8dff1 ("devlink: Break parameter
notification sequence to be before/after unload/load driver").

Cited commit resolved a problem in mlx5 params implementation,
when param notification code accessed memory previously freed
during reload.

Now, when the params can be registered and unregistered when devlink
instance is registered, mlx5 code unregisters the problematic param
during devlink reload. The fix is therefore no longer needed.

Current behavior is a it problematic, as it sends DEL notifications even
in potential case when reload_down() call fails which might confuse
userspace notifications listener.

So move the reload notifications back where they were originally in
between reload_down() and reload_up() calls.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sparx5-ES2-VCAP-support'

Steen Hegelund says:

====================
Adding Sparx5 ES2 VCAP support

This provides the Egress Stage 2 (ES2) VCAP (Versatile Content-Aware
Processor) support for the Sparx5 platform.

The ES2 VCAP is an Egress Access Control VCAP that uses frame keyfields and
previously classified keyfields to apply e.g. policing, trapping or
mirroring to frames.

The ES2 VCAP has 2 lookups and they are accessible with a TC chain id:

- chain 20000000: ES2 Lookup 0
- chain 20100000: ES2 Lookup 1

As the other Sparx5 VCAPs the ES2 VCAP has its own lookup/port keyset
configuration that decides which keys will be used for matching on which
traffic type.

The ES2 VCAP has these traffic classifications:

- IPv4 frames
- IPv6 frames
- Other frames

The ES2 VCAP can match on an ISDX key (Ingress Service Index) as one of the
frame metadata keyfields. The IS0 VCAP can update this key using its
actions, and this allows a IS0 VCAP rule to be linked to an ES2 rule.

This is similar to how the IS0 VCAP and the IS2 VCAP use the PAG
(Policy Association Group) keyfield to link rules.

From user space this is exposed via "chain offsets", so an IS0 rule with a
"goto chain 20000015" action will use an ISDX key value of 15 to link to a
rule in the ES2 VCAP attached to the same chain id.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains

This enhances the KUNIT test of the VCAP API with tests of the chaining
functionality.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add TC support for the ES2 VCAP

This enables the TC command to use the Sparx5 ES2 VCAP, and provides a new
ES2 ethertype table and handling of rule links between IS0 and ES2.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ingress information to VCAP instance

This allows the check of the goto action to be specific to the ingress and
egress VCAP instances.

The debugfs support is also updated to show this information.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ES2 VCAP keyset configuration for Sparx5

This adds the ES2 VCAP port keyset configuration for Sparx5 and also
updates the debugFS support to show the keyset configuration and the egress
port mask.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ES2 VCAP model and updated KUNIT VCAP model

This provides the VCAP model for the Sparx5 ES2 (Egress Stage 2) VCAP.

This VCAP provides tagging and remarking functionality

This also renames a VCAP keyfield: VCAP_KF_MIRROR_ENA becomes
VCAP_KF_MIRROR_PROBE, as the first name was caused by a mistake in the
model transformation.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Improve error message when parsing CVLAN filter

This improves the error message when a TC filter with CVLAN tag is used and
the selected VCAP instance does not support this.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Improve the IP frame key match for IPv6 frames

This ensures that it will be possible for a VCAP rule to distinguish IPv6
frames from non-IP frames, as the IS0 keyset usually selected for the IPv6
traffic class in (7TUPLE) does not offer a key that specifies IPv6
directly: only non-IPv4.

The IP_SNAP key ensures that we select (at least) IP frames.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add support for getting keysets without a type id

When there is only one keyset available for a certain VCAP rule size, the
particular keyset does not need a type id when encoded in the VCAP
Hardware.

This provides support for getting a keyset from a rule, when this is the
case: only one keyset fits this rule size.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batadv-next-pullrequest-20230127' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- drop prandom.h includes, by Sven Eckelmann

- fix mailing list address, by Sven Eckelmann

- multicast feature preparation, by Linus Lüssing (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Add a check for oversized packets

Occasionnaly we may get oversized packets from the hardware which
exceed the nomimal 2KiB buffer size we allocate SKBs with. Add an early
check which drops the packet to avoid invoking skb_over_panic() and move
on to processing the next packet.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: convert to gpio descriptor

The driver can be trivially converted, as it only triggers the gpio
pin briefly to do a reset, and it already only supports DT.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: mux-meson-g12a: use __clk_is_enabled to simplify the code

By using __clk_is_enabled () we can avoid defining an own variable for
tracking whether enable counter is zero.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netlink: recommend policy range validation

For large ranges (outside of s16) the documentation currently
recommends open-coding the validation, but it's better to use
the NLA_POLICY_FULL_RANGE() or NLA_POLICY_FULL_RANGE_SIGNED()
policy validation instead; recommend that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230127084506.09f280619d64.I5dece85f06efa8ab0f474ca77df9e26d3553d4ab@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-28

We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).

The main changes are:

1) Implement XDP hints via kfuncs with initial support for RX hash and
   timestamp metadata kfuncs, from Stanislav Fomichev and
   Toke Høiland-Jørgensen.
   Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk

2) Extend libbpf's bpf_tracing.h support for tracing arguments of
   kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.

3) Significantly reduce the search time for module symbols by livepatch
   and BPF, from Jiri Olsa and Zhen Lei.

4) Enable cpumasks to be used as kptrs, which is useful for tracing
   programs tracking which tasks end up running on which CPUs
   in different time intervals, from David Vernet.

5) Fix several issues in the dynptr processing such as stack slot liveness
   propagation, missing checks for PTR_TO_STACK variable offset, etc,
   from Kumar Kartikeya Dwivedi.

6) Various performance improvements, fixes, and introduction of more
   than just one XDP program to XSK selftests, from Magnus Karlsson.

7) Big batch to BPF samples to reduce deprecated functionality,
   from Daniel T. Lee.

8) Enable struct_ops programs to be sleepable in verifier,
   from David Vernet.

9) Reduce pr_warn() noise on BTF mismatches when they are expected under
   the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.

10) Describe modulo and division by zero behavior of the BPF runtime
    in BPF's instruction specification document, from Dave Thaler.

11) Several improvements to libbpf API documentation in libbpf.h,
    from Grant Seltzer.

12) Improve resolve_btfids header dependencies related to subcmd and add
    proper support for HOSTCC, from Ian Rogers.

13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
    helper along with BPF selftests, from Ziyang Xuan.

14) Simplify the parsing logic of structure parameters for BPF trampoline
    in the x86-64 JIT compiler, from Pu Lehui.

15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
    Rust compilation units with pahole, from Martin Rodriguez Reboredo.

16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
    from Kui-Feng Lee.

17) Disable stack protection for BPF objects in bpftool given BPF backends
    don't support it, from Holger Hoffstätte.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
  selftest/bpf: Make crashes more debuggable in test_progs
  libbpf: Add documentation to map pinning API functions
  libbpf: Fix malformed documentation formatting
  selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
  selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
  bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
  bpf/selftests: Verify struct_ops prog sleepable behavior
  bpf: Pass const struct bpf_prog * to .check_member
  libbpf: Support sleepable struct_ops.s section
  bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
  selftests/bpf: Fix vmtest static compilation error
  tools/resolve_btfids: Alter how HOSTCC is forced
  tools/resolve_btfids: Install subcmd headers
  bpf/docs: Document the nocast aliasing behavior of ___init
  bpf/docs: Document how nested trusted fields may be defined
  bpf/docs: Document cpumask kfuncs in a new file
  selftests/bpf: Add selftest suite for cpumask kfuncs
  selftests/bpf: Add nested trust selftests suite
  bpf: Enable cpumasks to be queried and used as kptrs
  bpf: Disallow NULLable pointers for trusted kfuncs
  ...
====================

Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: Remove 4s sleep during carrier detection

This patch removes the msleep(4s) during netpoll_setup() if the carrier
appears instantly.

Here are some scenarios where this workaround is counter-productive in
modern ages:

Servers which have BMC communicating over NC-SI via the same NIC as gets
used for netconsole. BMC will keep the PHY up, hence the carrier
appearing instantly.

The link is fibre, SERDES getting sync could happen within 0.1Hz, and
the carrier also appears instantly.

Other than that, if a driver is reporting instant carrier and then
losing it, this is probably a driver bug.

Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230125185230.3574681-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

Conflicts:

drivers/net/ethernet/intel/ice/ice_main.c
  418e53401e47 ("ice: move devlink port creation/deletion")
  643ef23bd9dd ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/

drivers/net/ethernet/engleder/tsnep_main.c
  3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues")
  25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/

net/netfilter/nf_conntrack_proto_sctp.c
  13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
  a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths")
  f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: fix tristate and help description

Fix description for tristate and help sections which include inaccurate
information.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230126190110.9124-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-xdp-execute-xdp_do_flush-before-napi_complete_done'

Magnus Karlsson says:

====================
net: xdp: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found in [1].

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in [2].

The drivers have only been compile-tested since I do not own any of
the HW below. So if you are a maintainer, it would be great if you
could take a quick look to make sure I did not mess something up.

Note that these were the drivers I found that violated the ordering by
running a simple script and manually checking the ones that came up as
potential offenders. But the script was not perfect in any way. There
might still be offenders out there, since the script can generate
false negatives.

[1] https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
[2] https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
====================

Link: https://lore.kernel.org/r/20230125074901.2737-1-magnus.karlsson@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-eth: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: d678be1dc1ec ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa_eth: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: a1e031ffb422 ("dpaa_eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio-net: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

lan966x: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: a825b611c7c1 ("net: lan966x: Add support for XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qede: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: d1b25b79e162b ("qede: add .ndo_xdp_xmit() and XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest/bpf: Make crashes more debuggable in test_progs

Reset stdio before printing verbose log of the SIGSEGV'ed test.
Otherwise, it's hard to understand what's going on in the cases like [0].

With the following patch applied:

--- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
@@ -392,6 +392,11 @@ void test_xdp_metadata(void)
       "generate freplace packet"))
goto out;

+
+ ASSERT_EQ(1, 2, "oops");
+ int *x = 0;
+ *x = 1; /* die */
+
while (!retries--) {
if (bpf_obj2->bss->called)
break;

Before:

#281     xdp_metadata:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1f)[0x55c919d98bcf]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bf90)[0x7f36aea5df90]
./test_progs(test_xdp_metadata+0x1db0)[0x55c919d8c6d0]
./test_progs(+0x23b438)[0x55c919d9a438]
./test_progs(main+0x534)[0x55c919d99454]
/lib/x86_64-linux-gnu/libc.so.6(+0x2718a)[0x7f36aea4918a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f36aea49245]
./test_progs(_start+0x21)[0x55c919b82ef1]

After:

test_xdp_metadata:PASS:ip netns add xdp_metadata 0 nsec
open_netns:PASS:malloc token 0 nsec
open_netns:PASS:open /proc/self/ns/net 0 nsec
open_netns:PASS:open netns fd 0 nsec
open_netns:PASS:setns 0 nsec
..
test_xdp_metadata:FAIL:oops unexpected oops: actual 1 != expected 2
#281     xdp_metadata:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1f)[0x562714a76bcf]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bf90)[0x7fa663f9cf90]
./test_progs(test_xdp_metadata+0x1db0)[0x562714a6a6d0]
./test_progs(+0x23b438)[0x562714a78438]
./test_progs(main+0x534)[0x562714a77454]
/lib/x86_64-linux-gnu/libc.so.6(+0x2718a)[0x7fa663f8818a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7fa663f88245]
./test_progs(_start+0x21)[0x562714860ef1]

0: https://github.com/kernel-patches/bpf/actions/runs/4019879316/jobs/6907358876

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230127215705.1254316-1-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

libbpf: Add documentation to map pinning API functions

This adds documentation for the following API functions:

- bpf_map__set_pin_path()
- bpf_map__pin_path()
- bpf_map__is_pinned()
- bpf_map__pin()
- bpf_map__unpin()
- bpf_object__pin_maps()
- bpf_object__unpin_maps()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024225.520685-1-grantseltzer@gmail.com

libbpf: Fix malformed documentation formatting

This fixes the doxygen format documentation above the
user_ring_buffer__* APIs. There has to be a newline
before the @brief, otherwise doxygen won't render them
for libbpf.readthedocs.org.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024749.522278-1-grantseltzer@gmail.com

Merge branch 'devlink-parama-cleanup'

Jiri Pirko says:

====================
devlink: Cleanup params usage

This patchset takes care of small cleanup of devlink params usage.
Some of the patches (first 2/3) are cosmetic, but I would like to
point couple of interesting ones:

Patch 9 is the main one of this set and introduces devlink instance
locking for params, similar to other devlink objects. That allows params
to be registered/unregistered when devlink instance is registered.

Patches 10-12 change mlx5 code to register non-driverinit params in the
code they are related to, and thanks to patch 8 this might be when
devlink instance is registered - for example during devlink reload.

---
v1->v2:
- Just small fix in the last patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Move eswitch port metadata devlink param to flow eswitch code

Move the param registration and handling code into the eswitch offloads
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Move flow steering devlink param to flow steering code

Move the param registration and handling code into the flow steering
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Move fw reset devlink param to fw reset code

Move the param registration and handling code into the fw reset code
as they are related to each other. No point in having the devlink param
registration done in separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: protect devlink param list by instance lock

Commit 1d18bb1a4ddd ("devlink: allow registering parameters after
the instance") as the subject implies introduced possibility to register
devlink params even for already registered devlink instance. This is a
bit problematic, as the consistency or params list was originally
secured by the fact it is static during devlink lifetime. So in order to
protect the params list, take devlink instance lock during the params
operations. Introduce unlocked function variants and use them in drivers
in locked context. Put lock assertions to appropriate places.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: put couple of WARN_ONs in devlink_param_driverinit_value_get()

Put couple of WARN_ONs in devlink_param_driverinit_value_get() function
to clearly indicate, that it is a driver bug if used without reload
support or for non-driverinit param.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: make devlink_param_driverinit_value_set() return void

devlink_param_driverinit_value_set() currently returns int with possible
error, but no user is checking it anyway. The only reason for a fail is
a driver bug. So convert the function to return void and put WARN_ONs
on error paths.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: remove pointless call to devlink_param_driverinit_value_set()

devlink_param_driverinit_value_set() call makes sense only for "
driverinit" params. However here, the param is "runtime".
devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case
and does not do anything. So remove the pointless call to
devlink_param_driverinit_value_set() entirely.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: remove pointless calls to devlink_param_driverinit_value_set()

devlink_param_driverinit_value_set() call makes sense only for
"driverinit" params. However here, both params are "runtime".
devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case
and does not do anything. So remove the pointless calls to
devlink_param_driverinit_value_set() entirely.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: don't work with possible NULL pointer in devlink_param_unregister()

There is a WARN_ON checking the param_item for being NULL when the param
is not inserted in the list. That indicates a driver BUG. Instead of
continuing to work with NULL pointer with its consequences, return.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: make devlink_param_register/unregister static

There is no user outside the devlink code, so remove the export and make
the functions static. Move them before callers to avoid forward
declarations.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Covert devlink params registration to use devlink_params_register/unregister()

Since mlx5 is the only user of devlink API to register/unregister a
single param, convert it to use array registration function allowing to
simplify the devlink API by removing the single param registration
functions.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Change devlink param register/unregister function names

The functions are registering and unregistering devlink params, so
change the names accordingly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-netlink-next'

Jakub Kicinski says:

====================
ethtool: netlink: handle SET intro/outro in the common code

Factor out the boilerplate code from SET handlers to common code.

I volunteered to refactor the extack in GET in a conversation
with Vladimir but I gave up.

The handling of failures during dump in GET handlers is a bit
unclear to me. Some code uses presence of info as indication
of dump and tries to avoid reporting errors altogether
(including extack messages).

There's also the question of whether we should have a validation
callback (similar to .set_validate here) for GET. It looks like
.parse_request was expected to perform the validation. It takes
the extack and tb directly, not via info:

int (*parse_request)(struct ethnl_req_info *req_info,
     struct nlattr **tb,
     struct netlink_ext_ack *extack);

int (*prepare_data)(const struct ethnl_req_info *req_info,
    struct ethnl_reply_data *reply_data,
    struct genl_info *info);

so no crashes dereferencing info possible.

But .parse_request doesn't run under rtnl nor ethnl_ops_begin().
As a result some implementations defer validation until .prepare_data
where all the locks are held and they can call out to the driver.

All this makes me think that maybe we should refactor GET in the
same direction I'm refactoring SET. Split .prepare_data, take
more locks in the core, and add a validation helper which would
take extack directly:

    - ret = ops->prepare_data(req_info, reply_data, info);
    + ret = ops->prepare_data_validate(req_info, reply_data, attrs, extack);
    + if (ret < 1) // if 0 -> skip for dump; -EOPNOTSUPP in do
    +   goto err1;
    +
    + ret = ethnl_ops_begin(dev);
    + if (ret)
    +   goto err1;
    +
    + ret = ops->prepare_data(req_info, reply_data); // no extack
    + ethnl_ops_complete(dev);

I'll file that away as a TODO for posterity / older me.

v2:
- invert checks for coalescing to avoid error code changes
- rebase and convert MM as well

v1: https://lore.kernel.org/all/20230121054430.642280-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: netlink: convert commands to common SET

Convert all SET commands where new common code is applicable.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: netlink: handle SET intro/outro in the common code

Most ethtool SET callbacks follow the same general structure.

  ethnl_parse_header_dev_get()
  rtnl_lock()
  ethnl_ops_begin()

  ... do stuff ...

  ethtool_notify()
  ethnl_ops_complete()
  rtnl_unlock()
  ethnl_parse_header_dev_put()

This leads to a lot of copy / pasted code an bugs when people
mis-handle the error path.

Add a generic implementation of this pattern with a .set callback
in struct ethnl_request_ops called to "do stuff".

Also add an optional .set_validate which is called before
ethnl_ops_begin() -- a lot of implementations do basic request
capability / sanity checking at that point.

Because we want to avoid generating the notification when
no change happened - adopt a slightly hairy return values:
- 0 means nothing to do (no notification)
- 1 means done / continue
- negative error codes on error

Reuse .hdr_attr from struct ethnl_request_ops, GET and SET
use the same attr spaces in all cases.

Convert pause as an example (and to avoid unused function warnings).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: convert to regmap read/write API

Convert qca8k to regmap read/write bulk API. The mgmt eth can write up
to 32 bytes of data at times. Currently we use a custom function to do
it but regmap now supports declaration of read/write bulk even without a
bus.

Drop the custom function and rework the regmap function to this new
implementation.

Rework the qca8k_fdb_read/write function to use the new
regmap_bulk_read/write as the old qca8k_bulk_read/write are now dropped.

Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: add QCA8K_ATU_TABLE_SIZE define for fdb access

Add and use QCA8K_ATU_TABLE_SIZE instead of hardcoding the ATU size with
a pure number and using sizeof on the array.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-skbuff-includes'

Jakub Kicinski says:

====================
net: skbuff: clean up unnecessary includes

skbuff.h is included in a significant portion of the tree.
Clean up unused dependencies to speed up builds.

This set only takes care of the most obvious cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove unnecessary includes from net/flow.h

This file is included by a lot of other commonly included
headers, it doesn't need socket.h or flow_dissector.h.

This reduces the size of this file after pre-processing
from 28165 to 4663.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/hrtimer.h include

linux/hrtimer.h include was added because apparently it used
to contain ktime related code. This is no longer the case
and we include linux/time.h explicitly.

Sadly this change is currently a noop because linux/dma-mapping.h
and net/page_pool.h pull in half of the universe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/splice.h include

splice.h is included since commit a60e3cc7c929 ("net: make
skb_splice_bits more configureable") but really even then
all we needed is some forward declarations. Most of that
code is now gone, and remaining has fwd declarations.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add missing includes of linux/splice.h

Number of files depend on linux/splice.h getting included
by linux/skbuff.h which soon will no longer be the case.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/sched.h include

linux/sched.h was added for skb_mstamp_* (all the way back
before linux/sched.h got split and linux/sched/clock.h created).
We don't need it in skbuff.h any more.

Sadly this change is currently a noop because linux/dma-mapping.h
and net/page_pool.h pull in half of the universe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/sched/clock.h include

It used to be necessary for skb_mstamp_* static inlines,
but those are gone since we moved to usec timestamps in
TCP, in 2017.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add missing includes of linux/sched/clock.h

Number of files depend on linux/sched/clock.h getting included
by linux/skbuff.h which soon will no longer be the case.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/textsearch.h include

This include was added for skb_find_text() but all we need there
is a forward declaration of struct ts_config.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: checksum: drop the linux/uaccess.h include

net/checksum.h pulls in linux/uaccess.h which is large.

In the x86 header the include seems to not be needed at all.
ARM on the other hand does not include uaccess.h, even tho
it calls access_ok().

In the generic implementation guard the include of linux/uaccess.h
with the same condition as the code that needs it.

With this change pre-processed net/checksum.h shrinks on x86
from 30616 lines to just 1193.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/net.h include

It appears nothing needs it. The kernel builds fine with this
include removed, building an otherwise empty source file with:

#include <linux/skbuff.h>
#ifdef _LINUX_NET_H
#error linux/net.h is back
#endif

works too (meaning net.h is not just pulled in indirectly).

This gives us a slight 0.5% reduction in the pre-processed size
of skbuff.h.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add missing includes of linux/net.h

linux/net.h will soon not be included by linux/skbuff.h.
Fix the cases where source files were depending on the implicit
include.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-abstract-status'

Alex Elder says:

====================
net: ipa: abstract status parsing

Under some circumstances, IPA generates a "packet status" structure
that describes information about a packet.  This is used, for
example, when offload hardware detects an error in a packet, or
otherwise discovers a packet needs special handling.  In this case,
the status is delivered (along with the packet it describes) to a
"default" endpoint so that it can be handled by the AP.

Until now, the structure of this status information hasn't changed.
However, to support more than 32 endpoints, this structure required
some changes, such that some fields are rearranged in ways that are
tricky to represent using C code.

This series updates code related to the IPA status structure.  The
first patch uses a local variable to avoid recomputing a packet
length more than once.  The second stops using sizeof() to determine
the size of an IPA packet status structure.  Patches 3-5 extend the
definitions for values held in packet status fields.  Patch 6 does a
little general cleanup to make patch 7 simpler.  Patch 7 stops using
a C structure to represent packet status; instead, a new function
fetches values "by name" from a buffer containing such a structure.
The last patch updates this function so it also supports IPA v5.0+.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: add IPA v5.0 packet status support

Update ipa_status_extract() to support IPA v5.0 and beyond. Because
the format of the IPA packet status depends on the version, pass an
IPA pointer to the function.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: introduce generalized status decoder

Stop assuming the IPA packet status has a fixed format (defined by
a C structure). Instead, use a function to extract each field from
a block of data interpreted as an IPA packet status. Define an
enumerated type that identifies the fields that can be extracted.
The current function extracts fields based on the existing
ipa_status structure format (which is no longer used).

Define IPA_STATUS_RULE_MISS, to replace the calls to field_max() to
represent that condition; those depended on the knowing the width of
a filter or router rule in the IPA packet status structure.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>