OSDN Git Service

tomoyo/tomoyo-test1.git
13 months agoselftests: router_bridge_vlan: Set vlan_default_pvid 0 on the bridge
Petr Machata [Fri, 2 Jun 2023 16:20:12 +0000 (18:20 +0200)]
selftests: router_bridge_vlan: Set vlan_default_pvid 0 on the bridge

When everything is configured, VLAN membership on the bridge in this
selftest are as follows:

    # bridge vlan show
    port              vlan-id
    swp2              1 PVID Egress Untagged
                      555
    br1               1 Egress Untagged
                      555 PVID Egress Untagged

Note that it is possible for untagged traffic to just flow through as VLAN
1, instead of using VLAN 555 as intended by the test. This configuration
seems too close to "works by accident", and it would be better to just shut
out VLAN 1 altogether.

To that end, configure vlan_default_pvid of 0:

    # bridge vlan show
    port              vlan-id
    swp2              555
    br1               555 PVID Egress Untagged

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoselftests: router_bridge_vlan: Add a diagram
Petr Machata [Fri, 2 Jun 2023 16:20:11 +0000 (18:20 +0200)]
selftests: router_bridge_vlan: Add a diagram

Add a topology diagram to this selftest to make the configuration easier to
understand.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoselftests: mlxsw: egress_vid_classification: Fix the diagram
Petr Machata [Fri, 2 Jun 2023 16:20:10 +0000 (18:20 +0200)]
selftests: mlxsw: egress_vid_classification: Fix the diagram

The topology diagram implies that $swp1 and $swp2 are members of the bridge
br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the
diagram.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoselftests: mlxsw: ingress_rif_conf_1d: Fix the diagram
Petr Machata [Fri, 2 Jun 2023 16:20:09 +0000 (18:20 +0200)]
selftests: mlxsw: ingress_rif_conf_1d: Fix the diagram

The topology diagram implies that $swp1 and $swp2 are members of the bridge
br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the
diagram.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agomlxsw: spectrum_router: Do not query MAX_VRS on each iteration
Petr Machata [Fri, 2 Jun 2023 16:20:08 +0000 (18:20 +0200)]
mlxsw: spectrum_router: Do not query MAX_VRS on each iteration

MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module.
Instead of making the call on every iteration, cache it up front, and use
the value.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agomlxsw: spectrum_router: Do not query MAX_RIFS on each iteration
Petr Machata [Fri, 2 Jun 2023 16:20:07 +0000 (18:20 +0200)]
mlxsw: spectrum_router: Do not query MAX_RIFS on each iteration

MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module.
Instead of making the call on every iteration, cache it up front, and use
the value.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agomlxsw: spectrum_router: Use extack in mlxsw_sp~_rif_ipip_lb_configure()
Petr Machata [Fri, 2 Jun 2023 16:20:06 +0000 (18:20 +0200)]
mlxsw: spectrum_router: Use extack in mlxsw_sp~_rif_ipip_lb_configure()

In commit 26029225d992 ("mlxsw: spectrum_router: Propagate extack
further"), the mlxsw_sp_rif_ops.configure callback got a new argument,
extack. However the callbacks that deal with tunnel configuration,
mlxsw_sp1_rif_ipip_lb_configure() and mlxsw_sp2_rif_ipip_lb_configure(),
were never updated to pass the parameter further. Do that now.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agomlxsw: spectrum_router: Clarify a comment
Petr Machata [Fri, 2 Jun 2023 16:20:05 +0000 (18:20 +0200)]
mlxsw: spectrum_router: Clarify a comment

"Reserved for X" usually means that only X is supposed to use a given
object. Here, it is used in the sense that X should consider the object
"reserved", as in "restricted".

Replace the comment simply by "X", with the implication that that's where
the field is used.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'sja1105-cleanups'
David S. Miller [Mon, 5 Jun 2023 10:26:03 +0000 (11:26 +0100)]
Merge branch 'sja1105-cleanups'

Russell King says:

====================
convert sja1105 xpcs creation and remove xpcs_create

This series of three patches converts sja1105 to use the newly
provided xpcs_create_mdiodev(), and as there become no users of
xpcs_create(), removes this function from the global namespace to
discourage future direct use.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: pcs: xpcs: remove xpcs_create() from public view
Russell King (Oracle) [Fri, 2 Jun 2023 13:58:45 +0000 (14:58 +0100)]
net: pcs: xpcs: remove xpcs_create() from public view

There are now no callers of xpcs_create(), so let's remove it from
public view to discourage future direct usage.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: sja1105: use xpcs_create_mdiodev()
Russell King (Oracle) [Fri, 2 Jun 2023 13:58:40 +0000 (14:58 +0100)]
net: dsa: sja1105: use xpcs_create_mdiodev()

Use the new xpcs_create_mdiodev() creator, which simplifies the
creation and destruction of the mdio device associated with xpcs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: sja1105: allow XPCS to handle mdiodev lifetime
Russell King (Oracle) [Fri, 2 Jun 2023 13:58:35 +0000 (14:58 +0100)]
net: dsa: sja1105: allow XPCS to handle mdiodev lifetime

Put the mdiodev after xpcs_create() so that the XPCS driver can manage
the lifetime of the mdiodev its using.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'regmap-TSE-PCS'
David S. Miller [Mon, 5 Jun 2023 08:56:36 +0000 (09:56 +0100)]
Merge branch 'regmap-TSE-PCS'

Maxime Chevallier says:

====================
net: add a regmap-based mdio driver and drop TSE PCS

This is the V4 of a series that follows-up on the work [1] aiming to drop the
altera TSE PCS driver, as it turns out to be a version of the Lynx PCS exposed
as a memory-mapped block, instead of living on an MDIO bus.

One step of this removal involved creating a regmap-based mdio driver
that translates MDIO accesses into the actual underlying bus that
exposes the register. The register layout must of course match the
standard MDIO layout, but we can now account for differences in stride
with recent work on the regmap subsystem [2].

Sorry for repeating this, but I didn't hear anything on this matter in previous
iterations, Mark, Net maintainers, this series depends on the patch
e12ff2876493 that was recently merged into the regmap tree [3].

For this series to be usable in net-next, this patch must be applied
beforehand. Should Mark create a tag that would then be merged into
net-next ? Or should we just wait for the next release to merge this
into net-next ?

This series introduces a new MDIO driver, and uses it to convert Altera
TSE from the actual TSE PCS driver to Lynx PCS.

Since it turns out dwmac_socfpga also uses a TSE PCS block, port that
driver to Lynx as well.

Changes in V4 :
 - Use new pcs_lynx_create/destroy helpers added by Russell
 - Rework the cleanup sequence to avoid leaking data
 - Rework a bit KConfig to properly select dependencies
 - Fix a few hiccups with misplaced hunks in 2 commits

Changes in V3 :
 - Use a dedicated struct for the mii bus's priv data, to avoid
   duplicating the whole struct mdio_regmap_config, from which 2 fields
   only are necessary after init, as suggested by Russell
 - Use ~0 instead of ~0UL for the no-scan bitmask, following Simon's
   review.

Changes in V2 :
 - Use phy_mask to avoid unnecessarily scanning the whole mdio bus
 - Go one step further and completely disable scanning if users
   set the .autoscan flag to false, in case the mdiodevice isn't an
   actual PHY (a PCS for example).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: stmmac: dwmac-sogfpga: use the lynx pcs driver
Maxime Chevallier [Thu, 1 Jun 2023 14:14:54 +0000 (16:14 +0200)]
net: stmmac: dwmac-sogfpga: use the lynx pcs driver

dwmac_socfpga re-implements support for the TSE PCS, which is identical
to the already existing TSE PCS, which in turn is the same as the Lynx
PCS. Drop the existing TSE re-implemenation and use the Lynx PCS
instead, relying on the regmap-mdio driver to translate MDIO accesses
into mmio accesses.

Add a lynx_pcs reference in the stmmac's internal structure, and use
.mac_select_pcs() to return the relevant PCS to be used.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: pcs: Drop the TSE PCS driver
Maxime Chevallier [Thu, 1 Jun 2023 14:14:53 +0000 (16:14 +0200)]
net: pcs: Drop the TSE PCS driver

Now that we can easily create a mdio-device that represents a
memory-mapped device that exposes an MDIO-like register layout, we don't
need the Altera TSE PCS anymore, since we can use the Lynx PCS instead.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx
Maxime Chevallier [Thu, 1 Jun 2023 14:14:52 +0000 (16:14 +0200)]
net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx

The newly introduced regmap-based MDIO driver allows for an easy mapping
of an mdiodevice onto the memory-mapped TSE PCS, which is actually a
Lynx PCS.

Convert Altera TSE to use this PCS instead of the pcs-altera-tse, which
is nothing more than a memory-mapped Lynx PCS.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: mdio: Introduce a regmap-based mdio driver
Maxime Chevallier [Thu, 1 Jun 2023 14:14:51 +0000 (16:14 +0200)]
net: mdio: Introduce a regmap-based mdio driver

There exists several examples today of devices that embed an ethernet
PHY or PCS directly inside an SoC. In this situation, either the device
is controlled through a vendor-specific register set, or sometimes
exposes the standard 802.3 registers that are typically accessed over
MDIO.

As phylib and phylink are designed to use mdiodevices, this driver
allows creating a virtual MDIO bus, that translates mdiodev register
accesses to regmap accesses.

The reason we use regmap is because there are at least 3 such devices
known today, 2 of them are Altera TSE PCS's, memory-mapped, exposed
with a 4-byte stride in stmmac's dwmac-socfpga variant, and a 2-byte
stride in altera-tse. The other one (nxp,sja1110-base-tx-mdio) is
exposed over SPI.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoipv6: lower "link become ready"'s level message
Matthieu Baerts [Fri, 2 Jun 2023 09:36:07 +0000 (11:36 +0200)]
ipv6: lower "link become ready"'s level message

This following message is printed in the console each time a network
device configured with an IPv6 addresses is ready to be used:

  ADDRCONF(NETDEV_CHANGE): <iface>: link becomes ready

When netns are being extensively used -- e.g. by re-creating netns' with
veth to discuss with each others for testing purposes like mptcp_join.sh
selftest does -- it generates a lot of messages like that: more than 700
when executing mptcp_join.sh with the latest version.

It looks like this message is not that helpful after all: maybe it can
be used as a sign to know if there is something wrong, e.g. if a device
is being regularly reconfigured by accident? But even then, there are
better ways to monitor and diagnose such issues.

When looking at commit 3c21edbd1137 ("[IPV6]: Defer IPv6 device
initialization until the link becomes ready.") which introduces this new
message, it seems it had been added to verify that the new feature was
working as expected. It could have then used a lower level than "info"
from the beginning but it was fine like that back then: 17 years ago.

It seems then OK today to simply lower its level, similar to commit
7c62b8dd5ca8 ("net/ipv6: lower the level of "link is not ready" messages")
and as suggested by Mat [1], Stephen and David [2].

Link: https://lore.kernel.org/mptcp/614e76ac-184e-c553-af72-084f792e60b0@kernel.org/T/
Link: https://lore.kernel.org/netdev/68035bad-b53e-91cb-0e4a-007f27d62b05@tessares.net/T/
Suggested-by: Mat Martineau <martineau@kernel.org>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: phylib: fix phy_read*_poll_timeout()
Russell King (Oracle) [Thu, 1 Jun 2023 15:48:12 +0000 (16:48 +0100)]
net: phylib: fix phy_read*_poll_timeout()

Dan Carpenter reported a signedness bug in genphy_loopback(). Andrew
reports that:

"It is common to get this wrong in general with PHY drivers. Dan
regularly posts fixes like this soon after a PHY driver patch it
merged. I really wish we could somehow get the compiler to warn when
the result from phy_read() is stored into a unsigned type. It would
save Dan a lot of work."

Let's make phy_read*_poll_timeout() immune to further issues when "val"
is an unsigned type by storing the read function's result in a signed
int as well as "val", and using the signed variable both to check for
an error and for propagating that error to the caller.

The advantage of this method is we don't change where the cast from
the signed return code to the user's variable occurs - so users will
see no change.

Previously Heiner changed phy_read_poll_timeout() to check for an error
before evaluating the user supplied condition, but didn't update
phy_read_mmd_poll_timeout(). Make that change there too.

Link: https://lore.kernel.org/r/d7bb312e-2428-45f6-b9b3-59ba544e8b94@kili.mountain
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q4kX6-00BNuM-Mx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'tools-ynl-gen-dust-off-the-user-space-code'
Jakub Kicinski [Sat, 3 Jun 2023 05:10:48 +0000 (22:10 -0700)]
Merge branch 'tools-ynl-gen-dust-off-the-user-space-code'

Jakub Kicinski says:

====================
tools: ynl-gen: dust off the user space code

Every now and then I wish I finished the user space part of
the netlink specs, Python scripts kind of stole the show but
C is useful for selftests and stuff which needs to be fast.
Recently someone asked me how to access devlink and ethtool
from C++ which pushed me over the edge.

Fix things which bit rotted and finish notification handling.
This series contains code gen changes only. I'll follow up
with the fixed component, samples and docs as soon as it's
merged.
====================

Link: https://lore.kernel.org/r/20230602023548.463441-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: generate static descriptions of notifications
Jakub Kicinski [Fri, 2 Jun 2023 02:35:48 +0000 (19:35 -0700)]
tools: ynl-gen: generate static descriptions of notifications

Notifications may come in at any time. The family must be always
ready to parse a random incoming notification. Generate notification
table for parsing and tell YNL which request we're processing
to distinguish responses from notifications.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: switch to family struct
Jakub Kicinski [Fri, 2 Jun 2023 02:35:47 +0000 (19:35 -0700)]
tools: ynl-gen: switch to family struct

We'll want to store static info about the family soon.
Generate a struct. This changes creation from, e.g.:

 ys = ynl_sock_create("netdev", &yerr);
to:
 ys = ynl_sock_create(&ynl_netdev_family, &yerr);

on user's side.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: generate alloc and free helpers for req
Jakub Kicinski [Fri, 2 Jun 2023 02:35:46 +0000 (19:35 -0700)]
tools: ynl-gen: generate alloc and free helpers for req

We expect user to allocate requests with calloc(),
make things a bit more consistent and provide helpers.
Generate free calls, too.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: move the response reading logic into YNL
Jakub Kicinski [Fri, 2 Jun 2023 02:35:45 +0000 (19:35 -0700)]
tools: ynl-gen: move the response reading logic into YNL

We generate send() and recv() calls and all msg handling for
each operation. It's a lot of repeated code and will only grow
with notification handling. Call back to a helper YNL lib instead.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: generate enum-to-string helpers
Jakub Kicinski [Fri, 2 Jun 2023 02:35:44 +0000 (19:35 -0700)]
tools: ynl-gen: generate enum-to-string helpers

It's sometimes useful to print the name of an enum value,
flag or name of the op. Python can do it, add C helper
code gen for getting names of things.

Example:

  static const char * const netdev_xdp_act_strmap[] = {
[0] = "basic",
[1] = "redirect",
[2] = "ndo-xmit",
[3] = "xsk-zerocopy",
[4] = "hw-offload",
[5] = "rx-sg",
[6] = "ndo-xmit-sg",
  };

  const char *netdev_xdp_act_str(enum netdev_xdp_act value)
  {
value = ffs(value) - 1;
if (value < 0 || value >= (int)MNL_ARRAY_SIZE(netdev_xdp_act_strmap))
return NULL;
return netdev_xdp_act_strmap[value];
  }

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: add error checking for nested structs
Jakub Kicinski [Fri, 2 Jun 2023 02:35:43 +0000 (19:35 -0700)]
tools: ynl-gen: add error checking for nested structs

Parsing nested types may return an error, propagate it.
Not marking as a fix, because nothing uses YNL upstream.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: loosen type consistency check for events
Jakub Kicinski [Fri, 2 Jun 2023 02:35:42 +0000 (19:35 -0700)]
tools: ynl-gen: loosen type consistency check for events

Both event and notify types are always consistent. Rewrite
the condition checking if we can reuse reply types to be
less picky and let notify thru.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: don't override pure nested struct
Jakub Kicinski [Fri, 2 Jun 2023 02:35:41 +0000 (19:35 -0700)]
tools: ynl-gen: don't override pure nested struct

For pure structs (parsed nested attributes) we track what
forms of the struct exist in request and reply directions.
Make sure we don't overwrite the recorded struct each time,
otherwise the information is lost.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: fix unused / pad attribute handling
Jakub Kicinski [Fri, 2 Jun 2023 02:35:40 +0000 (19:35 -0700)]
tools: ynl-gen: fix unused / pad attribute handling

Unused and Pad attributes don't carry information.
Unused should never exist, and be rejected.
Pad should be silently skipped.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: add extra headers for user space
Jakub Kicinski [Fri, 2 Jun 2023 02:35:39 +0000 (19:35 -0700)]
tools: ynl-gen: add extra headers for user space

Make sure all relevant headers are included, we allocate memory,
use memcpy() and Linux types without including the headers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoipv4: Drop tos parameter from flowi4_update_output()
Guillaume Nault [Thu, 1 Jun 2023 16:37:46 +0000 (18:37 +0200)]
ipv4: Drop tos parameter from flowi4_update_output()

Callers of flowi4_update_output() never try to update ->flowi4_tos:

  * ip_route_connect() updates ->flowi4_tos with its own current
    value.

  * ip_route_newports() has two users: tcp_v4_connect() and
    dccp_v4_connect. Both initialise fl4 with ip_route_connect(), which
    in turn sets ->flowi4_tos with RT_TOS(inet_sk(sk)->tos) and
    ->flowi4_scope based on SOCK_LOCALROUTE.

    Then ip_route_newports() updates ->flowi4_tos with
    RT_CONN_FLAGS(sk), which is the same as RT_TOS(inet_sk(sk)->tos),
    unless SOCK_LOCALROUTE is set on the socket. In that case, the
    lowest order bit is set to 1, to eventually inform
    ip_route_output_key_hash() to restrict the scope to RT_SCOPE_LINK.
    This is equivalent to properly setting ->flowi4_scope as
    ip_route_connect() did.

  * ip_vs_xmit.c initialises ->flowi4_tos with memset(0), then calls
    flowi4_update_output() with tos=0.

  * sctp_v4_get_dst() uses the same RT_CONN_FLAGS_TOS() when
    initialising ->flowi4_tos and when calling flowi4_update_output().

In the end, ->flowi4_tos never changes. So let's just drop the tos
parameter. This will simplify the conversion of ->flowi4_tos from __u8
to dscp_t.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: lan743x: Remove extranous gotos
Moritz Fischer [Fri, 2 Jun 2023 00:04:14 +0000 (00:04 +0000)]
net: lan743x: Remove extranous gotos

The gotos for cleanup aren't required, the function
might as well just return the actual error code.

Signed-off-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoip_gre: clean up some inconsistent indenting
Jiapeng Chong [Fri, 2 Jun 2023 05:59:25 +0000 (13:59 +0800)]
ip_gre: clean up some inconsistent indenting

No functional modification involved.

net/ipv4/ip_gre.c:192 ipgre_err() warn: inconsistent indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5375
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoselftests: tls: add tests for poll behavior
Jakub Kicinski [Wed, 31 May 2023 15:35:51 +0000 (08:35 -0700)]
selftests: tls: add tests for poll behavior

Make sure we don't generate premature POLLIN events.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agotls: suppress wakeups unless we have a full record
Jakub Kicinski [Wed, 31 May 2023 15:35:50 +0000 (08:35 -0700)]
tls: suppress wakeups unless we have a full record

TLS does not override .poll() so TLS-enabled socket will generate
an event whenever data arrives at the TCP socket. This leads to
unnecessary wakeups on slow connections.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoselftests/tc-testing: replace mq with invalid parent ID
Zhengchao Shao [Thu, 1 Jun 2023 01:22:50 +0000 (09:22 +0800)]
selftests/tc-testing: replace mq with invalid parent ID

The test case shown in [1] triggers the kernel to access the null pointer.
Therefore, add related test cases to mq.
The test results are as follows:

./tdc.py -e 0531
1..1
ok 1 0531 - Replace mq with invalid parent ID

./tdc.py -c mq
1..8
ok 1 ce7d - Add mq Qdisc to multi-queue device (4 queues)
ok 2 2f82 - Add mq Qdisc to multi-queue device (256 queues)
ok 3 c525 - Add duplicate mq Qdisc
ok 4 128a - Delete nonexistent mq Qdisc
ok 5 03a9 - Delete mq Qdisc twice
ok 6 be0f - Add mq Qdisc to single-queue device
ok 7 1023 - Show mq class
ok 8 0531 - Replace mq with invalid parent ID

[1] https://lore.kernel.org/all/20230527093747.3583502-1-shaozhengchao@huawei.com/
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230601012250.52738-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: phy: broadcom: Add LPI counter
Florian Fainelli [Wed, 31 May 2023 23:17:29 +0000 (16:17 -0700)]
net: phy: broadcom: Add LPI counter

Add the ability to read the PHY maintained LPI counter which is in the
Clause 45 vendor space, device address 7, offset 0x803F. The counter is
cleared on read.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230531231729.1873932-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agor8169: use dev_err_probe in all appropriate places in rtl_init_one()
Heiner Kallweit [Wed, 31 May 2023 20:41:32 +0000 (22:41 +0200)]
r8169: use dev_err_probe in all appropriate places in rtl_init_one()

In addition to properly handling probe deferrals dev_err_probe()
conveniently combines printing an error message with returning
the errno. So let's use it for every error path in rtl_init_one()
to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/f0596a19-d517-e301-b649-304f9247b75a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'extend-dt-bindings-for-pse-pd-controllers-and-update-prtt1c-dts'
Jakub Kicinski [Fri, 2 Jun 2023 04:38:32 +0000 (21:38 -0700)]
Merge branch 'extend-dt-bindings-for-pse-pd-controllers-and-update-prtt1c-dts'

Oleksij Rempel says:

====================
Extend dt-bindings for PSE-PD controllers and update prtt1c dts

This patch set comes in response to issues identified while adding PoDL
PSE support to the stm32 prtt1c device tree. The existing pse-pd device
tree bindings did not allow node name patterns like "ethernet-pse-0" and
"ethernet-pse-1", leading to validation failures.

To address these false positives in validation, the device tree bindings
are extended to support these node name patterns. Alongside this, an
example node is added to aid in the improved validation process.
====================

Link: https://lore.kernel.org/r/20230531102113.3353065-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodt-bindings: net: pse-pd: Allow -N suffix for ethernet-pse node names
Oleksij Rempel [Wed, 31 May 2023 10:21:12 +0000 (12:21 +0200)]
dt-bindings: net: pse-pd: Allow -N suffix for ethernet-pse node names

Extend the pattern matching for PSE-PD controller nodes to allow -N
suffixes. This enables the use of multiple "ethernet-pse" nodes without the
need for a "reg" property.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: bring port new reply back
Jiri Pirko [Wed, 31 May 2023 14:20:25 +0000 (16:20 +0200)]
devlink: bring port new reply back

In the offending fixes commit I mistakenly removed the reply message of
the port new command. I was under impression it is a new port
notification, partly due to the "notify" in the name of the helper
function. Bring the code sending reply with new port message back, this
time putting it directly to devlink_nl_cmd_port_new_doit()

Fixes: c496daeb8630 ("devlink: remove duplicate port notification")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531142025.2605001-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 1 Jun 2023 22:33:03 +0000 (15:33 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/sfc/tc.c
  622ab656344a ("sfc: fix error unwinds in TC offload")
  b6583d5e9e94 ("sfc: support TC decap rules matching on enc_src_port")

net/mptcp/protocol.c
  5b825727d087 ("mptcp: add annotations around msk->subflow accesses")
  e76c8ef5cc5b ("mptcp: refactor mptcp_stream_accept()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 1 Jun 2023 21:29:18 +0000 (17:29 -0400)]
Merge tag 'net-6.4-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Happy Wear a Dress Day.

  Fairly standard-sized batch of fixes, accounting for the lack of
  sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
  were complaining about that. No fires burning.

  Current release - regressions:

   - eth: mlx5e:
      - multiple fixes for dynamic IRQ allocation
      - prevent encap offload when neigh update is running

   - eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters

  Current release - new code bugs:

   - eth: mlx5e: DR, add missing mutex init/destroy in pattern manager

  Previous releases - always broken:

   - tcp: deny tcp_disconnect() when threads are waiting

   - sched: prevent ingress Qdiscs from getting installed in random
     locations in the hierarchy and moving around

   - sched: flower: fix possible OOB write in fl_set_geneve_opt()

   - netlink: fix NETLINK_LIST_MEMBERSHIPS length report

   - udp6: fix race condition in udp6_sendmsg & connect

   - tcp: fix mishandling when the sack compression is deferred

   - rtnetlink: validate link attributes set at creation time

   - mptcp: fix connect timeout handling

   - eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked

   - eth: amd-xgbe: fix the false linkup in xgbe_phy_status

   - eth: mlx5e:
      - fix corner cases in internal buffer configuration
      - drain health before unregistering devlink

   - usb: qmi_wwan: set DTR quirk for BroadMobi BM818

  Misc:

   - tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
     user_mss set"

* tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
  mptcp: fix active subflow finalization
  mptcp: add annotations around sk->sk_shutdown accesses
  mptcp: fix data race around msk->first access
  mptcp: consolidate passive msk socket initialization
  mptcp: add annotations around msk->subflow accesses
  mptcp: fix connect timeout handling
  rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
  rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
  rtnetlink: call validate_linkmsg in rtnl_create_link
  ice: recycle/free all of the fragments from multi-buffer frame
  net: phy: mxl-gpy: extend interrupt fix to all impacted variants
  net: renesas: rswitch: Fix return value in error path of xmit
  net: dsa: mv88e6xxx: Increase wait after reset deactivation
  net: ipa: Use correct value for IPA_STATUS_SIZE
  tcp: fix mishandling when the sack compression is deferred.
  net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
  sfc: fix error unwinds in TC offload
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  ...

13 months agofork, vhost: Use CLONE_THREAD to fix freezer/ps regression
Mike Christie [Thu, 1 Jun 2023 18:32:32 +0000 (13:32 -0500)]
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.

This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.  This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.

For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file.  To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec.  The
coredump code and de_thread have been modified to ignore vhost threads.

Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.

Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.

Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 months agoMerge tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 1 Jun 2023 17:15:43 +0000 (10:15 -0700)]
Merge tag 'mlx5-fixes-2023-05-31' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-05-31

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  net/mlx5: Fix setting of irq->map.index for static IRQ case
  net/mlx5: Remove rmap also in case dynamic MSIX not supported
====================

Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'mptcp-fixes-for-connect-timeout-access-annotations-and-subflow-init'
Jakub Kicinski [Thu, 1 Jun 2023 17:04:06 +0000 (10:04 -0700)]
Merge branch 'mptcp-fixes-for-connect-timeout-access-annotations-and-subflow-init'

Mat Martineau says:

====================
mptcp: Fixes for connect timeout, access annotations, and subflow init

Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect
timeout on MPTCP sockets.

Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.

Patch 6 correctly initializes some subflow fields on outgoing connections.
====================

Link: https://lore.kernel.org/r/20230531-send-net-20230531-v1-0-47750c420571@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: fix active subflow finalization
Paolo Abeni [Wed, 31 May 2023 19:37:08 +0000 (12:37 -0700)]
mptcp: fix active subflow finalization

Active subflow are inserted into the connection list at creation time.
When the MPJ handshake completes successfully, a new subflow creation
netlink event is generated correctly, but the current code wrongly
avoid initializing a couple of subflow data.

The above will cause misbehavior on a few exceptional events: unneeded
mptcp-level retransmission on msk-level sequence wrap-around and infinite
mapping fallback even when a MPJ socket is present.

Address the issue factoring out the needed initialization in a new helper
and invoking the latter from __mptcp_finish_join() time for passive
subflow and from mptcp_finish_join() for active ones.

Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: add annotations around sk->sk_shutdown accesses
Paolo Abeni [Wed, 31 May 2023 19:37:07 +0000 (12:37 -0700)]
mptcp: add annotations around sk->sk_shutdown accesses

Christoph reported the mptcp variant of a recently addressed plain
TCP issue. Similar to commit e14cadfd80d7 ("tcp: add annotations around
sk->sk_shutdown accesses") add READ/WRITE ONCE annotations to silence
KCSAN reports around lockless sk_shutdown access.

Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/401
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: fix data race around msk->first access
Paolo Abeni [Wed, 31 May 2023 19:37:06 +0000 (12:37 -0700)]
mptcp: fix data race around msk->first access

The first subflow socket is accessed outside the msk socket lock
by mptcp_subflow_fail(), we need to annotate each write access
with WRITE_ONCE, but a few spots still lacks it.

Fixes: 76a13b315709 ("mptcp: invoke MP_FAIL response when needed")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: consolidate passive msk socket initialization
Paolo Abeni [Wed, 31 May 2023 19:37:05 +0000 (12:37 -0700)]
mptcp: consolidate passive msk socket initialization

When the msk socket is cloned at MPC handshake time, a few
fields are initialized in a racy way outside mptcp_sk_clone()
and the msk socket lock.

The above is due historical reasons: before commit a88d0092b24b
("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
carrying all the needed date was not available yet at msk creation
time

We can now refactor the code moving the missing initialization bit
under the socket lock, removing the init race and avoiding some
code duplication.

This will also simplify the next patch, as all msk->first write
access are now under the msk socket lock.

Fixes: 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: add annotations around msk->subflow accesses
Paolo Abeni [Wed, 31 May 2023 19:37:04 +0000 (12:37 -0700)]
mptcp: add annotations around msk->subflow accesses

The MPTCP can access the first subflow socket in a few spots
outside the socket lock scope. That is actually safe, as MPTCP
will delete the socket itself only after the msk sock close().

Still the such accesses causes a few KCSAN splats, as reported
by Christoph. Silence the harmless warning adding a few annotation
around the relevant accesses.

Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: fix connect timeout handling
Paolo Abeni [Wed, 31 May 2023 19:37:03 +0000 (12:37 -0700)]
mptcp: fix connect timeout handling

Ondrej reported a functional issue WRT timeout handling on connect
with a nice reproducer.

The problem is that the current mptcp connect waits for both the
MPTCP socket level timeout, and the first subflow socket timeout.
The latter is not influenced/touched by the exposed setsockopt().

Overall the above makes the SO_SNDTIMEO a no-op on connect.

Since mptcp_connect is invoked via inet_stream_connect and the
latter properly handle the MPTCP level timeout, we can address the
issue making the nested subflow level connect always unblocking.

This also allow simplifying a bit the code, dropping an ugly hack
to handle the fastopen and custom proto_ops connect.

The issues predates the blamed commit below, but the current resolution
requires the infrastructure introduced there.

Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'rtnetlink-a-couple-of-fixes-in-linkmsg-validation'
Jakub Kicinski [Thu, 1 Jun 2023 16:59:45 +0000 (09:59 -0700)]
Merge branch 'rtnetlink-a-couple-of-fixes-in-linkmsg-validation'

Xin Long says:

====================
rtnetlink: a couple of fixes in linkmsg validation

validate_linkmsg() was introduced to do linkmsg validation for existing
links. However, the new created links also need this linkmsg validation.

Add validate_linkmsg() check for link creating in Patch 1, and add more
tb checks into validate_linkmsg() in Patch 2 and 3.
====================

Link: https://lore.kernel.org/r/cover.1685548598.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agortnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
Xin Long [Wed, 31 May 2023 16:01:44 +0000 (12:01 -0400)]
rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg

This fixes the issue that dev gro_max_size and gso_ipv4_max_size
can be set to a huge value:

  # ip link add dummy1 type dummy
  # ip link set dummy1 gro_max_size 4294967295
  # ip -d link show dummy1
    dummy addrgenmode eui64 ... gro_max_size 4294967295

Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agortnetlink: move IFLA_GSO_ tb check to validate_linkmsg
Xin Long [Wed, 31 May 2023 16:01:43 +0000 (12:01 -0400)]
rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg

These IFLA_GSO_* tb check should also be done for the new created link,
otherwise, they can be set to a huge value when creating links:

  # ip link add dummy1 gso_max_size 4294967295 type dummy
  # ip -d link show dummy1
    dummy addrgenmode eui64 ... gso_max_size 4294967295

Fixes: 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agortnetlink: call validate_linkmsg in rtnl_create_link
Xin Long [Wed, 31 May 2023 16:01:42 +0000 (12:01 -0400)]
rtnetlink: call validate_linkmsg in rtnl_create_link

validate_linkmsg() was introduced by commit 1840bb13c22f5b ("[RTNL]:
Validate hardware and broadcast address attribute for RTM_NEWLINK")
to validate tb[IFLA_ADDRESS/BROADCAST] for existing links. The same
check should also be done for newly created links.

This patch adds validate_linkmsg() call in rtnl_create_link(), to
avoid the invalid address set when creating some devices like:

  # ip link add dummy0 type dummy
  # ip link add link dummy0 name mac0 address 01:02 type macsec

Fixes: 0e06877c6fdb ("[RTNETLINK]: rtnl_link: allow specifying initial device address")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoice: recycle/free all of the fragments from multi-buffer frame
Maciej Fijalkowski [Wed, 31 May 2023 15:44:57 +0000 (08:44 -0700)]
ice: recycle/free all of the fragments from multi-buffer frame

The ice driver caches next_to_clean value at the beginning of
ice_clean_rx_irq() in order to remember the first buffer that has to be
freed/recycled after main Rx processing loop. The end boundary is
indicated by first descriptor of frame that Rx processing loop has ended
its duties. Note that if mentioned loop ended in the middle of gathering
multi-buffer frame, next_to_clean would be pointing to the descriptor in
the middle of the frame BUT freeing/recycling stage will stop at the
first descriptor. This means that next iteration of ice_clean_rx_irq()
will miss the (first_desc, next_to_clean - 1) entries.

 When running various 9K MTU workloads, such splats were observed:

[  540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  540.787787] #PF: supervisor read access in kernel mode
[  540.793002] #PF: error_code(0x0000) - not-present page
[  540.798218] PGD 0 P4D 0
[  540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G        W          6.3.0-rc7+ #96
[  540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice]
[  540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98
[  540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282
[  540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe
[  540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[  540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100
[  540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004
[  540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040
[  540.890253] FS:  00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000
[  540.898465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0
[  540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  540.926032] PKRU: 55555554
[  540.928790] Call Trace:
[  540.931276]  <TASK>
[  540.933418]  ice_napi_poll+0x4ca/0x6d0 [ice]
[  540.937804]  ? __pfx_ice_napi_poll+0x10/0x10 [ice]
[  540.942716]  napi_busy_loop+0xd7/0x320
[  540.946537]  xsk_recvmsg+0x143/0x170
[  540.950178]  sock_recvmsg+0x99/0xa0
[  540.953729]  __sys_recvfrom+0xa8/0x120
[  540.957543]  ? do_futex+0xbd/0x1d0
[  540.961008]  ? __x64_sys_futex+0x73/0x1d0
[  540.965083]  __x64_sys_recvfrom+0x20/0x30
[  540.969155]  do_syscall_64+0x38/0x90
[  540.972796]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  540.977934] RIP: 0033:0x7f6de5f27934

To fix this, set cached_ntc to first_desc so that at the end, when
freeing/recycling buffers, descriptors from first to ntc are not missed.

Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: phy: mxl-gpy: extend interrupt fix to all impacted variants
Xu Liang [Wed, 31 May 2023 07:48:22 +0000 (15:48 +0800)]
net: phy: mxl-gpy: extend interrupt fix to all impacted variants

The interrupt fix in commit 97a89ed101bb should be applied on all variants
of GPY2xx PHY and GPY115C.

Fixes: 97a89ed101bb ("net: phy: mxl-gpy: disable interrupts on GPY215 by default")
Signed-off-by: Xu Liang <lxu@maxlinear.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531074822.39136-1-lxu@maxlinear.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: renesas: rswitch: Fix return value in error path of xmit
Yoshihiro Shimoda [Mon, 29 May 2023 07:38:17 +0000 (16:38 +0900)]
net: renesas: rswitch: Fix return value in error path of xmit

Fix return value in the error path of rswitch_start_xmit(). If TX
queues are full, this function should return NETDEV_TX_BUSY.

Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20230529073817.1145208-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 1 Jun 2023 15:18:20 +0000 (11:18 -0400)]
Merge tag 'firewire-fixes-6.4-rc4' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
 "A single patch to use a flexible array rather than a zero-length one"

* tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: Replace zero-length array with flexible-array member

13 months agoMerge tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujit...
Linus Torvalds [Thu, 1 Jun 2023 15:13:10 +0000 (11:13 -0400)]
Merge tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujitsu/integration

Pull mailbox fix from Jassi Brar:
 "Fix missing mutex unlock in mailbox-test"

* tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: mailbox-test: fix a locking issue in mbox_test_message_write()

13 months agonet: dsa: mv88e6xxx: Increase wait after reset deactivation
Andreas Svensson [Tue, 30 May 2023 14:52:23 +0000 (16:52 +0200)]
net: dsa: mv88e6xxx: Increase wait after reset deactivation

A switch held in reset by default needs to wait longer until we can
reliably detect it.

An issue was observed when testing on the Marvell 88E6393X (Link Street).
The driver failed to detect the switch on some upstarts. Increasing the
wait time after reset deactivation solves this issue.

The updated wait time is now also the same as the wait time in the
mv88e6xxx_hardware_reset function.

Fixes: 7b75e49de424 ("net: dsa: mv88e6xxx: wait after reset deactivation")
Signed-off-by: Andreas Svensson <andreas.svensson@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230530145223.1223993-1-andreas.svensson@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agofirewire: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Mon, 29 May 2023 18:52:07 +0000 (12:52 -0600)]
firewire: Replace zero-length array with flexible-array member

Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.

Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
sound/firewire/amdtp-stream.c: In function ‘build_it_pkt_header’:
sound/firewire/amdtp-stream.c:694:17: warning: ‘generate_cip_header’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
  694 |                 generate_cip_header(s, cip_header, data_block_counter, syt);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/firewire/amdtp-stream.c:694:17: note: referencing argument 2 of type ‘__be32[2]’ {aka ‘unsigned int[2]’}
sound/firewire/amdtp-stream.c:667:13: note: in a call to function ‘generate_cip_header’
  667 | static void generate_cip_header(struct amdtp_stream *s, __be32 cip_header[2],
      |             ^~~~~~~~~~~~~~~~~~~

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].

Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/303
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/ZHT0V3SpvHyxCv5W@work
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
13 months agoMerge tag 'for-linus-2023060101' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 1 Jun 2023 13:02:04 +0000 (09:02 -0400)]
Merge tag 'for-linus-2023060101' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - Regression fix for overlong long timeouts during initialization on
   some Logitech Unifying devices (Bastien Nocera)

 - error handling and overflow fixes for Wacom driver (Denis Arefev,
   Jason Gerecke, Nikita Zhandarovich)

* tag 'for-linus-2023060101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-hidpp: Handle timeout differently from busy
  HID: wacom: Add error check to wacom_parse_and_register()
  HID: google: add jewel USB id
  HID: wacom: avoid integer overflow in wacom_intuos_inout()
  HID: wacom: Check for string overflow from strscpy calls

13 months agoMerge tag 'ata-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Thu, 1 Jun 2023 12:41:33 +0000 (08:41 -0400)]
Merge tag 'ata-6.4-rc5' of git://git./linux/kernel/git/dlemoal/libata

Pull ata fix from Damien Le Moal:

 - Fix ata_find_dev() use of the device number to find a struct
   ata_device for a port. This addresses issues with some passthrough
   commands with libsas managed devices.

* tag 'ata-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-scsi: Use correct device no in ata_find_dev()

13 months agoMerge tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 1 Jun 2023 12:27:34 +0000 (08:27 -0400)]
Merge tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Eight server fixes (most also for stable):

   - Two fixes for uninitialized pointer reads (rename and link)

   - Fix potential UAF in oplock break

   - Two fixes for potential out of bound reads in negotiate

   - Fix crediting bug

   - Two fixes for xfstests (allocation size fix for test 694 and lookup
     issue shown by test 464)"

* tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: call putname after using the last component
  ksmbd: fix incorrect AllocationSize set in smb2_get_info
  ksmbd: fix UAF issue from opinfo->conn
  ksmbd: fix multiple out-of-bounds read during context decoding
  ksmbd: fix slab-out-of-bounds read in smb2_handle_negotiate
  ksmbd: fix credit count leakage
  ksmbd: fix uninitialized pointer read in smb2_create_link()
  ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()

13 months agoMerge branch 'splice-net-handle-msg_splice_pages-in-chelsio-tls'
Paolo Abeni [Thu, 1 Jun 2023 11:41:40 +0000 (13:41 +0200)]
Merge branch 'splice-net-handle-msg_splice_pages-in-chelsio-tls'

David Howells says:

====================
splice, net: Handle MSG_SPLICE_PAGES in Chelsio-TLS

Here are patches to make Chelsio-TLS handle the MSG_SPLICE_PAGES internal
sendmsg flag.  MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can.  Its sendpage
implementation is then turned into a wrapper around that.
====================

Link: https://lore.kernel.org/r/20230531110008.642903-1-dhowells@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agochelsio: Convert chtls_sendpage() to use MSG_SPLICE_PAGES
David Howells [Wed, 31 May 2023 11:00:08 +0000 (12:00 +0100)]
chelsio: Convert chtls_sendpage() to use MSG_SPLICE_PAGES

Convert chtls_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ayush Sawal <ayush.sawal@chelsio.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agochelsio: Support MSG_SPLICE_PAGES
David Howells [Wed, 31 May 2023 11:00:07 +0000 (12:00 +0100)]
chelsio: Support MSG_SPLICE_PAGES

Make Chelsio's TLS offload sendmsg() support MSG_SPLICE_PAGES, splicing in
pages from the source iterator if possible and copying the data in
otherwise.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ayush Sawal <ayush.sawal@chelsio.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: ipa: Use correct value for IPA_STATUS_SIZE
Bert Karwatzki [Wed, 31 May 2023 10:36:19 +0000 (12:36 +0200)]
net: ipa: Use correct value for IPA_STATUS_SIZE

IPA_STATUS_SIZE was introduced in commit b8dc7d0eea5a as a replacement
for the size of the removed struct ipa_status which had size
sizeof(__le32[8]). Use this value as IPA_STATUS_SIZE.

Fixes: b8dc7d0eea5a ("net: ipa: stop using sizeof(status)")
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531103618.102608-1-spasswolf@web.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agotcp: fix mishandling when the sack compression is deferred.
fuyuanli [Wed, 31 May 2023 08:01:50 +0000 (16:01 +0800)]
tcp: fix mishandling when the sack compression is deferred.

In this patch, we mainly try to handle sending a compressed ack
correctly if it's deferred.

Here are more details in the old logic:
When sack compression is triggered in the tcp_compressed_ack_kick(),
if the sock is owned by user, it will set TCP_DELACK_TIMER_DEFERRED
and then defer to the release cb phrase. Later once user releases
the sock, tcp_delack_timer_handler() should send a ack as expected,
which, however, cannot happen due to lack of ICSK_ACK_TIMER flag.
Therefore, the receiver would not sent an ack until the sender's
retransmission timeout. It definitely increases unnecessary latency.

Fixes: 5d9f4262b7ea ("tcp: add SACK compression")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/netdev/20230529113804.GA20300@didi-ThinkCentre-M920t-N000/
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230531080150.GA20424@didi-ThinkCentre-M920t-N000
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet/sched: flower: fix possible OOB write in fl_set_geneve_opt()
Hangyu Hua [Wed, 31 May 2023 10:28:04 +0000 (18:28 +0800)]
net/sched: flower: fix possible OOB write in fl_set_geneve_opt()

If we send two TCA_FLOWER_KEY_ENC_OPTS_GENEVE packets and their total
size is 252 bytes(key->enc_opts.len = 252) then
key->enc_opts.len = opt->length = data_len / 4 = 0 when the third
TCA_FLOWER_KEY_ENC_OPTS_GENEVE packet enters fl_set_geneve_opt. This
bypasses the next bounds check and results in an out-of-bounds.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Link: https://lore.kernel.org/r/20230531102805.27090-1-hbh25y@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoMerge branch 'wangxun-netdev-features-support'
Jakub Kicinski [Thu, 1 Jun 2023 06:02:28 +0000 (23:02 -0700)]
Merge branch 'wangxun-netdev-features-support'

Mengyuan Lou says:

====================
Wangxun netdev features support

Implement tx_csum and rx_csum to support hardware checksum offload.
Implement ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid.
Implement ndo_set_features.
Enable macros in netdev features which wangxun can support.
====================

Link: https://lore.kernel.org/r/20230530022632.17938-1-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: txgbe: Implement vlan add and remove ops
Mengyuan Lou [Tue, 30 May 2023 02:26:32 +0000 (10:26 +0800)]
net: txgbe: Implement vlan add and remove ops

txgbe add ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: txgbe: Add netdev features support
Mengyuan Lou [Tue, 30 May 2023 02:26:31 +0000 (10:26 +0800)]
net: txgbe: Add netdev features support

Add features and hw_features that ngbe can support.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: ngbe: Implement vlan add and remove ops
Mengyuan Lou [Tue, 30 May 2023 02:26:30 +0000 (10:26 +0800)]
net: ngbe: Implement vlan add and remove ops

ngbe add ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: ngbe: Add netdev features support
Mengyuan Lou [Tue, 30 May 2023 02:26:29 +0000 (10:26 +0800)]
net: ngbe: Add netdev features support

Add features and hw_features that ngbe can support.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: libwx: Implement xx_set_features ops
Mengyuan Lou [Tue, 30 May 2023 02:26:28 +0000 (10:26 +0800)]
net: libwx: Implement xx_set_features ops

Implement wx_set_features function which to support
ndo_set_features.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: wangxun: Implement vlan add and kill functions
Mengyuan Lou [Tue, 30 May 2023 02:26:27 +0000 (10:26 +0800)]
net: wangxun: Implement vlan add and kill functions

Implement vlan add/kill functions which add and remove
vlan id in hardware.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: wangxun: libwx add rx offload functions
Mengyuan Lou [Tue, 30 May 2023 02:26:26 +0000 (10:26 +0800)]
net: wangxun: libwx add rx offload functions

Add rx offload functions for wx_clean_rx_irq
which supports ngbe and txgbe to implement
rx offload function.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: wangxun: libwx add tx offload functions
Mengyuan Lou [Tue, 30 May 2023 02:26:25 +0000 (10:26 +0800)]
net: wangxun: libwx add tx offload functions

Add tx offload functions for wx_xmit_frame_ring which
includes wx_encode_tx_desc_ptype, wx_tso and wx_tx_csum.
which supports ngbe and txgbe to implement tx offload
function.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: make health report on unregistered instance warn just once
Jakub Kicinski [Wed, 31 May 2023 01:55:23 +0000 (18:55 -0700)]
devlink: make health report on unregistered instance warn just once

Devlink health is involved in error recovery. Machines in bad
state tend to be fairly unreliable, and occasionally get stuck
in error loops. Even with a reasonable grace period devlink health
may get a thousand reports in an hour.

In case of reporting on an unregistered devlink instance
the subsequent reports don't add much value. Switch to
WARN_ON_ONCE() to avoid flooding dmesg and fleet monitoring
dashboards.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230531015523.48961-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'add-support-for-vsc85xx-dt-rgmii-delays'
Jakub Kicinski [Thu, 1 Jun 2023 05:33:46 +0000 (22:33 -0700)]
Merge branch 'add-support-for-vsc85xx-dt-rgmii-delays'

Harini Katakam says:

====================
Add support for VSC85xx DT RGMII delays

Provide an option to change RGMII delay value via devicetree.
====================

Link: https://lore.kernel.org/r/20230529122017.10620-1-harini.katakam@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agophy: mscc: Add support for RGMII delay configuration
Harini Katakam [Mon, 29 May 2023 12:20:17 +0000 (17:50 +0530)]
phy: mscc: Add support for RGMII delay configuration

Add support for optional rx/tx-internal-delay-ps from devicetree.
- When rx/tx-internal-delay-ps is/are specified, these take priority
- When either is absent,
1) use 2ns for respective settings if rgmii-id/rxid/txid is/are present
2) use 0.2ns for respective settings if mode is rgmii

Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agophy: mscc: Use PHY_ID_MATCH_VENDOR to minimize PHY ID table
Harini Katakam [Mon, 29 May 2023 12:20:16 +0000 (17:50 +0530)]
phy: mscc: Use PHY_ID_MATCH_VENDOR to minimize PHY ID table

All the PHY devices variants specified have the same mask and
hence can be simplified to one vendor look up for 0x00070400.
Any individual config can be identified by PHY_ID_MATCH_EXACT
in the respective structure.

Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agosfc: fix error unwinds in TC offload
Edward Cree [Tue, 30 May 2023 20:25:27 +0000 (21:25 +0100)]
sfc: fix error unwinds in TC offload

Failure ladders weren't exactly unwinding what the function had done up
 to that point; most seriously, when we encountered an already offloaded
 rule, the failure path tried to remove the new rule from the hashtable,
 which would in fact remove the already-present 'old' rule (since it has
 the same key) from the table, and leak its resources.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202305200745.xmIlkqjH-lkp@intel.com/
Fixes: d902e1a737d4 ("sfc: bare bones TC offload on EF100")
Fixes: 17654d84b47c ("sfc: add offloading of 'foreign' TC (decap) rules")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230530202527.53115-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: don't set sw irq coalescing defaults in case of PREEMPT_RT
Heiner Kallweit [Sun, 28 May 2023 17:39:59 +0000 (19:39 +0200)]
net: don't set sw irq coalescing defaults in case of PREEMPT_RT

If PREEMPT_RT is set, then assume that the user focuses on minimum
latency. Therefore don't set sw irq coalescing defaults.
This affects the defaults only, users can override these settings
via sysfs.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/f9439c7f-c92c-4c2c-703e-110f96d841b7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet/mlx5: Read embedded cpu after init bit cleared
Moshe Shemesh [Fri, 28 Apr 2023 10:48:13 +0000 (13:48 +0300)]
net/mlx5: Read embedded cpu after init bit cleared

During driver load it reads embedded_cpu bit from initialization
segment, but the initialization segment is readable only after
initialization bit is cleared.

Move the call to mlx5_read_embedded_cpu() right after initialization bit
cleared.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 591905ba9679 ("net/mlx5: Introduce Mellanox SmartNIC and modify page management logic")
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5e: Fix error handling in mlx5e_refresh_tirs
Saeed Mahameed [Sun, 28 May 2023 06:07:08 +0000 (23:07 -0700)]
net/mlx5e: Fix error handling in mlx5e_refresh_tirs

Allocation failure is outside the critical lock section and should
return immediately rather than jumping to the unlock section.

Also unlock as soon as required and remove the now redundant jump label.

Fixes: 80a2a9026b24 ("net/mlx5e: Add a lock on tir list")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Ensure af_desc.mask is properly initialized
Chuck Lever [Wed, 31 May 2023 19:48:25 +0000 (15:48 -0400)]
net/mlx5: Ensure af_desc.mask is properly initialized

[    9.837087] mlx5_core 0000:02:00.0: firmware version: 16.35.2000
[    9.843126] mlx5_core 0000:02:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
[   10.311515] mlx5_core 0000:02:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[   10.321948] mlx5_core 0000:02:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
[   10.344324] mlx5_core 0000:02:00.0: mlx5_pcie_event:301:(pid 88): PCIe slot advertised sufficient power (27W).
[   10.354339] BUG: unable to handle page fault for address: ffffffff8ff0ade0
[   10.361206] #PF: supervisor read access in kernel mode
[   10.366335] #PF: error_code(0x0000) - not-present page
[   10.371467] PGD 81ec39067 P4D 81ec39067 PUD 81ec3a063 PMD 114b07063 PTE 800ffff7e10f5062
[   10.379544] Oops: 0000 [#1] PREEMPT SMP PTI
[   10.383721] CPU: 0 PID: 117 Comm: kworker/0:6 Not tainted 6.3.0-13028-g7222f123c983 #1
[   10.391625] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b 06/12/2017
[   10.398750] Workqueue: events work_for_cpu_fn
[   10.403108] RIP: 0010:__bitmap_or+0x10/0x26
[   10.407286] Code: 85 c0 0f 95 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 89 c9 31 c0 48 83 c1 3f 48 c1 e9 06 39 c>
[   10.426024] RSP: 0000:ffffb45a0078f7b0 EFLAGS: 00010097
[   10.431240] RAX: 0000000000000000 RBX: ffffffff8ff0adc0 RCX: 0000000000000004
[   10.438365] RDX: ffff9156801967d0 RSI: ffffffff8ff0ade0 RDI: ffff9156801967b0
[   10.445489] RBP: ffffb45a0078f7e8 R08: 0000000000000030 R09: 0000000000000000
[   10.452613] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000ec
[   10.459737] R13: ffffffff8ff0ade0 R14: 0000000000000001 R15: 0000000000000020
[   10.466862] FS:  0000000000000000(0000) GS:ffff9165bfc00000(0000) knlGS:0000000000000000
[   10.474936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.480674] CR2: ffffffff8ff0ade0 CR3: 00000001011ae003 CR4: 00000000003706f0
[   10.487800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.494922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.502046] Call Trace:
[   10.504493]  <TASK>
[   10.506589]  ? matrix_alloc_area.constprop.0+0x43/0x9a
[   10.511729]  ? prepare_namespace+0x84/0x174
[   10.515914]  irq_matrix_reserve_managed+0x56/0x10c
[   10.520699]  x86_vector_alloc_irqs+0x1d2/0x31e
[   10.525146]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
[   10.530284]  irq_domain_alloc_irqs_parent+0x1a/0x2a
[   10.535155]  intel_irq_remapping_alloc+0x59/0x5e9
[   10.539859]  ? kmem_cache_debug_flags+0x11/0x26
[   10.544383]  ? __radix_tree_lookup+0x39/0xb9
[   10.548649]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
[   10.553779]  irq_domain_alloc_irqs_parent+0x1a/0x2a
[   10.558650]  msi_domain_alloc+0x8c/0x120
[   10.567697]  irq_domain_alloc_irqs_locked+0x11d/0x286
[   10.572741]  __irq_domain_alloc_irqs+0x72/0x93
[   10.577179]  __msi_domain_alloc_irqs+0x193/0x3f1
[   10.581789]  ? __xa_alloc+0xcf/0xe2
[   10.585273]  msi_domain_alloc_irq_at+0xa8/0xfe
[   10.589711]  pci_msix_alloc_irq_at+0x47/0x5c

The crash is due to matrix_alloc_area() attempting to access per-CPU
memory for CPUs that are not present on the system. The CPU mask
passed into reserve_managed_vector() via it's @irqd parameter is
corrupted because it contains uninitialized stack data.

Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Fix setting of irq->map.index for static IRQ case
Niklas Schnelle [Wed, 31 May 2023 08:48:56 +0000 (10:48 +0200)]
net/mlx5: Fix setting of irq->map.index for static IRQ case

When dynamic IRQ allocation is not supported all IRQs are allocated up
front in mlx5_irq_table_create() instead of dynamically as part of
mlx5_irq_alloc(). In the latter dynamic case irq->map.index is set
via the mapping returned by pci_msix_alloc_irq_at(). In the static case
and prior to commit 1da438c0ae02 ("net/mlx5: Fix indexing of mlx5_irq")
irq->map.index was set in mlx5_irq_alloc() twice once initially to 0 and
then to the requested index before storing in the xarray. After this
commit it is only set to 0 which breaks all other IRQ mappings.

Fix this by setting irq->map.index to the requested index together with
irq->map.virq and improve the related comment to make it clearer which
cases it deals with.

Cc: Chuck Lever III <chuck.lever@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Fixes: 1da438c0ae02 ("net/mlx5: Fix indexing of mlx5_irq")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Remove rmap also in case dynamic MSIX not supported
Shay Drory [Tue, 30 May 2023 08:59:34 +0000 (11:59 +0300)]
net/mlx5: Remove rmap also in case dynamic MSIX not supported

mlx5 add IRQs to rmap upon MSIX request, and mlx5 remove rmap from
MSIX only if msi_map.index is populated. However, msi_map.index is
populated only when dynamic MSIX is supported. This results in freeing
IRQs without removing them from rmap, which triggers the bellow
WARN_ON[1].

rmap is a feature which have no relation to dynamic MSIX.
Hence, remove the check of msi_map.index when removing IRQ from rmap.

[1]
[  200.307160 ] WARNING: CPU: 20 PID: 1702 at kernel/irq/manage.c:2034 free_irq+0x2ac/0x358
[  200.316990 ] CPU: 20 PID: 1702 Comm: modprobe Not tainted 6.4.0-rc3_for_upstream_min_debug_2023_05_24_14_02 #1
[  200.318939 ] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  200.321659 ] pc : free_irq+0x2ac/0x358
[  200.322400 ] lr : free_irq+0x20/0x358
[  200.337865 ] Call trace:
[  200.338360 ]  free_irq+0x2ac/0x358
[  200.339029 ]  irq_release+0x58/0xd0 [mlx5_core]
[  200.340093 ]  mlx5_irqs_release_vectors+0x80/0xb0 [mlx5_core]
[  200.341344 ]  destroy_comp_eqs+0x120/0x170 [mlx5_core]
[  200.342469 ]  mlx5_eq_table_destroy+0x1c/0x38 [mlx5_core]
[  200.343645 ]  mlx5_unload+0x8c/0xc8 [mlx5_core]
[  200.344652 ]  mlx5_uninit_one+0x78/0x118 [mlx5_core]
[  200.345745 ]  remove_one+0x80/0x108 [mlx5_core]
[  200.346752 ]  pci_device_remove+0x40/0xd8
[  200.347554 ]  device_remove+0x50/0x88
[  200.348272 ]  device_release_driver_internal+0x1c4/0x228
[  200.349312 ]  driver_detach+0x54/0xa0
[  200.350030 ]  bus_remove_driver+0x74/0x100
[  200.350833 ]  driver_unregister+0x34/0x68
[  200.351619 ]  pci_unregister_driver+0x28/0xa0
[  200.352476 ]  mlx5_cleanup+0x14/0x2210 [mlx5_core]
[  200.353536 ]  __arm64_sys_delete_module+0x190/0x2e8
[  200.354495 ]  el0_svc_common.constprop.0+0x6c/0x1d0
[  200.355455 ]  do_el0_svc+0x38/0x98
[  200.356122 ]  el0_svc+0x1c/0x80
[  200.356739 ]  el0t_64_sync_handler+0xb4/0x130
[  200.357604 ]  el0t_64_sync+0x174/0x178
[  200.358345 ] ---[ end trace 0000000000000000  ]---

Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agoMerge tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Wed, 31 May 2023 23:24:01 +0000 (19:24 -0400)]
Merge tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Four small smb3 client fixes:

   - two small fixes suggested by kernel test robot

   - small cleanup fix

   - update Paulo's email address in the maintainer file"

* tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: address unused variable warning
  smb: delete an unnecessary statement
  smb3: missing null check in SMB2_change_notify
  smb3: update a reviewer email in MAINTAINERS file

13 months agomailbox: mailbox-test: fix a locking issue in mbox_test_message_write()
Dan Carpenter [Fri, 5 May 2023 09:22:09 +0000 (12:22 +0300)]
mailbox: mailbox-test: fix a locking issue in mbox_test_message_write()

There was a bug where this code forgot to unlock the tdev->mutex if the
kzalloc() failed.  Fix this issue, by moving the allocation outside the
lock.

Fixes: 2d1e952a2b8e ("mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
13 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Wed, 31 May 2023 18:13:09 +0000 (14:13 -0400)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

 - Fix 64K ARM page size support in bnxt_re and efa

 - bnxt_re fixes for a memory leak, incorrect error handling and a
   remove a bogus FW failure when running on a VF

 - Update MAINTAINERS for hns and efa

 - Fix two rxe regressions added this merge window in error unwind and
   incorrect spinlock primitives

 - hns gets a better algorithm for allocating page tables to avoid
   running out of resources, and a timeout adjustment

 - Fix a text case failure in hns

 - Use after free in irdma and fix incorrect construction of a WQE
   causing mis-execution

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/irdma: Fix Local Invalidate fencing
  RDMA/irdma: Prevent QP use after free
  MAINTAINERS: Update maintainer of Amazon EFA driver
  RDMA/bnxt_re: Do not enable congestion control on VFs
  RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx
  RDMA/bnxt_re: Fix a possible memory leak
  RDMA/hns: Modify the value of long message loopback slice
  RDMA/hns: Fix base address table allocation
  RDMA/hns: Fix timeout attr in query qp for HIP08
  RDMA/efa: Fix unsupported page sizes in device
  RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to spin_{lock_irqsave,unlock_irqrestore}
  RDMA/rxe: Fix double unlock in rxe_qp.c
  MAINTAINERS: Update maintainers of HiSilicon RoCE
  RDMA/bnxt_re: Fix the page_size used during the MR creation

13 months agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 31 May 2023 18:06:01 +0000 (14:06 -0400)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix two regressions in ext4 and a number of issues reported by syzbot"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: enable the lazy init thread when remounting read/write
  ext4: fix fsync for non-directories
  ext4: add lockdep annotations for i_data_sem for ea_inode's
  ext4: disallow ea_inodes with extended attributes
  ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find()
  ext4: add EA_INODE checking to ext4_iget()

13 months agoHID: logitech-hidpp: Handle timeout differently from busy
Bastien Nocera [Wed, 31 May 2023 08:24:28 +0000 (10:24 +0200)]
HID: logitech-hidpp: Handle timeout differently from busy

If an attempt at contacting a receiver or a device fails because the
receiver or device never responds, don't restart the communication, only
restart it if the receiver or device answers that it's busy, as originally
intended.

This was the behaviour on communication timeout before commit 586e8fede795
("HID: logitech-hidpp: Retry commands when device is busy").

This fixes some overly long waits in a critical path on boot, when
checking whether the device is connected by getting its HID++ version.

Signed-off-by: Bastien Nocera <hadess@hadess.net>
Suggested-by: Mark Lord <mlord@pobox.com>
Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
13 months agoudp6: Fix race condition in udp6_sendmsg & connect
Vladislav Efanov [Tue, 30 May 2023 11:39:41 +0000 (14:39 +0300)]
udp6: Fix race condition in udp6_sendmsg & connect

Syzkaller got the following report:
BUG: KASAN: use-after-free in sk_setup_caps+0x621/0x690 net/core/sock.c:2018
Read of size 8 at addr ffff888027f82780 by task syz-executor276/3255

The function sk_setup_caps (called by ip6_sk_dst_store_flow->
ip6_dst_store) referenced already freed memory as this memory was
freed by parallel task in udpv6_sendmsg->ip6_sk_dst_lookup_flow->
sk_dst_check.

          task1 (connect)              task2 (udp6_sendmsg)
        sk_setup_caps->sk_dst_set |
                                  |  sk_dst_check->
                                  |      sk_dst_set
                                  |      dst_release
        sk_setup_caps references  |
        to already freed dst_entry|

The reason for this race condition is: sk_setup_caps() keeps using
the dst after transferring the ownership to the dst cache.

Found by Linux Verification Center (linuxtesting.org) with syzkaller.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'xstats-for-tc-taprio'
David S. Miller [Wed, 31 May 2023 09:00:30 +0000 (10:00 +0100)]
Merge branch 'xstats-for-tc-taprio'

Vladimir Oltean says:

====================
xstats for tc-taprio

As a result of this discussion:
https://lore.kernel.org/intel-wired-lan/20230411055543.24177-1-muhammad.husaini.zulkifli@intel.com/

it became apparent that tc-taprio should make an effort to standardize
statistics counters related to the 802.1Qbv scheduling as implemented
by the NIC. I'm presenting here one counter suggested by the standard,
and one counter defined by the NXP ENETC controller from LS1028A. Both
counters are reported globally and per traffic class - drivers get
different callbacks for reporting both of these, and get to choose what
to report in both cases.

The iproute2 counterpart is available here for testing:
https://github.com/vladimiroltean/iproute2/commits/taprio-xstats
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: enetc: report statistics counters for taprio
Vladimir Oltean [Tue, 30 May 2023 09:19:48 +0000 (12:19 +0300)]
net: enetc: report statistics counters for taprio

Report the "win_drop" counter from the unstructured ethtool -S as
TCA_TAPRIO_OFFLOAD_STATS_WINDOW_DROPS to the Qdisc layer. It is
available both as a global counter as well as a per-TC one.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>