git.osdn.net Git - tomoyo/tomoyo-test1.git/log

net: hns3: debugfs add support dumping page pool info

Add a file node "page_pool_info" for debugfs, then cat this
file node to dump page pool info as below:

QUEUE_ID  ALLOCATE_CNT  FREE_CNT      POOL_SIZE(PAGE_NUM)  ORDER  NUMA_ID  MAX_LEN
0         512           0             512                  0      2        4K
1         512           0             512                  0      2        4K
2         512           0             512                  0      2        4K
3         512           0             512                  0      2        4K
4         512           0             512                  0      2        4K

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tulip: fix setting device address from rom

I missed removing i from the array index when converting
from a loop to a direct copy.

Fixes: ca8793175564 ("ethernet: tulip: remove direct netdev->dev_addr writes")
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Managed-Neighbor-Entries'

Daniel Borkmann says:

====================
Managed Neighbor Entries

This series adds a couple of fixes related to NTF_EXT_LEARNED and NTF_USE
neighbor flags, extends the UAPI with a new NDA_FLAGS_EXT netlink attribute
in order to be able to add new neighbor flags from user space given all
current struct ndmsg / ndm_flags bits are used up. Finally, the core of this
series adds a new NTF_EXT_MANAGED flag to neighbors, which allows user space
control planes to add 'managed' neighbor entries. Meaning, user space may
either transition existing entries or can push down new L3 entries without
lladdr into the kernel where the latter will periodically try to keep such
NTF_EXT_MANAGED managed entries in reachable state. Main use case for this
series are XDP / tc BPF load-balancers which make use of the bpf_fib_lookup()
helper for backends. For more details, please see individual patches. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net, neigh: Add NTF_MANAGED flag for managed neighbor entries

Allow a user space control plane to insert entries with a new NTF_EXT_MANAGED
flag. The flag then indicates to the kernel that the neighbor entry should be
periodically probed for keeping the entry in NUD_REACHABLE state iff possible.

The use case for this is targeting XDP or tc BPF load-balancers which use
the bpf_fib_lookup() BPF helper in order to piggyback on neighbor resolution
for their backends. Given they cannot be resolved in fast-path, a control
plane inserts the L3 (without L2) entries manually into the neighbor table
and lets the kernel do the neighbor resolution either on the gateway or on
the backend directly in case the latter resides in the same L2. This avoids
to deal with L2 in the control plane and to rebuild what the kernel already
does best anyway.

NTF_EXT_MANAGED can be combined with NTF_EXT_LEARNED in order to avoid GC
eviction. The kernel then adds NTF_MANAGED flagged entries to a per-neighbor
table which gets triggered by the system work queue to periodically call
neigh_event_send() for performing the resolution. The implementation allows
migration from/to NTF_MANAGED neighbor entries, so that already existing
entries can be converted by the control plane if needed. Potentially, we could
make the interval for periodically calling neigh_event_send() configurable;
right now it's set to DELAY_PROBE_TIME which is also in line with mlxsw which
has similar driver-internal infrastructure c723c735fa6b ("mlxsw: spectrum_router:
Periodically update the kernel's neigh table"). In future, the latter could
possibly reuse the NTF_MANAGED neighbors as well.

Example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Link: https://linuxplumbersconf.org/event/11/contributions/953/
Signed-off-by: David S. Miller <davem@davemloft.net>

net, neigh: Extend neigh->flags to 32 bit to allow for extensions

Currently, all bits in struct ndmsg's ndm_flags are used up with the most
recent addition of 435f2e7cc0b7 ("net: bridge: add support for sticky fdb
entries"). This makes it impossible to extend the neighboring subsystem
with new NTF_* flags:

  struct ndmsg {
    __u8   ndm_family;
    __u8   ndm_pad1;
    __u16  ndm_pad2;
    __s32  ndm_ifindex;
    __u16  ndm_state;
    __u8   ndm_flags;
    __u8   ndm_type;
  };

There are ndm_pad{1,2} attributes which are not used. However, due to
uncareful design, the kernel does not enforce them to be zero upon new
neighbor entry addition, and given they've been around forever, it is
not possible to reuse them today due to risk of breakage. One option to
overcome this limitation is to add a new NDA_FLAGS_EXT attribute for
extended flags.

In struct neighbour, there is a 3 byte hole between protocol and ha_lock,
which allows neigh->flags to be extended from 8 to 32 bits while still
being on the same cacheline as before. This also allows for all future
NTF_* flags being in neigh->flags rather than yet another flags field.
Unknown flags in NDA_FLAGS_EXT will be rejected by the kernel.

Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE

Currently, it is not possible to migrate a neighbor entry between NUD_PERMANENT
state and NTF_USE flag with a dynamic NUD state from a user space control plane.
Similarly, it is not possible to add/remove NTF_EXT_LEARNED flag from an existing
neighbor entry in combination with NTF_USE flag.

This is due to the latter directly calling into neigh_event_send() without any
meta data updates as happening in __neigh_update(). Thus, to enable this use
case, extend the latter with a NEIGH_UPDATE_F_USE flag where we break the
NUD_PERMANENT state in particular so that a latter neigh_event_send() is able
to re-resolve a neighbor entry.

Before fix, NUD_PERMANENT -> NUD_* & NTF_USE:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
  [...]
  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
  [...]

As can be seen, despite the admin-triggered replace, the entry remains in the
NUD_PERMANENT state.

After fix, NUD_PERMANENT -> NUD_* & NTF_USE:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
  [...]
  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
  [...]
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn STALE
  [...]
  # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
  [...]

After the fix, the admin-triggered replace switches to a dynamic state from
the NTF_USE flag which triggered a new neighbor resolution. Likewise, we can
transition back from there, if needed, into NUD_PERMANENT.

Similar before/after behavior can be observed for below transitions:

Before fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
  [...]
  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
  [...]

After fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
  [...]
  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
  [...]
  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
  [..]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USE

The NTF_EXT_LEARNED neigh flag is usually propagated back to user space
upon dump of the neighbor table. However, when used in combination with
NTF_USE flag this is not the case despite exempting the entry from the
garbage collector. This results in inconsistent state since entries are
typically marked in neigh->flags with NTF_EXT_LEARNED, but here they are
not. Fix it by propagating the creation flag to ___neigh_create().

Before fix:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
  [...]

After fix:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
  [...]

Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns: Prefer struct_size over open coded arithmetic

As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

So, take the opportunity to refactor the hnae_handle structure to switch
the last member to flexible array, changing the code accordingly. Also,
fix the comment in the hnae_vf_cb structure to inform that the ae_handle
member must be the last member.

Then, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kzalloc() function.

This code was detected with the help of Coccinelle and audited and fixed
manually.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-ECN-mirroring'

Ido Schimmel says:

====================
mlxsw: Add support for ECN mirroring

Petr says:

Patches in this set have been floating around for some time now together
with trap_fwd support. That will however need more work, time for which is
nowhere to be found, apparently. Instead, this patchset enables offload of
only packet mirroring on RED mark qevent, enabling mirroring of ECN-marked
packets.

Formally it enables offload of filters added to blocks bound to the RED
qevent mark if:

- The switch ASIC is Spectrum-2 or above.
- Only a single filter is attached at the block, at chain 0 (the default),
and its classifier is matchall.
- The filter has hw_stats set to disabled.
- The filter has a single action, which is mirror.

This differs from early_drop qevent offload, which supports mirroring and
trapping. However trapping in context of ECN-marked packets is not
suitable, because the HW does not drop the packet, as the trap action
implies. And there is as of now no way to express only the part of trapping
that transfers the packet to the SW datapath, sans the HW-datapath drop.

The patchset progresses as follows:

Patch #1 is an extack propagation.

Mirroring of ECN-marked packets is configured in the ASIC through an ECN
trigger, which is considered "egress", unlike the EARLY_DROP trigger.
In patch #2, add a helper to classify triggers as ingress.

As clarified above, traps cannot be offloaded on mark qevent. Similarly,
given a trap_fwd action, it would not be offloadable on early_drop qevent.
In patch #3, introduce support for tracking actions permissible on a given
block.

Patch #4 actually adds the mark qevent offload.

In patch #5, fix a small style issue in one of the selftests, and in
patch #6 add mark offload selftests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: RED: Add selftests for the mark qevent

Add do_mark_test(), which is to do_ecn_test() like do_drop_test() is to
do_red_test(): meant to test that actions on the RED mark qevent block are
offloaded, and executed on ECN-marked packets.

The test splits install_qdisc() into its constituents, install_root_qdisc()
and install_qdisc_tcX(). This is in order to test that when mirroring is
enabled on one TC, the other TC does not mirror.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: sch_red_core: Drop two unused variables

These variables are cut'n'pasted from other functions in the file and not
actually used.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Offload RED qevent mark

The RED "mark" qevent can be offloaded under similar conditions as the RED
"early_drop" qevent. Therefore recognize its binding type in the
TC_SETUP_BLOCK handler and translate to the right SPAN trigger, with the
right set of supported actions.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Track permissible actions per binding

One block can be bound to several qevents. The qevent type that the block
is bound to determines which actions make sense in a given context. In the
particular case of mlxsw, trap cannot be offloaded on a RED mark qevent,
because the trap contract specifies that the packet is dropped in the HW
datapath, and the HW trigger that the action is offloaded to is always
forwarding the packet (in addition to marking in).

Therefore keep track of which actions are permissible at each binding
block. When an attempt is made to bind a certain action at a binding point
where it is not supported, bounce the request.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Distinguish between ingress and egress triggers

The following patches will configure the MLXSW_SP_SPAN_TRIGGER_ECN
mirroring trigger. This trigger is considered "egress", unlike the
previously-offloaded _EARLY_DROP. Add a helper to spectrum_span,
mlxsw_sp_span_trigger_is_ingress(), to classify triggers to ingress and
egress. Pass result of this instead of hardcoding true when calling
mlxsw_sp_span_analyzed_port_get()/_put().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Pass extack to mlxsw_sp_qevent_entry_configure()

This function will report a new failure in the following patches.
Pass extack so that the failure is explicable.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfc-minor-printk-cleanup'

Krzysztof Kozlowski says:

====================
nfc: minor printk cleanup

v2: Correct SPDX license in patch 2/7 (as Joe pointed out).
v1: Remove unused variable in pn533 (reported by kbuild).
====================

Link: https://lore.kernel.org/r/20211011133835.236347-1-krzysztof.kozlowski@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: microread: drop unneeded debug prints

ftrace is a preferred and standard way to debug entering and exiting
functions so drop useless debug prints.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: trf7970a: drop unneeded debug prints

ftrace is a preferred and standard way to debug entering and exiting
functions so drop useless debug prints.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Acked-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: st21nfca: drop unneeded debug prints

ftrace is a preferred and standard way to debug entering and exiting
functions so drop useless debug prints.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: st-nci: drop unneeded debug prints

ftrace is a preferred and standard way to debug entering and exiting
functions so drop useless debug prints.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: s3fwrn5: simplify dereferencing pointer to struct device

Simplify the code dereferencing several pointers to reach the struct
device.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: nci: replace GPLv2 boilerplate with SPDX

Replace standard GPLv2 license text with SPDX tag. Although the comment
mentions GPLv2-only, it refers to the full license file which allows
later GPL versions.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: drop unneeded debug prints

ftrace is a preferred and standard way to debug entering and exiting
functions so drop useless debug prints.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-10-11

Wojciech Drewek says:

This series adds support for adding/removing advanced switch filters
in ice driver. Advanced filters are building blocks for HW acceleration
of TC orchestration. Add ndo_setup_tc callback implementation for PF and
VF port representors (when device is configured in switchdev mode).

Define dummy packet headers to allow adding advanced rules in HW.
Supported headers, and thus filters, are:
- MAC + IPv4 + UDP
- MAC + VLAN + IPv4 + UDP
- MAC + IPv4 + TCP
- MAC + VLAN + IPv4 + TCP
- MAC + IPv6 + UDP
- MAC + VLAN + IPv6 + UDP
- MAC + IPv6 + TCP
- MAC + VLAN + IPv6 + TCP
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gve-improvements'

Jeroen de Borst says:

====================
gve: minor code and performance improvements

This patchset contains a number of independent minor code and performance
improvements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Track RX buffer allocation failures

The rx_buf_alloc_fail counter wasn't getting updated.

Fixes: 433e274b8f7b0 ("gve: Add stats for gve.")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Allow pageflips on larger pages

Half pages are just used for small enough packets. This change allows
this to also apply for systems with pages larger than 4 KB.

Fixes: 02b0e0c18ba75 ("gve: Rx Buffer Recycling")
Signed-off-by: Jordan Kim <jrkim@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Add netif_set_xps_queue call

Configure XPS when adding tx queues to the notification blocks.

Fixes: dbdaa67540512 ("gve: Move some static functions to a common file")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Recover from queue stall due to missed IRQ

Don't always reset the driver on a TX timeout. Attempt to
recover by kicking the queue in case an IRQ was missed.

Fixes: 9e5f7d26a4c08 ("gve: Add workqueue and reset support")
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Do lazy cleanup in TX path

When TX queue is full, attemt to process enough TX completions
to avoid stalling the queue.

Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Add rx buffer pagecnt bias

Add a pagecnt bias field to rx buffer info struct to eliminate
needing to increment the atomic page ref count on every pass in the
rx hotpath.

Also prefetch two packet pages ahead.

Fixes: ede3fcf5ec67f ("gve: Add support for raw addressing to the rx path")
Signed-off-by: Yanchun Fu <yangchun@google.com>
Signed-off-by: Nathan Lewis <npl@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Switch to use napi_complete_done

Use napi_complete_done to allow for the use of gro_flush_timeout.

Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: ndo_setup_tc implementation for PR

Add tc-flower support for VF port representor devices.

Implement ndo_setup_tc callback for TC HW offload on VF port representors
devices. Implemented both methods: add and delete tc-flower flows.

Mark NETIF_F_HW_TC bit in net device's feature set to enable offload TC
infrastructure for port representor.

Implement TC filters replay function required to restore filters settings
while switchdev configuration is rebuilt.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: ndo_setup_tc implementation for PF

Implement ndo_setup_tc net device callback for TC HW offload on PF device.

ndo_setup_tc provides support for HW offloading various TC filters.
Add support for configuring the following filter with tc-flower:
- default L2 filters (src/dst mac addresses, ethertype, VLAN)
- variations of L3, L3+L4, L2+L3+L4 filters using advanced filters
(including ipv4 and ipv6 addresses).

Allow for adding/removing TC flows when PF device is configured in
eswitch switchdev mode. Two types of actions are supported at the
moment: FLOW_ACTION_DROP and FLOW_ACTION_REDIRECT.

Co-developed-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
Signed-off-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Allow changing lan_en and lb_en on all kinds of filters

There is no way to change default lan_en and lb_en flags while
adding new rule. Add function that allows changing these flags
on rule determined by rule id and recipe id.

Function checks if the rule is presented on regular rules list or
advance rules list and call the appropriate function to update
rule entry.

As rules with ICE_SW_LKUP_DFLT recipe aren't tracked in a list,
implement function which updates flags without searching for rules
based only on rule id.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: cleanup rules info

Change ICE_SW_LKUP_LAST to ICE_MAX_NUM_RECIPES as for now there also can
be recipes other than the default.

Free all structures created for advanced recipes in cleanup function.
Write a function to clean allocated structures on advanced rule info.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: allow deleting advanced rules

To remove advanced rule the same protocols list like in adding should be
send to function. Based on this information list of advanced rules is
searched to find the correct rule id.

Remove advanced rule if it forwards to only one VSI. If it forwards
to list of VSI remove only input VSI from this list.

Introduce function to remove rule by id. It is used in case rule needs to
be removed even if it forwards to the list of VSI.

Allow removing all advanced rules from a particular VSI. It is useful in
rebuilding VSI path.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Shivanshu Shukla <shivanshu.shukla@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: allow adding advanced rules

Define dummy packet headers to allow adding advanced rules in HW. This
header is used as admin queue command parameter for adding a rule.
The firmware will extract correct fields and will use them in look ups.

Define each supported packets header and offsets to words used in recipe.
Supported headers:
- MAC + IPv4 + UDP
- MAC + VLAN + IPv4 + UDP
- MAC + IPv4 + TCP
- MAC + VLAN + IPv4 + TCP
- MAC + IPv6 + UDP
- MAC + VLAN + IPv6 + UDP
- MAC + IPv6 + TCP
- MAC + VLAN + IPv6 + TCP

Add code for creating an advanced rule. Rule needs to match defined
dummy packet, if not return error, which means that this type of rule
isn't currently supported.

The first step in adding advanced rule is searching for an advanced
recipe matching this kind of rule. If it doesn't exist new recipe is
created. Dummy packet has to be filled with the correct header field
value from the rule definition. It will be used to do look up in HW.

Support searching for existing advance rule entry. It is used in case
of adding the same rule on different VSI. In this case, instead of
creating new rule, the existing one should be updated with refreshed VSI
list.

Add initialization for prof_res_bm_init flag to zero so that
the possible resource for fv in the files can be initialized.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: create advanced switch recipe

These changes introduce code for creating advanced recipes for the
switch in hardware.

There are a couple of recipes already defined in the HW. They apply to
matching on basic protocol headers, like MAC, VLAN, MACVLAN,
ethertype or direction (promiscuous), etc.. If the user wants to match on
other protocol headers (eg. ip address, src/dst port etc.) or different
variation of already supported protocols, there is a need to create
new, more complex recipe. That new recipe is referred as
'advanced recipe', and the filtering rule created on top of that recipe
is called 'advanced rule'.

One recipe can have up to 5 words, but the first word is always reserved
for match on switch id, so the driver can define up to 4 words for one
recipe. To support recipes with more words up to 5 recipes can be
chained, so 20 words can be programmed for look up.

Input for adding recipe function is a list of protocols to support. Based
on this list correct profile is being chosen. Correct profile means
that it contains all protocol types from a list. Each profile have up to
48 field vector words and each of this word have protocol id and offset.
These two fields need to match with input data for adding recipe
function. If the correct profile can't be found the function returns an
error.

The next step after finding the correct profile is grouping words into
groups. One group can have up to 4 words. This is done to simplify
sending recipes to HW (because recipe also can have up to 4 words).

In case of chaining (so when look up consists of more than 4 words) last
recipe will always have results from the previous recipes used as words.

A recipe to profile map is used to store information about which profile
is associate with this recipe. This map is an array of 64 elements (max
number of recipes) and each element is a 256 bits bitmap (max number of
profiles)

Profile to recipe map is used to store information about which recipe is
associate with this profile. This map is an array of 256 elements (max
number of profiles) and each element is a 64 bits bitmap (max number of
recipes)

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: manage profiles and field vectors

Implement functions to manage profiles and field vectors in hardware.

In hardware, there are up to 256 profiles and each of these profiles can
have 48 field vector words. Each field vector word is described by
protocol id and offset in the packet. To add a new recipe all used
profiles need to be searched. If the profile contains all required
protocol ids and offsets from the recipe it can be used. The driver has
to add this profile to recipe association to tell hardware that newly
added recipe is going to be associated with this profile.

The amount of used profiles depend on the package. To avoid searching
across not used profile, max profile id value is calculated at init flow.
The profile is considered as unused when all field vector words in the
profile are invalid (protocol id 0xff and offset 0x1ff).

Profiles are read from the package section ICE_SID_FLD_VEC_SW. Empty
field vector words can be used for recipe results. Store all unused field
vector words in prof_res_bm. It is a 256 elements array (max number of
profiles) each element is a 48 bit bitmap (max number of field vector
words).

For now, support only non-tunnel profiles type.

Co-developed-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: implement low level recipes functions

Add code to manage recipes and profiles on admin queue layer.

Allow the driver to add a new recipe and update an existing one. Get a
recipe and get a recipe to profile association is mostly used in update
existing recipes code.

Only default recipes can be updated. An update is done by reading recipes
from HW, changing their params and calling add recipe command.

Support following admin queue commands:
- ice_aqc_opc_add_recipe (0x0290) - create a recipe with protocol
header information and other details that determine how this recipe
filter works
- ice_aqc_opc_recipe_to_profile (0x0291) - associate a switch recipe
to a profile
- ice_aqc_opc_get_recipe (0x0292) - get details of an existing recipe
- ice_aqc_opc_get_recipe_to_profile (0x0293) - get a recipe associated
with profile ID

Define ICE_AQC_RES_TYPE_RECIPE resource type to hold a switch
recipe. It is needed when a new switch recipe needs to be created.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ethernet: sun: add missing semicolon, fix build

Fix for this build problem:

drivers/net/ethernet/sun/ldmvsw.c: In function 'vsw_alloc_netdev':
drivers/net/ethernet/sun/ldmvsw.c:243:2: error: expected ';' before 'sprintf'
sprintf(dev->name, "vif%d.%d", (int)handle, (int)port_id);
^~~~~~~

Fixes: a7639279c93c ("ethernet: sun: remove direct netdev->dev_addr writes")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20211011173424.7743035d@canb.auug.org.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-pf: Simplify the receive buffer size calculation

This patch separates the logic of configuring hardware
maximum transmit frame size and receive frame size.
This simplifies the logic to calculate receive buffer
size and using cqe descriptor of different size.
Also additional size of skb_shared_info structure is
allocated for each receive buffer pointer given to
hardware which is not necessary. Hence change the
size calculation to remove the size of
skb_shared_info. Add a check for array out of
bounds while adding fragments to the network stack.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: Remove redundant 'flush_workqueue()' calls

'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

This was generated with coccinelle:

@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> #mlx*
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: skip RCU read lock by checking xdp_enabled of vi

networking benchmark shows that __rcu_read_lock and
__rcu_read_unlock takes some cpu cycles, and we can avoid
calling them partially in virtio rx path by check xdp_enabled
of vi, and xdp is disabled most of time

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make dev_get_port_parent_id slightly more readable

Cosmetic commit making dev_get_port_parent_id slightly more readable.
There is no need to split the condition to return after calling
devlink_compat_switch_id_get and after that 'recurse' is always true.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: better describe debug regs

Give a name to known debug regs from Documentation instead of using
unknown hex values.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: enable prefer master for 83xx internal phy

From original QCA source code the port was set to prefer master as port
type in 1000BASE-T mode. Apply the same settings also here.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: add DAC amplitude fix for 8327 phy

QCA8327 internal phy require DAC amplitude adjustement set to +6% with
100m speed. Also add additional define to report a change of the same
reg in QCA8337. (different scope it does set 1000m voltage)
Add link_change_notify function to set the proper amplitude adjustement
on PHY_RUNNING state and disable on any other state.

Fixes: b4df02b562f4 ("net: phy: at803x: add support for qca 8327 A variant internal phy")
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: fix resume for QCA8327 phy

From Documentation phy resume triggers phy reset and restart
auto-negotiation. Add a dedicated function to wait reset to finish as
it was notice a regression where port sometime are not reliable after a
suspend/resume session. The reset wait logic is copied from phy_poll_reset.
Add dedicated suspend function to use genphy_suspend only with QCA8337
phy and set only additional debug settings for QCA8327. With more test
it was reported that QCA8327 doesn't proprely support this mode and
using this cause the unreliability of the switch ports, especially the
malfunction of the port0.

Fixes: 15b9df4ece17 ("net: phy: at803x: add resume/suspend function to qca83xx phy")
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-use-helpers'

Juhee Kang says:

====================
net-next: replace open code with helper functions

Currently, there are many helper functions on netdevice.h. However,
some code doesn't use the helper functions and remains open code.
So this patchset replaces open code with an appropriate helper function.

First patch modifies to use netif_is_rxfh_configured instead of
dev->priv_flags & IFF_RXFH_CONFIGURED.
Second patch replaces open code with netif_is_bond_master.
Last patch substitutes netif_is_macsec() for dev->priv_flags & IFF_MACSEC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: use netif_is_macsec() instead of open code

Open code which is dev->priv_flags & IFF_MACSEC has already defined as
netif_is_macsec(). So use netif_is_macsec() instead of open code.
This patch doesn't change logic.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: use netif_is_bond_master() instead of open code

Use netif_is_bond_master() function instead of open code, which is
((event_dev->priv_flags & IFF_BONDING) && (event_dev->flags & IFF_MASTER)).
This patch doesn't change logic.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: use netif_is_rxfh_configured instead of open code

The open code which is dev->priv_flags & IFF_RXFH_CONFIGURED is defined as
a helper function on netdevice.h. So use netif_is_rxfh_configured()
function instead of open code. This patch doesn't change logic.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ionic-vlanid-mgmt'

Shannon Nelson says:

====================
ionic: add vlanid overflow management

Add vlans to the existing rx_filter_sync mechanics currently
used for managing mac filters.

Older versions of our firmware had no enforced limits on the
number of vlans that the driver could request, but requesting
large numbers of vlans caused issues in FW memory management,
so an arbitrary limit was added in the FW.  The FW now
returns -ENOSPC when it hits that limit, which the driver
needs to handle.

Unfortunately, the FW doesn't advertise the vlan id limit,
as it does with mac filters, so the driver won't know the
limit until it bumps into it.  We'll grab the current vlan id
count and use that as the limit from there on and thus prevent
getting any more -ENOSPC errors.

Just as is done for the mac filters, the device puts the device
into promiscuous mode when -ENOSPC is seen for vlan ids, and
the driver will track the vlans that aren't synced to the FW.
When vlans are removed, the driver will retry the un-synced
vlans.  If all outstanding vlans are synced, the promiscuous
mode will be disabled.

The first 6 patches rework the existing filter management to
make it flexible enough for additional filter types.  Next
we add the vlan ids into the management.  The last 2 patches
allow us to catch the max vlan -ENOSPC error without adding
an unnecessary error message to the kernel log.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: tame the filter no space message

Override the automatic AdminQ error message in order to
capture the potential No Space message when we hit the
max vlan limit, and add additional messaging to detail
what filter failed.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: allow adminq requests to override default error message

The AdminQ handler has an error handler that automatically prints
an error message when the request has failed. However, there are
situations where the caller can expect that it might fail and has
an alternative strategy, thus may not want the error message sent
to the log, such as hitting -ENOSPC when adding a new vlan id.

We add a new interface to the AdminQ API to allow for override of
the default behavior, and an interface to the use standard error
message formatting.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: handle vlan id overflow

Add vlans to the existing rx_filter_sync mechanics currently
used for managing mac filters.

Older versions of our firmware had no enforced limits on the
number of vlans that the LIF could request, but requesting large
numbers of vlans caused issues in FW memory management, so an
arbitrary limit was added in the FW.  The FW now returns -ENOSPC
when it hits that limit, which the driver needs to handle.

Unfortunately, the FW doesn't advertise the vlan id limit,
as it does with mac filters, so the driver won't know the
limit until it bumps into it.  We'll grab the current vlan id
count and use that as the limit from there on and thus prevent
getting any more -ENOSPC errors.

Just as is done for the mac filters, the device puts the device
into promiscuous mode when -ENOSPC is seen for vlan ids, and
the driver will track the vlans that aren't synced to the FW.
When vlans are removed, the driver will retry the un-synced
vlans.  If all outstanding vlans are synced, the promiscuous
mode will be disabled.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: generic filter delete

Similar to the filter add, make a generic filter delete.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: generic filter add

In preparation for adding vlan overflow management, rework
the ionic_lif_addr_add() function to something a little more
generic that can be used for other filter types.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: add generic filter search

In preparation for enhancing vlan filter management,
add a filter search routine that can figure out for
itself which type of filter search is needed.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: remove mac overflow flags

The overflow flags really aren't useful and we don't need lif
struct elements to track them.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: move lif mac address functions

The routines that add and delete mac addresses from the
firmware really should be in the file with the rest of
the filter management. This simply moves the functions
with no logic changes.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: add filterlist to debugfs

Dump the filter list to debugfs - includes the device-assigned
filter id and the sync'd-to-hardware status.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: document felix family in dsa-tag-protocol

The Vitesse/Microsemi/Microchip family of switches supported by the
Felix DSA driver is capable of selecting between the native tagging
protocol and ocelot-8021q. This is necessary to enable flow control on
the CPU port.

Certain systems where these switches are integrated use the switch as a
port multiplexer, so the termination throughput is paramount. Changing
the tagging protocol at runtime is possible for these systems, but since
it is known beforehand that one tagging protocol will provide strictly
better performance than the other, just allow them to specify the
preference in the device tree.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: fix typo in dsa-tag-protocol description

There is a trivial typo when spelling "protocol", fix it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use dev_addr_set()

Use dev_addr_set() instead of writing directly to netdev->dev_addr
in various misc and old drivers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dev_addr-direct-writes'

Jakub Kicinski says:

====================
net: remove direct netdev->dev_addr writes

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

This series contains top 5 conversions in terms of LoC required
to bring the driver into compliance.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: 8390: remove direct netdev->dev_addr writes

8390 contains a lot of loops assigning netdev->dev_addr
byte by byte. Convert what's possible directly to
eth_hw_addr_set(), use local buf in other places.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: sun: remove direct netdev->dev_addr writes

Consify temporary variables pointing to netdev->dev_addr.

A few places need local storage but pretty simple conversion
over all. Note that macaddr[] is an array of ints, so we need
to keep the loops.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: tulip: remove direct netdev->dev_addr writes

Consify the casts of netdev->dev_addr.

Convert pointless to eth_hw_addr_set() where possible.

Use local buffers in a number of places.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: tg3: remove direct netdev->dev_addr writes

tg3 does various forms of direct writes to netdev->dev_addr.
Use a local buffer. Make sure local buffer is aligned since
eth_platform_get_mac_address() may call ether_addr_copy().

tg3_get_device_address() returns whenever it finds a method
that found a valid address. Instead of modifying all the exit
points pass the buffer from the outside and commit the address
in the caller.

Constify the argument of the set addr helper.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: forcedeth: remove direct netdev->dev_addr writes

forcedeth writes to dev_addr byte by byte, make it use
a local buffer instead. Commit the changes with
eth_hw_addr_set() at the end.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: item: Annotate item helpers with '__maybe_unused'

mlxsw is using helpers to get / set fields in messages exchanged with
the device. It is possible that some fields are only set or only get.
This causes LLVM to emit warnings such as the following when building
with W=1 [1]:

drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c:2022:1: warning: unused function 'mlxsw_afa_sampler_mirror_agent_get'

The fact that some fields are only set or only get is very much
intentional and not indicative of functions that need to be removed.

Therefore, annotate the item helpers with '__maybe_unused' to suppress
these warnings.

[1] https://lkml.org/lkml/2021/9/29/685

Cc: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20211008132315.90211-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/tls: add SM4 GCM/CCM to tls selftests

Add new cipher as a variant of standard tls selftests.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211008091745.42917-1-tianjia.zhang@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tg3: fix redundant check of true expression

Remove the redundant check of (tg3_asic_rev(tp) == ASIC_REV_5705) after
it is checked to be true.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20211008063147.1421-1-sakiwit@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: Fix compilation for CONFIG_QED_SRIOV undefined scenario

This patch fixes below compliation error in case CONFIG_QED_SRIOV not
defined.
drivers/net/ethernet/qlogic/qed/qed_dev.c: In function
‘qed_fw_err_handler’:
drivers/net/ethernet/qlogic/qed/qed_dev.c:2390:3: error: implicit
declaration of function ‘qed_sriov_vfpf_malicious’; did you mean
‘qed_iov_vf_task’? [-Werror=implicit-function-declaration]
   qed_sriov_vfpf_malicious(p_hwfn, &data->err_data);
   ^~~~~~~~~~~~~~~~~~~~~~~~
   qed_iov_vf_task
drivers/net/ethernet/qlogic/qed/qed_dev.c: In function
‘qed_common_eqe_event’:
drivers/net/ethernet/qlogic/qed/qed_dev.c:2410:10: error: implicit
declaration of function ‘qed_sriov_eqe_event’; did you mean
‘qed_common_eqe_event’? [-Werror=implicit-function-declaration]
   return qed_sriov_eqe_event(p_hwfn, opcode, echo, data,
          ^~~~~~~~~~~~~~~~~~~
          qed_common_eqe_event

Fixes: fe40a830dcde ("qed: Update qed_hsi.h for fw 8.59.1.0")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: ksz9131 led errata workaround

Micrel KSZ9131 PHY LED behavior is not correct when configured in
Individual Mode, LED1 (Activity LED) is in the ON state when there is
no-link.

Workaround this by setting bit 9 of register 0x1e after verifying that
the LED configuration is Individual Mode.

This issue is described in KSZ9131RNX Silicon Errata DS80000693B [*]
and according to that it will not be corrected in a future silicon
revision.

[*] https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ9131RNX-Silicon-Errata-and-Data-Sheet-Clarification-80000863B.pdf

Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Philippe Schenker <philippe.schenker@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netdev-name-in-use'

Antoine Tenart says:

====================
net: introduce a function to check if a netdev name is in use

This was initially part of an RFC series[1] but has value on its own;
hence the standalone report. (It will also help in not having a series
too large).

From patch 1:

"""
__dev_get_by_name is currently used to either retrieve a net device
reference using its name or to check if a name is already used by a
registered net device (per ns). In the later case there is no need to
return a reference to a net device.

Introduce a new helper, netdev_name_in_use, to check if a name is
currently used by a registered net device without leaking a reference
the corresponding net device. This helper uses netdev_name_node_lookup
instead of __dev_get_by_name as we don't need the extra logic retrieving
a reference to the corresponding net device.
"""

Two uses[2] of __dev_get_by_name weren't converted to this new function,
as they are really looking for a net device, not only checking if a net
device name is in use. While checking one or the other currently has
the same result, that might change if the initial RFC series moves
forward. I'll convert them later depending on the outcome of the initial
series.

Thanks,
Antoine

[1] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/
[2] drivers/net/Space.c:130 & drivers/nvme/host/tcp.c:2550
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ppp: use the correct function to check if a netdev name is in use

A new helper to detect if a net device name is in use was added. Use it
here as the return reference from __dev_get_by_name was discarded.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: use the correct function to check for netdev name collision

A new helper to detect if a net device name is in use was added. Use it
here as the return reference from __dev_get_by_name was discarded.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: introduce a function to check if a netdev name is in use

__dev_get_by_name is currently used to either retrieve a net device
reference using its name or to check if a name is already used by a
registered net device (per ns). In the later case there is no need to
return a reference to a net device.

Introduce a new helper, netdev_name_in_use, to check if a name is
currently used by a registered net device without leaking a reference
the corresponding net device. This helper uses netdev_name_node_lookup
instead of __dev_get_by_name as we don't need the extra logic retrieving
a reference to the corresponding net device.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'enetc-swtso'

Ioana Ciornei says:

====================
net: enetc: add support for software TSO

This series adds support for driver level TSO in the enetc driver.

Ever since the ENETC MDIO erratum workaround is in place, the Tx path is
incurring a penalty (enetc_lock_mdio/enetc_unlock_mdio) for each skb to
be sent out. On top of this, ENETC does not support Tx checksum
offloading. This means that software TSO would help performance just by
the fact that one single mdio lock/unlock sequence would cover multiple
packets sent. On the other hand, checksum needs to be computed in
software since the controller cannot handle it.

This is why, beside using the usual tso_build_hdr()/tso_build_data()
this specific implementation also has to compute the checksum, both IP
and L4, for each resulted segment.

Even with that, the performance improvement of a TCP flow running on a
single A72@1.3GHz of the LS1028A SoC (2.5Gbit/s port) is the following:

before: 1.63 Gbits/sec
after: 2.34 Gbits/sec

Changes in v2:
- declare NETIF_F_HW_CSUM instead of NETIF_F_IP_CSUM in 1/2
- add support for TSO over IPv6 (NETIF_F_TSO6 and csum compute) in 2/2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: add support for software TSO

This patch adds support for driver level TSO in the enetc driver using
the TSO API.

Beside using the usual tso_build_hdr(), tso_build_data() this specific
implementation also has to compute the checksum, both IP and L4, for
each resulted segment. This is because the ENETC controller does not
support Tx checksum offload which is needed in order to take advantage
of TSO.

With the workaround for the ENETC MDIO erratum in place the Tx path of
the driver is forced to lock/unlock for each skb sent. This is why, even
though we are computing the checksum by hand we see the following
improvement in TCP termination on the LS1028A SoC, on a single A72 core
running at 1.3GHz:

before: 1.63 Gbits/sec
after: 2.34 Gbits/sec

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: declare NETIF_F_HW_CSUM and do it in software

This is just a preparation patch for software TSO in the enetc driver.
Unfortunately, ENETC does not support Tx checksum offload which would
normally render TSO, even software, impossible.

Declare NETIF_F_HW_CSUM as part of the feature set and do it at driver
level using skb_csum_hwoffload_help() so that we can move forward and
also add support for TSO in the next patch.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ip6gre-tests'

Ido Schimmel says:

====================
selftests: forwarding: Add ip6gre tests

This patchset adds forwarding selftests for ip6gre. The tests can be run
with veth pairs or with physical loopbacks.

Patch #1 adds a new config option to determine if 'skip_sw' / 'skip_hw'
flags are used when installing tc filters. By default, it is not set
which means the flags are not used. 'skip_sw' is useful to ensure
traffic is forwarded by the hardware data path.

Patch #2 adds a new helper function.

Patches #3-#4 add the forwarding selftests.

Patch #5 adds a mlxsw-specific selftest to validate correct behavior of
the 'decap_error' trap with IPv6 underlay.

Patches #6-#8 align the corresponding IPv4 underlay test to the IPv6
one.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: devlink_trap_tunnel_ipip: Send a full-length key

As part of adding same test for GRE tunnel with IPv6 underlay, missing
bytes for key were found.

mausezahn does not fill zeros between two colons, so send them
explicitly. For example, use "00:00:00:E9:" instead of ":E9:"

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: devlink_trap_tunnel_ipip: Remove code duplication

As part of adding same test for GRE tunnel with IPv6 underlay, an
optional improvement was found - call ipip_payload_get from
ecn_payload_get, so do not duplicate the code which creates the payload.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: devlink_trap_tunnel_ipip: Align topology drawing correctly

As part of adding same test for GRE tunnel with IPv6 underlay, wrong
alignments were found, fix them.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: devlink_trap_tunnel_ipip6: Add test case for IPv6 decap_error

IPv6 underlay support was added, add test to check that "decap_error" trap
is triggered under the right conditions and that devlink counters increase.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add IPv6 GRE hierarchical tests

Add tests that check IPv6-in-IPv6, IPv4-in-IPv6 and MTU change of GRE
tunnel. The tests use hierarchical model - the tunnel is bound to a device
in a different VRF.

These tests can be run with TC_FLAG=skip_sw, so then they will verify
that packets go through hardware as part of enacp and decap phases.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add IPv6 GRE flat tests

Add tests that check IPv6-in-IPv6, IPv4-in-IPv6 and MTU change of GRE
tunnel. The tests use flat model - overlay and underlay share the same VRF.

These tests can be run with TC_FLAG=skip_sw, so then they will verify
that packets go through hardware as part of enacp and decap phases.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

testing: selftests: tc_common: Add tc_check_at_least_x_packets()

Add function that checks that at least X packets hit the tc rule.
There are cases that it is not possible to catch only the interesting
packets, so then, it is possible to send many packets and verify that at
least this amount of packets hit the rule.

This function will be used in the next patch for general tc rule that
can be used to test both software and hardware.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

testing: selftests: forwarding.config.sample: Add tc flag

Add TC_FLAG value to tests topology.
This flag supposed to be skip_sw/skip_hw which means do not filter by
software/hardware.

This can be useful for adding tests to forwarding directory, and be able
to verify that packets go through the hardware.

When the flag is not set or set to 'skip_hw', tests can still be executed
with veth pairs.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vsock: Enable y2038 safe timeval for timeout

Reuse the timeval compat code from core/sock to handle 32-bit and
64-bit timeval structures. Also introduce a new socket option define
to allow using y2038 safe timeval under 32-bit.

The existing behavior of sock_set_timeout and vsock's timeout setter
differ when the time value is out of bounds. vsocks current behavior
is retained at the expense of not being able to share the full
implementation.

This allows the LTP test vsock01 to pass under 32-bit compat mode.

Fixes: fe0c72f3db11 ("socket: move compat timeout handling into sock.c")
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Cc: Richard Palethorpe <rpalethorpe@richiejp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

vsock: Refactor vsock_*_getsockopt to resemble sock_getsockopt

In preparation for sharing the implementation of sock_get_timeout.

Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Cc: Richard Palethorpe <rpalethorpe@richiejp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ath11k: fix m68k and xtensa build failure in ath11k_peer_assoc_h_smps()

Stephen reported that ath11k was failing to build on m68k and xtensa:

In file included from <command-line>:0:0:
In function 'ath11k_peer_assoc_h_smps',
    inlined from 'ath11k_peer_assoc_prepare' at drivers/net/wireless/ath/ath11k/mac.c:2362:2:
include/linux/compiler_types.h:317:38: error: call to '__compiletime_assert_650' declared with attribute error: FIELD_GET: type of reg too small for mask
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
include/linux/compiler_types.h:298:4: note: in definition of macro '__compiletime_assert'
    prefix ## suffix();    \
    ^
include/linux/compiler_types.h:317:2: note: in expansion of macro '_compiletime_assert'
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^
include/linux/bitfield.h:52:3: note: in expansion of macro 'BUILD_BUG_ON_MSG'
   BUILD_BUG_ON_MSG((_mask) > (typeof(_reg))~0ull,  \
   ^
include/linux/bitfield.h:108:3: note: in expansion of macro '__BF_FIELD_CHECK'
   __BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \
   ^
drivers/net/wireless/ath/ath11k/mac.c:2079:10: note: in expansion of macro 'FIELD_GET'
   smps = FIELD_GET(IEEE80211_HE_6GHZ_CAP_SM_PS,

Fix the issue by using le16_get_bits() to specify the size explicitly.

Fixes: 6f4d70308e5e ("ath11k: support SMPS configuration for 6 GHz")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-sysfs: try not to restart the syscall if it will fail eventually

Due to deadlocks in the networking subsystem spotted 12 years ago[1],
a workaround was put in place[2] to avoid taking the rtnl lock when it
was not available and restarting the syscall (back to VFS, letting
userspace spin). The following construction is found a lot in the net
sysfs and sysctl code:

  if (!rtnl_trylock())
          return restart_syscall();

This can be problematic when multiple userspace threads use such
interfaces in a short period, making them to spin a lot. This happens
for example when adding and moving virtual interfaces: userspace
programs listening on events, such as systemd-udevd and NetworkManager,
do trigger actions reading files in sysfs. It gets worse when a lot of
virtual interfaces are created concurrently, say when creating
containers at boot time.

Returning early without hitting the above pattern when the syscall will
fail eventually does make things better. While it is not a fix for the
issue, it does ease things.

[1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
    https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
    and https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
[2] Rightfully, those deadlocks are *hard* to solve.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phylib: ensure phy device drivers do not match by DT

PHYLIB device drivers must match by either numerical PHY ID or by their
.match_phy_device method. Matching by DT is not permitted.

Link: https://lore.kernel.org/r/2b1dc053-8c9a-e3e4-b450-eecdfca3fe16@gmail.com
Tested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: ensure the type of mdio devices match mdio drivers

On the MDIO bus, we have PHYLIB devices and drivers, and we have non-
PHYLIB devices and drivers. PHYLIB devices are MDIO devices that are
wrapped with a struct phy_device.

Trying to bind a MDIO device with a PHYLIB driver results in out-of-
bounds accesses as we attempt to access struct phy_device members. So,
let's prevent this by ensuring that the type of the MDIO device
(indicated by the MDIO_DEVICE_FLAG_PHY flag) matches the type of the
MDIO driver (indicated by the MDIO_DEVICE_IS_PHY flag.)

Link: https://lore.kernel.org/r/2b1dc053-8c9a-e3e4-b450-eecdfca3fe16@gmail.com
Tested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>