OSDN Git Service

tomoyo/tomoyo-test1.git
2 years agoselftests: forwarding: add basic QoS classification test for Ocelot switches
Vladimir Oltean [Mon, 2 May 2022 15:54:24 +0000 (18:54 +0300)]
selftests: forwarding: add basic QoS classification test for Ocelot switches

Test basic (port-default, VLAN PCP and IP DSCP) QoS classification for
Ocelot switches. Advanced QoS classification using tc filters is covered
by tc_flower_chains.sh in the same directory.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220502155424.4098917-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'mptcp-userspace-path-manager-prerequisites'
Jakub Kicinski [Tue, 3 May 2022 23:54:58 +0000 (16:54 -0700)]
Merge branch 'mptcp-userspace-path-manager-prerequisites'

Mat Martineau says:

====================
mptcp: Userspace path manager prerequisites

This series builds upon the path manager mode selection changes merged
in 4994d4fa99ba ("Merge branch 'mptcp-path-manager-mode-selection'") to
further modify the path manager code in preparation for adding the new
netlink commands to announce/remove advertised addresses and
create/destroy subflows of an MPTCP connection. The third and final
patch series for the userspace path manager will implement those
commands as discussed in
https://lore.kernel.org/netdev/23ff3b49-2563-1874-fa35-3af55d3088e7@linux.intel.com/#r

Patches 1, 5, and 7 remove some internal constraints on path managers
(in general) without changing in-kernel PM behavior.

Patch 2 adds a self test to validate MPTCP address advertisement ack
behavior.

Patches 3, 4, and 6 add new attributes to existing MPTCP netlink events
and track internal state for populating those attributes.
====================

Link: https://lore.kernel.org/r/20220502205237.129297-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: allow ADD_ADDR reissuance by userspace PMs
Kishen Maloor [Mon, 2 May 2022 20:52:37 +0000 (13:52 -0700)]
mptcp: allow ADD_ADDR reissuance by userspace PMs

This change allows userspace PM implementations to reissue ADD_ADDR
announcements (if necessary) based on their chosen policy.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: expose server_side attribute in MPTCP netlink events
Kishen Maloor [Mon, 2 May 2022 20:52:36 +0000 (13:52 -0700)]
mptcp: expose server_side attribute in MPTCP netlink events

This change records the 'server_side' attribute of MPTCP_EVENT_CREATED
and MPTCP_EVENT_ESTABLISHED events to inform their recipient about the
Client/Server role of the running MPTCP application.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/246
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: establish subflows from either end of connection
Kishen Maloor [Mon, 2 May 2022 20:52:35 +0000 (13:52 -0700)]
mptcp: establish subflows from either end of connection

This change updates internal logic to permit subflows to be
established from either the client or server ends of MPTCP
connections. This symmetry and added flexibility may be
harnessed by PM implementations running on either end in
creating new subflows.

The essence of this change lies in not relying on the
"server_side" flag (which continues to be available if needed).

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: reflect remote port (not 0) in ANNOUNCED events
Kishen Maloor [Mon, 2 May 2022 20:52:34 +0000 (13:52 -0700)]
mptcp: reflect remote port (not 0) in ANNOUNCED events

Per RFC 8684, if no port is specified in an ADD_ADDR message, MPTCP
SHOULD attempt to connect to the specified address on the same port
as the port that is already in use by the subflow on which the
ADD_ADDR signal was sent.

To facilitate that, this change reflects the specific remote port in
use by that subflow in MPTCP_EVENT_ANNOUNCED events.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: store remote id from MP_JOIN SYN/ACK in local ctx
Kishen Maloor [Mon, 2 May 2022 20:52:33 +0000 (13:52 -0700)]
mptcp: store remote id from MP_JOIN SYN/ACK in local ctx

This change reads the addr id assigned to the remote endpoint
of a subflow from the MP_JOIN SYN/ACK message and stores it
in the related subflow context. The remote id was not being
captured prior to this change, and will now provide a consistent
view of remote endpoints and their ids as seen through netlink
events.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: ADD_ADDR echo test with missing userspace daemon
Mat Martineau [Mon, 2 May 2022 20:52:32 +0000 (13:52 -0700)]
selftests: mptcp: ADD_ADDR echo test with missing userspace daemon

Check userspace PM behavior to ensure ADD_ADDR echoes are only sent when
there is an active userspace daemon. If the daemon is restarting or
hasn't loaded yet, the missing echo will cause the peer to retransmit
the ADD_ADDR - and hopefully the daemon will be ready to receive it at
that later time.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: bypass in-kernel PM restrictions for non-kernel PMs
Kishen Maloor [Mon, 2 May 2022 20:52:31 +0000 (13:52 -0700)]
mptcp: bypass in-kernel PM restrictions for non-kernel PMs

Current limits on the # of addresses/subflows must apply only to
in-kernel PM managed sockets. Thus this change removes such
restrictions on connections overseen by non-kernel (e.g. userspace)
PMs. This change also ensures that the kernel does not record stats
inside struct mptcp_pm_data updated along kernel code paths when exercised
via non-kernel PMs.

Additionally, address announcements are acknolwedged and subflow
requests are honored only when it's deemed that a userspace path
manager is active at the time.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'mlx5-updates-2022-05-02' of git://git.kernel.org/pub/scm/linux/kernel...
Paolo Abeni [Tue, 3 May 2022 10:43:41 +0000 (12:43 +0200)]
Merge tag 'mlx5-updates-2022-05-02' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-05-02

1) Trivial Misc updates to mlx5 driver

2) From Mark Bloch: Flow steering, general steering refactoring/cleaning

An issue with flow steering deletion flow (when creating a rule without
dests) turned out to be easy to fix but during the fix some issue
with the flow steering creation/deletion flows have been found.

The following patch series tries to fix long standing issues with flow
steering code and hopefully preventing silly future bugs.

  A) Fix an issue where a proper dest type wasn't assigned.
  B) Refactor and fix dests enums values, refactor deletion
     function and do proper bookkeeping of dests.
  C) Change mlx5_del_flow_rules() to delete rules when there are no
     no more rules attached associated with an FTE.
  D) Don't call hard coded deletion function but use the node's
     defined one.
  E) Add a WARN_ON() to catch future bugs when an FTE with dests
     is deleted.

* tag 'mlx5-updates-2022-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: fs, an FTE should have no dests when deleted
  net/mlx5: fs, call the deletion function of the node
  net/mlx5: fs, delete the FTE when there are no rules attached to it
  net/mlx5: fs, do proper bookkeeping for forward destinations
  net/mlx5: fs, add unused destination type
  net/mlx5: fs, jump to exit point and don't fall through
  net/mlx5: fs, refactor software deletion rule
  net/mlx5: fs, split software and IFC flow destination definitions
  net/mlx5e: TC, set proper dest type
  net/mlx5e: Remove unused mlx5e_dcbnl_build_rep_netdev function
  net/mlx5e: Drop error CQE handling from the XSK RX handler
  net/mlx5: Print initializing field in case of timeout
  net/mlx5: Delete redundant default assignment of runtime devlink params
  net/mlx5: Remove useless kfree
  net/mlx5: use kvfree() for kvzalloc() in mlx5_ct_fs_smfs_matcher_create
====================

Link: https://lore.kernel.org/r/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'mlxsw-remove-size-limitations-on-egress-descriptor-buffer'
Paolo Abeni [Tue, 3 May 2022 10:10:51 +0000 (12:10 +0200)]
Merge branch 'mlxsw-remove-size-limitations-on-egress-descriptor-buffer'

Ido Schimmel says:

====================
mlxsw: Remove size limitations on egress descriptor buffer

Petr says:

Spectrum machines have two resources related to keeping packets in an
internal buffer: bytes (allocated in cell-sized units) for packet payload,
and descriptors, for keeping headers. Currently, mlxsw only configures the
bytes part of the resource management.

Spectrum switches permit a full parallel configuration for the descriptor
resources, including port-pool and port-TC-pool quotas. By default, these
are all configured to use pool 14, with an infinite quota. The ingress pool
14 is then infinite in size.

However, egress pool 14 has finite size by default. The size is chip
dependent, but always much lower than what the chip actually permits. As a
result, we can easily construct workloads that exhaust the configured
descriptor limit.

Going forward, mlxsw will have to fix this issue properly by maintaining
descriptor buffer sizes, TC bindings, and quotas that match the
architecture recommendation. Short term, fix the issue by configuring the
egress descriptor pool to be infinite in size as well. This will maintain
the same configuration philosophy, but will unlock all chip resources to be
usable.

In this patchset, patch #1 first adds the "desc" field into the pool
configuration register. Then in patch #2, the new field is used to
configure both ingress and egress pool 14 as infinite.

In patches #3 and #4, add a selftest that verifies that a large burst
can be absorbed by the shared buffer. This test specifically exercises a
scenario where descriptor buffer is the limiting factor and the test
fails without the above patches.
====================

Link: https://lore.kernel.org/r/20220502084926.365268-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoselftests: mlxsw: Add a test for soaking up a burst of traffic
Petr Machata [Mon, 2 May 2022 08:49:26 +0000 (11:49 +0300)]
selftests: mlxsw: Add a test for soaking up a burst of traffic

Add a test that sends 1Gbps of traffic through the switch, into which it
then injects a burst of traffic and tests that there are no drops.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoselftests: forwarding: lib: Add start_traffic_pktsize() helpers
Petr Machata [Mon, 2 May 2022 08:49:25 +0000 (11:49 +0300)]
selftests: forwarding: lib: Add start_traffic_pktsize() helpers

Add two helpers, start_traffic_pktsize() and start_tcp_traffic_pktsize(),
that allow explicit overriding of packet size. Change start_traffic() and
start_tcp_traffic() to dispatch through these helpers with the default
packet size.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agomlxsw: Configure descriptor buffers
Petr Machata [Mon, 2 May 2022 08:49:24 +0000 (11:49 +0300)]
mlxsw: Configure descriptor buffers

Spectrum machines have two resources related to keeping packets in an
internal buffer: bytes (allocated in cell-sized units) for packet payload,
and descriptors, for keeping metadata. Currently, mlxsw only configures the
bytes part of the resource management.

Spectrum switches permit a full parallel configuration for the descriptor
resources, including port-pool and port-TC-pool quotas. By default, these
are all configured to use pool 14, with an infinite quota. The ingress pool
14 is then infinite in size.

However, egress pool 14 has finite size by default. The size is chip
dependent, but always much lower than what the chip actually permits. As a
result, we can easily construct workloads that exhaust the configured
descriptor limit.

Fix the issue by configuring the egress descriptor pool to be infinite in
size as well. This will maintain the configuration philosophy of the
default configuration, but will unlock all chip resources to be usable.

In the code, include both the configuration of ingress and ingress, mostly
for clarity.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agomlxsw: reg: Add "desc" field to SBPR
Petr Machata [Mon, 2 May 2022 08:49:23 +0000 (11:49 +0300)]
mlxsw: reg: Add "desc" field to SBPR

SBPR, or Shared Buffer Pools Register, configures and retrieves the shared
buffer pools and configuration. The desc field determines whether the
configuration relates to the byte pool or the descriptor pool.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'use-standard-sysctl-macro'
Paolo Abeni [Tue, 3 May 2022 08:15:09 +0000 (10:15 +0200)]
Merge branch 'use-standard-sysctl-macro'

Tonghao Zhang says:

====================
use standard sysctl macro

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patchset introduce sysctl macro or replace var
with macro.
====================

Link: https://lore.kernel.org/r/20220501035524.91205-1-xiangxia.m.yue@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoselftests/sysctl: add sysctl macro test
Tonghao Zhang [Sun, 1 May 2022 03:55:24 +0000 (11:55 +0800)]
selftests/sysctl: add sysctl macro test

Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: sysctl: introduce sysctl SYSCTL_THREE
Tonghao Zhang [Sun, 1 May 2022 03:55:23 +0000 (11:55 +0800)]
net: sysctl: introduce sysctl SYSCTL_THREE

This patch introdues the SYSCTL_THREE.

KUnit:
[00:10:14] ================ sysctl_test (10 subtests) =================
[00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max
[00:10:14] =================== [PASSED] sysctl_test ===================

./run_kselftest.sh -c sysctl
...
ok 1 selftests: sysctl: sysctl.sh

Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: sysctl: use shared sysctl macro
Tonghao Zhang [Sun, 1 May 2022 03:55:22 +0000 (11:55 +0800)]
net: sysctl: use shared sysctl macro

This patch replace two, four and long_one to SYSCTL_XXX.

Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet/mlx5: fs, an FTE should have no dests when deleted
Mark Bloch [Tue, 22 Mar 2022 12:58:02 +0000 (12:58 +0000)]
net/mlx5: fs, an FTE should have no dests when deleted

When deleting an FTE it should have no dests, which means
fte->dests_size should be 0. Add a WARN_ON() to catch bugs
where the proper tracking wasn't done.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: fs, call the deletion function of the node
Mark Bloch [Tue, 15 Mar 2022 11:51:55 +0000 (11:51 +0000)]
net/mlx5: fs, call the deletion function of the node

Don't call del_hw_fte() directly, instead use the hardware deletion
function set. This is just a small cleanup and doesn't change anything
as for an FTE the deletion function is already set to del_hw_fte().

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: fs, delete the FTE when there are no rules attached to it
Mark Bloch [Tue, 15 Mar 2022 11:23:40 +0000 (11:23 +0000)]
net/mlx5: fs, delete the FTE when there are no rules attached to it

When an FTE has no children is means all the rules where removed
and the FTE can be deleted regardless of the dests_size value.
While dests_size should be 0 when there are no children
be extra careful not to leak memory or get firmware syndrome
if the proper bookkeeping of dests_size wasn't done.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: fs, do proper bookkeeping for forward destinations
Mark Bloch [Tue, 15 Mar 2022 11:22:17 +0000 (11:22 +0000)]
net/mlx5: fs, do proper bookkeeping for forward destinations

Keep track after destinations that are forward destinations.
When a forward destinations is removed from an FTE check if
the actions bits need to be updated.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: fs, add unused destination type
Mark Bloch [Tue, 15 Mar 2022 14:08:23 +0000 (14:08 +0000)]
net/mlx5: fs, add unused destination type

When the caller doesn't pass a destination fs_core will create a unused
rule just so a context can be returned. This unused rule
is zeroed out and its type is 0 which can be mixed up with
MLX5_FLOW_DESTINATION_TYPE_VPORT.

Create a dedicated type to differentiate between the two
named MLX5_FLOW_DESTINATION_TYPE_NONE.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: fs, jump to exit point and don't fall through
Mark Bloch [Tue, 15 Mar 2022 10:45:00 +0000 (10:45 +0000)]
net/mlx5: fs, jump to exit point and don't fall through

For code clarity and to prevent future bugs make sure to jump
to the exit point once done handling that specific type.
This aligns the code with the rest logic in the function.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: fs, refactor software deletion rule
Mark Bloch [Tue, 15 Mar 2022 10:44:14 +0000 (10:44 +0000)]
net/mlx5: fs, refactor software deletion rule

When deleting a rule make sure that for every type dests_size is
decreased only once and no other logic is executed.

Without this dests_size might be decreased twice when dests_size == 1
so the if for that type won't be entered and if action has
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST set dests_size will be decreased again.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: fs, split software and IFC flow destination definitions
Mark Bloch [Tue, 22 Mar 2022 09:16:39 +0000 (09:16 +0000)]
net/mlx5: fs, split software and IFC flow destination definitions

Separate flow destinations between software and IFC.
Flow destination type passed by callers was used as the input in
firmware commands and over the years software only types were added
which resulted in mixing between the two.

Create an IFC enum that contains only the flow destinations defined
when talking to the firmware.

Now that there is a proper software only enum for flow destinations
the hardcoded values can be removed as the values are no longer used
in firmware commands.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, set proper dest type
Mark Bloch [Tue, 22 Mar 2022 16:14:51 +0000 (16:14 +0000)]
net/mlx5e: TC, set proper dest type

Dest type isn't set, this works only because
MLX5_FLOW_DESTINATION_TYPE_VPORT is zero. Set the proper type.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Remove unused mlx5e_dcbnl_build_rep_netdev function
Gal Pressman [Tue, 29 Mar 2022 12:24:33 +0000 (15:24 +0300)]
net/mlx5e: Remove unused mlx5e_dcbnl_build_rep_netdev function

Commit
7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
removed the usage of mlx5e_dcbnl_build_rep_netdev() from the driver,
delete the function.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Drop error CQE handling from the XSK RX handler
Maxim Mikityanskiy [Thu, 20 Jan 2022 09:32:04 +0000 (11:32 +0200)]
net/mlx5e: Drop error CQE handling from the XSK RX handler

This commit removes the redundant check and removes the unused cqe parameter
of skb_from_cqe handlers.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Print initializing field in case of timeout
Shay Drory [Tue, 22 Mar 2022 14:55:58 +0000 (16:55 +0200)]
net/mlx5: Print initializing field in case of timeout

Print the initializing field in case of FW couldn't initialize before
timeout. This will help to better understand the root cause in some
cases.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Delete redundant default assignment of runtime devlink params
Shay Drory [Wed, 3 Nov 2021 10:18:35 +0000 (12:18 +0200)]
net/mlx5: Delete redundant default assignment of runtime devlink params

Runtime devlink params always read their values from the get() callbacks.
Also, it is an error to set driverinit_value for params which don't
support driverinit cmode. Delete such assignments.

In addition, move the set of default matching mode inside eswitch code.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Remove useless kfree
Haowen Bai [Fri, 22 Apr 2022 06:08:32 +0000 (14:08 +0800)]
net/mlx5: Remove useless kfree

After alloc fail, we do not need to kfree.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: use kvfree() for kvzalloc() in mlx5_ct_fs_smfs_matcher_create
Ziyang Xuan [Wed, 20 Apr 2022 10:36:17 +0000 (18:36 +0800)]
net/mlx5: use kvfree() for kvzalloc() in mlx5_ct_fs_smfs_matcher_create

The memory of spec is allocated with kvzalloc(), the corresponding
release function should not be kfree(), use kvfree() instead.

Generated by: scripts/coccinelle/api/kfree_mismatch.cocci

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agoMerge branch 'vsock-virtio-add-support-for-device-suspend-resume'
Jakub Kicinski [Mon, 2 May 2022 23:04:36 +0000 (16:04 -0700)]
Merge branch 'vsock-virtio-add-support-for-device-suspend-resume'

Stefano Garzarella says:

====================
vsock/virtio: add support for device suspend/resume

Vilas reported that virtio-vsock no longer worked properly after
suspend/resume (echo mem >/sys/power/state).
It was impossible to connect to the host and vice versa.

Indeed, the support has never been implemented.

This series implement .freeze and .restore callbacks of struct virtio_driver
to support device suspend/resume.

The first patch factors our the code to initialize and delete VQs.
The second patch uses that code to support device suspend/resume.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
====================

Link: https://lore.kernel.org/r/20220428132241.152679-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agovsock/virtio: add support for device suspend/resume
Stefano Garzarella [Thu, 28 Apr 2022 13:22:41 +0000 (15:22 +0200)]
vsock/virtio: add support for device suspend/resume

Implement .freeze and .restore callbacks of struct virtio_driver
to support device suspend/resume.

During suspension all connected sockets are reset and VQs deleted.
During resume the VQs are re-initialized.

Reported by: Vilas R K <vilas.r.k@intel.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agovsock/virtio: factor our the code to initialize and delete VQs
Stefano Garzarella [Thu, 28 Apr 2022 13:22:40 +0000 (15:22 +0200)]
vsock/virtio: factor our the code to initialize and delete VQs

Add virtio_vsock_vqs_init() and virtio_vsock_vqs_del() with the code
that was in virtio_vsock_probe() and virtio_vsock_remove to initialize
and delete VQs.

These new functions will be used in the next commit to support device
suspend/resume

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: forwarding: add Per-Stream Filtering and Policing test for Ocelot
Vladimir Oltean [Sun, 1 May 2022 11:29:53 +0000 (14:29 +0300)]
selftests: forwarding: add Per-Stream Filtering and Policing test for Ocelot

The Felix VSC9959 switch in NXP LS1028A supports the tc-gate action
which enforced time-based access control per stream. A stream as seen by
this switch is identified by {MAC DA, VID}.

We use the standard forwarding selftest topology with 2 host interfaces
and 2 switch interfaces. The host ports must require timestamping non-IP
packets and supporting tc-etf offload, for isochron to work. The
isochron program monitors network sync status (ptp4l, phc2sys) and
deterministically transmits packets to the switch such that the tc-gate
action either (a) always accepts them based on its schedule, or
(b) always drops them.

I tried to keep as much of the logic that isn't specific to the NXP
LS1028A in a new tsn_lib.sh, for future reuse. This covers
synchronization using ptp4l and phc2sys, and isochron.

The cycle-time chosen for this selftest isn't particularly impressive
(and the focus is the functionality of the switch), but I didn't really
know what to do better, considering that it will mostly be run during
debugging sessions, various kernel bloatware would be enabled, like
lockdep, KASAN, etc, and we certainly can't run any races with those on.

I tried to look through the kselftest framework for other real time
applications and didn't really find any, so I'm not sure how better to
prepare the environment in case we want to go for a lower cycle time.
At the moment, the only thing the selftest is ensuring is that dynamic
frequency scaling is disabled on the CPU that isochron runs on. It would
probably be useful to have a blacklist of kernel config options (checked
through zcat /proc/config.gz) and some cyclictest scripts to run
beforehand, but I saw none of those.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220501112953.3298973-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL
jianghaoran [Fri, 29 Apr 2022 05:38:02 +0000 (13:38 +0800)]
ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL

ARPHRD_TUNNEL interface can't process rs packets
and will generate TX errors

ex:
ip tunnel add ethn mode ipip local 192.168.1.1 remote 192.168.1.2
ifconfig ethn x.x.x.x

ethn: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1480
inet x.x.x.x  netmask 255.255.255.255  destination x.x.x.x
inet6 fe80::5efe:ac1e:3cdb  prefixlen 64  scopeid 0x20<link>
tunnel   txqueuelen 1000  (IPIP Tunnel)
RX packets 0  bytes 0 (0.0 B)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 3  dropped 0 overruns 0  carrier 0  collisions 0

Signed-off-by: jianghaoran <jianghaoran@kylinos.cn>
Link: https://lore.kernel.org/r/20220429053802.246681-1-jianghaoran@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: optimise skb_zerocopy_iter_stream()
Pavel Begunkov [Thu, 28 Apr 2022 10:57:46 +0000 (11:57 +0100)]
tcp: optimise skb_zerocopy_iter_stream()

It's expensive to make a copy of 40B struct iov_iter to the point it
was taking 0.2-0.5% of all cycles in my tests. iov_iter_revert() should
be fine as it's a simple case without nested reverts/truncates.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a7e1690c00c5dfe700c30eb9a8a81ec59f6545dd.1650884401.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoocteontx2-af: debugfs: fix error return of allocations
Niels Dossche [Sat, 30 Apr 2022 19:46:56 +0000 (21:46 +0200)]
octeontx2-af: debugfs: fix error return of allocations

Current memory failure code in the debugfs returns -ENOSPC. This is
normally used for indicating that there is no space left on the
device and is not applicable for memory allocation failures.
Replace this with -ENOMEM.

Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Link: https://lore.kernel.org/r/20220430194656.44357-1-dossche.niels@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'ocelot-stats-improvement'
Jakub Kicinski [Mon, 2 May 2022 21:04:19 +0000 (14:04 -0700)]
Merge branch 'ocelot-stats-improvement'

Colin Foster says:

====================
ocelot stats improvement

A couple of pick-ups after f187bfa6f35 ("net: ethernet: ocelot: remove
the need for num_stats initializer") - one addresses a warning
patchwork flagged about operator precedence when using macro arguments.
The other is a reduction of unnecessary memory allocation.
====================

Link: https://lore.kernel.org/r/20220430232327.4091825-1-colin.foster@in-advantage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: add missed parentheses around macro argument
Colin Foster [Sat, 30 Apr 2022 23:23:27 +0000 (16:23 -0700)]
net: mscc: ocelot: add missed parentheses around macro argument

Commit 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats
initializer") added a macro that patchwork warned it lacked parentheses
around an argument. Correct this mistake.

Fixes: 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats initializer")
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: remove unnecessary variable
Colin Foster [Sat, 30 Apr 2022 23:23:26 +0000 (16:23 -0700)]
net: mscc: ocelot: remove unnecessary variable

Commit 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats
initializer") added a flags field to the ocelot stats structure. The same
behavior can be achieved without this additional field taking up extra
memory.

Remove this structure element to free up RAM

Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoStefan Schmidt says:
Jakub Kicinski [Mon, 2 May 2022 20:57:54 +0000 (13:57 -0700)]
Stefan Schmidt says:

====================
pull-request: ieee802154-next 2022-05-01

Miquel Raynal landed two patch series bundled in this pull request.

The first series re-works the symbol duration handling to better
accommodate the needs of the various phy layers in ieee802154.

In the second series Miquel improves th errors handling from drivers
up mac802154. THis streamlines the error handling throughout the
ieee/mac802154 stack in preparation for sync TX to be introduced for
MLME frames.
====================

Link: https://lore.kernel.org/r/20220501194614.1198325-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-more-heap-allocation-and-split-of-rtnl_newlink'
Paolo Abeni [Mon, 2 May 2022 13:14:22 +0000 (15:14 +0200)]
Merge branch 'net-more-heap-allocation-and-split-of-rtnl_newlink'

Jakub Kicinski says:

====================
net: more heap allocation and split of rtnl_newlink()

Small refactoring of rtnl_newlink() to fix a stack usage warning
and make the function shorter.
====================

Link: https://lore.kernel.org/r/20220429235508.268349-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agortnl: move rtnl_newlink_create()
Jakub Kicinski [Fri, 29 Apr 2022 23:55:08 +0000 (16:55 -0700)]
rtnl: move rtnl_newlink_create()

Pure code move.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agortnl: split __rtnl_newlink() into two functions
Jakub Kicinski [Fri, 29 Apr 2022 23:55:07 +0000 (16:55 -0700)]
rtnl: split __rtnl_newlink() into two functions

__rtnl_newlink() is 250LoC, but has a few clear sections.
Move the part which creates a new netdev to a separate
function.

For ease of review code will be moved in the next change.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agortnl: allocate more attr tables on the heap
Jakub Kicinski [Fri, 29 Apr 2022 23:55:06 +0000 (16:55 -0700)]
rtnl: allocate more attr tables on the heap

Commit a293974590cf ("rtnetlink: avoid frame size warning in rtnl_newlink()")
moved to allocating the largest attribute array of rtnl_newlink()
on the heap. Kalle reports the stack has grown above 1k again:

  net/core/rtnetlink.c:3557:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Move more attrs to the heap, wrap them in a struct.
Don't bother with linkinfo, it's referenced a lot and we take
its size so it's awkward to move, plus it's small (6 elements).

Reported-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'use-mmd-c45-helpers'
Paolo Abeni [Mon, 2 May 2022 11:21:40 +0000 (13:21 +0200)]
Merge branch 'use-mmd-c45-helpers'

Andrew Lunn says:

====================
Use MMD/C45 helpers

MDIO busses can perform two sorts of bus transaction, defined in
clause 22 and clause 45 of 802.3. This results in two register
addresses spaces. The current driver structure for indicating if C22
or C45 should be used is messy, and many C22 only bus drivers will
wrongly interpret a C45 transaction as a C22 transaction.

This patchset is a preparation step to cleanup the situation. It
converts MDIO bus users to make use of existing _mmd and _c45 helpers
to perform accesses to C45 registers. This will later allow C45 and
C22 to be kept separate.
====================

Link: https://lore.kernel.org/r/20220430173037.156823-1-andrew@lunn.ch
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: pcs: pcs-xpcs: Convert to mdiobus_c45_read
Andrew Lunn [Sat, 30 Apr 2022 17:30:37 +0000 (19:30 +0200)]
net: pcs: pcs-xpcs: Convert to mdiobus_c45_read

Stop using the helpers to construct a special mdio address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: dsa: sja1105: Convert to mdiobus_c45_read
Andrew Lunn [Sat, 30 Apr 2022 17:30:36 +0000 (19:30 +0200)]
net: dsa: sja1105: Convert to mdiobus_c45_read

Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: phy: bcm87xx: Use mmd helpers
Andrew Lunn [Sat, 30 Apr 2022 17:30:35 +0000 (19:30 +0200)]
net: phy: bcm87xx: Use mmd helpers

Rather than construct special phy device addresses to access C45
registers, use the mmd helpers. These will directly call the C45 API
of the MDIO bus driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: phy: Convert to mdiobus_c45_{read|write}
Andrew Lunn [Sat, 30 Apr 2022 17:30:34 +0000 (19:30 +0200)]
net: phy: Convert to mdiobus_c45_{read|write}

Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: phylink: Convert to mdiobus_c45_{read|write}
Andrew Lunn [Sat, 30 Apr 2022 17:30:33 +0000 (19:30 +0200)]
net: phylink: Convert to mdiobus_c45_{read|write}

Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge tag 'linux-can-next-for-5.19-20220502' of git://git.kernel.org/pub/scm/linux...
Paolo Abeni [Mon, 2 May 2022 11:03:50 +0000 (13:03 +0200)]
Merge tag 'linux-can-next-for-5.19-20220502' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2022-05-02

this is a pull request of 9 patches for net-next/master.

The first patch is by Biju Das and documents renesas,r9a07g043-canfd
support in the renesas,rcar-canfd bindings document.

Jakub Kicinski's patch removes a copy of the NAPI_POLL_WEIGHT define
from the m_can driver.

The last 7 patches all target the ctucanfd driver. Pavel Pisa provides
2 patch which update the documentation. 2 patches by Jiapeng Chong
remove unneeded includes and error messages. And another 3 patches by
Pavel Pisa to further clean up the driver (remove inline keyword,
remove unneeded debug statements, and remove unneeded module parameters).

linux-can-next-for-5.19-20220502

* tag 'linux-can-next-for-5.19-20220502' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: ctucanfd: remove PCI module debug parameters
  can: ctucanfd: remove debug statements
  can: ctucanfd: remove inline keyword from local static functions
  can: ctucanfd: ctucan_platform_probe(): remove unnecessary print function dev_err()
  can: ctucanfd: remove unused including <linux/version.h>
  docs: networking: device drivers: can: ctucanfd: update author e-mail
  docs: networking: device drivers: can: add ctucanfd to index
  can: m_can: remove a copy of the NAPI_POLL_WEIGHT define
  dt-bindings: can: renesas,rcar-canfd: Document RZ/G2UL support
====================

Link: https://lore.kernel.org/r/20220502075914.1905039-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonfp: support VxLAN inner TSO with GSO_PARTIAL offload
Fei Qin [Sat, 30 Apr 2022 23:11:50 +0000 (08:11 +0900)]
nfp: support VxLAN inner TSO with GSO_PARTIAL offload

VxLAN belongs to UDP-based encapsulation protocol. Inner TSO for VxLAN
packet with udpcsum requires offloading of outer header csum.

The device doesn't support outer header csum offload. However, inner TSO
for VxLAN with udpcsum can still work with GSO_PARTIAL offload, which
means outer udp csum computed by stack and inner tcp segmentation finished
by hardware. Thus, the patch enable features "NETIF_F_GSO_UDP_TUNNEL_CSUM"
and "NETIF_F_GSO_PARTIAL" and set gso_partial_features.

Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220430231150.175270-1-simon.horman@corigine.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoselftests: net: vrf_strict_mode_test: add support to select a test to run
Jaehee Park [Fri, 29 Apr 2022 16:46:58 +0000 (12:46 -0400)]
selftests: net: vrf_strict_mode_test: add support to select a test to run

Add a boilerplate test loop to run all tests in
vrf_strict_mode_test.sh. Add a -t flag that allows a selected test to
run. Remove the vrf_strict_mode_tests function which is now unused.

Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220429164658.GA656707@jaehee-ThinkPad-X1-Extreme
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'devices-always-netif_f_lltx'
Paolo Abeni [Mon, 2 May 2022 08:30:36 +0000 (10:30 +0200)]
Merge branch 'devices-always-netif_f_lltx'

Peilin Ye says:

====================
devices always NETIF_F_LLTX

v1: https://lore.kernel.org/netdev/cover.1650580763.git.peilin.ye@bytedance.com/

change since v1:
  - deleted "depends on patch..." in [1/2]'s commit message

This patchset depends on these fixes [1], which has been merged into
net-next.  Since o_seqno is now atomic_t, we can always turn on
NETIF_F_LLTX for [IP6]GRE[TAP] devices, since we no longer need the TX
lock (&txq->_xmit_lock).

We could probably do the same thing to [IP6]ERSPAN devices as well, but
I'm not familiar with them yet.  For example, ERSPAN devices are
initialized as |= GRE_FEATURES in erspan_tunnel_init(), but I don't see
IP6ERSPAN devices being initialized as |= GRE6_FEATURES.  Where should we
initialize IP6ERSPAN devices' ->features?  Please suggest if I'm missing
something, thanks!

[1] https://lore.kernel.org/netdev/cover.1650575919.git.peilin.ye@bytedance.com/
====================

Link: https://lore.kernel.org/r/cover.1651207788.git.peilin.ye@bytedance.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX
Peilin Ye [Fri, 29 Apr 2022 05:25:47 +0000 (22:25 -0700)]
ip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX

Recently we made o_seqno atomic_t.  Stop special-casing TUNNEL_SEQ, and
always mark IP6GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX
Peilin Ye [Fri, 29 Apr 2022 05:25:21 +0000 (22:25 -0700)]
ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX

Recently we made o_seqno atomic_t.  Stop special-casing TUNNEL_SEQ, and
always mark GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agocan: ctucanfd: remove PCI module debug parameters
Pavel Pisa [Sun, 24 Apr 2022 16:28:08 +0000 (18:28 +0200)]
can: ctucanfd: remove PCI module debug parameters

This patch removes the PCI module debug parameters, which are not
needed anymore, to make both checkpatch.pl and patchwork happy.

Link: https://lore.kernel.org/all/1fd684bcf5ddb0346aad234072f54e976a5210fb.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: ctucanfd: remove debug statements
Pavel Pisa [Sun, 24 Apr 2022 16:28:08 +0000 (18:28 +0200)]
can: ctucanfd: remove debug statements

This patch removes the debug statements from the driver to make
checkpatch.pl and patchwork happy.

Link: https://lore.kernel.org/all/1fd684bcf5ddb0346aad234072f54e976a5210fb.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: ctucanfd: remove inline keyword from local static functions
Pavel Pisa [Sun, 24 Apr 2022 16:28:08 +0000 (18:28 +0200)]
can: ctucanfd: remove inline keyword from local static functions

This patch removes the inline keywords from the local static functions
to make both checkpatch.pl and patchwork happy.

Link: https://lore.kernel.org/all/1fd684bcf5ddb0346aad234072f54e976a5210fb.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: ctucanfd: ctucan_platform_probe(): remove unnecessary print function dev_err()
Jiapeng Chong [Thu, 21 Apr 2022 20:32:42 +0000 (04:32 +0800)]
can: ctucanfd: ctucan_platform_probe(): remove unnecessary print function dev_err()

The print function dev_err() is redundant because platform_get_irq()
already prints an error.

Eliminate the follow coccicheck warnings:

| drivers/net/can/ctucanfd/ctucanfd_platform.c:67:2-9:
| line 67 is redundant because platform_get_irq() already prints an error.

Link: https://lore.kernel.org/all/20220421203242.7335-1-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Pave Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: ctucanfd: remove unused including <linux/version.h>
Jiapeng Chong [Thu, 21 Apr 2022 20:28:52 +0000 (04:28 +0800)]
can: ctucanfd: remove unused including <linux/version.h>

Eliminate the follow versioncheck warning:

| drivers/net/can/ctucanfd/ctucanfd_base.c: 34 linux/version.h not needed.

Link: https://lore.kernel.org/all/20220421202852.2693-1-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Pave Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agodocs: networking: device drivers: can: ctucanfd: update author e-mail
Pavel Pisa [Sun, 24 Apr 2022 16:28:11 +0000 (18:28 +0200)]
docs: networking: device drivers: can: ctucanfd: update author e-mail

This patch updates the author's email address.

Link: https://lore.kernel.org/all/e4396244da6b008c671def9f50bb983a10389863.1650816929.git.pisa@cmp.felk.cvut.cz
Cc: Odrej Ille <ondrej.ille@gmail.com>
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agodocs: networking: device drivers: can: add ctucanfd to index
Pavel Pisa [Sun, 24 Apr 2022 16:28:11 +0000 (18:28 +0200)]
docs: networking: device drivers: can: add ctucanfd to index

This patch adds the ctucanfd-driver document to the index.

Link: https://lore.kernel.org/all/e4396244da6b008c671def9f50bb983a10389863.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: m_can: remove a copy of the NAPI_POLL_WEIGHT define
Jakub Kicinski [Fri, 29 Apr 2022 17:44:46 +0000 (10:44 -0700)]
can: m_can: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same values in
the drivers just makes refactoring harder.

Link: https://lore.kernel.org/all/20220429174446.196655-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agodt-bindings: can: renesas,rcar-canfd: Document RZ/G2UL support
Biju Das [Sat, 23 Apr 2022 13:07:43 +0000 (14:07 +0100)]
dt-bindings: can: renesas,rcar-canfd: Document RZ/G2UL support

Add CANFD binding documentation for Renesas R9A07G043 (RZ/G2UL) SoC.

Link: https://lore.kernel.org/all/20220423130743.123198-1-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agoMerge branch 'adin1100-industrial-PHY-support'
David S. Miller [Sun, 1 May 2022 16:45:35 +0000 (17:45 +0100)]
Merge branch 'adin1100-industrial-PHY-support'

Alexandru Tachici says:

====================
net: phy: adin1100: Add initial support for ADIN1100 industrial PHY

The ADIN1100 is a low power single port 10BASE-T1L transceiver designed for
industrial Ethernet applications and is compliant with the IEEE 802.3cg
Ethernet standard for long reach 10 Mb/s Single Pair Ethernet.

The ADIN1100 uses Auto-Negotiation capability in accordance
with IEEE 802.3 Clause 98, providing a mechanism for
exchanging information between PHYs to allow link partners to
agree to a common mode of operation.

The concluded operating mode is the transmit amplitude mode and
master/slave preference common across the two devices.

Both device and LP advertise their ability and request for
increased transmit at:
- BASE-T1 autonegotiation advertisement register [47:32]\
Clause 45.2.7.21 of Standard 802.3
- BIT(13) - 10BASE-T1L High Level Transmit Operating Mode Ability
- BIT(12) - 10BASE-T1L High Level Transmit Operating Mode Request

For 2.4 Vpp (high level transmit) operation, both devices need
to have the High Level Transmit Operating Mode Ability bit set,
and only one of them needs to have the High Level Transmit
Operating Mode Request bit set. Otherwise 1.0 Vpp transmit level
will be used.

Settings for eth1:
Supported ports: [ TP  MII ]
Supported link modes:   10baseT1L/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  10baseT1L/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes:  10baseT1L/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 10Mb/s
Duplex: Full
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: slave
Port: Twisted Pair
PHYAD: 0
Transceiver: external
MDI-X: Unknown
Link detected: yes
SQI: 7/7

1. Add basic support for ADIN1100.

Alexandru Ardelean (1):
  net: phy: adin1100: Add initial support for ADIN1100 industrial PHY

1. Added 10baset-T1L link modes.

2. Added 10-BasetT1L registers.

3. Added Base-T1 auto-negotiation registers. For Base-T1 these
registers decide master/slave status and TX voltage of the
device and link partner.

4. Added 10BASE-T1L support in phy-c45.c. Now genphy functions will call
Base-T1 functions where registers don't match, like the auto-negotiation ones.

5. Convert MSE to SQI using a predefined table and allow user access
through ethtool.

6. DT bindings for the 2.4 Vpp transmit mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: phy: Add 10-baseT1L 2.4 Vpp
Alexandru Tachici [Fri, 29 Apr 2022 15:34:37 +0000 (18:34 +0300)]
dt-bindings: net: phy: Add 10-baseT1L 2.4 Vpp

Add a tristate property to advertise desired transmit level.

If the device supports the 2.4 Vpp operating mode for 10BASE-T1L,
as defined in 802.3gc, and the 2.4 Vpp transmit voltage operation
is desired, property should be set to 1. This property is used
to select whether Auto-Negotiation advertises a request to
operate the 10BASE-T1L PHY in increased transmit level mode.

If property is set to 1, the PHY shall advertise a request
to operate the 10BASE-T1L PHY in increased transmit level mode.
If property is set to zero, the PHY shall not advertise
a request to operate the 10BASE-T1L PHY in increased transmit level mode.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: adin1100: Add SQI support
Alexandru Tachici [Fri, 29 Apr 2022 15:34:36 +0000 (18:34 +0300)]
net: phy: adin1100: Add SQI support

Determine the SQI from MSE using a predefined table
for the 10BASE-T1L.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: adin1100: Add initial support for ADIN1100 industrial PHY
Alexandru Ardelean [Fri, 29 Apr 2022 15:34:35 +0000 (18:34 +0300)]
net: phy: adin1100: Add initial support for ADIN1100 industrial PHY

The ADIN1100 is a low power single port 10BASE-T1L transceiver designed for
industrial Ethernet applications and is compliant with the IEEE 802.3cg
Ethernet standard for long reach 10 Mb/s Single Pair Ethernet.

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: Add 10BASE-T1L support in phy-c45
Alexandru Tachici [Fri, 29 Apr 2022 15:34:34 +0000 (18:34 +0300)]
net: phy: Add 10BASE-T1L support in phy-c45

This patch is needed because the BASE-T1 uses different registers
for status, control and advertisement to those already
employed in the existing phy-c45 functions.

Where required, genphy_c45 functions will now check whether
the device supports BASE-T1 and use the specific registers
instead: 45.2.7.19 BASE-T1 AN control register,
45.2.7.20 BASE-T1 AN status, 45.2.7.21 BASE-T1 AN
advertisement register, 45.2.7.22 BASE-T1 AN LP Base
Page ability register, 45.2.1.185 BASE-T1 PMA/PMD control
register.

Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: Add BaseT1 auto-negotiation registers
Alexandru Tachici [Fri, 29 Apr 2022 15:34:33 +0000 (18:34 +0300)]
net: phy: Add BaseT1 auto-negotiation registers

Added BASE-T1 AN advertisement register (Registers 7.514, 7.515, and
7.516) and BASE-T1 AN LP Base Page ability register (Registers 7.517,
7.518, and 7.519).

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: Add 10-BaseT1L registers
Alexandru Tachici [Fri, 29 Apr 2022 15:34:32 +0000 (18:34 +0300)]
net: phy: Add 10-BaseT1L registers

The 802.3gc specification defines the 10-BaseT1L link
mode for ethernet trafic on twisted wire pair.

PMA status register can be used to detect if the phy supports
2.4 V TX level and PCS control register can be used to
enable/disable PCS level loopback.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoethtool: Add 10base-T1L link mode entry
Alexandru Tachici [Fri, 29 Apr 2022 15:34:31 +0000 (18:34 +0300)]
ethtool: Add 10base-T1L link mode entry

Add entry for the 10base-T1L full duplex mode.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: Cosmetic change spaces to tabs in dsa_switch_ops
Marek Behún [Fri, 29 Apr 2022 14:32:14 +0000 (16:32 +0200)]
net: dsa: mv88e6xxx: Cosmetic change spaces to tabs in dsa_switch_ops

All but 5 methods in dsa_swith_ops use tabs for indentation.

Change the 5 methods that break this rule.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: enable memcg accounting for veth queues
Vasily Averin [Fri, 29 Apr 2022 05:17:35 +0000 (08:17 +0300)]
net: enable memcg accounting for veth queues

veth netdevice defines own rx queues and allocates array containing
up to 4095 ~750-bytes-long 'struct veth_rq' elements. Such allocation
is quite huge and should be accounted to memcg.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'UDP-sock_wfree-opts'
David S. Miller [Sun, 1 May 2022 11:19:01 +0000 (12:19 +0100)]
Merge branch 'UDP-sock_wfree-opts'

Pavel Begunkov says:

====================
UDP sock_wfree optimisations

The series is not UDP specific but that the main beneficiary. 2/3 saves one
atomic in sock_wfree() and on top 3/3 removes an extra barrier.
Tested with UDP over dummy netdev, 2038491 -> 2099071 req/s (or around +3%).

note: in regards to 1/3, there is a "Should agree with poll..." comment
that I don't completely get, and there is no git history to explain it.
Though I can't see how it could rely on having the second check without
racing with tasks woken by wake_up*().

The series was split from a larger patchset, see
https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosock: optimise sock_def_write_space barriers
Pavel Begunkov [Thu, 28 Apr 2022 10:58:19 +0000 (11:58 +0100)]
sock: optimise sock_def_write_space barriers

Now we have a separate path for sock_def_write_space() and can go one
step further. When it's called from sock_wfree() we know that there is a
preceding atomic for putting down ->sk_wmem_alloc. We can use it to
replace to replace smb_mb() with a less expensive
smp_mb__after_atomic(). It also removes an extra RCU read lock/unlock as
a small bonus.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosock: optimise UDP sock_wfree() refcounting
Pavel Begunkov [Thu, 28 Apr 2022 10:58:18 +0000 (11:58 +0100)]
sock: optimise UDP sock_wfree() refcounting

For non SOCK_USE_WRITE_QUEUE sockets, sock_wfree() (atomically) puts
->sk_wmem_alloc twice. It's needed to keep the socket alive while
calling ->sk_write_space() after the first put.

However, some sockets, such as UDP, are freed by RCU
(i.e. SOCK_RCU_FREE) and use already RCU-safe sock_def_write_space().
Carve a fast path for such sockets, put down all refs in one go before
calling sock_def_write_space() but guard the socket from being freed
by an RCU read section.

note: because TCP sockets are marked with SOCK_USE_WRITE_QUEUE it
doesn't add extra checks in its path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosock: dedup sock_def_write_space wmem_alloc checks
Pavel Begunkov [Thu, 28 Apr 2022 10:58:17 +0000 (11:58 +0100)]
sock: dedup sock_def_write_space wmem_alloc checks

Except for minor rounding differences the first ->sk_wmem_alloc test in
sock_def_write_space() is a hand coded version of sock_writeable().
Replace it with the helper, and also kill the following if duplicating
the check.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: marvell: update abilities and advertising when switching to SGMII
Robert Hancock [Wed, 27 Apr 2022 19:39:28 +0000 (13:39 -0600)]
net: phy: marvell: update abilities and advertising when switching to SGMII

With some SFP modules, such as Finisar FCLF8522P2BTL, the PHY hardware
strapping defaults to 1000BaseX mode, but the kernel prefers to set them
for SGMII mode. When this happens and the PHY is soft reset, the BMSR
status register is updated, but this happens after the kernel has already
read the PHY abilities during probing. This results in support not being
detected for, and the PHY not advertising support for, 10 and 100 Mbps
modes, preventing the link from working with a non-gigabit link partner.

When the PHY is being configured for SGMII mode, call genphy_read_abilities
again in order to re-read the capabilities, and update the advertising
field accordingly.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mac802154: Fix symbol durations
Miquel Raynal [Thu, 28 Apr 2022 16:41:40 +0000 (18:41 +0200)]
net: mac802154: Fix symbol durations

There are two major issues in the logic calculating the symbol durations
based on the page/channel:
- The page number is used in place of the channel value.
- The BIT() macro is missing because we want to check the channel
  value against a bitmask.

Fix these two errors and apologize loudly for this mistake.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20220428164140.251965-1-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2 years agonet: lan966x: Fix compilation error
Horatiu Vultur [Fri, 29 Apr 2022 07:19:53 +0000 (09:19 +0200)]
net: lan966x: Fix compilation error

Starting from the blamed commit, the lan966x build fails with the
following compilation error:

drivers/net/ethernet/microchip/lan966x/lan966x_ptp.c:342:9: error: implicit declaration of function ‘ptp_find_pin_unlocked’ [-Werror=implicit-function-declaration]
  342 |   pin = ptp_find_pin_unlocked(phc->clock, PTP_PF_EXTTS, 0);

The issue is that there is no stub function for ptp_find_pin_unlocked
in case CONFIG_PTP_1588_CLOCK is not selected. Therefore add one.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: f3d8e0a9c28ba0 ("net: lan966x: Add support for PTP_PF_EXTTS")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv4: remove unnecessary type castings
Yu Zhe [Fri, 29 Apr 2022 02:14:04 +0000 (19:14 -0700)]
ipv4: remove unnecessary type castings

remove unnecessary void* type castings.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoeth: remove remaining copies of the NAPI_POLL_WEIGHT define
Jakub Kicinski [Fri, 29 Apr 2022 17:43:30 +0000 (10:43 -0700)]
eth: remove remaining copies of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

This patch covers three more drivers which I missed in
commit 5f012b40ef63 ("eth: remove copies of the NAPI_POLL_WEIGHT define").

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: use tcp_skb_sent_after() instead in RACK
Pengcheng Yang [Fri, 29 Apr 2022 10:32:56 +0000 (18:32 +0800)]
tcp: use tcp_skb_sent_after() instead in RACK

This patch doesn't change any functionality.

Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/funeth: simplify the return expression of fun_dl_info_get()
Minghao Chi [Fri, 29 Apr 2022 09:01:04 +0000 (09:01 +0000)]
net/funeth: simplify the return expression of fun_dl_info_get()

Simplify the return expression.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoqede: Reduce verbosity of ptp tx timestamp
Prabhakar Kushwaha [Sat, 30 Apr 2022 01:05:13 +0000 (04:05 +0300)]
qede: Reduce verbosity of ptp tx timestamp

Reduce verbosity of ptp tx timestamp error to reduce excessive log
messages.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ocelot: remove the need for num_stats initializer
Colin Foster [Fri, 29 Apr 2022 21:30:36 +0000 (14:30 -0700)]
net: ethernet: ocelot: remove the need for num_stats initializer

There is a desire to share the oclot_stats_layout struct outside of the
current vsc7514 driver. In order to do so, the length of the array needs to
be known at compile time, and defined in the struct ocelot and struct
felix_info.

Since the array is defined in a .c file and would be declared in the header
file via:
extern struct ocelot_stat_layout[];
the size of the array will not be known at compile time to outside modules.

To fix this, remove the need for defining the number of stats at compile
time and allow this number to be determined at initialization.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: drop skb dst in tcp_rcv_established()
Eric Dumazet [Sat, 30 Apr 2022 01:15:23 +0000 (18:15 -0700)]
tcp: drop skb dst in tcp_rcv_established()

In commit f84af32cbca7 ("net: ip_queue_rcv_skb() helper")
I dropped the skb dst in tcp_data_queue().

This only dealt with so-called TCP input slow path.

When fast path is taken, tcp_rcv_established() calls
tcp_queue_rcv() while skb still has a dst.

This was mostly fine, because most dsts at this point
are not refcounted (thanks to early demux)

However, TCP packets sent over loopback have refcounted dst.

Then commit 68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") came and had the effect
of delaying skb freeing for an arbitrary time.

If during this time the involved netns is dismantled, cleanup_net()
frees the struct net with embedded net->ipv6.ip6_dst_ops.

Then when eventually dst_destroy_rcu() is called,
if (dst->ops->destroy) ... triggers an use-after-free.

It is not clear if ip6_route_net_exit() lacks a rcu_barrier()
as syzbot reported similar issues before the blamed commit.

( https://groups.google.com/g/syzkaller-bugs/c/CofzW4eeA9A/m/009WjumTAAAJ )

Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'lan966x-phy-reset-remove'
David S. Miller [Sat, 30 Apr 2022 12:09:26 +0000 (13:09 +0100)]
Merge branch 'lan966x-phy-reset-remove'

Michael Walle says:

====================
net: lan966x: remove PHY reset support

Remove the unneeded PHY reset node as well as the driver support for it.

This was already discussed [1] and I expect Microchip to Ack on this
removal. Since there is no user, no breakage is expected.

I'm not sure it this should go through net or net-next and if the patches
should have a Fixes: tag or not. In upstream linux there was never any user
of it, so there is no bug to be fixed. But OTOH if the schema fix isn't
backported, then there might be an older schema version still containing
the reset node. Thoughts?

The patches needed for the GPIO part are just waiting to be picked up by
Linus [2,3]. This patch and the GPIO parts are the last pieces of the
puzzle to get ethernet working on the LAN9668 on upstream linux.

[1] https://lore.kernel.org/netdev/20220330110210.3374165-1-michael@walle.cc/
[2] https://lore.kernel.org/linux-gpio/CACRpkdbxmN+SWt95aGHjA2ZGnN61aWaA7c5S4PaG+WePAj=htg@mail.gmail.com/
[3] https://lore.kernel.org/linux-gpio/20220420191926.3411830-1-michael@walle.cc/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: remove PHY reset support
Michael Walle [Thu, 28 Apr 2022 11:40:49 +0000 (13:40 +0200)]
net: lan966x: remove PHY reset support

The PHY subsystem as well as the MIIM mdio driver (in case of the
integrated PHYs) will take care of the resets. A separate reset driver
isn't needed. There is no in-tree user of this feature. Remove the
support.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: lan966x: remove PHY reset
Michael Walle [Thu, 28 Apr 2022 11:40:48 +0000 (13:40 +0200)]
dt-bindings: net: lan966x: remove PHY reset

The PHY reset was intended to be a phandle for a special PHY reset
driver for the integrated PHYs as well as any external PHYs. It turns
out, that the culprit is how the reset of the switch device is done.
In particular, the switch reset also affects other subsystems like
the GPIO and the SGPIO block and it happens to be the case that the
reset lines of the external PHYs are connected to a common GPIO line.
Thus as soon as the switch issues a reset during probe time, all the
external PHYs will go into reset because all the GPIO lines will
switch to input and the pull-down on that signal will take effect.

So even if there was a special PHY reset driver, it (1) won't fix
the root cause of the problem and (2) it won't fix all the other
consumers of GPIO lines which will also be reset.

It turns out, the Ocelot SoC has the same weird behavior (or the
lack of a dedicated switch reset) and there the problem is already
solved and all the bits and pieces are already there and this PHY
reset property isn't not needed at all.

There are no users of this binding. Just remove it.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ipv6-net-opts'
David S. Miller [Sat, 30 Apr 2022 11:58:45 +0000 (12:58 +0100)]
Merge branch 'ipv6-net-opts'

Pavel Begunkov says:

====================
generic net and ipv6 minor optimisations

1-3 inline simple functions that only reshuffle arguments possibly adding
extra zero args, and call another function. It was benchmarked before with
a bunch of extra patches, see for details

https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/

It may increase the binary size, but it's the right thing to do and at least
without modules it actually sheds some bytes for some standard-ish config.

   text    data     bss     dec     hex filename
9627200       0       0 9627200  92e640 ./arch/x86_64/boot/bzImage
   text    data     bss     dec     hex filename
9627104       0       0 9627104  92e5e0 ./arch/x86_64/boot/bzImage
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: refactor ip6_finish_output2()
Pavel Begunkov [Thu, 28 Apr 2022 10:58:48 +0000 (11:58 +0100)]
ipv6: refactor ip6_finish_output2()

Throw neigh checks in ip6_finish_output2() under a single slow path if,
so we don't have the overhead in the hot path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: help __ip6_finish_output() inlining
Pavel Begunkov [Thu, 28 Apr 2022 10:58:47 +0000 (11:58 +0100)]
ipv6: help __ip6_finish_output() inlining

There are two callers of __ip6_finish_output(), both are in
ip6_finish_output(). We can combine the call sites into one and handle
return code after, that will inline __ip6_finish_output().

Note, error handling under NET_XMIT_CN will only return 0 if
__ip6_finish_output() succeded, and in this case it return 0.
Considering that NET_XMIT_SUCCESS is 0, it'll be returning exactly the
same result for it as before.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>