git.osdn.net Git - tomoyo/tomoyo-test1.git/log

net/mlx5e: Do not update SBCM when prio2buffer command is invalid

The shared buffer pools configuration which are stored in the SBCM
register are updated when the user changes the prio2buffer mapping.

However, in case the user desired prio2buffer change is invalid,
which can occur due to mapping a lossless priority to a not large enough
buffer, the SBCM update should not be performed, as the user command is
failed.

Thus, Perform the SBCM update only after xoff threshold calculation is
performed and the user prio2buffer mapping is validated.

Fixes: a440030d8946 ("net/mlx5e: Update shared buffer along with device buffer changes")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Consider internal buffers size in port buffer calculations

Currently, when a user triggers a change in port buffer headroom
(buffers 0-7), the driver checks that the requested headroom does
not exceed the total port buffer size. However, this check does not
take into account the internal buffers (buffers 8-9), which are also
part of the total port buffer. This can result in treating invalid port
buffer change requests as valid, causing unintended changes to the shared
buffer.

To address this, include the internal buffers size in the calculation of
available port buffer space which ensures that port buffer requests do not
exceed the correct limit.

Furthermore, remove internal buffers (8-9) size from the total_size
calculation as these buffers are reserved for internal use and are not
exposed to the user.

While at it, add verbosity to the debug prints in
mlx5e_port_query_buffer() function to ease future debugging.

Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Prevent encap offload when neigh update is running

The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

crash> bt ffff8aece8a64000
PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
crash> bt 0xffff8aeb07544000
PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad7999b ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Extract remaining tunnel encap code to dedicated file

Move set_encap_dests() and clean_encap_dests() to the tunnel encap
dedicated file. And rename them to mlx5e_tc_tun_encap_dests_set()
and mlx5e_tc_tun_encap_dests_unset().

No functional change in this patch. It is needed in the next patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

ipv6: Fix out-of-bounds access in ipv6_find_tlv()

optlen is fetched without checking whether there is more than one byte to parse.
It can lead to out-of-bounds access.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Fixes: c61a40432509 ("[IPV6]: Find option offset by type.")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-fixes-2023-05-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2023-05-22

This series provides bug fixes for the mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs

The commit c6d96df9fa2c ("net: ethernet: mtk_eth_soc: drop generic vlan rx
offload, only use DSA untagging") makes VLAN RX offloading to be only used
on the SoCs without the MTK_NETSYS_V2 ability (which are not just MT7621
and MT7622). The commit disables the proper handling of special tagged
(DSA) frames, added with commit 87e3df4961f4 ("net-next: ethernet:
mediatek: add CDM able to recognize the tag for DSA"), for non
MTK_NETSYS_V2 SoCs when it finds a MAC that does not use DSA. So if the
other MAC uses DSA, the CDMQ component transmits DSA tagged frames to the
CPU improperly. This issue can be observed on frames with TCP, for example,
a TCP speed test using iperf3 won't work.

The commit disables the proper handling of special tagged (DSA) frames
because it assumes that these SoCs don't use more than one MAC, which is
wrong. Although I made Frank address this false assumption on the patch log
when they sent the patch on behalf of Felix, the code still made changes
with this assumption.

Therefore, the proper handling of special tagged (DSA) frames must be kept
enabled in all circumstances as it doesn't affect non DSA tagged frames.

Hardware DSA untagging, introduced with the commit 2d7605a72906 ("net:
ethernet: mtk_eth_soc: enable hardware DSA untagging"), and VLAN RX
offloading are operations on the two CDM components of the frame engine,
CDMP and CDMQ, which connect to Packet DMA (PDMA) and QoS DMA (QDMA) and
are between the MACs and the CPU. These operations apply to all MACs of the
SoC so if one MAC uses DSA and the other doesn't, the hardware DSA
untagging operation will cause the CDMP component to transmit non DSA
tagged frames to the CPU improperly.

Since the VLAN RX offloading feature configuration was dropped, VLAN RX
offloading can only be used along with hardware DSA untagging. So, for the
case above, we need to disable both features and leave it to the CPU,
therefore software, to untag the DSA and VLAN tags.

So the correct way to handle this is:

For all SoCs:

Enable the proper handling of special tagged (DSA) frames
(MTK_CDMQ_IG_CTRL).

For non MTK_NETSYS_V2 SoCs:

Enable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Enable VLAN RX offloading (MTK_CDMP_EG_CTRL).

When a non MTK_NETSYS_V2 SoC MAC does not use DSA:

Disable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Disable VLAN RX offloading (MTK_CDMP_EG_CTRL).

Fixes: c6d96df9fa2c ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: netdev: document the existence of the mail bot

We had a good run, but after 4 weeks of use we heard someone
asking about pw-bot commands. Let's explain its existence
in the docs. It's not a complete documentation but hopefully
it's enough for the casual contributor. The project and scope
are in flux so the details would likely become out of date,
if we were to document more in depth.

Link: https://lore.kernel.org/all/20230522140057.GB18381@nucnuc.mle/
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522230903.1853151-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fix skb leak in __skb_tstamp_tx()

Commit 50749f2dd685 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with
TX timestamp.") added a call to skb_orphan_frags_rx() to fix leaks with
zerocopy skbs. But it ended up adding a leak of its own. When
skb_orphan_frags_rx() fails, the function just returns, leaking the skb
it just cloned. Free it before returning.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Fixes: 50749f2dd685 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.")
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230522153020.32422-1-ptyadav@amazon.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: Use a raw_spinlock_t for the register locks.

The driver's interrupt service routine is requested with the
IRQF_NO_THREAD if MSI is available. This means that the routine is
invoked in hardirq context even on PREEMPT_RT. The routine itself is
relatively short and schedules a worker, performs register access and
schedules NAPI. On PREEMPT_RT, scheduling NAPI from hardirq results in
waking ksoftirqd for further processing so using NAPI threads with this
driver is highly recommended since it NULL routes the threaded-IRQ
efforts.

Adding rtl_hw_aspm_clkreq_enable() to the ISR is problematic on
PREEMPT_RT because the function uses spinlock_t locks which become
sleeping locks on PREEMPT_RT. The locks are only used to protect
register access and don't nest into other functions or locks. They are
also not used for unbounded period of time. Therefore it looks okay to
convert them to raw_spinlock_t.

Convert the three locks which are used from the interrupt service
routine to raw_spinlock_t.

Fixes: e1ed3e4d9111 ("r8169: disable ASPM during NAPI poll")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20230522134121.uxjax0F5@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

page_pool: fix inconsistency for page_pool_ring_[un]lock()

page_pool_ring_[un]lock() use in_softirq() to decide which
spin lock variant to use, and when they are called in the
context with in_softirq() being false, spin_lock_bh() is
called in page_pool_ring_lock() while spin_unlock() is
called in page_pool_ring_unlock(), because spin_lock_bh()
has disabled the softirq in page_pool_ring_lock(), which
causes inconsistency for spin lock pair calling.

This patch fixes it by returning in_softirq state from
page_pool_producer_lock(), and use it to decide which
spin lock variant to use in page_pool_producer_unlock().

As pool->ring has both producer and consumer lock, so
rename it to page_pool_producer_[un]lock() to reflect
the actual usage. Also move them to page_pool.c as they
are only used there, and remove the 'inline' as the
compiler may have better idea to do inlining or not.

Fixes: 7886244736a4 ("net: page_pool: Add bulk support for ptr_ring")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv{4,6}/raw: fix output xfrm lookup wrt protocol

With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.

For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.

For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

lan966x: Fix unloading/loading of the driver

It was noticing that after a while when unloading/loading the driver and
sending traffic through the switch, it would stop working. It would stop
forwarding any traffic and the only way to get out of this was to do a
power cycle of the board. The root cause seems to be that the switch
core is initialized twice. Apparently initializing twice the switch core
disturbs the pointers in the queue systems in the HW, so after a while
it would stop sending the traffic.
Unfortunetly, it is not possible to use a reset of the switch here,
because the reset line is connected to multiple devices like MDIO,
SGPIO, FAN, etc. So then all the devices will get reseted when the
network driver will be loaded.
So the fix is to check if the core is initialized already and if that is
the case don't initialize it again.

Fixes: db8bcaad5393 ("net: lan966x: add the basic lan966x driver")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522120038.3749026-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Fix indexing of mlx5_irq

After the cited patch, mlx5_irq xarray index can be different then
mlx5_irq MSIX table index.
Fix it by storing both mlx5_irq xarray index and MSIX table index.

Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Fix irq affinity management

The cited patch deny the user of changing the affinity of mlx5 irqs,
which break backward compatibility.
Hence, allow the user to change the affinity of mlx5 irqs.

Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Free irqs only on shutdown callback

Whenever a shutdown is invoked, free irqs only and keep mlx5_irq
synthetic wrapper intact in order to avoid use-after-free on
system shutdown.

for example:
==================================================================
BUG: KASAN: use-after-free in _find_first_bit+0x66/0x80
Read of size 8 at addr ffff88823fc0d318 by task kworker/u192:0/13608

CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G    B   W  O  6.1.21-cloudflare-kasan-2023.3.21 #1
Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x48
  print_report+0x170/0x473
  ? _find_first_bit+0x66/0x80
  kasan_report+0xad/0x130
  ? _find_first_bit+0x66/0x80
  _find_first_bit+0x66/0x80
  mlx5e_open_channels+0x3c5/0x3a10 [mlx5_core]
  ? console_unlock+0x2fa/0x430
  ? _raw_spin_lock_irqsave+0x8d/0xf0
  ? _raw_spin_unlock_irqrestore+0x42/0x80
  ? preempt_count_add+0x7d/0x150
  ? __wake_up_klogd.part.0+0x7d/0xc0
  ? vprintk_emit+0xfe/0x2c0
  ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core]
  ? dev_attr_show.cold+0x35/0x35
  ? devlink_health_do_dump.part.0+0x174/0x340
  ? devlink_health_report+0x504/0x810
  ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  ? process_one_work+0x680/0x1050
  mlx5e_safe_switch_params+0x156/0x220 [mlx5_core]
  ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core]
  ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core]
  mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core]
  ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0
  devlink_health_reporter_recover+0xa6/0x1f0
  devlink_health_report+0x2f7/0x810
  ? vsnprintf+0x854/0x15e0
  mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core]
  ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core]
  ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core]
  ? newidle_balance+0x9b7/0xe30
  ? psi_group_change+0x6a7/0xb80
  ? mutex_lock+0x96/0xf0
  ? __mutex_lock_slowpath+0x10/0x10
  mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  process_one_work+0x680/0x1050
  worker_thread+0x5a0/0xeb0
  ? process_one_work+0x1050/0x1050
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>

Freed by task 1:
  kasan_save_stack+0x23/0x50
  kasan_set_track+0x21/0x30
  kasan_save_free_info+0x2a/0x40
  ____kasan_slab_free+0x169/0x1d0
  slab_free_freelist_hook+0xd2/0x190
  __kmem_cache_free+0x1a1/0x2f0
  irq_pool_free+0x138/0x200 [mlx5_core]
  mlx5_irq_table_destroy+0xf6/0x170 [mlx5_core]
  mlx5_core_eq_free_irqs+0x74/0xf0 [mlx5_core]
  shutdown+0x194/0x1aa [mlx5_core]
  pci_device_shutdown+0x75/0x120
  device_shutdown+0x35c/0x620
  kernel_restart+0x60/0xa0
  __do_sys_reboot+0x1cb/0x2c0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x4b/0xb5

The buggy address belongs to the object at ffff88823fc0d300
  which belongs to the cache kmalloc-192 of size 192
The buggy address is located 24 bytes inside of
  192-byte region [ffff88823fc0d300, ffff88823fc0d3c0)

The buggy address belongs to the physical page:
page:0000000010139587 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x23fc0c
head:0000000010139587 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004ca00
raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88823fc0d200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88823fc0d280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff88823fc0d300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
  ffff88823fc0d380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  ffff88823fc0d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
general protection fault, probably for non-canonical address
0xdffffc005c40d7ac: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000002e206bd60-0x00000002e206bd67]
CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G    B   W  O  6.1.21-cloudflare-kasan-2023.3.21 #1
Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
RIP: 0010:__alloc_pages+0x141/0x5c0
Call Trace:
  <TASK>
  ? sysvec_apic_timer_interrupt+0xa0/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? __alloc_pages_slowpath.constprop.0+0x1ec0/0x1ec0
  ? _raw_spin_unlock_irqrestore+0x3d/0x80
  __kmalloc_large_node+0x80/0x120
  ? kvmalloc_node+0x4e/0x170
  __kmalloc_node+0xd4/0x150
  kvmalloc_node+0x4e/0x170
  mlx5e_open_channels+0x631/0x3a10 [mlx5_core]
  ? console_unlock+0x2fa/0x430
  ? _raw_spin_lock_irqsave+0x8d/0xf0
  ? _raw_spin_unlock_irqrestore+0x42/0x80
  ? preempt_count_add+0x7d/0x150
  ? __wake_up_klogd.part.0+0x7d/0xc0
  ? vprintk_emit+0xfe/0x2c0
  ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core]
  ? dev_attr_show.cold+0x35/0x35
  ? devlink_health_do_dump.part.0+0x174/0x340
  ? devlink_health_report+0x504/0x810
  ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  ? process_one_work+0x680/0x1050
  mlx5e_safe_switch_params+0x156/0x220 [mlx5_core]
  ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core]
  ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core]
  mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core]
  ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0
  devlink_health_reporter_recover+0xa6/0x1f0
  devlink_health_report+0x2f7/0x810
  ? vsnprintf+0x854/0x15e0
  mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core]
  ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core]
  ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core]
  ? newidle_balance+0x9b7/0xe30
  ? psi_group_change+0x6a7/0xb80
  ? mutex_lock+0x96/0xf0
  ? __mutex_lock_slowpath+0x10/0x10
  mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  process_one_work+0x680/0x1050
  worker_thread+0x5a0/0xeb0
  ? process_one_work+0x1050/0x1050
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>
---[ end trace 0000000000000000  ]---
RIP: 0010:__alloc_pages+0x141/0x5c0
Code: e0 39 a3 96 89 e9 b8 22 01 32 01 83 e1 0f 48 89 fa 01 c9 48 c1 ea
03 d3 f8 83 e0 03 89 44 24 6c 48 b8 00 00 00 00 00 fc ff df <80> 3c 02
00 0f 85 fc 03 00 00 89 e8 4a 8b 14 f5 e0 39 a3 96 4c 89
RSP: 0018:ffff888251f0f438 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 1ffff1104a3e1e8b RCX: 0000000000000000
RDX: 000000005c40d7ac RSI: 0000000000000003 RDI: 00000002e206bd60
RBP: 0000000000052dc0 R08: ffff8882b0044218 R09: ffff8882b0045e8a
R10: fffffbfff300fefc R11: ffff888167af4000 R12: 0000000000000003
R13: 0000000000000000 R14: 00000000696c7070 R15: ffff8882373f4380
FS:  0000000000000000(0000) GS:ffff88bf2be80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005641d031eee8 CR3: 0000002e7ca14000 CR4: 0000000000350ee0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception  ]---]

Reported-by: Frederick Lawler <fred@cloudflare.com>
Link: https://lore.kernel.org/netdev/be5b9271-7507-19c5-ded1-fa78f1980e69@cloudflare.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Devcom, serialize devcom registration

From one hand, mlx5 driver is allowing to probe PFs in parallel.
From the other hand, devcom, which is a share resource between PFs, is
registered without any lock. This might resulted in memory problems.

Hence, use the global mlx5_dev_list_lock in order to serialize devcom
registration.

Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Devcom, fix error flow in mlx5_devcom_register_device

In case devcom allocation is failed, mlx5 is always freeing the priv.
However, this priv might have been allocated by a different thread,
and freeing it might lead to use-after-free bugs.
Fix it by freeing the priv only in case it was allocated by the
running thread.

Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-switch, Devcom, sync devcom events and devcom comp register

devcom events are sent to all registered component. Following the
cited patch, it is possible for two components, e.g.: two eswitches,
to send devcom events, while both components are registered. This
means eswitch layer will do double un/pairing, which is double
allocation and free of resources, even though only one un/pairing is
needed. flow example:

cpu0 cpu1
---- ----

mlx5_devlink_eswitch_mode_set(dev0)
  esw_offloads_devcom_init()
   mlx5_devcom_register_component(esw0)
                                         mlx5_devlink_eswitch_mode_set(dev1)
                                          esw_offloads_devcom_init()
                                           mlx5_devcom_register_component(esw1)
                                           mlx5_devcom_send_event()
   mlx5_devcom_send_event()

Hence, check whether the eswitches are already un/paired before
free/allocation of resources.

Fixes: 09b278462f16 ("net: devlink: enable parallel ops on netlink interface")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Fix using eswitch mapping in nic mode

Cited patch is using the eswitch object mapping pool while
in nic mode where it isn't initialized. This results in the
trace below [0].

Fix that by using either nic or eswitch object mapping pool
depending if eswitch is enabled or not.

[0]:
[  826.446057] ==================================================================
[  826.446729] BUG: KASAN: slab-use-after-free in mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.447515] Read of size 8 at addr ffff888194485830 by task tc/6233

[  826.448243] CPU: 16 PID: 6233 Comm: tc Tainted: G        W          6.3.0-rc6+ #1
[  826.448890] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  826.449785] Call Trace:
[  826.450052]  <TASK>
[  826.450302]  dump_stack_lvl+0x33/0x50
[  826.450650]  print_report+0xc2/0x610
[  826.450998]  ? __virt_addr_valid+0xb1/0x130
[  826.451385]  ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.451935]  kasan_report+0xae/0xe0
[  826.452276]  ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.452829]  mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[  826.453368]  ? __kmalloc_node+0x5a/0x120
[  826.453733]  esw_add_restore_rule+0x20f/0x270 [mlx5_core]
[  826.454288]  ? mlx5_eswitch_add_send_to_vport_meta_rule+0x260/0x260 [mlx5_core]
[  826.455011]  ? mutex_unlock+0x80/0xd0
[  826.455361]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[  826.455862]  ? mapping_add+0x2cb/0x440 [mlx5_core]
[  826.456425]  mlx5e_tc_action_miss_mapping_get+0x139/0x180 [mlx5_core]
[  826.457058]  ? mlx5e_tc_update_skb_nic+0xb0/0xb0 [mlx5_core]
[  826.457636]  ? __kasan_kmalloc+0x77/0x90
[  826.458000]  ? __kmalloc+0x57/0x120
[  826.458336]  mlx5_tc_ct_flow_offload+0x325/0xe40 [mlx5_core]
[  826.458916]  ? ct_kernel_enter.constprop.0+0x48/0xa0
[  826.459360]  ? mlx5_tc_ct_parse_action+0xf0/0xf0 [mlx5_core]
[  826.459933]  ? mlx5e_mod_hdr_attach+0x491/0x520 [mlx5_core]
[  826.460507]  ? mlx5e_mod_hdr_get+0x12/0x20 [mlx5_core]
[  826.461046]  ? mlx5e_tc_attach_mod_hdr+0x154/0x170 [mlx5_core]
[  826.461635]  mlx5e_configure_flower+0x969/0x2110 [mlx5_core]
[  826.462217]  ? _raw_spin_lock_bh+0x85/0xe0
[  826.462597]  ? __mlx5e_add_fdb_flow+0x750/0x750 [mlx5_core]
[  826.463163]  ? kasan_save_stack+0x2e/0x40
[  826.463534]  ? down_read+0x115/0x1b0
[  826.463878]  ? down_write_killable+0x110/0x110
[  826.464288]  ? tc_setup_action.part.0+0x9f/0x3b0
[  826.464701]  ? mlx5e_is_uplink_rep+0x4c/0x90 [mlx5_core]
[  826.465253]  ? mlx5e_tc_reoffload_flows_work+0x130/0x130 [mlx5_core]
[  826.465878]  tc_setup_cb_add+0x112/0x250
[  826.466247]  fl_hw_replace_filter+0x230/0x310 [cls_flower]
[  826.466724]  ? fl_hw_destroy_filter+0x1a0/0x1a0 [cls_flower]
[  826.467212]  fl_change+0x14e1/0x2030 [cls_flower]
[  826.467636]  ? sock_def_readable+0x89/0x120
[  826.468019]  ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
[  826.468509]  ? kasan_unpoison+0x23/0x50
[  826.468873]  ? get_random_u16+0x180/0x180
[  826.469244]  ? __radix_tree_lookup+0x2b/0x130
[  826.469640]  ? fl_get+0x7b/0x140 [cls_flower]
[  826.470042]  ? fl_mask_put+0x200/0x200 [cls_flower]
[  826.470478]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[  826.470973]  ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
[  826.471427]  tc_new_tfilter+0x644/0x1050
[  826.471795]  ? tc_get_tfilter+0x860/0x860
[  826.472170]  ? __thaw_task+0x130/0x130
[  826.472525]  ? arch_stack_walk+0x98/0xf0
[  826.472892]  ? cap_capable+0x9f/0xd0
[  826.473235]  ? security_capable+0x47/0x60
[  826.473608]  rtnetlink_rcv_msg+0x1d5/0x550
[  826.473985]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  826.474383]  ? __stack_depot_save+0x35/0x4c0
[  826.474779]  ? kasan_save_stack+0x2e/0x40
[  826.475149]  ? kasan_save_stack+0x1e/0x40
[  826.475518]  ? __kasan_record_aux_stack+0x9f/0xb0
[  826.475939]  ? task_work_add+0x77/0x1c0
[  826.476305]  netlink_rcv_skb+0xe0/0x210
[  826.476661]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
[  826.477057]  ? netlink_ack+0x7c0/0x7c0
[  826.477412]  ? rhashtable_jhash2+0xef/0x150
[  826.477796]  ? _copy_from_iter+0x105/0x770
[  826.484386]  netlink_unicast+0x346/0x490
[  826.484755]  ? netlink_attachskb+0x400/0x400
[  826.485145]  ? kernel_text_address+0xc2/0xd0
[  826.485535]  netlink_sendmsg+0x3b0/0x6c0
[  826.485902]  ? kernel_text_address+0xc2/0xd0
[  826.486296]  ? netlink_unicast+0x490/0x490
[  826.486671]  ? iovec_from_user.part.0+0x7a/0x1a0
[  826.487083]  ? netlink_unicast+0x490/0x490
[  826.487461]  sock_sendmsg+0x73/0xc0
[  826.487803]  ____sys_sendmsg+0x364/0x380
[  826.488186]  ? import_iovec+0x7/0x10
[  826.488531]  ? kernel_sendmsg+0x30/0x30
[  826.488893]  ? __copy_msghdr+0x180/0x180
[  826.489258]  ? kasan_save_stack+0x2e/0x40
[  826.489629]  ? kasan_save_stack+0x1e/0x40
[  826.490002]  ? __kasan_record_aux_stack+0x9f/0xb0
[  826.490424]  ? __call_rcu_common.constprop.0+0x46/0x580
[  826.490876]  ___sys_sendmsg+0xdf/0x140
[  826.491231]  ? copy_msghdr_from_user+0x110/0x110
[  826.491649]  ? fget_raw+0x120/0x120
[  826.491988]  ? ___sys_recvmsg+0xd9/0x130
[  826.492355]  ? folio_batch_add_and_move+0x80/0xa0
[  826.492776]  ? _raw_spin_lock+0x7a/0xd0
[  826.493137]  ? _raw_spin_lock+0x7a/0xd0
[  826.493500]  ? _raw_read_lock_irq+0x30/0x30
[  826.493880]  ? kasan_set_track+0x21/0x30
[  826.494249]  ? kasan_save_free_info+0x2a/0x40
[  826.494650]  ? do_sys_openat2+0xff/0x270
[  826.495016]  ? __fget_light+0x1b5/0x200
[  826.495377]  ? __virt_addr_valid+0xb1/0x130
[  826.495763]  __sys_sendmsg+0xb2/0x130
[  826.496118]  ? __sys_sendmsg_sock+0x20/0x20
[  826.496501]  ? __x64_sys_rseq+0x2e0/0x2e0
[  826.496874]  ? do_user_addr_fault+0x276/0x820
[  826.497273]  ? fpregs_assert_state_consistent+0x52/0x60
[  826.497727]  ? exit_to_user_mode_prepare+0x30/0x120
[  826.498158]  do_syscall_64+0x3d/0x90
[  826.498502]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  826.498949] RIP: 0033:0x7f9b67f4f887
[  826.499294] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  826.500742] RSP: 002b:00007fff5d1a5498 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  826.501395] RAX: ffffffffffffffda RBX: 0000000064413ce6 RCX: 00007f9b67f4f887
[  826.501975] RDX: 0000000000000000 RSI: 00007fff5d1a5500 RDI: 0000000000000003
[  826.502556] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[  826.503135] R10: 00007f9b67e08708 R11: 0000000000000246 R12: 0000000000000001
[  826.503714] R13: 0000000000000001 R14: 00007fff5d1a9800 R15: 0000000000485400
[  826.504304]  </TASK>

[  826.504753] Allocated by task 3764:
[  826.505090]  kasan_save_stack+0x1e/0x40
[  826.505453]  kasan_set_track+0x21/0x30
[  826.505810]  __kasan_kmalloc+0x77/0x90
[  826.506164]  __mlx5_create_flow_table+0x16d/0xbb0 [mlx5_core]
[  826.506742]  esw_offloads_enable+0x60d/0xfb0 [mlx5_core]
[  826.507292]  mlx5_eswitch_enable_locked+0x4d3/0x680 [mlx5_core]
[  826.507885]  mlx5_devlink_eswitch_mode_set+0x2a3/0x580 [mlx5_core]
[  826.508513]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[  826.508969]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[  826.509427]  genl_rcv_msg+0x28d/0x3e0
[  826.509772]  netlink_rcv_skb+0xe0/0x210
[  826.510133]  genl_rcv+0x24/0x40
[  826.510448]  netlink_unicast+0x346/0x490
[  826.510810]  netlink_sendmsg+0x3b0/0x6c0
[  826.511179]  sock_sendmsg+0x73/0xc0
[  826.511519]  __sys_sendto+0x18d/0x220
[  826.511867]  __x64_sys_sendto+0x72/0x80
[  826.512232]  do_syscall_64+0x3d/0x90
[  826.512576]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[  826.513220] Freed by task 5674:
[  826.513535]  kasan_save_stack+0x1e/0x40
[  826.513893]  kasan_set_track+0x21/0x30
[  826.514245]  kasan_save_free_info+0x2a/0x40
[  826.514629]  ____kasan_slab_free+0x11a/0x1b0
[  826.515021]  __kmem_cache_free+0x14d/0x280
[  826.515399]  tree_put_node+0x109/0x1c0 [mlx5_core]
[  826.515907]  mlx5_destroy_flow_table+0x119/0x630 [mlx5_core]
[  826.516481]  esw_offloads_steering_cleanup+0xe7/0x150 [mlx5_core]
[  826.517084]  esw_offloads_disable+0xe0/0x160 [mlx5_core]
[  826.517632]  mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
[  826.518225]  mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
[  826.518834]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[  826.519286]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[  826.519748]  genl_rcv_msg+0x28d/0x3e0
[  826.520101]  netlink_rcv_skb+0xe0/0x210
[  826.520458]  genl_rcv+0x24/0x40
[  826.520771]  netlink_unicast+0x346/0x490
[  826.521137]  netlink_sendmsg+0x3b0/0x6c0
[  826.521505]  sock_sendmsg+0x73/0xc0
[  826.521842]  __sys_sendto+0x18d/0x220
[  826.522191]  __x64_sys_sendto+0x72/0x80
[  826.522554]  do_syscall_64+0x3d/0x90
[  826.522894]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[  826.523540] Last potentially related work creation:
[  826.523969]  kasan_save_stack+0x1e/0x40
[  826.524331]  __kasan_record_aux_stack+0x9f/0xb0
[  826.524739]  insert_work+0x30/0x130
[  826.525078]  __queue_work+0x34b/0x690
[  826.525426]  queue_work_on+0x48/0x50
[  826.525766]  __rhashtable_remove_fast_one+0x4af/0x4d0 [mlx5_core]
[  826.526365]  del_sw_flow_group+0x1b5/0x270 [mlx5_core]
[  826.526898]  tree_put_node+0x109/0x1c0 [mlx5_core]
[  826.527407]  esw_offloads_steering_cleanup+0xd3/0x150 [mlx5_core]
[  826.528009]  esw_offloads_disable+0xe0/0x160 [mlx5_core]
[  826.528616]  mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
[  826.529218]  mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
[  826.529823]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[  826.530276]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[  826.530733]  genl_rcv_msg+0x28d/0x3e0
[  826.531079]  netlink_rcv_skb+0xe0/0x210
[  826.531439]  genl_rcv+0x24/0x40
[  826.531755]  netlink_unicast+0x346/0x490
[  826.532123]  netlink_sendmsg+0x3b0/0x6c0
[  826.532487]  sock_sendmsg+0x73/0xc0
[  826.532825]  __sys_sendto+0x18d/0x220
[  826.533175]  __x64_sys_sendto+0x72/0x80
[  826.533533]  do_syscall_64+0x3d/0x90
[  826.533877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[  826.534521] The buggy address belongs to the object at ffff888194485800
                which belongs to the cache kmalloc-512 of size 512
[  826.535506] The buggy address is located 48 bytes inside of
                freed 512-byte region [ffff888194485800, ffff888194485a00)

[  826.536666] The buggy address belongs to the physical page:
[  826.537138] page:00000000d75841dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x194480
[  826.537915] head:00000000d75841dd order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  826.538595] flags: 0x200000000010200(slab|head|node=0|zone=2)
[  826.539089] raw: 0200000000010200 ffff888100042c80 ffffea0004523800 dead000000000002
[  826.539755] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[  826.540417] page dumped because: kasan: bad access detected

[  826.541095] Memory state around the buggy address:
[  826.541519]  ffff888194485700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  826.542149]  ffff888194485780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  826.542773] >ffff888194485800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  826.543400]                                      ^
[  826.543822]  ffff888194485880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  826.544452]  ffff888194485900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  826.545079] ==================================================================

Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix SQ wake logic in ptp napi_poll context

Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken
up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb
fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up.

Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix deadlock in tc route query code

Cited commit causes ABBA deadlock[0] when peer flows are created while
holding the devcom rw semaphore. Due to peer flows offload implementation
the lock is taken much higher up the call chain and there is no obvious way
to easily fix the deadlock. Instead, since tc route query code needs the
peer eswitch structure only to perform a lookup in xarray and doesn't
perform any sleeping operations with it, refactor the code for lockless
execution in following ways:

- RCUify the devcom 'data' pointer. When resetting the pointer
synchronously wait for RCU grace period before returning. This is fine
since devcom is currently only used for synchronization of
pairing/unpairing of eswitches which is rare and already expensive as-is.

- Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has
already been used in some unlocked contexts without proper
annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't
an issue since all relevant code paths checked it again after obtaining the
devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as
"best effort" check to return NULL when devcom is being unpaired. Note that
while RCU read lock doesn't prevent the unpaired flag from being changed
concurrently it still guarantees that reader can continue to use 'data'.

- Refactor mlx5e_tc_query_route_vport() function to use new
mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock.

[0]:

[  164.599612] ======================================================
[  164.600142] WARNING: possible circular locking dependency detected
[  164.600667] 6.3.0-rc3+ #1 Not tainted
[  164.601021] ------------------------------------------------------
[  164.601557] handler1/3456 is trying to acquire lock:
[  164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.603078]
               but task is already holding lock:
[  164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[  164.604459]
               which lock already depends on the new lock.

[  164.605190]
               the existing dependency chain (in reverse order) is:
[  164.605848]
               -> #1 (&comp->sem){++++}-{3:3}:
[  164.606380]        down_read+0x39/0x50
[  164.606772]        mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[  164.607336]        mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core]
[  164.607914]        mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core]
[  164.608495]        mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core]
[  164.609063]        mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core]
[  164.609627]        __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[  164.610175]        mlx5e_configure_flower+0x952/0x1a20 [mlx5_core]
[  164.610741]        tc_setup_cb_add+0xd4/0x200
[  164.611146]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  164.611661]        fl_change+0xc95/0x18a0 [cls_flower]
[  164.612116]        tc_new_tfilter+0x3fc/0xd20
[  164.612516]        rtnetlink_rcv_msg+0x418/0x5b0
[  164.612936]        netlink_rcv_skb+0x54/0x100
[  164.613339]        netlink_unicast+0x190/0x250
[  164.613746]        netlink_sendmsg+0x245/0x4a0
[  164.614150]        sock_sendmsg+0x38/0x60
[  164.614522]        ____sys_sendmsg+0x1d0/0x1e0
[  164.614934]        ___sys_sendmsg+0x80/0xc0
[  164.615320]        __sys_sendmsg+0x51/0x90
[  164.615701]        do_syscall_64+0x3d/0x90
[  164.616083]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  164.616568]
               -> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}:
[  164.617210]        __lock_acquire+0x159e/0x26e0
[  164.617638]        lock_acquire+0xc2/0x2a0
[  164.618018]        __mutex_lock+0x92/0xcd0
[  164.618401]        mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.618943]        post_process_attr+0x153/0x2d0 [mlx5_core]
[  164.619471]        mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[  164.620021]        __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[  164.620564]        mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[  164.621125]        tc_setup_cb_add+0xd4/0x200
[  164.621531]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  164.622047]        fl_change+0xc95/0x18a0 [cls_flower]
[  164.622500]        tc_new_tfilter+0x3fc/0xd20
[  164.622906]        rtnetlink_rcv_msg+0x418/0x5b0
[  164.623324]        netlink_rcv_skb+0x54/0x100
[  164.623727]        netlink_unicast+0x190/0x250
[  164.624138]        netlink_sendmsg+0x245/0x4a0
[  164.624544]        sock_sendmsg+0x38/0x60
[  164.624919]        ____sys_sendmsg+0x1d0/0x1e0
[  164.625340]        ___sys_sendmsg+0x80/0xc0
[  164.625731]        __sys_sendmsg+0x51/0x90
[  164.626117]        do_syscall_64+0x3d/0x90
[  164.626502]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  164.626995]
               other info that might help us debug this:

[  164.627725]  Possible unsafe locking scenario:

[  164.628268]        CPU0                    CPU1
[  164.628683]        ----                    ----
[  164.629098]   lock(&comp->sem);
[  164.629421]                                lock(&esw->offloads.encap_tbl_lock);
[  164.630066]                                lock(&comp->sem);
[  164.630555]   lock(&esw->offloads.encap_tbl_lock);
[  164.630993]
                *** DEADLOCK ***

[  164.631575] 3 locks held by handler1/3456:
[  164.631962]  #0: ffff888124b75130 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
[  164.632703]  #1: ffff888116e512b8 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]
[  164.633552]  #2: ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[  164.634435]
               stack backtrace:
[  164.634883] CPU: 17 PID: 3456 Comm: handler1 Not tainted 6.3.0-rc3+ #1
[  164.635431] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  164.636340] Call Trace:
[  164.636616]  <TASK>
[  164.636863]  dump_stack_lvl+0x47/0x70
[  164.637217]  check_noncircular+0xfe/0x110
[  164.637601]  __lock_acquire+0x159e/0x26e0
[  164.637977]  ? mlx5_cmd_set_fte+0x5b0/0x830 [mlx5_core]
[  164.638472]  lock_acquire+0xc2/0x2a0
[  164.638828]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.639339]  ? lock_is_held_type+0x98/0x110
[  164.639728]  __mutex_lock+0x92/0xcd0
[  164.640074]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.640576]  ? __lock_acquire+0x382/0x26e0
[  164.640958]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.641468]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.641965]  mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[  164.642454]  ? lock_release+0xbf/0x240
[  164.642819]  post_process_attr+0x153/0x2d0 [mlx5_core]
[  164.643318]  mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[  164.643835]  __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[  164.644340]  mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[  164.644862]  ? lock_acquire+0xc2/0x2a0
[  164.645219]  tc_setup_cb_add+0xd4/0x200
[  164.645588]  fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  164.646067]  fl_change+0xc95/0x18a0 [cls_flower]
[  164.646488]  tc_new_tfilter+0x3fc/0xd20
[  164.646861]  ? tc_del_tfilter+0x810/0x810
[  164.647236]  rtnetlink_rcv_msg+0x418/0x5b0
[  164.647621]  ? rtnl_setlink+0x160/0x160
[  164.647982]  netlink_rcv_skb+0x54/0x100
[  164.648348]  netlink_unicast+0x190/0x250
[  164.648722]  netlink_sendmsg+0x245/0x4a0
[  164.649090]  sock_sendmsg+0x38/0x60
[  164.649434]  ____sys_sendmsg+0x1d0/0x1e0
[  164.649804]  ? copy_msghdr_from_user+0x6d/0xa0
[  164.650213]  ___sys_sendmsg+0x80/0xc0
[  164.650563]  ? lock_acquire+0xc2/0x2a0
[  164.650926]  ? lock_acquire+0xc2/0x2a0
[  164.651286]  ? __fget_files+0x5/0x190
[  164.651644]  ? find_held_lock+0x2b/0x80
[  164.652006]  ? __fget_files+0xb9/0x190
[  164.652365]  ? lock_release+0xbf/0x240
[  164.652723]  ? __fget_files+0xd3/0x190
[  164.653079]  __sys_sendmsg+0x51/0x90
[  164.653435]  do_syscall_64+0x3d/0x90
[  164.653784]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  164.654229] RIP: 0033:0x7f378054f8bd
[  164.654577] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a c3 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 be c3 f4 ff 48
[  164.656041] RSP: 002b:00007f377fa114b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[  164.656701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f378054f8bd
[  164.657297] RDX: 0000000000000000 RSI: 00007f377fa11540 RDI: 0000000000000014
[  164.657885] RBP: 00007f377fa12278 R08: 0000000000000000 R09: 000000000000015c
[  164.658472] R10: 00007f377fa123d0 R11: 0000000000000293 R12: 0000560962d99bd0
[  164.665317] R13: 0000000000000000 R14: 0000560962d99bd0 R15: 00007f377fa11540

Fixes: f9d196bd632b ("net/mlx5e: Use correct eswitch for stack devices with lag")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Fix error message when failing to allocate device memory

Fix spacing for the error and also the correct error code pointer.

Fixes: c9b9dcb430b3 ("net/mlx5: Move device memory management to mlx5_core")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Use correct encap attribute during invalidation

With introduction of post action infrastructure most of the users of encap
attribute had been modified in order to obtain the correct attribute by
calling mlx5e_tc_get_encap_attr() helper instead of assuming encap action
is always on default attribute. However, the cited commit didn't modify
mlx5e_invalidate_encap() which prevents it from destroying correct modify
header action which leads to a warning [0]. Fix the issue by using correct
attribute.

[0]:

Feb 21 09:47:35 c-237-177-40-045 kernel: WARNING: CPU: 17 PID: 654 at drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:684 mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: RIP: 0010:mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:47:35 c-237-177-40-045 kernel:  <TASK>
Feb 21 09:47:35 c-237-177-40-045 kernel:  mlx5e_tc_fib_event_work+0x8e3/0x1f60 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? mlx5e_take_all_encap_flows+0xe0/0xe0 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lock_downgrade+0x6d0/0x6d0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel:  process_one_work+0x7c2/0x1310
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? pwq_dec_nr_in_flight+0x230/0x230
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? rwlock_bug.part.0+0x90/0x90
Feb 21 09:47:35 c-237-177-40-045 kernel:  worker_thread+0x59d/0xec0
Feb 21 09:47:35 c-237-177-40-045 kernel:  ? __kthread_parkme+0xd9/0x1d0

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE

SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB
(loopback), and FL (force-loopback) QP is preferred for performance. FL is
available when RoCE is enabled or disabled based on RoCE caps.
This patch adds reading of FL capability from HCA caps in addition to the
existing reading from RoCE caps, thus fixing the case where we didn't
have loopback enabled when RoCE was disabled.

Fixes: 7304d603a57a ("net/mlx5: DR, Add support for force-loopback QP")
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Fix crc32 calculation to work on big-endian (BE) CPUs

When calculating crc for hash index we use the function crc32 that
calculates for little-endian (LE) arch.
Then we convert it to network endianness using htonl(), but it's wrong
to do the conversion in BE archs since the crc32 value is already LE.

The solution is to switch the bytes from the crc result for all types
of arc.

Fixes: 40416d8ede65 ("net/mlx5: DR, Replace CRC32 implementation to use kernel lib")
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Handle pairing of E-switch via uplink un/load APIs

In case user switch a device from switchdev mode to legacy mode, mlx5
first unpair the E-switch and afterwards unload the uplink vport.
From the other hand, in case user remove or reload a device, mlx5
first unload the uplink vport and afterwards unpair the E-switch.

The latter is causing a bug[1], hence, handle pairing of E-switch as
part of uplink un/load APIs.

[1]
In case VF_LAG is used, every tc fdb flow is duplicated to the peer
esw. However, the original esw keeps a pointer to this duplicated
flow, not the peer esw.
e.g.: if user create tc fdb flow over esw0, the flow is duplicated
over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
flow.
During module unload while a peer tc fdb flow is still offloaded, in
case the first device to be removed is the peer device (esw1 in the
example above), the peer net-dev is destroyed, and so the mlx5e_priv
is memset to 0.
Afterwards, the peer device is trying to unpair himself from the
original device (esw0 in the example above). Unpair API invoke the
original device to clear peer flow from its eswitch (esw0), but the
peer flow, which is stored over the original eswitch (esw0), is
trying to use the peer mlx5e_priv, which is memset to 0 and result in
bellow kernel-oops.

[  157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
[  157.964662 ] #PF: supervisor read access in kernel mode
[  157.965123 ] #PF: error_code(0x0000) - not-present page
[  157.965582 ] PGD 0 P4D 0
[  157.965866 ] Oops: 0000 [#1] SMP
[  157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
[  157.976164 ] Call Trace:
[  157.976437 ]  <TASK>
[  157.976690 ]  __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
[  157.977230 ]  mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
[  157.977767 ]  mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
[  157.984653 ]  mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
[  157.985212 ]  mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
[  157.985714 ]  esw_offloads_disable+0x5a/0x110 [mlx5_core]
[  157.986209 ]  mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
[  157.986757 ]  mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
[  157.987248 ]  mlx5_unload+0x2a/0xb0 [mlx5_core]
[  157.987678 ]  mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
[  157.988127 ]  remove_one+0x64/0xe0 [mlx5_core]
[  157.988549 ]  pci_device_remove+0x31/0xa0
[  157.988933 ]  device_release_driver_internal+0x18f/0x1f0
[  157.989402 ]  driver_detach+0x3f/0x80
[  157.989754 ]  bus_remove_driver+0x70/0xf0
[  157.990129 ]  pci_unregister_driver+0x34/0x90
[  157.990537 ]  mlx5_cleanup+0xc/0x1c [mlx5_core]
[  157.990972 ]  __x64_sys_delete_module+0x15a/0x250
[  157.991398 ]  ? exit_to_user_mode_prepare+0xea/0x110
[  157.991840 ]  do_syscall_64+0x3d/0x90
[  157.992198 ]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Collect command failures data only for known commands

DEVX can issue a general command, which is not used by mlx5 driver.
In case such command is failed, mlx5 is trying to collect the failure
data, However, mlx5 doesn't create a storage for this command, since
mlx5 doesn't use it. This lead to array-index-out-of-bounds error.

Fix it by checking whether the command is known before collecting the
failure data.

Fixes: 34f46ae0d4b3 ("net/mlx5: Add command failures data to debugfs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/handshake: Fix sock->file allocation

sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
^^^^ ^^^^

sock_alloc_file() calls release_sock() on error but the left hand
side of the assignment dereferences "sock". This isn't the bug and
I didn't report this earlier because there is an assert that it
doesn't fail.

net/handshake/handshake-test.c:221 handshake_req_submit_test4() error: dereferencing freed memory 'sock'
net/handshake/handshake-test.c:233 handshake_req_submit_test4() warn: 'req' was already freed.
net/handshake/handshake-test.c:254 handshake_req_submit_test5() error: dereferencing freed memory 'sock'
net/handshake/handshake-test.c:290 handshake_req_submit_test6() error: dereferencing freed memory 'sock'
net/handshake/handshake-test.c:321 handshake_req_cancel_test1() error: dereferencing freed memory 'sock'
net/handshake/handshake-test.c:355 handshake_req_cancel_test2() error: dereferencing freed memory 'sock'
net/handshake/handshake-test.c:367 handshake_req_cancel_test2() warn: 'req' was already freed.
net/handshake/handshake-test.c:395 handshake_req_cancel_test3() error: dereferencing freed memory 'sock'
net/handshake/handshake-test.c:407 handshake_req_cancel_test3() warn: 'req' was already freed.
net/handshake/handshake-test.c:451 handshake_req_destroy_test1() error: dereferencing freed memory 'sock'
net/handshake/handshake-test.c:463 handshake_req_destroy_test1() warn: 'req' was already freed.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 88232ec1ec5e ("net/handshake: Add Kunit tests for the handshake consumer API")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/168451609436.45209.15407022385441542980.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/handshake: Squelch allocation warning during Kunit test

The "handshake_req_alloc excessive privsize" kunit test is intended
to check what happens when the maximum privsize is exceeded. The
WARN_ON_ONCE_GFP at mm/page_alloc.c:4744 can be disabled safely for
this test.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Fixes: 88232ec1ec5e ("net/handshake: Add Kunit tests for the handshake consumer API")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/168451636052.47152.9600443326570457947.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3c589_cs: Fix an error handling path in tc589_probe()

Should tc589_config() fail, some resources need to be released as already
done in the remove function.

Fixes: 15b99ac17295 ("[PATCH] pcmcia: add return value to _config() functions")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/d8593ae867b24c79063646e36f9b18b0790107cb.1684575975.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

forcedeth: Fix an error handling path in nv_probe()

If an error occures after calling nv_mgmt_acquire_sema(), it should be
undone with a corresponding nv_mgmt_release_sema() call.

Add it in the error handling path of the probe as already done in the
remove function.

Fixes: cac1c52c3621 ("forcedeth: mgmt unit interface")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Link: https://lore.kernel.org/r/355e9a7d351b32ad897251b6f81b5886fcdc6766.1684571393.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: fix an issue that plpmtu can never go to complete state

When doing plpmtu probe, the probe size is growing every time when it
receives the ACK during the Search state until the probe fails. When
the failure occurs, pl.probe_high is set and it goes to the Complete
state.

However, if the link pmtu is huge, like 65535 in loopback_dev, the probe
eventually keeps using SCTP_MAX_PLPMTU as the probe size and never fails.
Because of that, pl.probe_high can not be set, and the plpmtu probe can
never go to the Complete state.

Fix it by setting pl.probe_high to SCTP_MAX_PLPMTU when the probe size
grows to SCTP_MAX_PLPMTU in sctp_transport_pl_recv(). Also, not allow
the probe size greater than SCTP_MAX_PLPMTU in the Complete state.

Fixes: b87641aff9e7 ("sctp: do state transition when a probe succeeds on HB ACK recv path")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'for-net-2023-05-19' of git://git./linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- Fix compiler warnings on btnxpuart
- Fix potential double free on hci_conn_unlink
- Fix UAF on hci_conn_hash_flush

* tag 'for-net-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btnxpuart: Fix compiler warnings
  Bluetooth: Unlink CISes when LE disconnects in hci_conn_del
  Bluetooth: Fix UAF in hci_conn_hash_flush again
  Bluetooth: Refcnt drop must be placed last in hci_conn_unlink
  Bluetooth: Fix potential double free caused by hci_conn_unlink
====================

Link: https://lore.kernel.org/r/20230519233056.2024340-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fix stack overflow when LRO is disabled for virtual interfaces

When the virtual interface's feature is updated, it synchronizes the
updated feature for its own lower interface.
This propagation logic should be worked as the iteration, not recursively.
But it works recursively due to the netdev notification unexpectedly.
This problem occurs when it disables LRO only for the team and bonding
interface type.

       team0
         |
  +------+------+-----+-----+
  |      |      |     |     |
team1  team2  team3  ...  team200

If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
event to its own lower interfaces(team1 ~ team200).
It is worked by netdev_sync_lower_features().
So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
work iteratively.
But generated NETDEV_FEAT_CHANGE event is also sent to the upper
interface too.
upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
lower interfaces again.
lower and upper interfaces receive this event and generate this
event again and again.
So, the stack overflow occurs.

But it is not the infinite loop issue.
Because the netdev_sync_lower_features() updates features before
generating the NETDEV_FEAT_CHANGE event.
Already synchronized lower interfaces skip notification logic.
So, it is just the problem that iteration logic is changed to the
recursive unexpectedly due to the notification mechanism.

Reproducer:

ip link add team0 type team
ethtool -K team0 lro on
for i in {1..200}
do
        ip link add team$i master team0 type team
        ethtool -K team$i lro on
done

ethtool -K team0 lro off

In order to fix it, the notifier_ctx member of bonding/team is introduced.

Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Bluetooth: btnxpuart: Fix compiler warnings

This fixes the follwing compiler warning reported by kernel test robot:

drivers/bluetooth/btnxpuart.c:1332:34: warning: unused variable
'nxpuart_of_match_table' [-Wunused-const-variable]

Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305161345.eClvTYQ9-lkp@intel.com/
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: Unlink CISes when LE disconnects in hci_conn_del

Currently, hci_conn_del calls hci_conn_unlink for BR/EDR, (e)SCO, and
CIS connections, i.e., everything except LE connections. However, if
(e)SCO connections are unlinked when BR/EDR disconnects, CIS connections
should also be unlinked when LE disconnects.

In terms of disconnection behavior, CIS and (e)SCO connections are not
too different. One peculiarity of CIS is that when CIS connections are
disconnected, the CIS handle isn't deleted, as per [BLUETOOTH CORE
SPECIFICATION Version 5.4 | Vol 4, Part E] 7.1.6 Disconnect command:

        All SCO, eSCO, and CIS connections on a physical link should be
        disconnected before the ACL connection on the same physical
        connection is disconnected. If it does not, they will be
        implicitly disconnected as part of the ACL disconnection.
        ...
        Note: As specified in Section 7.7.5, on the Central, the handle
        for a CIS remains valid even after disconnection and, therefore,
        the Host can recreate a disconnected CIS at a later point in
        time using the same connection handle.

Since hci_conn_link invokes both hci_conn_get and hci_conn_hold,
hci_conn_unlink should perform both hci_conn_put and hci_conn_drop as
well. However, currently it performs only hci_conn_put.

This patch makes hci_conn_unlink call hci_conn_drop as well, which
simplifies the logic in hci_conn_del a bit and may benefit future users
of hci_conn_unlink. But it is noted that this change additionally
implies that hci_conn_unlink can queue disc_work on conn itself, with
the following call stack:

        hci_conn_unlink(conn)  [conn->parent == NULL]
                -> hci_conn_unlink(child)  [child->parent == conn]
                        -> hci_conn_drop(child->parent)
                                -> queue_delayed_work(&conn->disc_work)

Queued disc_work after hci_conn_del can be spurious, so during the
process of hci_conn_del, it is necessary to make the call to
cancel_delayed_work(&conn->disc_work) after invoking hci_conn_unlink.

Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: Fix UAF in hci_conn_hash_flush again

Commit 06149746e720 ("Bluetooth: hci_conn: Add support for linking
multiple hcon") reintroduced a previously fixed bug [1] ("KASAN:
slab-use-after-free Read in hci_conn_hash_flush"). This bug was
originally fixed by commit 5dc7d23e167e ("Bluetooth: hci_conn: Fix
possible UAF").

The hci_conn_unlink function was added to avoid invalidating the link
traversal caused by successive hci_conn_del operations releasing extra
connections. However, currently hci_conn_unlink itself also releases
extra connections, resulted in the reintroduced bug.

This patch follows a more robust solution for cleaning up all
connections, by repeatedly removing the first connection until there are
none left. This approach does not rely on the inner workings of
hci_conn_del and ensures proper cleanup of all connections.

Meanwhile, we need to make sure that hci_conn_del never fails. Indeed it
doesn't, as it now always returns zero. To make this a bit clearer, this
patch also changes its return type to void.

Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-bluetooth/000000000000aa920505f60d25ad@google.com/
Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: Refcnt drop must be placed last in hci_conn_unlink

If hci_conn_put(conn->parent) reduces conn->parent's reference count to
zero, it can immediately deallocate conn->parent. At the same time,
conn->link->list has its head in conn->parent, causing use-after-free
problems in the latter list_del_rcu(&conn->link->list).

This problem can be easily solved by reordering the two operations,
i.e., first performing the list removal with list_del_rcu and then
decreasing the refcnt with hci_conn_put.

Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Closes: https://lore.kernel.org/linux-bluetooth/CABBYNZ+1kce8_RJrLNOXd_8=Mdpb=2bx4Nto-hFORk=qiOkoCg@mail.gmail.com/
Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: Fix potential double free caused by hci_conn_unlink

The hci_conn_unlink function is being called by hci_conn_del, which
means it should not call hci_conn_del with the input parameter conn
again. If it does, conn may have already been released when
hci_conn_unlink returns, leading to potential UAF and double-free
issues.

This patch resolves the problem by modifying hci_conn_unlink to release
only conn's child links when necessary, but never release conn itself.

Reported-by: syzbot+690b90b14f14f43f4688@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-bluetooth/000000000000484a8205faafe216@google.com/
Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reported-by: syzbot+690b90b14f14f43f4688@syzkaller.appspotmail.com
Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com

net: fec: add dma_wmb to ensure correct descriptor values

Two dma_wmb() are added in the XDP TX path to ensure proper ordering of
descriptor and buffer updates:
1. A dma_wmb() is added after updating the last BD to make sure
   the updates to rest of the descriptor are visible before
   transferring ownership to FEC.
2. A dma_wmb() is also added after updating the bdp to ensure these
   updates are visible before updating txq->bd.cur.
3. Start the xmit of the frame immediately right after configuring the
   tx descriptor.

Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: add myself as maintainer for enetc

I would like to be copied on new patches submitted on this driver.
I am relatively familiar with the code, having practically maintained
it for a while.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Fix TSOv6 offload

HW adds segment size to the payload length
in the IPv6 header. Fix payload length to
just TCP header length instead of 'TCP header
size + IPv6 header size'.

Fixes: 86d7476078b8 ("octeontx2-pf: TCP segmentation offload support")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: fix devlink info error handling

Avoid early devlink info return if errors arise with MCDI commands
executed for getting the required info from the device. The rationale
is some commands can fail but later ones could still give useful data.
Moreover, some nvram partitions could not be present which needs to be
handled as a non error.

The specific errors are reported through system messages and if any
error appears, it will be reported generically through extack.

Fixes 14743ddd2495 ("sfc: add devlink info support for ef100")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: Reset connection when trying to use SMCRv2 fails.

We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It
can be reproduced by:

- smc_run nginx
- smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port>

BUG: kernel NULL pointer dereference, address: 0000000000000014
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G        W   E      6.4.0-rc1+ #42
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc]
RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06
RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000
RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00
RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000
R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180
R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <TASK>
  ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib]
  ? smcr_buf_map_link+0x24b/0x290 [smc]
  ? __smc_buf_create+0x4ee/0x9b0 [smc]
  smc_clc_send_accept+0x4c/0xb0 [smc]
  smc_listen_work+0x346/0x650 [smc]
  ? __schedule+0x279/0x820
  process_one_work+0x1e5/0x3f0
  worker_thread+0x4d/0x2f0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xe5/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2c/0x50
  </TASK>

During the CLC handshake, server sequentially tries available SMCRv2
and SMCRv1 devices in smc_listen_work().

If an SMCRv2 device is found. SMCv2 based link group and link will be
assigned to the connection. Then assumed that some buffer assignment
errors happen later in the CLC handshake, such as RMB registration
failure, server will give up SMCRv2 and try SMCRv1 device instead. But
the resources assigned to the connection won't be reset.

When server tries SMCRv1 device, the connection creation process will
be executed again. Since conn->lnk has been assigned when trying SMCRv2,
it will not be set to the correct SMCRv1 link in
smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to
correct SMCRv1 link group but conn->lnk points to the SMCRv2 link
mistakenly.

Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx]
will be accessed. Since the link->link_idx is not correct, the related
MR may not have been initialized, so crash happens.

| Try SMCRv2 device first
|     |-> conn->lgr: assign existed SMCRv2 link group;
|     |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC);
|     |-> sndbuf & RMB creation fails, quit;
|
| Try SMCRv1 device then
|     |-> conn->lgr: create SMCRv1 link group and assign;
|     |-> conn->link: keep SMCRv2 link mistakenly;
|     |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0]
|         initialized.
|
| Then smc_clc_send_confirm_accept() accesses
| conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash.
v

This patch tries to fix this by cleaning conn->lnk before assigning
link. In addition, it is better to reset the connection and clean the
resources assigned if trying SMCRv2 failed in buffer creation or
registration.

Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2")
Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_tests: mute cleanup error message

In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()

  Tests passed: 201
  Tests failed:   0
  Cannot remove namespace file "/run/netns/ns1": No such file or directory

This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.

Redirect the error message to /dev/null to mute it.

V2: Update commit message and fixes tag.
V3: resubmit due to missing netdev ML in V2

Fixes: b60417a9f2b8 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: do as little as possible in napi poll when budget is 0

NAPI gets called with budget of 0 from netpoll, which has interrupts
disabled. We should try to free some space on Tx rings and nothing
else.

Specifically do not try to handle XDP TX or try to refill Rx buffers -
we can't use the page pool from IRQ context. Don't check if IRQs moved,
either, that makes no sense in netpoll. Netpoll calls _all_ the rings
from whatever CPU it happens to be invoked on.

In general do as little as possible, the work quickly adds up when
there's tens of rings to poll.

The immediate stack trace I was seeing is:

    __do_softirq+0xd1/0x2c0
    __local_bh_enable_ip+0xc7/0x120
    </IRQ>
    <TASK>
    page_pool_put_defragged_page+0x267/0x320
    mlx5e_free_xdpsq_desc+0x99/0xd0
    mlx5e_poll_xdpsq_cq+0x138/0x3b0
    mlx5e_napi_poll+0xc3/0x8b0
    netpoll_poll_dev+0xce/0x150

AFAIU page pool takes a BH lock, releases it and since BH is now
enabled tries to run softirqs.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tls-fixes'

Jakub Kicinski says:

====================
tls: rx: strp: fix inline crypto offload

The local strparser version I added to TLS does not preserve
decryption status, which breaks inline crypto (NIC offload).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: strp: don't use GFP_KERNEL in softirq context

When receive buffer is small, or the TCP rx queue looks too
complicated to bother using it directly - we allocate a new
skb and copy data into it.

We already use sk->sk_allocation... but nothing actually
sets it to GFP_ATOMIC on the ->sk_data_ready() path.

Users of HW offload are far more likely to experience problems
due to scheduling while atomic. "Copy mode" is very rarely
triggered with SW crypto.

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: strp: preserve decryption status of skbs when needed

When receive buffer is small we try to copy out the data from
TCP into a skb maintained by TLS to prevent connection from
stalling. Unfortunately if a single record is made up of a mix
of decrypted and non-decrypted skbs combining them into a single
skb leads to loss of decryption status, resulting in decryption
errors or data corruption.

Similarly when trying to use TCP receive queue directly we need
to make sure that all the skbs within the record have the same
status. If we don't the mixed status will be detected correctly
but we'll CoW the anchor, again collapsing it into a single paged
skb without decrypted status preserved. So the "fixup" code will
not know which parts of skb to re-encrypt.

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: strp: factor out copying skb data

We'll need to copy input skbs individually in the next patch.
Factor that code out (without assuming we're copying a full record).

Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: strp: fix determining record length in copy mode

We call tls_rx_msg_size(skb) before doing skb->len += chunk.
So the tls_rx_msg_size() code will see old skb->len, most
likely leading to an over-read.

Worst case we will over read an entire record, next iteration
will try to trim the skb but may end up turning frag len negative
or discarding the subsequent record (since we already told TCP
we've read it during previous read but now we'll trim it out of
the skb).

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: strp: force mixed decrypted records into copy mode

If a record is partially decrypted we'll have to CoW it, anyway,
so go into copy mode and allocate a writable skb right away.

This will make subsequent fix simpler because we won't have to
teach tls_strp_msg_make_copy() how to copy skbs while preserving
decrypt status.

Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: strp: set the skb->len of detached / CoW'ed skbs

alloc_skb_with_frags() fills in page frag sizes but does not
set skb->len and skb->data_len. Set those correctly otherwise
device offload will most likely generate an empty skb and
hit the BUG() at the end of __skb_nsg().

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: rx: device: fix checking decryption status

skb->len covers the entire skb, including the frag_list.
In fact we're guaranteed that rxm->full_len <= skb->len,
so since the change under Fixes we were not checking decrypt
status of any skb but the first.

Note that the skb_pagelen() added here may feel a bit costly,
but it's removed by subsequent fixes, anyway.

Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 86b259f6f888 ("tls: rx: device: bound the frag walk")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize

Currently in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is lower than
the calculated "min" value, but greater than zero, the logic sets
tx_max to dwNtbOutMaxSize. This is then used to allocate a new SKB in
cdc_ncm_fill_tx_frame() where all the data is handled.

For small values of dwNtbOutMaxSize the memory allocated during
alloc_skb(dwNtbOutMaxSize, GFP_ATOMIC) will have the same size, due to
how size is aligned at alloc time:
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
Thus we hit the same bug that we tried to squash with
commit 2be6d4d16a084 ("net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero")

Low values of dwNtbOutMaxSize do not cause an issue presently because at
alloc_skb() time more memory (512b) is allocated than required for the
SKB headers alone (320b), leaving some space (512b - 320b = 192b)
for CDC data (172b).

However, if more elements (for example 3 x u64 = [24b]) were added to
one of the SKB header structs, say 'struct skb_shared_info',
increasing its original size (320b [320b aligned]) to something larger
(344b [384b aligned]), then suddenly the CDC data (172b) no longer
fits in the spare SKB data area (512b - 384b = 128b).

Consequently the SKB bounds checking semantics fails and panics:

skbuff: skb_over_panic: text:ffffffff831f755b len:184 put:172 head:ffff88811f1c6c00 data:ffff88811f1c6c00 tail:0xb8 end:0x80 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:113!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 5.15.106-syzkaller-00249-g19c0ed55a470 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
Workqueue: mld mld_ifc_work
RIP: 0010:skb_panic net/core/skbuff.c:113 [inline]
RIP: 0010:skb_over_panic+0x14c/0x150 net/core/skbuff.c:118
[snip]
Call Trace:
<TASK>
skb_put+0x151/0x210 net/core/skbuff.c:2047
skb_put_zero include/linux/skbuff.h:2422 [inline]
cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1131 [inline]
cdc_ncm_fill_tx_frame+0x11ab/0x3da0 drivers/net/usb/cdc_ncm.c:1308
cdc_ncm_tx_fixup+0xa3/0x100

Deal with too low values of dwNtbOutMaxSize, clamp it in the range
[USB_CDC_NCM_NTB_MIN_OUT_SIZE, CDC_NCM_NTB_MAX_SIZE_TX]. We ensure
enough data space is allocated to handle CDC data by making sure
dwNtbOutMaxSize is not smaller than USB_CDC_NCM_NTB_MIN_OUT_SIZE.

Fixes: 289507d3364f ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
Cc: stable@vger.kernel.org
Reported-by: syzbot+9f575a1f15fc0c01ed69@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=b982f1059506db48409d
Link: https://lore.kernel.org/all/20211202143437.1411410-1-lee.jones@linaro.org/
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230517133808.1873695-2-tudor.ambarus@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.4-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm, bluetooth and netfilter.

  Current release - regressions:

   - ipv6: fix RCU splat in ipv6_route_seq_show()

   - wifi: iwlwifi: disable RFI feature

  Previous releases - regressions:

   - tcp: fix possible sk_priority leak in tcp_v4_send_reset()

   - tipc: do not update mtu if msg_max is too small in mtu negotiation

   - netfilter: fix null deref on element insertion

   - devlink: change per-devlink netdev notifier to static one

   - phylink: fix ksettings_set() ethtool call

   - wifi: mac80211: fortify the spinlock against deadlock by interrupt

   - wifi: brcmfmac: check for probe() id argument being NULL

   - eth: ice:
      - fix undersized tx_flags variable
      - fix ice VF reset during iavf initialization

   - eth: hns3: fix sending pfc frames after reset issue

  Previous releases - always broken:

   - xfrm: release all offloaded policy memory

   - nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()

   - vsock: avoid to close connected socket after the timeout

   - dsa: rzn1-a5psw: enable management frames for CPU port

   - eth: virtio_net: fix error unwinding of XDP initialization

   - eth: tun: fix memory leak for detached NAPI queue.

  Misc:

   - MAINTAINERS: sctp: move Neil to CREDITS"

* tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
  MAINTAINERS: skip CCing netdev for Bluetooth patches
  mdio_bus: unhide mdio_bus_init prototype
  bridge: always declare tunnel functions
  atm: hide unused procfs functions
  net: isa: include net/Space.h
  Revert "ARM: dts: stm32: add CAN support on stm32f746"
  netfilter: nft_set_rbtree: fix null deref on element insertion
  netfilter: nf_tables: fix nft_trans type confusion
  netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
  net: wwan: t7xx: Ensure init is completed before system sleep
  net: selftests: Fix optstring
  net: pcs: xpcs: fix C73 AN not getting enabled
  net: wwan: iosm: fix NULL pointer dereference when removing device
  vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()
  mailmap: add entries for Nikolay Aleksandrov
  igb: fix bit_shift to be in [1..8] range
  net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset
  cassini: Fix a memory leak in the error handling path of cas_init_one()
  tun: Fix memory leak for detached NAPI queue.
  can: kvaser_pciefd: Disable interrupts in probe error path
  ...

Merge tag 'media/v6.4-3' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"Several fixes for the dvb core and drivers:

   - fix UAF and null pointer de-reference in DVB core

   - fix kernel runtime warning for blocking operation in wait_event*()
     in dvb core

   - fix write size bug in DVB conditional access core

   - fix dvb demux continuity counter debug check logic

   - randconfig build fixes in pvrusb2 and mn88443x

   - fix memory leak in ttusb-dec

   - fix netup_unidvb probe-time error check logic

   - improve error handling in dw2102 if it can't retrieve DVB MAC
     address"

* tag 'media/v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: dvb-core: Fix use-after-free due to race condition at dvb_ca_en50221
  media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*()
  media: dvb-core: Fix use-after-free due to race at dvb_register_device()
  media: dvb-core: Fix use-after-free due on race condition at dvb_net
  media: dvb-core: Fix use-after-free on race condition at dvb_frontend
  media: mn88443x: fix !CONFIG_OF error by drop of_match_ptr from ID table
  media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb()
  media: dvb_ca_en50221: fix a size write bug
  media: netup_unidvb: fix irq init by register it at the end of probe
  media: dvb-usb: dw2102: fix uninit-value in su3000_read_mac_address
  media: dvb-usb: digitv: fix null-ptr-deref in digitv_i2c_xfer()
  media: dvb-usb-v2: rtl28xxu: fix null-ptr-deref in rtl28xxu_i2c_xfer
  media: dvb-usb-v2: ce6230: fix null-ptr-deref in ce6230_i2c_master_xfer()
  media: dvb-usb-v2: ec168: fix null-ptr-deref in ec168_i2c_xfer()
  media: dvb-usb: az6027: fix three null-ptr-deref in az6027_i2c_xfer()
  media: netup_unidvb: fix use-after-free at del_timer()
  media: dvb_demux: fix a bug for the continuity counter
  media: pvrusb2: fix DVB_CORE dependency

Merge tag 'linux-can-fixes-for-6.4-20230518' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2023-05-18

this is a pull request of 7 patches for net/master.

The first 6 patches are by Jimmy Assarsson and fix several bugs in the
kvaser_pciefd driver.

The latest patch is from me and reverts a change in stm32f746.dtsi
that causes build errors due to a missing dependent patch.

* tag 'linux-can-fixes-for-6.4-20230518' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  Revert "ARM: dts: stm32: add CAN support on stm32f746"
  can: kvaser_pciefd: Disable interrupts in probe error path
  can: kvaser_pciefd: Do not send EFLUSH command on TFD interrupt
  can: kvaser_pciefd: Empty SRB buffer in probe
  can: kvaser_pciefd: Call request_irq() before enabling interrupts
  can: kvaser_pciefd: Clear listen-only bit if not explicitly requested
  can: kvaser_pciefd: Set CAN_STATE_STOPPED in kvaser_pciefd_stop()
====================

Link: https://lore.kernel.org/r/20230518073241.1110453-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-23-05-17' of https://git./linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
Netfilter fixes for net

1. Silence warning about unused variable when CONFIG_NF_NAT=n, from Tom Rix.
2. nftables: Fix possible out-of-bounds access, from myself.
3. nftables: fix null deref+UAF during element insertion into rbtree,
   also from myself.

* tag 'nf-23-05-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_set_rbtree: fix null deref on element insertion
  netfilter: nf_tables: fix nft_trans type confusion
  netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
====================

Link: https://lore.kernel.org/r/20230517123756.7353-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-2023-05-17' of git://git./linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.4

A lot of fixes this time, for both the stack and the drivers. The
brcmfmac resume fix has been reported by several people so I would say
it's the most important here. The iwlwifi RFI workaround is also
something which was reported as a regression recently.

* tag 'wireless-2023-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (31 commits)
  wifi: b43: fix incorrect __packed annotation
  wifi: rtw88: sdio: Always use two consecutive bytes for word operations
  mac80211_hwsim: fix memory leak in hwsim_new_radio_nl
  wifi: iwlwifi: mvm: Add locking to the rate read flow
  wifi: iwlwifi: Don't use valid_links to iterate sta links
  wifi: iwlwifi: mvm: don't trust firmware n_channels
  wifi: iwlwifi: mvm: fix OEM's name in the tas approved list
  wifi: iwlwifi: fix OEM's name in the ppag approved list
  wifi: iwlwifi: mvm: fix initialization of a return value
  wifi: iwlwifi: mvm: fix access to fw_id_to_mac_id
  wifi: iwlwifi: fw: fix DBGI dump
  wifi: iwlwifi: mvm: fix number of concurrent link checks
  wifi: iwlwifi: mvm: fix cancel_delayed_work_sync() deadlock
  wifi: iwlwifi: mvm: don't double-init spinlock
  wifi: iwlwifi: mvm: always free dup_data
  wifi: mac80211: recalc chanctx mindef before assigning
  wifi: mac80211: consider reserved chanctx for mindef
  wifi: mac80211: simplify chanctx allocation
  wifi: mac80211: Abort running color change when stopping the AP
  wifi: mac80211: fix min center freq offset tracing
  ...
====================

Link: https://lore.kernel.org/r/20230517151914.B0AF6C433EF@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: skip CCing netdev for Bluetooth patches

As requested by Marcel skip netdev for Bluetooth patches.
Bluetooth has its own mailing list and overloading netdev
leads to fewer people reading it.

Link: https://lore.kernel.org/netdev/639C8EA4-1F6E-42BE-8F04-E4A753A6EFFC@holtmann.org/
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230517014253.1233333-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mdio_bus: unhide mdio_bus_init prototype

mdio_bus_init() is either used as a local module_init() entry,
or it gets called in phy_device.c. In the former case, there
is no declaration, which causes a warning:

drivers/net/phy/mdio_bus.c:1371:12: error: no previous prototype for 'mdio_bus_init' [-Werror=missing-prototypes]

Remove the #ifdef around the declaration to avoid the warning..

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230516194625.549249-4-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bridge: always declare tunnel functions

When CONFIG_BRIDGE_VLAN_FILTERING is disabled, two functions are still
defined but have no prototype or caller. This causes a W=1 warning for
the missing prototypes:

net/bridge/br_netlink_tunnel.c:29:6: error: no previous prototype for 'vlan_tunid_inrange' [-Werror=missing-prototypes]
net/bridge/br_netlink_tunnel.c:199:5: error: no previous prototype for 'br_vlan_tunnel_info' [-Werror=missing-prototypes]

The functions are already contitional on CONFIG_BRIDGE_VLAN_FILTERING,
and I coulnd't easily figure out the right set of #ifdefs, so just
move the declarations out of the #ifdef to avoid the warning,
at a small cost in code size over a more elaborate fix.

Fixes: 188c67dd1906 ("net: bridge: vlan options: add support for tunnel id dumping")
Fixes: 569da0822808 ("net: bridge: vlan options: add support for tunnel mapping set/del")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230516194625.549249-3-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: hide unused procfs functions

When CONFIG_PROC_FS is disabled, the function declarations for some
procfs functions are hidden, but the definitions are still build,
as shown by this compiler warning:

net/atm/resources.c:403:7: error: no previous prototype for 'atm_dev_seq_start' [-Werror=missing-prototypes]
net/atm/resources.c:409:6: error: no previous prototype for 'atm_dev_seq_stop' [-Werror=missing-prototypes]
net/atm/resources.c:414:7: error: no previous prototype for 'atm_dev_seq_next' [-Werror=missing-prototypes]

Add another #ifdef to leave these out of the build.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230516194625.549249-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: isa: include net/Space.h

The legacy drivers that still get called from net/Space.c have prototypes
in net/Space, but this header is not included in most of the files that
define those functions:

drivers/net/ethernet/cirrus/cs89x0.c:1649:28: error: no previous prototype for 'cs89x0_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/8390/ne.c:947:28: error: no previous prototype for 'ne_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/8390/smc-ultra.c:167:28: error: no previous prototype for 'ultra_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/amd/lance.c:438:28: error: no previous prototype for 'lance_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/3com/3c515.c:422:20: error: no previous prototype for 'tc515_probe' [-Werror=missing-prototypes]

Add the inclusion to avoids the warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230516194625.549249-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "ARM: dts: stm32: add CAN support on stm32f746"

This reverts commit 0920ccdf41e3078a4dd2567eb905ea154bc826e6.

The commit 0920ccdf41e3 ("ARM: dts: stm32: add CAN support on
stm32f746") depends on the patch "dt-bindings: mfd: stm32f7: add
binding definition for CAN3" [1], which is not in net/main, yet. This
results in a parsing error of "stm32f746.dtsi".

So revert this commit.

[1] https://lore.kernel.org/all/20230423172528.1398158-2-dario.binacchi@amarulasolutions.com

Cc: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Cc: Alexandre TORGUE <alexandre.torgue@foss.st.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305172108.x5acbaQG-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202305172130.eGGEUhpi-lkp@intel.com
Fixes: 0920ccdf41e3 ("ARM: dts: stm32: add CAN support on stm32f746")
Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/20230517181950.1106697-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'linux-kselftest-fixes-6.4-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:

- sgx test fix for false negatives

- ftrace output is hard to parses and it masks inappropriate skips etc.
   This fix addresses the problems by integrating with kselftest runner

* tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Improve integration with kselftest runner
  selftests/sgx: Add "test_encl.elf" to TEST_FILES

Merge tag 'nfsd-6.4-1' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- A collection of minor bug fixes

* tag 'nfsd-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Remove open coding of string copy
  SUNRPC: Fix trace_svc_register() call site
  SUNRPC: always free ctxt when freeing deferred request
  SUNRPC: double free xprt_ctxt while still in use
  SUNRPC: Fix error handling in svc_setup_socket()
  SUNRPC: Fix encoding of accepted but unsuccessful RPC replies
  lockd: define nlm_port_min,max with CONFIG_SYSCTL
  nfsd: define exports_proc_ops with CONFIG_PROC_FS
  SUNRPC: Avoid relying on crypto API to derive CBC-CTS output IV

Merge tag 'tpmdd-v6.4-rc2' of git://git./linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
"Three bug fixes for recently discovered issues"

* tag 'tpmdd-v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm/tpm_tis: Disable interrupts for more Lenovo devices
  tpm: Prevent hwrng from activating during resume
  tpm_tis: Use tpm_chip_{start,stop} decoration inside tpm_tis_resume

tracing: make ftrace_likely_update() declaration visible

This function is only used when CONFIG_TRACE_BRANCH_PROFILING is set and
DISABLE_BRANCH_PROFILING is not set, and the declaration is hidden
behind this combination of tests.

But that causes a warning when building with CONFIG_TRACING_BRANCHES,
since that sets DISABLE_BRANCH_PROFILING for the tracing code, and the
declaration is thus hidden:

kernel/trace/trace_branch.c:205:6: error: no previous prototype for 'ftrace_likely_update' [-Werror=missing-prototypes]

Move the declaration out of the #ifdef to avoid the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

netfilter: nft_set_rbtree: fix null deref on element insertion

There is no guarantee that rb_prev() will not return NULL in nft_rbtree_gc_elem():

general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
nft_add_set_elem+0x14b0/0x2990
nf_tables_newsetelem+0x528/0xb30

Furthermore, there is a possible use-after-free while iterating,
'node' can be free'd so we need to cache the next value to use.

Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix nft_trans type confusion

nft_trans_FOO objects all share a common nft_trans base structure, but
trailing fields depend on the real object size. Access is only safe after
trans->msg_type check.

Check for rule type first. Found by code inspection.

Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT

gcc with W=1 and ! CONFIG_NF_NAT
net/netfilter/nf_conntrack_netlink.c:3463:32: error:
  ‘exp_nat_nla_policy’ defined but not used [-Werror=unused-const-variable=]
3463 | static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
      |                                ^~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:2979:33: error:
  ‘any_addr’ defined but not used [-Werror=unused-const-variable=]
2979 | static const union nf_inet_addr any_addr;
      |                                 ^~~~~~~~

These variables use is controlled by CONFIG_NF_NAT, so should their definitions.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

net: wwan: t7xx: Ensure init is completed before system sleep

When the system attempts to sleep while mtk_t7xx is not ready, the driver
cannot put the device to sleep:
[   12.472918] mtk_t7xx 0000:57:00.0: [PM] Exiting suspend, modem in invalid state
[   12.472936] mtk_t7xx 0000:57:00.0: PM: pci_pm_suspend(): t7xx_pci_pm_suspend+0x0/0x20 [mtk_t7xx] returns -14
[   12.473678] mtk_t7xx 0000:57:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1b0 returns -14
[   12.473711] mtk_t7xx 0000:57:00.0: PM: failed to suspend async: error -14
[   12.764776] PM: Some devices failed to suspend, or early wake event detected

Mediatek confirmed the device can take a rather long time to complete
its initialization, so wait for up to 20 seconds until init is done.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: selftests: Fix optstring

The cited commit added a stray colon to the 'v' option. That makes the
option work incorrectly.

ex:
tools/testing/selftests/net# ./fib_nexthops.sh -v
(should enable verbose mode, instead it shows help text due to missing arg)

Fixes: 5feba4727395 ("selftests: fib_nexthops: Make ping timeout configurable")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pcs: xpcs: fix C73 AN not getting enabled

The XPCS expects clause 73 (copper backplane) autoneg to follow the
ethtool autoneg bit. It actually did that until the blamed
commit inaptly replaced state->an_enabled (coming from ethtool) with
phylink_autoneg_inband() (coming from the device tree or struct
phylink_config), as part of an unrelated phylink_pcs API conversion.

Russell King suggests that state->an_enabled from the original code was
just a proxy for the ethtool Autoneg bit, and that the correct way of
restoring the functionality is to check for this bit in the advertising
mask.

Fixes: 11059740e616 ("net: pcs: xpcs: convert to phylink_pcs_ops")
Link: https://lore.kernel.org/netdev/ZGNt2MFeRolKGFck@shell.armlinux.org.uk/
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wwan: iosm: fix NULL pointer dereference when removing device

In suspend and resume cycle, the removal and rescan of device ends
up in NULL pointer dereference.

During driver initialization, if the ipc_imem_wwan_channel_init()
fails to get the valid device capabilities it returns an error and
further no resource (wwan struct) will be allocated. Now in this
situation if driver removal procedure is initiated it would result
in NULL pointer exception since unallocated wwan struct is dereferenced
inside ipc_wwan_deinit().

ipc_imem_run_state_worker() to handle the called functions return value
and to release the resource in failure case. It also reports the link
down event in failure cases. The user space application can handle this
event to do a device reset for restoring the device communication.

Fixes: 3670970dd8c6 ("net: iosm: shared memory IPC interface")
Reported-by: Samuel Wein PhD <sam@samwein.com>
Closes: https://lore.kernel.org/netdev/20230427140819.1310f4bd@kernel.org/T/
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()

syzbot triggered the following splat [1], sending an empty message
through pppoe_sendmsg().

When VLAN_FLAG_REORDER_HDR flag is set, vlan_dev_hard_header()
does not push extra bytes for the VLAN header, because vlan is offloaded.

Unfortunately vlan_dev_hard_start_xmit() first reads veth->h_vlan_proto
before testing (vlan->flags & VLAN_FLAG_REORDER_HDR).

We need to swap the two conditions.

[1]
BUG: KMSAN: uninit-value in vlan_dev_hard_start_xmit+0x171/0x7f0 net/8021q/vlan_dev.c:111
vlan_dev_hard_start_xmit+0x171/0x7f0 net/8021q/vlan_dev.c:111
__netdev_start_xmit include/linux/netdevice.h:4883 [inline]
netdev_start_xmit include/linux/netdevice.h:4897 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x253/0xa20 net/core/dev.c:3596
__dev_queue_xmit+0x3c7f/0x5ac0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3053 [inline]
pppoe_sendmsg+0xa93/0xb80 drivers/net/ppp/pppoe.c:900
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0xa24/0xe40 net/socket.c:2501
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555
__sys_sendmmsg+0x411/0xa50 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Uninit was created at:
slab_post_alloc_hook+0x12d/0xb60 mm/slab.h:774
slab_alloc_node mm/slub.c:3452 [inline]
kmem_cache_alloc_node+0x543/0xab0 mm/slub.c:3497
kmalloc_reserve+0x148/0x470 net/core/skbuff.c:520
__alloc_skb+0x3a7/0x850 net/core/skbuff.c:606
alloc_skb include/linux/skbuff.h:1277 [inline]
sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2583
pppoe_sendmsg+0x3af/0xb80 drivers/net/ppp/pppoe.c:867
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0xa24/0xe40 net/socket.c:2501
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555
__sys_sendmmsg+0x411/0xa50 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

CPU: 0 PID: 29770 Comm: syz-executor.0 Not tainted 6.3.0-rc6-syzkaller-gc478e5b17829 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mailmap: add entries for Nikolay Aleksandrov

Turns out I missed a few patches due to use of old addresses by
senders. Add a mailmap entry with my old addresses.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: fix bit_shift to be in [1..8] range

In igb_hash_mc_addr() the expression:
"mc_addr[4] >> 8 - bit_shift", right shifting "mc_addr[4]"
shift by more than 7 bits always yields zero, so hash becomes not so different.
Add initialization with bit_shift = 1 and add a loop condition to ensure
bit_shift will be always in [1..8] range.

Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony nguyen says:

====================
Intel Wired LAN Driver Updates 2023-05-16

This series contains updates to ice and iavf drivers.

Ahmed adds setting of missed condition for statistics which caused
incorrect reporting of values for ice. For iavf, he removes a call to set
VLAN offloads during re-initialization which can cause incorrect values
to be set.

Dawid adds checks to ensure VF is ready to be reset before executing
commands that will require it to be reset on ice.
---
v2:
Patch 2
- Redo commit message
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset

According to datasheet, the command opcode must be specified
into bits [14:12] of the Extended Port Control register (EPC).

Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
Signed-off-by: Marco Migliore <m.migliore@tiesse.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

cassini: Fix a memory leak in the error handling path of cas_init_one()

cas_saturn_firmware_init() allocates some memory using vmalloc(). This
memory is freed in the .remove() function but not it the error handling
path of the probe.

Add the missing vfree() to avoid a memory leak, should an error occur.

Fixes: fcaa40669cd7 ("cassini: use request_firmware")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: Fix memory leak for detached NAPI queue.

syzkaller reported [0] memory leaks of sk and skb related to the TUN
device with no repro, but we can reproduce it easily with:

  struct ifreq ifr = {}
  int fd_tun, fd_tmp;
  char buf[4] = {};

  fd_tun = openat(AT_FDCWD, "/dev/net/tun", O_WRONLY, 0);
  ifr.ifr_flags = IFF_TUN | IFF_NAPI | IFF_MULTI_QUEUE;
  ioctl(fd_tun, TUNSETIFF, &ifr);

  ifr.ifr_flags = IFF_DETACH_QUEUE;
  ioctl(fd_tun, TUNSETQUEUE, &ifr);

  fd_tmp = socket(AF_PACKET, SOCK_PACKET, 0);
  ifr.ifr_flags = IFF_UP;
  ioctl(fd_tmp, SIOCSIFFLAGS, &ifr);

  write(fd_tun, buf, sizeof(buf));
  close(fd_tun);

If we enable NAPI and multi-queue on a TUN device, we can put skb into
tfile->sk.sk_write_queue after the queue is detached.  We should prevent
it by checking tfile->detached before queuing skb.

Note this must be done under tfile->sk.sk_write_queue.lock because write()
and ioctl(IFF_DETACH_QUEUE) can run concurrently.  Otherwise, there would
be a small race window:

  write()                             ioctl(IFF_DETACH_QUEUE)
  `- tun_get_user                     `- __tun_detach
     |- if (tfile->detached)             |- tun_disable_queue
     |  `-> false                        |  `- tfile->detached = tun
     |                                   `- tun_queue_purge
     |- spin_lock_bh(&queue->lock)
     `- __skb_queue_tail(queue, skb)

Another solution is to call tun_queue_purge() when closing and
reattaching the detached queue, but it could paper over another
problems.  Also, we do the same kind of test for IFF_NAPI_FRAGS.

[0]:
unreferenced object 0xffff88801edbc800 (size 2048):
  comm "syz-executor.1", pid 33269, jiffies 4295743834 (age 18.756s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
  backtrace:
    [<000000008c16ea3d>] __do_kmalloc_node mm/slab_common.c:965 [inline]
    [<000000008c16ea3d>] __kmalloc+0x4a/0x130 mm/slab_common.c:979
    [<000000003addde56>] kmalloc include/linux/slab.h:563 [inline]
    [<000000003addde56>] sk_prot_alloc+0xef/0x1b0 net/core/sock.c:2035
    [<000000003e20621f>] sk_alloc+0x36/0x2f0 net/core/sock.c:2088
    [<0000000028e43843>] tun_chr_open+0x3d/0x190 drivers/net/tun.c:3438
    [<000000001b0f1f28>] misc_open+0x1a6/0x1f0 drivers/char/misc.c:165
    [<000000004376f706>] chrdev_open+0x111/0x300 fs/char_dev.c:414
    [<00000000614d379f>] do_dentry_open+0x2f9/0x750 fs/open.c:920
    [<000000008eb24774>] do_open fs/namei.c:3636 [inline]
    [<000000008eb24774>] path_openat+0x143f/0x1a30 fs/namei.c:3791
    [<00000000955077b5>] do_filp_open+0xce/0x1c0 fs/namei.c:3818
    [<00000000b78973b0>] do_sys_openat2+0xf0/0x260 fs/open.c:1356
    [<00000000057be699>] do_sys_open fs/open.c:1372 [inline]
    [<00000000057be699>] __do_sys_openat fs/open.c:1388 [inline]
    [<00000000057be699>] __se_sys_openat fs/open.c:1383 [inline]
    [<00000000057be699>] __x64_sys_openat+0x83/0xf0 fs/open.c:1383
    [<00000000a7d2182d>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<00000000a7d2182d>] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80
    [<000000004cc4e8c4>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

unreferenced object 0xffff88802f671700 (size 240):
  comm "syz-executor.1", pid 33269, jiffies 4295743854 (age 18.736s)
  hex dump (first 32 bytes):
    68 c9 db 1e 80 88 ff ff 68 c9 db 1e 80 88 ff ff  h.......h.......
    00 c0 7b 2f 80 88 ff ff 00 c8 db 1e 80 88 ff ff  ..{/............
  backtrace:
    [<00000000e9d9fdb6>] __alloc_skb+0x223/0x250 net/core/skbuff.c:644
    [<000000002c3e4e0b>] alloc_skb include/linux/skbuff.h:1288 [inline]
    [<000000002c3e4e0b>] alloc_skb_with_frags+0x6f/0x350 net/core/skbuff.c:6378
    [<00000000825f98d7>] sock_alloc_send_pskb+0x3ac/0x3e0 net/core/sock.c:2729
    [<00000000e9eb3df3>] tun_alloc_skb drivers/net/tun.c:1529 [inline]
    [<00000000e9eb3df3>] tun_get_user+0x5e1/0x1f90 drivers/net/tun.c:1841
    [<0000000053096912>] tun_chr_write_iter+0xac/0x120 drivers/net/tun.c:2035
    [<00000000b9282ae0>] call_write_iter include/linux/fs.h:1868 [inline]
    [<00000000b9282ae0>] new_sync_write fs/read_write.c:491 [inline]
    [<00000000b9282ae0>] vfs_write+0x40f/0x530 fs/read_write.c:584
    [<00000000524566e4>] ksys_write+0xa1/0x170 fs/read_write.c:637
    [<00000000a7d2182d>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<00000000a7d2182d>] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80
    [<000000004cc4e8c4>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

Fixes: cde8b15f1aab ("tuntap: add ioctl to attach or detach a file form tuntap device")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge patch series "can: kvaser_pciefd: Bug fixes"

Jimmy Assarsson <extja@kvaser.com> says:

This patch series contains various bug fixes for the kvaser_pciefd
driver.

Link: https://lore.kernel.org/r/20230516134318.104279-1-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Disable interrupts in probe error path

Disable interrupts in error path of probe function.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: stable@vger.kernel.org
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/r/20230516134318.104279-7-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Do not send EFLUSH command on TFD interrupt

Under certain circumstances we send two EFLUSH commands, resulting in two
EFLUSH ack packets, while only expecting a single EFLUSH ack.
This can cause the driver Tx flush completion to get out of sync.

To avoid this problem, don't enable the "Transmit buffer flush done" (TFD)
interrupt and remove the code handling it.
Now we only send EFLUSH command after receiving status packet with
"Init detected" (IDET) bit set.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: stable@vger.kernel.org
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/r/20230516134318.104279-6-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Empty SRB buffer in probe

Empty the "Shared receive buffer" (SRB) in probe, to assure we start in a
known state, and don't process any irrelevant packets.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: stable@vger.kernel.org
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/r/20230516134318.104279-5-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Call request_irq() before enabling interrupts

Make sure the interrupt handler is registered before enabling interrupts.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: stable@vger.kernel.org
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/r/20230516134318.104279-4-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Clear listen-only bit if not explicitly requested

The listen-only bit was never cleared, causing the controller to
always use listen-only mode, if previously set.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: stable@vger.kernel.org
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/r/20230516134318.104279-3-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Set CAN_STATE_STOPPED in kvaser_pciefd_stop()

Set can.state to CAN_STATE_STOPPED in kvaser_pciefd_stop().
Without this fix, wrong CAN state was repported after the interface was
brought down.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: stable@vger.kernel.org
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://lore.kernel.org/r/20230516134318.104279-2-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

wifi: b43: fix incorrect __packed annotation

clang warns about an unpacked structure inside of a packed one:

drivers/net/wireless/broadcom/b43/b43.h:654:4: error: field data within 'struct b43_iv' is less aligned than 'union (unnamed union at /home/arnd/arm-soc/drivers/net/wireless/broadcom/b43/b43.h:651:2)' and is usually due to 'struct b43_iv' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]

The problem here is that the anonymous union has the default alignment
from its members, apparently because the original author mixed up the
placement of the __packed attribute by placing it next to the struct
member rather than the union definition. As the struct itself is
also marked as __packed, there is no need to mark its members, so just
move the annotation to the inner type instead.

As Michael noted, the same problem is present in b43legacy, so
change both at the same time.

Acked-by: Michael Büsch <m@bues.ch>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Link: https://lore.kernel.org/oe-kbuild-all/202305160749.ay1HAoyP-lkp@intel.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230516183442.536589-1-arnd@kernel.org

wifi: rtw88: sdio: Always use two consecutive bytes for word operations

The Allwinner sunxi-mmc controller cannot handle word (16 bit)
transfers. So and sdio_{read,write}w fails with messages like the
following example using an RTL8822BS (but the same problems were also
observed with RTL8822CS and RTL8723DS chips):
  rtw_8822bs mmc1:0001:1: Firmware version 27.2.0, H2C version 13
  sunxi-mmc 4021000.mmc: unaligned scatterlist: os f80 length 2
  sunxi-mmc 4021000.mmc: map DMA failed
  rtw_8822bs mmc1:0001:1: sdio read16 failed (0x10230): -22

Use two consecutive single byte accesses for word operations instead. It
turns out that upon closer inspection this is also what the vendor
driver does, even though it does have support for sdio_{read,write}w. So
we can conclude that the rtw88 chips do support word access but only on
SDIO controllers that also support it. Since there's no way to detect if
the controller supports word access or not the rtw88 sdio driver
switches to the easiest approach: avoiding word access.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Closes: https://lore.kernel.org/linux-wireless/527585e5-9cdd-66ed-c3af-6da162f4b720@lwfinger.net/
Reported-by: Rudi Heitbaum <rudi@heitbaum.com>
Link: https://github.com/LibreELEC/LibreELEC.tv/pull/7837#issue-1708469467
Fixes: 65371a3f14e7 ("wifi: rtw88: sdio: Add HCI implementation for SDIO based chipsets")
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230515195043.572375-1-martin.blumenstingl@googlemail.com

Merge tag 'ipsec-2023-05-16' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2023-05-16

1) Don't check the policy default if we have an allow
   policy. Fix from Sabrina Dubroca.

2) Fix netdevice refount usage on offload.
   From Leon Romanovsky.

3) Use netdev_put instead of dev_puti to correctly release
   the netdev on failure in xfrm_dev_policy_add.
   From Leon Romanovsky.

4) Revert "Fix XFRM-I support for nested ESP tunnels"
   This broke Netfilter policy matching.
   From Martin Willi.

5) Reject optional tunnel/BEET mode templates in outbound policies
   on netlink and pfkey sockets. From Tobias Brunner.

6) Check if_id in inbound policy/secpath match to make
   it symetric to the outbound codepath.
   From Benedict Wong.

* tag 'ipsec-2023-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: Check if_id in inbound policy/secpath match
  af_key: Reject optional tunnel/BEET mode templates in outbound policies
  xfrm: Reject optional tunnel/BEET mode templates in outbound policies
  Revert "Fix XFRM-I support for nested ESP tunnels"
  xfrm: Fix leak of dev tracker
  xfrm: release all offloaded policy memory
  xfrm: don't check the default policy if the policy allows the packet
====================

Link: https://lore.kernel.org/r/20230516052405.2677554-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'linux-can-fixes-for-6.4-20230515' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2023-05-15

The first 2 patches are by Oliver Hartkopp and allow the
MSG_CMSG_COMPAT flag for isotp and j1939.

The next patch is by Oliver Hartkopp, too and adds missing CAN XL
support in can_put_echo_skb().

Geert Uytterhoeven's patch let's the bxcan driver depend on
ARCH_STM32.

The last 5 patches are from Dario Binacchi and also affect the bxcan
driver. The bxcan driver hit mainline with v6.4-rc1 and was originally
written for IP cores containing 2 CAN interfaces with shared
resources. Dario's series updates the DT bindings and driver to
support IP cores with a single CAN interface instance as well as
adding the bxcan to the stm32f746's device tree.

* tag 'linux-can-fixes-for-6.4-20230515' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  ARM: dts: stm32: add CAN support on stm32f746
  can: bxcan: add support for single peripheral configuration
  ARM: dts: stm32: add pin map for CAN controller on stm32f7
  ARM: dts: stm32f429: put can2 in secondary mode
  dt-bindings: net: can: add "st,can-secondary" property
  can: CAN_BXCAN should depend on ARCH_STM32
  can: dev: fix missing CAN XL support in can_put_echo_skb()
  can: j1939: recvmsg(): allow MSG_CMSG_COMPAT flag
  can: isotp: recvmsg(): allow MSG_CMSG_COMPAT flag
====================

Link: https://lore.kernel.org/r/20230515204722.1000957-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Fix crash with CONFIG_NET_NS=n

'__net_initdata' becomes a no-op with CONFIG_NET_NS=y, but when this
option is disabled it becomes '__initdata', which means the data can be
freed after the initialization phase. This annotation is obviously
incorrect for the devlink net device notifier block which is still
registered after the initialization phase [1].

Fix this crash by removing the '__net_initdata' annotation.

[1]
general protection fault, probably for non-canonical address 0xcccccccccccccccc: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 117 Comm: (udev-worker) Not tainted 6.4.0-rc1-custom-gdf0acdc59b09 #64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
RIP: 0010:notifier_call_chain+0x58/0xc0
[...]
Call Trace:
<TASK>
dev_set_mac_address+0x85/0x120
dev_set_mac_address_user+0x30/0x50
do_setlink+0x219/0x1270
rtnl_setlink+0xf7/0x1a0
rtnetlink_rcv_msg+0x142/0x390
netlink_rcv_skb+0x58/0x100
netlink_unicast+0x188/0x270
netlink_sendmsg+0x214/0x470
__sys_sendto+0x12f/0x1a0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: e93c9378e33f ("devlink: change per-devlink netdev notifier to static one")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/netdev/600ddf9e-589a-2aa0-7b69-a438f833ca10@samsung.com/
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230515162925.1144416-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mac80211_hwsim: fix memory leak in hwsim_new_radio_nl

When parse_pmsr_capa failed in hwsim_new_radio_nl, the memory resources
applied for by pmsr_capa are not released. Add release processing to the
incorrect path.

Fixes: 92d13386ec55 ("mac80211_hwsim: add PMSR capability support")
Reported-by: syzbot+904ce6fbb38532d9795c@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230515092227.2691437-1-shaozhengchao@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: Add locking to the rate read flow

The rs_drv_get_rate flow reads the lq_sta to return the optimal rate
for tx frames. This read flow is not protected thereby leaving
a small window, a few instructions wide, open to contention by an
asynchronous rate update. Indeed this race condition was hit and the
update occurred in the middle of the read.

Fix this by locking the lq_sta struct during read.

Signed-off-by: Ariel Malamud <ariel.malamud@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230514120631.b52c9ed5c379.I15290b78e0d966c1b68278263776ca9de841d5fe@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: Don't use valid_links to iterate sta links

This bitmap equals to zero when in a non-MLO mode, and then we won't
be iterating on any link. Use for_each_sta_active_link() instead, as
it handles also the case of non-MLO mode.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230514120631.f32a8c08730a.Ib02248cd0b7f2bc885f91005c3c110dd027f9dcd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>