OSDN Git Service
Michael Walle [Mon, 9 Jan 2023 12:30:13 +0000 (13:30 +0100)]
net: phy: mxl-gpy: disable interrupts on GPY215 by default
The interrupts on the GPY215B and GPY215C are broken and the only viable
fix is to disable them altogether. There is still the possibilty to
opt-in via the device tree.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Walle [Mon, 9 Jan 2023 12:30:12 +0000 (13:30 +0100)]
net: phy: allow a phy to opt-out of interrupt handling
Until now, it is not possible for a PHY driver to disable interrupts
during runtime. If a driver offers the .config_intr() as well as the
.handle_interrupt() ops, it is eligible for interrupt handling.
Introduce a new flag for the dev_flags property of struct phy_device, which
can be set by PHY driver to skip interrupt setup and fall back to polling
mode.
At the moment, this is used for the MaxLinear PHY which has broken
interrupt handling and there is a need to disable interrupts in some
cases.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Walle [Mon, 9 Jan 2023 12:30:11 +0000 (13:30 +0100)]
dt-bindings: net: phy: add MaxLinear GPY2xx bindings
Add the device tree bindings for the MaxLinear GPY2xx PHYs, which
essentially adds just one flag: maxlinear,use-broken-interrupts.
One might argue, that if interrupts are broken, just don't use
the interrupt property in the first place. But it needs to be more
nuanced. First, this interrupt line is also used to wake up systems by
WoL, which has nothing to do with the (broken) PHY interrupt handling.
Second and more importantly, there are devicetrees which have this
property set. Thus, within the driver we have to switch off interrupt
handling by default as a workaround. But OTOH, a systems designer who
knows the hardware and knows there are no shared interrupts for example,
can use this new property as a hint to the driver that it can enable the
interrupt nonetheless.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Walle [Mon, 9 Jan 2023 12:30:10 +0000 (13:30 +0100)]
dt-bindings: vendor-prefixes: add MaxLinear
MaxLinear is a manufacturer of integrated circuits.
https://www.maxlinear.com
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 10 Jan 2023 10:58:41 +0000 (11:58 +0100)]
Merge branch 'mv88e6xxx-add-mab-offload-support'
Hans J. Schultz says:
====================
mv88e6xxx: Add MAB offload support
This patch-set adds MAB [1] offload support in mv88e6xxx.
Patch #1: Correct default return value for mv88e6xxx_port_bridge_flags.
Patch #2: Shorten the locked section in
mv88e6xxx_g1_atu_prob_irq_thread_fn().
Patch #3: The MAB implementation for mv88e6xxx.
[1] https://git.kernel.org/netdev/net-next/c/
4bf24ad09bc0
====================
Link: https://lore.kernel.org/r/20230108094849.1789162-1-netdev@kapio-technology.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hans J. Schultz [Sun, 8 Jan 2023 09:48:49 +0000 (10:48 +0100)]
net: dsa: mv88e6xxx: mac-auth/MAB implementation
This implementation for the Marvell mv88e6xxx chip series is based on
handling ATU miss violations occurring when packets ingress on a port
that is locked with learning on. This will trigger a
SWITCHDEV_FDB_ADD_TO_BRIDGE event, which will result in the bridge module
adding a locked FDB entry. This bridge FDB entry will not age out as
it has the extern_learn flag set.
Userspace daemons can listen to these events and either accept or deny
access for the host, by either replacing the locked FDB entry with a
simple entry or leave the locked entry.
If the host MAC address is already present on another port, a ATU
member violation will occur, but to no real effect, and the packet will
be dropped in hardware. Statistics on these violations can be shown with
the command and example output of interest:
ethtool -S ethX
NIC statistics:
...
atu_member_violation: 5
atu_miss_violation: 23
...
Where ethX is the interface of the MAB enabled port.
Furthermore, as added vlan interfaces where the vid is not added to the
VTU will cause ATU miss violations reporting the FID as
MV88E6XXX_FID_STANDALONE, we need to check and skip the miss violations
handling in this case.
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hans J. Schultz [Sun, 8 Jan 2023 09:48:48 +0000 (10:48 +0100)]
net: dsa: mv88e6xxx: shorten the locked section in mv88e6xxx_g1_atu_prob_irq_thread_fn()
As only the hardware access functions up til and including
mv88e6xxx_g1_atu_mac_read() called under the interrupt handler
need to take the chip lock, we release the chip lock after this call.
The follow up code that handles the violations can run without the
chip lock held.
In further patches, the violation handler function will even be
incompatible with having the chip lock held. This due to an AB/BA
ordering inversion with rtnl_lock().
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hans J. Schultz [Sun, 8 Jan 2023 09:48:47 +0000 (10:48 +0100)]
net: dsa: mv88e6xxx: change default return of mv88e6xxx_port_bridge_flags
The default return value -EOPNOTSUPP of mv88e6xxx_port_bridge_flags()
came from the return value of the DSA method port_egress_floods() in
commit
4f85901f0063 ("net: dsa: mv88e6xxx: add support for bridge flags"),
but the DSA API was changed in commit
a8b659e7ff75 ("net: dsa: act as
passthrough for bridge port flags"), resulting in the return value
-EOPNOTSUPP not being valid anymore, and sections for new flags will not
need to set the return value to zero on success, as with the new mab flag
added in a following patch.
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju [Mon, 9 Jan 2023 10:18:19 +0000 (15:48 +0530)]
amd-xgbe: Add support for 10 Mbps speed
Add the necessary changes to support 10 Mbps speed for BaseT and SFP
port modes. This is supported in MAC ver >= 30H.
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://lore.kernel.org/r/20230109101819.747572-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jamie Gloudon [Tue, 3 Jan 2023 23:06:53 +0000 (15:06 -0800)]
e1000e: Enable Link Partner Advertised Support
This enables link partner advertised support to show link modes and
pause frame use.
Signed-off-by: Jamie Gloudon <jamie.gloudon@gmx.fr>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230103230653.1102544-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 7 Jan 2023 02:29:04 +0000 (18:29 -0800)]
net: skb: remove old comments about frag_size for build_skb()
Since commit
ce098da1497c ("skbuff: Introduce slab_build_skb()")
drivers trying to build skb around slab-backed buffers should
go via slab_build_skb() rather than passing frag_size = 0 to
the main build_skb().
Remove the copy'n'pasted comments about 0 meaning slab.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jan 2023 07:39:53 +0000 (07:39 +0000)]
Merge branch 'r8152-NCM-firmwares'
Bjørn Mork says:
====================
r8152: allow firmwares with NCM support
Some device and firmware combinations with NCM support will
end up using the cdc_ncm driver by default. This is sub-
optimal for the same reasons we've previously accepted the
blacklist hack in cdc_ether.
The recent support for subclassing the generic USB device
driver allows us to create a very slim driver with the same
functionality. This patch set uses that to implement a
device specific configuration default which is independent
of any USB interface drivers. This means that it works
equally whether the device initially ends up in NCM or ECM
mode, without depending on any code in the respective class
drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 6 Jan 2023 16:07:39 +0000 (17:07 +0100)]
cdc_ether: no need to blacklist any r8152 devices
The r8152 driver does not need this anymore.
Dropping blacklist entries adds optional support for these
devices in ECM mode.
The 8153 devices are handled by the r8153_ecm driver when
in ECM mode, and must still be blacklisted here.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 6 Jan 2023 16:07:38 +0000 (17:07 +0100)]
r8152: add USB device driver for config selection
Subclassing the generic USB device driver to override the
default configuration selection regardless of matching interface
drivers.
The r815x family devices expose a vendor specific function which
the r8152 interface driver wants to handle. This is the preferred
device mode. Additionally one or more USB class functions are
usually supported for hosts lacking a vendor specific driver. The
choice is USB configuration based, with one alternate function per
configuration.
Example device with both NCM and ECM alternate cfgs:
T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 4 Spd=5000 MxCh= 0
D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 3
P: Vendor=0bda ProdID=8156 Rev=31.00
S: Manufacturer=Realtek
S: Product=USB 10/100/1G/2.5G LAN
S: SerialNumber=
001000001
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=256mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=00 Driver=r8152
E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=83(I) Atr=03(Int.) MxPS= 2 Ivl=128ms
C: #Ifs= 2 Cfg#= 2 Atr=a0 MxPwr=256mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0d Prot=00 Driver=
E: Ad=83(I) Atr=03(Int.) MxPS= 16 Ivl=128ms
I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=01 Driver=
I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=01 Driver=
E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
C: #Ifs= 2 Cfg#= 3 Atr=a0 MxPwr=256mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=
E: Ad=83(I) Atr=03(Int.) MxPS= 16 Ivl=128ms
I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=
I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=
E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
A problem with this is that Linux will prefer class functions over
vendor specific functions. Using the above example, Linux defaults
to cfg #2, running the device in a sub-optimal NCM mode.
Previously we've attempted to work around the problem by
blacklisting the devices in the ECM class driver "cdc_ether", and
matching on the ECM class function in the vendor specific interface
driver. The latter has been used to switch back to the vendor
specific configuration when the driver is probed for a class
function.
This workaround has several issues;
- class driver blacklists is additional maintanence cruft in an
unrelated driver
- class driver blacklists prevents users from optionally running
the devices in class mode
- each device needs double match entries in the vendor driver
- the initial probing as a class function slows down device
discovery
Now these issues have become even worse with the introduction of
firmware supporting both NCM and ECM, where NCM ends up as the
default mode in Linux. To use the same workaround, we now have
to blacklist the devices in to two different class drivers and
add yet another match entry to the vendor specific driver.
This patch implements an alternative workaround strategy -
independent of the interface drivers. It avoids adding a
blacklist to the cdc_ncm driver and will let us remove the
existing blacklist from the cdc_ether driver.
As an additional bonus, removing the blacklists allow users to
select one of the other device modes if wanted.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jan 2023 07:30:50 +0000 (07:30 +0000)]
Merge branch 'mptcp-next'
Mat Martineau says:
====================
mptcp: Protocol in-use tracking and code cleanup
Here's a collection of commits from the MPTCP tree:
Patches 1-4 and 6 contain miscellaneous code cleanup for more consistent
use of helper functions, existing local variables, and better naming.
Patches 5, 7, and 9 add sock_prot_inuse tracking for MPTCP and an
associated self test.
Patch 8 modifies the mptcp_connect self test tool to exit on SIGUSR1
when in "slow mode".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 6 Jan 2023 18:57:25 +0000 (10:57 -0800)]
selftest: mptcp: add test for mptcp socket in use
Add the function chk_msk_inuse() to diag.sh, which is used to check the
statistics of mptcp socket in use. As mptcp socket in listen state will
be closed randomly after 'accept', we need to get the count of listening
mptcp socket through 'ss' command.
All tests pass.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 6 Jan 2023 18:57:24 +0000 (10:57 -0800)]
selftest: mptcp: exit from copyfd_io_poll() when receive SIGUSR1
For now, mptcp_connect won't exit after receiving the 'SIGUSR1' signal
if '-r' is set. Fix this by skipping poll and sleep in copyfd_io_poll()
if 'quit' is set.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 6 Jan 2023 18:57:23 +0000 (10:57 -0800)]
mptcp: add statistics for mptcp socket in use
Do the statistics of mptcp socket in use with sock_prot_inuse_add().
Therefore, we can get the count of used mptcp socket from
/proc/net/protocols:
& cat /proc/net/protocols
protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em
MPTCPv6 2048 0 0 no 0 yes kernel y n y y y y y y y y y y n n n y y y n
MPTCP 1896 1 0 no 0 yes kernel y n y y y y y y y y y y n n n y y y n
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 6 Jan 2023 18:57:22 +0000 (10:57 -0800)]
mptcp: rename 'sk' to 'ssk' in mptcp_token_new_connect()
'ssk' should be more appropriate to be the name of the first argument
in mptcp_token_new_connect().
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 6 Jan 2023 18:57:21 +0000 (10:57 -0800)]
mptcp: init sk->sk_prot in build_msk()
The 'sk_prot' field in token KUNIT self-tests will be dereferenced in
mptcp_token_new_connect(). Therefore, init it with tcp_prot.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Fri, 6 Jan 2023 18:57:20 +0000 (10:57 -0800)]
mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen()
'sock->sk' is used frequently in mptcp_listen(). Therefore, we can
introduce the 'sk' and replace 'sock->sk' with it.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Fri, 6 Jan 2023 18:57:19 +0000 (10:57 -0800)]
mptcp: use local variable ssk in write_options
The local variable 'ssk' has been defined at the beginning of the function
mptcp_write_options(), use it instead of getting 'ssk' again.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Fri, 6 Jan 2023 18:57:18 +0000 (10:57 -0800)]
mptcp: use net instead of sock_net
Use the local variable 'net' instead of sock_net() in the functions where
the variable 'struct net *net' has been defined.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Fri, 6 Jan 2023 18:57:17 +0000 (10:57 -0800)]
mptcp: use msk_owned_by_me helper
The helper msk_owned_by_me() is defined in protocol.h, so use it instead
of sock_owned_by_me().
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leesoo Ahn [Fri, 6 Jan 2023 10:49:49 +0000 (19:49 +0900)]
usbnet: optimize usbnet_bh() to reduce CPU load
The current source pushes skb into dev-done queue by calling
skb_dequeue_tail() and then pop it by skb_dequeue() to branch to
rx_cleanup state for freeing urb/skb in usbnet_bh(). It takes extra CPU
load, 2.21% (skb_queue_tail) as follows,
- 11.58% 0.26% swapper [k] usbnet_bh
- 11.32% usbnet_bh
- 6.43% skb_dequeue
6.34% _raw_spin_unlock_irqrestore
- 2.21% skb_queue_tail
2.19% _raw_spin_unlock_irqrestore
- 1.68% consume_skb
- 0.97% kfree_skbmem
0.80% kmem_cache_free
0.53% skb_release_data
To reduce the extra CPU load use return values to call helper function
usb_free_skb() to free the resources instead of calling skb_queue_tail()
and skb_dequeue() for push and pop respectively.
- 7.87% 0.25% swapper [k] usbnet_bh
- 7.62% usbnet_bh
- 4.81% skb_dequeue
4.74% _raw_spin_unlock_irqrestore
- 1.75% consume_skb
- 0.98% kfree_skbmem
0.78% kmem_cache_free
0.58% skb_release_data
0.53% smsc95xx_rx_fixup
Signed-off-by: Leesoo Ahn <lsahn@ooseel.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jan 2023 07:21:42 +0000 (07:21 +0000)]
Merge branch 'phy-micrel-warnings'
Divya Koppera says:
====================
Fixed warnings
Fixed warnings related to PTR_ERR and initialization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Divya Koppera [Fri, 6 Jan 2023 08:29:05 +0000 (13:59 +0530)]
net: phy: micrel: Fix warn: passing zero to PTR_ERR
Handle the NULL pointer case
Fixes New smatch warnings:
drivers/net/phy/micrel.c:2613 lan8814_ptp_probe_once() warn: passing zero to 'PTR_ERR'
vim +/PTR_ERR +2613 drivers/net/phy/micrel.c
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Divya Koppera [Fri, 6 Jan 2023 08:29:04 +0000 (13:59 +0530)]
net: phy: micrel: Fixed error related to uninitialized symbol ret
Initialized return variable
Fixes Old smatch warnings:
drivers/net/phy/micrel.c:1750 ksz886x_cable_test_get_status() error:
uninitialized symbol 'ret'.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 7 Jan 2023 03:38:01 +0000 (19:38 -0800)]
Merge branch 'net-wangxun-adjust-code-structure'
Jiawen Wu says:
====================
net: wangxun: Adjust code structure
Remove useless structs 'txgbe_hw' and 'ngbe_hw' make the codes clear.
And move the same codes which sets MAC address between txgbe and ngbe
to libwx. Further more, rename struct 'wx_hw' to 'wx' and move total
adapter members to wx.
====================
Link: https://lore.kernel.org/r/20230106033853.2806007-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mengyuan Lou [Fri, 6 Jan 2023 03:38:53 +0000 (11:38 +0800)]
net: ngbe: Remove structure ngbe_adapter
Move the total private structure to libwx.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Fri, 6 Jan 2023 03:38:52 +0000 (11:38 +0800)]
net: txgbe: Remove structure txgbe_adapter
Move the total private structure to libwx.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Fri, 6 Jan 2023 03:38:51 +0000 (11:38 +0800)]
net: wangxun: Rename private structure in libwx
In order to move the total members in struct adapter to struct wx_hw
to keep the code clean, it's a bad name of 'wx_hw' only for hardware.
Rename 'wx_hw' to 'wx', and rename the pointers at use.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Fri, 6 Jan 2023 03:38:50 +0000 (11:38 +0800)]
net: wangxun: Move MAC address handling to libwx
For setting MAC address, both txgbe and ngbe drivers have the same handling
flow with different parameters. Move the same codes to libwx.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Fri, 6 Jan 2023 03:38:49 +0000 (11:38 +0800)]
net: ngbe: Move defines into unified file
Remove ngbe.h, move defines into ngbe_type.h file.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Fri, 6 Jan 2023 03:38:48 +0000 (11:38 +0800)]
net: txgbe: Move defines into unified file
Remove txgbe.h, move defines into txgbe_type.h file.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Fri, 6 Jan 2023 03:38:47 +0000 (11:38 +0800)]
net: ngbe: Remove structure ngbe_hw
Remove useless structure ngbe_hw to make the codes clear.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Fri, 6 Jan 2023 03:38:46 +0000 (11:38 +0800)]
net: txgbe: Remove structure txgbe_hw
Remove useless structure txgbe_hw to make the codes clear.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur [Wed, 4 Jan 2023 19:42:18 +0000 (20:42 +0100)]
net: phy: micrel: Change handler interrupt for lan8814
The lan8814 represents a package of 4 PHYs. All of them are sharing the
same interrupt line. So when a link was going down/up or a frame was
timestamped, then the interrupt handler of all the PHYs was called.
Which is all fine and expected but the problem is the way the handler
interrupt works.
Basically if one of the PHYs timestamp a frame, then all the other 3
PHYs were polling the status of the interrupt until that PHY actually
cleared the interrupt by reading the timestamp.
The reason of polling was in case another PHY was also timestamping a
frame at the same time, it could miss this interrupt. But this is not
the right approach, because it is the interrupt controller who needs to
call the interrupt handlers again if the interrupt line is still
active.
Therefore change this such when the interrupt handler is called check
only if the interrupt is for itself, otherwise just exit. In this way
save CPU usage.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230104194218.3785229-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 6 Jan 2023 04:28:48 +0000 (20:28 -0800)]
ethtool: Replace 0-length array with flexible array
Zero-length arrays are deprecated[1]. Replace struct ethtool_rxnfc's
"rule_locs" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:
net/ethtool/common.c: In function 'ethtool_get_max_rxnfc_channel':
net/ethtool/common.c:558:55: warning: array subscript i is outside array bounds of '__u32[0]' {aka 'unsigned int[]'} [-Warray-bounds=]
558 | .fs.location = info->rule_locs[i],
| ~~~~~~~~~~~~~~~^~~
In file included from include/linux/ethtool.h:19,
from include/uapi/linux/ethtool_netlink.h:12,
from include/linux/ethtool_netlink.h:6,
from net/ethtool/common.c:3:
include/uapi/linux/ethtool.h:1186:41: note: while referencing
'rule_locs'
1186 | __u32 rule_locs[0];
| ^~~~~~~~~
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: kernel test robot <lkp@intel.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Cc: Sean Anderson <sean.anderson@seco.com>
Cc: Alexandru Tachici <alexandru.tachici@analog.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230106042844.give.885-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Thu, 5 Jan 2023 22:15:37 +0000 (14:15 -0800)]
net: ipv6: rpl_iptunnel: Replace 0-length arrays with flexible arrays
Zero-length arrays are deprecated[1]. Replace struct ipv6_rpl_sr_hdr's
"segments" union of 0-length arrays with flexible arrays. Detected with
GCC 13, using -fstrict-flex-arrays=3:
In function 'rpl_validate_srh',
inlined from 'rpl_build_state' at ../net/ipv6/rpl_iptunnel.c:96:7:
../net/ipv6/rpl_iptunnel.c:60:28: warning: array subscript <unknown> is outside array bounds of 'struct in6_addr[0]' [-Warray-bounds=]
60 | if (ipv6_addr_type(&srh->rpl_segaddr[srh->segments_left - 1]) &
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../include/net/rpl.h:12,
from ../net/ipv6/rpl_iptunnel.c:13:
../include/uapi/linux/rpl.h: In function 'rpl_build_state':
../include/uapi/linux/rpl.h:40:33: note: while referencing 'addr'
40 | struct in6_addr addr[0];
| ^~~~
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230105221533.never.711-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Thu, 5 Jan 2023 22:21:16 +0000 (14:21 -0800)]
ipv6: ioam: Replace 0-length array with flexible array
Zero-length arrays are deprecated[1]. Replace struct ioam6_trace_hdr's
"data" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:
net/ipv6/ioam6_iptunnel.c: In function 'ioam6_build_state':
net/ipv6/ioam6_iptunnel.c:194:37: warning: array subscript <unknown> is outside array bounds of '__u8[0]' {aka 'unsigned char[]'} [-Warray-bounds=]
194 | tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PADN;
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
In file included from include/linux/ioam6.h:11,
from net/ipv6/ioam6_iptunnel.c:13:
include/uapi/linux/ioam6.h:130:17: note: while referencing 'data'
130 | __u8 data[0];
| ^~~~
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Justin Iurman <justin.iurman@uliege.be>
Tested-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://lore.kernel.org/r/20230105222115.never.661-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Fri, 6 Jan 2023 12:56:20 +0000 (12:56 +0000)]
Merge branch 'devlink-unregister'
Jakub Kicinski says:
====================
devlink: remove the wait-for-references on unregister
Move the registration and unregistration of the devlink instances
under their instance locks. Don't perform the netdev-style wait
for all references when unregistering the instance.
Instead the devlink instance refcount will only ensure that
the memory of the instance is not freed. All places which acquire
access to devlink instances via a reference must check that the
instance is still registered under the instance lock.
This fixes the problem of the netdev code accessing devlink
instances before they are registered.
RFC: https://lore.kernel.org/all/
20221217011953.152487-1-kuba@kernel.org/
- rewrite the cover letter
- rewrite the commit message for patch 1
- un-export and rename devl_is_alive
- squash the netdevsim patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:34:02 +0000 (22:34 -0800)]
netdevsim: move devlink registration under the instance lock
To prevent races with netdev code accessing free devlink instances
move the registration under the devlink instance lock.
Core now waits for the instance to be registered before accessing it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:34:01 +0000 (22:34 -0800)]
netdevsim: rename a label
err_dl_unregister should unregister the devlink instance.
Looks like renaming it was missed in one of the reshufflings.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:34:00 +0000 (22:34 -0800)]
devlink: allow registering parameters after the instance
It's most natural to register the instance first and then its
subobjects. Now that we can use the instance lock to protect
the atomicity of all init - it should also be safe.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:33:59 +0000 (22:33 -0800)]
devlink: don't require setting features before registration
Requiring devlink_set_features() to be run before devlink is
registered is overzealous. devlink_set_features() itself is
a leftover from old workarounds which were trying to prevent
initiating reload before probe was complete.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:33:58 +0000 (22:33 -0800)]
devlink: remove the registration guarantee of references
The objective of exposing the devlink instance locks to
drivers was to let them use these locks to prevent user space
from accessing the device before it's fully initialized.
This is difficult because devlink_unregister() waits for all
references to be released, meaning that devlink_unregister()
can't itself be called under the instance lock.
To avoid this issue devlink_register() was moved after subobject
registration a while ago. Unfortunately the netdev paths get
a hold of the devlink instances _before_ they are registered.
Ideally netdev should wait for devlink init to finish (synchronizing
on the instance lock). This can't work because we don't know if the
instance will _ever_ be registered (in case of failures it may not).
The other option of returning an error until devlink_register()
is called is unappealing (user space would get a notification
netdev exist but would have to wait arbitrary amount of time
before accessing some of its attributes).
Weaken the guarantees of the devlink references.
Holding a reference will now only guarantee that the memory
of the object is around. Another way of looking at it is that
the reference now protects the object not its "registered" status.
Use devlink instance lock to synchronize unregistration.
This implies that releasing of the "main" reference of the devlink
instance moves from devlink_unregister() to devlink_free().
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:33:57 +0000 (22:33 -0800)]
devlink: always check if the devlink instance is registered
Always check under the instance lock whether the devlink instance
is still / already registered.
This is a no-op for the most part, as the unregistration path currently
waits for all references. On the init path, however, we may temporarily
open up a race with netdev code, if netdevs are registered before the
devlink instance. This is temporary, the next change fixes it, and this
commit has been split out for the ease of review.
Note that in case of iterating over sub-objects which have their
own lock (regions and line cards) we assume an implicit dependency
between those objects existing and devlink unregistration.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:33:56 +0000 (22:33 -0800)]
devlink: protect devlink->dev by the instance lock
devlink->dev is assumed to be always valid as long as any
outstanding reference to the devlink instance exists.
In prep for weakening of the references take the instance lock.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:33:55 +0000 (22:33 -0800)]
devlink: update the code in netns move to latest helpers
devlink_pernet_pre_exit() is the only obvious place which takes
the instance lock without using the devl_ helpers. Update the code
and move the error print after releasing the reference
(having unlock and put together feels slightly idiomatic).
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:33:54 +0000 (22:33 -0800)]
devlink: bump the instance index directly when iterating
xa_find_after() is designed to handle multi-index entries correctly.
If a xarray has two entries one which spans indexes 0-3 and one at
index 4 xa_find_after(0) will return the entry at index 4.
Having to juggle the two callbacks, however, is unnecessary in case
of the devlink xarray, as there is 1:1 relationship with indexes.
Always use xa_find() and increment the index manually.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Thu, 5 Jan 2023 02:28:42 +0000 (18:28 -0800)]
sysctl: expose all net/core sysctls inside netns
All were not visible to the non-priv users inside netns. However,
with
4ecb90090c84 ("sysctl: allow override of /proc/sys/net with
CAP_NET_ADMIN"), these vars are protected from getting modified.
A proc with capable(CAP_NET_ADMIN) can change the values so
not having them visible inside netns is just causing nuisance to
process that check certain values (e.g. net.core.somaxconn) and
see different behavior in root-netns vs. other-netns
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 6 Jan 2023 06:09:10 +0000 (22:09 -0800)]
Merge branch 'devlink-code-split-and-structured-instance-walk'
Jakub Kicinski says:
====================
devlink: code split and structured instance walk
Split devlink.c into a handful of files, trying to keep the "core"
code away from all the command-specific implementations.
The core code has been quite scattered until now. Going forward we can
consider using a source file per-subobject, I think that it's quite
beneficial to newcomers (based on relative ease with which folks
contribute to ethtool vs devlink). But this series doesn't split
everything out, yet - partially due to backporting concerns,
but mostly due to lack of time. Bulk of the netlink command
handling is left in a leftover.c file.
Introduce a context structure for dumps, and use it to store
the devlink instance ID of the last dumped devlink instance.
This means we don't have to restart the walk from 0 each time.
Finally - introduce a "structured walk". A centralized dump handler
in devlink/netlink.c which walks the devlink instances, deals with
refcounting/locking, simplifying the per-object implementations quite
a bit. Inspired by the ethtool code.
v1: https://lore.kernel.org/all/
20230104041636.226398-1-kuba@kernel.org/
RFC: https://lore.kernel.org/all/
20221215020155.
1619839-1-kuba@kernel.org/
====================
Link: https://lore.kernel.org/r/20230105040531.353563-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:31 +0000 (20:05 -0800)]
devlink: convert remaining dumps to the by-instance scheme
Soon we'll have to check if a devlink instance is alive after
locking it. Convert to the by-instance dumping scheme to make
refactoring easier.
Most of the subobject code no longer has to worry about any devlink
locking / lifetime rules (the only ones that still do are the two subject
types which stubbornly use their own locking). Both dump and do callbacks
are given a devlink instance which is already locked and good-to-access
(do from the .pre_doit handler, dump from the new dump indirection).
Note that we'll now check presence of an op (e.g. for sb_pool_get)
under the devlink instance lock, that will soon be necessary anyway,
because we don't hold refs on the driver modules so the memory
in which ops live may be gone for a dead instance, after upcoming
locking changes.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:30 +0000 (20:05 -0800)]
devlink: add by-instance dump infra
Most dumpit implementations walk the devlink instances.
This requires careful lock taking and reference dropping.
Factor the loop out and provide just a callback to handle
a single instance dump.
Convert one user as an example, other users converted
in the next change.
Slightly inspired by ethtool netlink code.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:29 +0000 (20:05 -0800)]
devlink: uniformly take the devlink instance lock in the dump loop
Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit().
This way all dumps will take the instance lock in the main iteration
loop directly, making refactoring and reading the code easier.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:28 +0000 (20:05 -0800)]
devlink: restart dump based on devlink instance ids (function)
Use xarray id for cases of sub-objects which are iterated in
a function.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:27 +0000 (20:05 -0800)]
devlink: restart dump based on devlink instance ids (nested)
Use xarray id for cases of simple sub-object iteration.
We'll now use the state->instance for the devlink instances
and state->idx for subobject index.
Moving the definition of idx into the inner loop makes sense,
so while at it also move other sub-object local variables into
the loop.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:26 +0000 (20:05 -0800)]
devlink: restart dump based on devlink instance ids (simple)
xarray gives each devlink instance an id and allows us to restart
walk based on that id quite neatly. This is nice both from the
perspective of code brevity and from the stability of the dump
(devlink instances disappearing from before the resumption point
will not cause inconsistent dumps).
This patch takes care of simple cases where state->idx counts
devlink instances only.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:25 +0000 (20:05 -0800)]
devlink: health: combine loops in dump
Walk devlink instances only once. Dump the instance reporters
and port reporters before moving to the next instance.
User space should not depend on ordering of messages.
This will make improving stability of the walk easier.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:24 +0000 (20:05 -0800)]
devlink: drop the filter argument from devlinks_xa_find_get
Looks like devlinks_xa_find_get() was intended to get the mark
from the @filter argument. It doesn't actually use @filter, passing
DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other
than registered is unlikely so drop @filter argument completely.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:23 +0000 (20:05 -0800)]
devlink: remove start variables from dumps
The start variables made the code clearer when we had to access
cb->args[0] directly, as the name args doesn't explain much.
Now that we use a structure to hold state this seems no longer
needed.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:22 +0000 (20:05 -0800)]
devlink: use an explicit structure for dump context
Create a dump context structure instead of using cb->args
as an unsigned long array. This is a pure conversion which
is intended to be as much of a noop as possible.
Subsequent changes will use this to simplify the code.
The two non-trivial parts are:
- devlink_nl_cmd_health_reporter_dump_get_dumpit() checks args[0]
to see if devlink_fmsg_dumpit() has already been called (whether
this is the first msg), but doesn't use the exact value, so we
can drop the local variable there already
- devlink_nl_cmd_region_read_dumpit() uses args[0] for address
but we'll use args[1] now, shouldn't matter
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:21 +0000 (20:05 -0800)]
netlink: add macro for checking dump ctx size
We encourage casting struct netlink_callback::ctx to a local
struct (in a comment above the field). Provide a convenience
macro for checking if the local struct fits into the ctx.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:20 +0000 (20:05 -0800)]
devlink: split out netlink code
Move out the netlink glue into a separate file.
Leave the ops in the old file because we'd have to export a ton
of functions. Going forward we should switch to split ops which
will let us to put the new ops in the netlink.c file.
Pure code move, no functional changes.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:19 +0000 (20:05 -0800)]
devlink: split out core code
Move core code into a separate file. It's spread around the main
file which makes refactoring and figuring out how devlink works
harder.
Move the xarray, all the most core devlink instance code out like
locking, ref counting, alloc, register, etc. Leave port stuff in
leftover.c, if we want to move port code it'd probably be to its
own file.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:18 +0000 (20:05 -0800)]
devlink: rename devlink_netdevice_event -> devlink_port_netdevice_event
To make the upcoming change a pure(er?) code move rename
devlink_netdevice_event -> devlink_port_netdevice_event.
This makes it clear that it only touches ports and doesn't
belong cleanly in the core.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:05:17 +0000 (20:05 -0800)]
devlink: move code to a dedicated directory
The devlink code is hard to navigate with 13kLoC in one file.
I really like the way Michal split the ethtool into per-command
files and core. It'd probably be too much to split it all up,
but we can at least separate the core parts out of the per-cmd
implementations and put it in a directory so that new commands
can be separate files.
Move the code, subsequent commit will do a partial split.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 6 Jan 2023 06:03:15 +0000 (22:03 -0800)]
Merge branch 'net-ipa-simplify-ipa-interrupt-handling'
Alex Elder says:
====================
net: ipa: simplify IPA interrupt handling
One of the IPA's two IRQs fires when data on a suspended channel is
available (to request that the channel--or system--be resumed to
recieve the pending data). This interrupt also handles a few
conditions signaled by the embedded microcontroller.
For this "IPA interrupt", the current code requires a handler to be
dynamically registered for each interrupt condition. Any condition
that has no registered handler is quietly ignored. This design is
derived from the downstream IPA driver implementation.
There isn't any need for this complexity. Even in the downstream
code, only four of the available 30 or so IPA interrupt conditions
are ever handled. So these handlers can pretty easily just be
called directly in the main IRQ handler function.
This series simplifies the interrupt handling code by having the
small number of IPA interrupt handlers be called directly, rather
than having them be registered dynamically.
Version 2 just adds a missing forward-reference, as suggested by
Caleb.
====================
Link: https://lore.kernel.org/r/20230104175233.2862874-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 4 Jan 2023 17:52:33 +0000 (11:52 -0600)]
net: ipa: don't maintain IPA interrupt handler array
We can call the two IPA interrupt handler functions directly;
there's no need to maintain the array of handler function pointers
any more.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 4 Jan 2023 17:52:32 +0000 (11:52 -0600)]
net: ipa: kill ipa_interrupt_add()
The dynamic assignment of IPA interrupt handlers isn't needed; we
only handle three IPA interrupt types, and their handler functions
are now assigned directly. We can get rid of ipa_interrupt_add()
and ipa_interrupt_remove() now, because they serve no purpose.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 4 Jan 2023 17:52:31 +0000 (11:52 -0600)]
net: ipa: register IPA interrupt handlers directly
Declare the microcontroller IPA interrupt handler publicly, and
assign it directly in ipa_interrupt_config(). Make the SUSPEND IPA
interrupt handler public, and rename it ipa_power_suspend_handler().
Assign it directly in ipa_interrupt_config() as well.
This makes it unnecessary to do this in ipa_interrupt_add(). Make
similar changes for removing IPA interrupt handlers.
The next two patches will finish the cleanup, removing the
add/remove functions and the handler array entirely.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 4 Jan 2023 17:52:30 +0000 (11:52 -0600)]
net: ipa: enable IPA interrupt handlers separate from registration
Expose ipa_interrupt_enable() and have functions that register
IPA interrupt handlers enable them directly, rather than having the
registration process do that. Do the same for disabling IPA
interrupt handlers.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 4 Jan 2023 17:52:29 +0000 (11:52 -0600)]
net: ipa: introduce ipa_interrupt_enable()
Create new function ipa_interrupt_enable() to encapsulate enabling
one of the IPA interrupt types. Introduce ipa_interrupt_disable()
to reverse that operation. Add a helper function to factor out the
common register update used by both.
Use these in ipa_interrupt_add() and ipa_interrupt_remove().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 4 Jan 2023 17:52:28 +0000 (11:52 -0600)]
net: ipa: introduce a common microcontroller interrupt handler
The prototype for an IPA interrupt handler supplies the IPA
interrupt ID, so it's possible to use a single function to handle
any type of microcontroller interrupt.
Introduce ipa_uc_interrupt_handler(), which calls the event or the
response handler depending on the IRQ ID provided. Register the new
function as the handler for both microcontroller IPA interrupt types.
The called functions don't use their "irq_id" arguments, so remove
them.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 6 Jan 2023 05:38:33 +0000 (21:38 -0800)]
Merge branch 'enetc-unlock-xdp_redirect-for-xdp-non-linear-buffers'
Lorenzo Bianconi says:
====================
enetc: unlock XDP_REDIRECT for XDP non-linear buffers
Unlock XDP_REDIRECT for S/G XDP buffer and rely on XDP stack to properly
take care of the frames.
Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access
non-linear part of the xdp_buff/xdp_frame.
====================
Link: https://lore.kernel.org/r/cover.1672840490.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Wed, 4 Jan 2023 13:57:12 +0000 (14:57 +0100)]
net: ethernet: enetc: do not always access skb_shared_info in the XDP path
Move XDP skb_shared_info structure initialization in from
enetc_map_rx_buff_to_xdp() to enetc_add_rx_buff_to_xdp() and do not always
access skb_shared_info in the xdp_buff/xdp_frame since it is located in a
different cacheline with respect to hard_start and data xdp pointers.
Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access
non-linear part of the xdp_buff/xdp_frame.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Wed, 4 Jan 2023 13:57:11 +0000 (14:57 +0100)]
net: ethernet: enetc: get rid of xdp_redirect_sg counter
Remove xdp_redirect_sg counter and the related ethtool entry since it is
no longer used.
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Wed, 4 Jan 2023 13:57:10 +0000 (14:57 +0100)]
net: ethernet: enetc: unlock XDP_REDIRECT for XDP non-linear buffers
Even if full XDP_REDIRECT is not supported yet for non-linear XDP buffers
since we allow redirecting just into CPUMAPs, unlock XDP_REDIRECT for
S/G XDP buffer and rely on XDP stack to properly take care of the
frames.
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 23:34:11 +0000 (15:34 -0800)]
Merge git://git./linux/kernel/git/netdev/net
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 5 Jan 2023 20:40:50 +0000 (12:40 -0800)]
Merge tag 'net-6.2-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons, avoid
null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
Linus Torvalds [Thu, 5 Jan 2023 20:06:40 +0000 (12:06 -0800)]
Merge tag 'gpio-fixes-for-v6.2-rc3' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"A reference leak fix, two fixes for using uninitialized variables and
more drivers converted to using immutable irqchips:
- fix a reference leak in gpio-sifive
- fix a potential use of an uninitialized variable in core gpiolib
- fix a potential use of an uninitialized variable in gpio-pca953x
- make GPIO irqchips immutable in gpio-pmic-eic-sprd, gpio-eic-sprd
and gpio-sprd"
* tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sifive: Fix refcount leak in sifive_gpio_probe
gpio: sprd: Make the irqchip immutable
gpio: pmic-eic-sprd: Make the irqchip immutable
gpio: eic-sprd: Make the irqchip immutable
gpio: pca953x: avoid to use uninitialized value pinctrl
gpiolib: Fix using uninitialized lookup-flags on ACPI platforms
Linus Torvalds [Thu, 5 Jan 2023 19:24:33 +0000 (11:24 -0800)]
Merge tag 'fbdev-for-6.2-rc3' of git://git./linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
- Fix Matrox G200eW initialization failure
- Fix build failure of offb driver when built as module
- Optimize stack usage in omapfb
* tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: omapfb: avoid stack overflow warning
fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
fbdev: atyfb: use strscpy() to instead of strncpy()
fbdev: omapfb: use strscpy() to instead of strncpy()
fbdev: make offb driver tristate
Paolo Abeni [Thu, 5 Jan 2023 11:12:21 +0000 (12:12 +0100)]
Merge branch 'add-support-for-qsgmii-mode-for-j721e-cpsw9g-to-am65-cpsw-driver'
Siddharth Vadapalli says:
====================
Add support for QSGMII mode for J721e CPSW9G to am65-cpsw driver
Add compatible to am65-cpsw driver for J721e CPSW9G, which contains 8
external ports and 1 internal host port.
Add support to power on and power off the SERDES PHY which is used by the
CPSW MAC.
=========
Changelog
=========
v5:
https://lore.kernel.org/r/
20221109042203.375042-1-s-vadapalli@ti.com/
v4:
https://lore.kernel.org/r/
20221108080606.124596-1-s-vadapalli@ti.com/
v3:
https://lore.kernel.org/r/
20221026090957.180592-1-s-vadapalli@ti.com/
v2:
https://lore.kernel.org/r/
20221018085810.151327-1-s-vadapalli@ti.com/
v1:
https://lore.kernel.org/r/
20220914095053.189851-1-s-vadapalli@ti.com/
====================
Link: https://lore.kernel.org/r/20230104103432.1126403-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Siddharth Vadapalli [Wed, 4 Jan 2023 10:34:32 +0000 (16:04 +0530)]
net: ethernet: ti: am65-cpsw: Add support for SERDES configuration
Use PHY framework APIs to initialize the SERDES PHY connected to CPSW MAC.
Define the functions am65_cpsw_disable_phy(), am65_cpsw_enable_phy(),
am65_cpsw_disable_serdes_phy() and am65_cpsw_enable_serdes_phy().
Add new member "serdes_phy" to struct "am65_cpsw_slave_data" to store the
SERDES PHY for each port, if it exists. Use it later while disabling the
SERDES PHY for each port.
Power on and initialize the SerDes PHY in am65_cpsw_nuss_init_slave_ports()
by invoking am65_cpsw_enable_serdes_phy().
Power off the SerDes PHY in am65_cpsw_nuss_remove() by invoking
am65_cpsw_disable_serdes_phy().
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Siddharth Vadapalli [Wed, 4 Jan 2023 10:34:31 +0000 (16:04 +0530)]
net: ethernet: ti: am65-cpsw: Enable QSGMII mode for J721e CPSW9G
CPSW9G in J721e supports additional modes like QSGMII.
Add new compatible for J721e in am65-cpsw driver.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Siddharth Vadapalli [Wed, 4 Jan 2023 10:34:30 +0000 (16:04 +0530)]
dt-bindings: net: ti: k3-am654-cpsw-nuss: Add J721e CPSW9G support
Update bindings for TI K3 J721e SoC which contains 9 ports (8 external
ports) CPSW9G module and add compatible for it.
Changes made:
- Add new compatible ti,j721e-cpswxg-nuss for CPSW9G.
- Extend pattern properties for new compatible.
- Change maximum number of CPSW ports to 8 for new compatible.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arnd Bergmann [Thu, 15 Dec 2022 17:02:28 +0000 (18:02 +0100)]
fbdev: omapfb: avoid stack overflow warning
The dsi_irq_stats structure is a little too big to fit on the
stack of a 32-bit task, depending on the specific gcc options:
fbdev/omap2/omapfb/dss/dsi.c: In function 'dsi_dump_dsidev_irqs':
fbdev/omap2/omapfb/dss/dsi.c:1621:1: error: the frame size of 1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Since this is only a debugfs file, performance is not critical,
so just dynamically allocate it, and print an error message
in there in place of a failure code when the allocation fails.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Zhengchao Shao [Wed, 4 Jan 2023 06:51:46 +0000 (14:51 +0800)]
caif: fix memory leak in cfctrl_linkup_request()
When linktype is unknown or kzalloc failed in cfctrl_linkup_request(),
pkt is not released. Add release process to error path.
Fixes:
b482cd2053e3 ("net-caif: add CAIF core protocol stack")
Fixes:
8d545c8f958f ("caif: Disconnect without waiting for response")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 3 Jan 2023 19:27:36 +0000 (19:27 +0000)]
inet: control sockets should not use current thread task_frag
Because ICMP handlers run from softirq contexts,
they must not use current thread task_frag.
Previously, all sockets allocated by inet_ctl_sock_create()
would use the per-socket page fragment, with no chance of
recursion.
Fixes:
98123866fcf3 ("Treewide: Stop corrupting socket's task_frag")
Reported-by: syzbot+bebc6f1acdf4cbb79b03@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20230103192736.454149-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 3 Jan 2023 11:19:17 +0000 (12:19 +0100)]
net/ulp: prevent ULP without clone op from entering the LISTEN status
When an ULP-enabled socket enters the LISTEN status, the listener ULP data
pointer is copied inside the child/accepted sockets by sk_clone_lock().
The relevant ULP can take care of de-duplicating the context pointer via
the clone() operation, but only MPTCP and SMC implement such op.
Other ULPs may end-up with a double-free at socket disposal time.
We can't simply clear the ULP data at clone time, as TLS replaces the
socket ops with custom ones assuming a valid TLS ULP context is
available.
Instead completely prevent clone-less ULP sockets from entering the
LISTEN status.
Fixes:
734942cc4ea6 ("tcp: ULP infrastructure")
Reported-by: slipper <slipper.alive@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.1672740602.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Caleb Sander [Tue, 3 Jan 2023 23:30:21 +0000 (16:30 -0700)]
qed: allow sleep in qed_mcp_trace_dump()
By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop
that can run 500K times, so calls to qed_mcp_nvm_rd_cmd()
may block the current thread for over 5s.
We observed thread scheduling delays over 700ms in production,
with stacktraces pointing to this code as the culprit.
qed_mcp_trace_dump() is called from ethtool, so sleeping is permitted.
It already can sleep in qed_mcp_halt(), which calls qed_mcp_cmd().
Add a "can sleep" parameter to qed_find_nvram_image() and
qed_nvram_read() so they can sleep during qed_mcp_trace_dump().
qed_mcp_trace_get_meta_info() and qed_mcp_trace_read_meta(),
called only by qed_mcp_trace_dump(), allow these functions to sleep.
I can't tell if the other caller (qed_grc_dump_mcp_hw_dump()) can sleep,
so keep b_can_sleep set to false when it calls these functions.
An example stacktrace from a custom warning we added to the kernel
showing a thread that has not scheduled despite long needing resched:
[ 2745.362925,17] ------------[ cut here ]------------
[ 2745.362941,17] WARNING: CPU: 23 PID: 5640 at arch/x86/kernel/irq.c:233 do_IRQ+0x15e/0x1a0()
[ 2745.362946,17] Thread not rescheduled for 744 ms after irq 99
[ 2745.362956,17] Modules linked in: ...
[ 2745.363339,17] CPU: 23 PID: 5640 Comm: lldpd Tainted: P O 4.4.182+ #
202104120910+
6d1da174272d.61x
[ 2745.363343,17] Hardware name: FOXCONN MercuryB/Quicksilver Controller, BIOS H11P1N09 07/08/2020
[ 2745.363346,17]
0000000000000000 ffff885ec07c3ed8 ffffffff8131eb2f ffff885ec07c3f20
[ 2745.363358,17]
ffffffff81d14f64 ffff885ec07c3f10 ffffffff81072ac2 ffff88be98ed0000
[ 2745.363369,17]
0000000000000063 0000000000000174 0000000000000074 0000000000000000
[ 2745.363379,17] Call Trace:
[ 2745.363382,17] <IRQ> [<
ffffffff8131eb2f>] dump_stack+0x8e/0xcf
[ 2745.363393,17] [<
ffffffff81072ac2>] warn_slowpath_common+0x82/0xc0
[ 2745.363398,17] [<
ffffffff81072b4c>] warn_slowpath_fmt+0x4c/0x50
[ 2745.363404,17] [<
ffffffff810d5a8e>] ? rcu_irq_exit+0xae/0xc0
[ 2745.363408,17] [<
ffffffff817c99fe>] do_IRQ+0x15e/0x1a0
[ 2745.363413,17] [<
ffffffff817c7ac9>] common_interrupt+0x89/0x89
[ 2745.363416,17] <EOI> [<
ffffffff8132aa74>] ? delay_tsc+0x24/0x50
[ 2745.363425,17] [<
ffffffff8132aa04>] __udelay+0x34/0x40
[ 2745.363457,17] [<
ffffffffa04d45ff>] qed_mcp_cmd_and_union+0x36f/0x7d0 [qed]
[ 2745.363473,17] [<
ffffffffa04d5ced>] qed_mcp_nvm_rd_cmd+0x4d/0x90 [qed]
[ 2745.363490,17] [<
ffffffffa04e1dc7>] qed_mcp_trace_dump+0x4a7/0x630 [qed]
[ 2745.363504,17] [<
ffffffffa04e2556>] ? qed_fw_asserts_dump+0x1d6/0x1f0 [qed]
[ 2745.363520,17] [<
ffffffffa04e4ea7>] qed_dbg_mcp_trace_get_dump_buf_size+0x37/0x80 [qed]
[ 2745.363536,17] [<
ffffffffa04ea881>] qed_dbg_feature_size+0x61/0xa0 [qed]
[ 2745.363551,17] [<
ffffffffa04eb427>] qed_dbg_all_data_size+0x247/0x260 [qed]
[ 2745.363560,17] [<
ffffffffa0482c10>] qede_get_regs_len+0x30/0x40 [qede]
[ 2745.363566,17] [<
ffffffff816c9783>] ethtool_get_drvinfo+0xe3/0x190
[ 2745.363570,17] [<
ffffffff816cc152>] dev_ethtool+0x1362/0x2140
[ 2745.363575,17] [<
ffffffff8109bcc6>] ? finish_task_switch+0x76/0x260
[ 2745.363580,17] [<
ffffffff817c2116>] ? __schedule+0x3c6/0x9d0
[ 2745.363585,17] [<
ffffffff810dbd50>] ? hrtimer_start_range_ns+0x1d0/0x370
[ 2745.363589,17] [<
ffffffff816c1e5b>] ? dev_get_by_name_rcu+0x6b/0x90
[ 2745.363594,17] [<
ffffffff816de6a8>] dev_ioctl+0xe8/0x710
[ 2745.363599,17] [<
ffffffff816a58a8>] sock_do_ioctl+0x48/0x60
[ 2745.363603,17] [<
ffffffff816a5d87>] sock_ioctl+0x1c7/0x280
[ 2745.363608,17] [<
ffffffff8111f393>] ? seccomp_phase1+0x83/0x220
[ 2745.363612,17] [<
ffffffff811e3503>] do_vfs_ioctl+0x2b3/0x4e0
[ 2745.363616,17] [<
ffffffff811e3771>] SyS_ioctl+0x41/0x70
[ 2745.363619,17] [<
ffffffff817c6ffe>] entry_SYSCALL_64_fastpath+0x1e/0x79
[ 2745.363622,17] ---[ end trace
f6954aa440266421 ]---
Fixes:
c965db4446291 ("qed: Add support for debug data collection")
Signed-off-by: Caleb Sander <csander@purestorage.com>
Acked-by: Alok Prasad <palok@marvell.com>
Link: https://lore.kernel.org/r/20230103233021.1457646-1-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:21:25 +0000 (20:21 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next 2023-01-04
We've added 45 non-merge commits during the last 21 day(s) which contain
a total of 50 files changed, 1454 insertions(+), 375 deletions(-).
The main changes are:
1) Fixes, improvements and refactoring of parts of BPF verifier's
state equivalence checks, from Andrii Nakryiko.
2) Fix a few corner cases in libbpf's BTF-to-C converter in particular
around padding handling and enums, also from Andrii Nakryiko.
3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better
support decap on GRE tunnel devices not operating in collect metadata,
from Christian Ehrig.
4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks,
from Dave Marchevsky.
5) Remove the need for trace_printk_lock for bpf_trace_printk
and bpf_trace_vprintk helpers, from Jiri Olsa.
6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps,
from Maryam Tahhan.
7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du.
8) Bigger batch of improvements to BPF tracing code samples,
from Daniel T. Lee.
9) Add LoongArch support to libbpf's bpf_tracing helper header,
from Hengqi Chen.
10) Fix a libbpf compiler warning in perf_event_open_probe on arm32,
from Khem Raj.
11) Optimize bpf_local_storage_elem by removing 56 bytes of padding,
from Martin KaFai Lau.
12) Use pkg-config to locate libelf for resolve_btfids build,
from Shen Jiamin.
13) Various libbpf improvements around API documentation and errno
handling, from Xin Liu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
libbpf: Return -ENODATA for missing btf section
libbpf: Add LoongArch support to bpf_tracing.h
libbpf: Restore errno after pr_warn.
libbpf: Added the description of some API functions
libbpf: Fix invalid return address register in s390
samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs
samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro
samples/bpf: Change _kern suffix to .bpf with syscall tracing program
samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program
samples/bpf: Use kyscall instead of kprobe in syscall tracing program
bpf: rename list_head -> graph_root in field info types
libbpf: fix errno is overwritten after being closed.
bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
bpf: perform byte-by-byte comparison only when necessary in regsafe()
bpf: reject non-exact register type matches in regsafe()
bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
bpf: reorganize struct bpf_reg_state fields
bpf: teach refsafe() to take into account ID remapping
bpf: Remove unused field initialization in bpf's ctl_table
selftests/bpf: Add jit probe_mem corner case tests to s390x denylist
...
====================
Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jan 2023 04:17:19 +0000 (20:17 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
bpf 2023-01-04
We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 5 files changed, 112 insertions(+), 18 deletions(-).
The main changes are:
1) Always use maximal size for copy_array in the verifier to fix
KASAN tracking, from Kees.
2) Fix bpf task iterator walking through dead tasks, from Kui-Feng.
3) Make sure livepatch and bpf fexit can coexist, from Chuang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Always use maximal size for copy_array()
selftests/bpf: add a test for iter/task_vma for short-lived processes
bpf: keep a reference to the mm, in case the task is dead.
selftests/bpf: Temporarily disable part of btf_dump:var_data test.
bpf: Fix panic due to wrong pageattr of im->image
====================
Link: https://lore.kernel.org/r/20230104215500.79435-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 5 Jan 2023 01:13:53 +0000 (17:13 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Mostly fixes all over the place, a couple of cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (32 commits)
virtio_blk: Fix signedness bug in virtblk_prep_rq()
vdpa_sim_net: should not drop the multicast/broadcast packet
vdpasim: fix memory leak when freeing IOTLBs
vdpa: conditionally fill max max queue pair for stats
vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_remove
vduse: Validate vq_num in vduse_validate_config()
tools/virtio: remove smp_read_barrier_depends()
tools/virtio: remove stray characters
vhost_vdpa: fix the crash in unmap a large memory
virtio: Implementing attribute show with sysfs_emit
virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session()
tools/virtio: Variable type completion
vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
virtio_blk: use UINT_MAX instead of -1U
vhost-vdpa: fix an iotlb memory leak
vhost: fix range used in translate_desc()
vringh: fix range used in iotlb_translate()
vhost/vsock: Fix error handling in vhost_vsock_init()
vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
tools: Delete the unneeded semicolon after curly braces
...
Linus Torvalds [Wed, 4 Jan 2023 20:11:29 +0000 (12:11 -0800)]
Merge tag 'x86-urgent-2023-01-04' of git://git./linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
"Fix a double-free bug, a binutils warning, a header namespace clash
and a bug in ib_prctl_set()"
* tag 'x86-urgent-2023-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Flush IBP in ib_prctl_set()
x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type
x86/asm: Fix an assembler warning with current binutils
x86/kexec: Fix double-free of elf header buffer
Linus Torvalds [Wed, 4 Jan 2023 20:02:26 +0000 (12:02 -0800)]
Merge tag 'f2fs-fix-6.2-rc3' of git://git./linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
- fix a null pointer dereference in f2fs_issue_flush, which occurs by
the combination of mount/remount options.
- fix a bug in per-block age-based extent_cache newly introduced in
6.2-rc1, which reported a wrong age information in extent_cache.
- fix a kernel panic if extent_tree was not created, which was caught
by a wrong BUG_ON
* tag 'f2fs-fix-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: let's avoid panic if extent_tree is not created
f2fs: should use a temp extent_info for lookup
f2fs: don't mix to use union values in extent_info
f2fs: initialize extent_cache parameter
f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()
Linus Torvalds [Wed, 4 Jan 2023 19:26:36 +0000 (11:26 -0800)]
Merge tag 'nfsd-6.2-2' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a filecache UAF during NFSD shutdown
- Avoid exposing automounted mounts on NFS re-exports
* tag 'nfsd-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix handling of readdir in v4root vs. mount upcall timeout
nfsd: shut down the NFSv4 state objects before the filecache
Rodrigo Branco [Tue, 3 Jan 2023 20:17:51 +0000 (14:17 -0600)]
x86/bugs: Flush IBP in ib_prctl_set()
We missed the window between the TIF flag update and the next reschedule.
Signed-off-by: Rodrigo Branco <bsdaemon@google.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
David S. Miller [Wed, 4 Jan 2023 08:57:24 +0000 (08:57 +0000)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-01-03 (igc)
Muhammad Husaini Zulkifli says:
Improvements to the Time-Sensitive Networking (TSN) Qbv Scheduling
capabilities were included in this patch series for I226 SKU.
An overview of each patch series is given below:
Patch 1: To enable basetime scheduling in the future, remove the existing
restriction for i226 stepping while maintain the restriction for i225.
Patch 2: Remove the restriction which require a controller reset when
setting the basetime register for new i226 steps and enable the second
GCL configuration.
Patch 3: Remove the power reset adapter during disabling the tsn config.
---
Patches remaining from initial PR:
https://lore.kernel.org/netdev/
20221205212414.
3197525-1-anthony.l.nguyen@intel.com/
after sending net patches:
https://lore.kernel.org/netdev/
20221215230758.
3595578-1-anthony.l.nguyen@intel.com/
Note: patch 3 is an additional patch from the initial PR.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>