OSDN Git Service

tomoyo/tomoyo-test1.git
4 years agomt76: add missing locking around ampdu action
Felix Fietkau [Mon, 7 Oct 2019 10:32:14 +0000 (12:32 +0200)]
mt76: add missing locking around ampdu action

This is needed primarily to avoid races in dealing with rx aggregation
related data structures

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: do not use devm API for led classdev
Felix Fietkau [Sun, 29 Sep 2019 20:04:37 +0000 (22:04 +0200)]
mt76: do not use devm API for led classdev

With the devm API, the unregister happens after the device cleanup is done,
after which the struct mt76_dev which contains the led_cdev has already been
freed. This leads to a use-after-free bug that can crash the system.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: enable airtime fairness
Felix Fietkau [Fri, 6 Sep 2019 09:29:09 +0000 (11:29 +0200)]
mt76: enable airtime fairness

It is supported by all hardware drivers now

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
4 years agomt76: mt7615: track tx/rx airtime for airtime fairness
Lorenzo Bianconi [Wed, 18 Sep 2019 10:58:05 +0000 (12:58 +0200)]
mt76: mt7615: track tx/rx airtime for airtime fairness

Poll per-station hardware counters available in WTBL after tx/rx
status events in order to report tx/rx airtime to mac80211 layer

Co-developed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7615: introduce mt7615_mac_wtbl_update routine
Lorenzo Bianconi [Wed, 18 Sep 2019 10:58:03 +0000 (12:58 +0200)]
mt76: mt7615: introduce mt7615_mac_wtbl_update routine

Introduce mt7615_mac_wtbl_update utility routine in order to update
WTBL update register

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7615: fix survey channel busy time
Felix Fietkau [Thu, 26 Sep 2019 16:23:56 +0000 (18:23 +0200)]
mt76: mt7615: fix survey channel busy time

Like on mt7603, MIB status register 16 tracks CCA time, but does not
include tx time. Switch to status register 9 to includ NAV and tx
time as well.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7615: report tx_time, bss_rx and busy time to mac80211
Lorenzo Bianconi [Wed, 18 Sep 2019 10:58:02 +0000 (12:58 +0200)]
mt76: mt7615: report tx_time, bss_rx and busy time to mac80211

Report tx time/rx time and obss time from hw mib counters to fill survey
info requested by mac80211

Co-developed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt76x02: track approximate tx airtime for airtime fairness and survey
Felix Fietkau [Fri, 6 Sep 2019 08:19:19 +0000 (10:19 +0200)]
mt76: mt76x02: track approximate tx airtime for airtime fairness and survey

Estimate by calculating duration for EWMA packet size + estimated A-MPDU
length on tx status events

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt76x02: move MT_CH_TIME_CFG init to mt76x02_mac_cc_reset
Felix Fietkau [Thu, 5 Sep 2019 17:30:21 +0000 (19:30 +0200)]
mt76: mt76x02: move MT_CH_TIME_CFG init to mt76x02_mac_cc_reset

Reduces code duplication and adds missing bits for USB variants

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: unify channel survey update code
Felix Fietkau [Thu, 5 Sep 2019 16:29:13 +0000 (18:29 +0200)]
mt76: unify channel survey update code

Host time is used to calculate the channel active time on mt7603 and mt7615.
Use the same on mt76x02 and move the lock to core code to get rid of some
duplicated code.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7603: switch to a different counter for survey busy time
Felix Fietkau [Wed, 4 Sep 2019 19:12:10 +0000 (21:12 +0200)]
mt76: mt7603: switch to a different counter for survey busy time

MT_MIB_STAT_PSCCA only counts rx CCA busy time, which does not include
tx time. MT_MIB_STAT_CCA counts full busy time, including Rx, Tx and NAV

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7603: track tx airtime for airtime fairness and survey
Felix Fietkau [Wed, 4 Sep 2019 18:50:14 +0000 (20:50 +0200)]
mt76: mt7603: track tx airtime for airtime fairness and survey

Poll per-station hardware counters after tx status events

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: track rx airtime for airtime fairness and survey
Felix Fietkau [Wed, 4 Sep 2019 15:45:02 +0000 (17:45 +0200)]
mt76: track rx airtime for airtime fairness and survey

Report total rx airtime for valid stations as BSS rx time in survey

mt7615 is left out for now, it will be supported later by reading
hardware counters instead of calculating airtime in software

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: store current channel survey_state in struct mt76_dev
Felix Fietkau [Thu, 5 Sep 2019 14:58:08 +0000 (16:58 +0200)]
mt76: store current channel survey_state in struct mt76_dev

Move mt76_channel_state() from mt76.h to mac80211.c
Preparation for updating channel state from more places in the drivers/core

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: rename mt76_driver_ops txwi_flags to drv_flags and include tx aligned4
Felix Fietkau [Sat, 31 Aug 2019 10:41:59 +0000 (12:41 +0200)]
mt76: rename mt76_driver_ops txwi_flags to drv_flags and include tx aligned4

This reduces the struct size and is useful for adding more flags later

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: report rx a-mpdu subframe status
Felix Fietkau [Wed, 28 Aug 2019 09:28:36 +0000 (11:28 +0200)]
mt76: report rx a-mpdu subframe status

This can be used in monitor mode to figure out which subframes were sent as
part of which A-MPDU

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7603: remove q_rx field from struct mt7603_dev
Felix Fietkau [Wed, 28 Aug 2019 09:07:39 +0000 (11:07 +0200)]
mt76: mt7603: remove q_rx field from struct mt7603_dev

It is no longer used

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7603: collect aggregation stats
Lorenzo Bianconi [Fri, 13 Sep 2019 07:05:54 +0000 (09:05 +0200)]
mt76: mt7603: collect aggregation stats

Introduce ampdu_stat entry in mt7603 debugfs in order to dump 802.11
aggr cumulative statistics

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7615: collect aggregation stats
Lorenzo Bianconi [Fri, 13 Sep 2019 07:05:53 +0000 (09:05 +0200)]
mt76: mt7615: collect aggregation stats

Introduce ampdu_stat entry in mt7615 debugfs in order to dump 802.11
aggr cumulative statistics

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: move aggr_stats array in mt76_dev
Lorenzo Bianconi [Fri, 13 Sep 2019 07:05:52 +0000 (09:05 +0200)]
mt76: move aggr_stats array in mt76_dev

Move aggr_stats array from mt76x02_dev to mt76_dev in order to be reused
adding aggregation stats for mt7603/mt7615 drivers

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7615: add queue entry in debugfs
Lorenzo Bianconi [Fri, 13 Sep 2019 07:05:51 +0000 (09:05 +0200)]
mt76: mt7615: add queue entry in debugfs

Introduce mt7615_queues_read routine to dump hw queue related info.
Add hw ac queues statistics in mt7615 debugfs

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: move queue debugfs entry to driver specific code
Lorenzo Bianconi [Fri, 13 Sep 2019 07:05:50 +0000 (09:05 +0200)]
mt76: move queue debugfs entry to driver specific code

Move queue debugfs entry to driver specific code since mt7615 devices
rely on a different queue layout

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt76x02u: move mt76x02u_mac_start in mt76x02-usb module
Lorenzo Bianconi [Thu, 12 Sep 2019 09:06:38 +0000 (11:06 +0200)]
mt76: mt76x02u: move mt76x02u_mac_start in mt76x02-usb module

Unify mt76x02u_mac_start between mt76x2u and mt76x0u since the
code is shared between both drivers

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt76x0u: reset counter starting the device
Lorenzo Bianconi [Thu, 12 Sep 2019 09:06:37 +0000 (11:06 +0200)]
mt76: mt76x0u: reset counter starting the device

Remove mt76x02_mac_reset_counters from mt76x0_init_hardware since
it will be run starting the device

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt76x2: move mt76x02_mac_reset_counters in mt76x02_mac_start
Lorenzo Bianconi [Thu, 12 Sep 2019 09:06:36 +0000 (11:06 +0200)]
mt76: mt76x2: move mt76x02_mac_reset_counters in mt76x02_mac_start

Move mt76x02_mac_reset_counters in mt76x02_mac_start and get rid of
mt76x2_mac_start since it is just a wrapper for mt76x02_mac_start

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt76x02: move mac_reset_counter in mt76x02_lib module
Lorenzo Bianconi [Thu, 12 Sep 2019 09:06:35 +0000 (11:06 +0200)]
mt76: mt76x02: move mac_reset_counter in mt76x02_lib module

Unify mac_reset_counter routine and move it in mt76x02_lib module
since it is shared by all mt76x02 drivers (pci/usb)

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7615: enable SCS by default
Lorenzo Bianconi [Fri, 6 Sep 2019 15:29:04 +0000 (17:29 +0200)]
mt76: mt7615: enable SCS by default

Enable Smart Carrier Sense algorithm by default in order to improve
performances in a noisy environment

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt76x0e: make array mt76x0_chan_map static const, makes object smaller
Colin Ian King [Fri, 6 Sep 2019 12:19:26 +0000 (13:19 +0100)]
mt76: mt76x0e: make array mt76x0_chan_map static const, makes object smaller

Don't populate the array mt76x0_chan_map on the stack but instead make it
static const. Makes the object code smaller by 80 bytes.

Before:
   text    data     bss     dec     hex filename
   7685    1192       0    8877    22ad mediatek/mt76/mt76x0/eeprom.o

After:
   text    data     bss     dec     hex filename
   7541    1256       0    8797    225d mediatek/mt76/mt76x0/eeprom.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: usb: add lockdep_assert_held in __mt76u_vendor_request
Lorenzo Bianconi [Thu, 5 Sep 2019 16:32:58 +0000 (18:32 +0200)]
mt76: usb: add lockdep_assert_held in __mt76u_vendor_request

Introduce lockdep_assert_held macro in __mt76u_vendor_request routine
and remove comments regarding usb_ctrl_mtx lock

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: remove empty flag in mt76_txq_schedule_list
Lorenzo Bianconi [Thu, 22 Aug 2019 09:49:10 +0000 (11:49 +0200)]
mt76: remove empty flag in mt76_txq_schedule_list

Remove empty flag in mt76_txq_schedule_list and mt76_txq_send_burst
since we just need retry_q length to notify mac80211 to reschedule the
current tx queue

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: use cancel_delayed_work_sync in mt76_rx_aggr_shutdown
Felix Fietkau [Sun, 15 Sep 2019 16:43:59 +0000 (18:43 +0200)]
mt76: use cancel_delayed_work_sync in mt76_rx_aggr_shutdown

The workqueue item needs to be fully shut down before the struct can be
freed.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: remove aggr_work field from struct mt76_wcid
Felix Fietkau [Sun, 15 Sep 2019 16:43:14 +0000 (18:43 +0200)]
mt76: remove aggr_work field from struct mt76_wcid

It is unused

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agomt76: mt7615: fix control frame rx in monitor mode
Felix Fietkau [Thu, 12 Sep 2019 14:42:53 +0000 (16:42 +0200)]
mt76: mt7615: fix control frame rx in monitor mode

Adjust filters and ensure frames don't get sent to MCU instead of host

Signed-off-by: Felix Fietkau <nbd@nbd.name>
4 years agortl8xxxu: Remove set but not used variable 'vif','dev','len'
zhengbin [Tue, 19 Nov 2019 02:25:14 +0000 (10:25 +0800)]
rtl8xxxu: Remove set but not used variable 'vif','dev','len'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c: In function rtl8xxxu_c2hcmd_callback:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:5396:24: warning: variable vif set but not used [-Wunused-but-set-variable]
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c: In function rtl8xxxu_c2hcmd_callback:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:5397:17: warning: variable dev set but not used [-Wunused-but-set-variable]
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c: In function rtl8xxxu_c2hcmd_callback:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:5400:6: warning: variable len set but not used [-Wunused-but-set-variable]

They are introduced by commit e542e66b7c2e ("rtl8xxxu:
add bluetooth co-existence support for single antenna"), but never used,
so remove them.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agobrcmfmac: remove monitor interface when detaching
Rafał Miłecki [Mon, 18 Nov 2019 12:38:55 +0000 (13:38 +0100)]
brcmfmac: remove monitor interface when detaching

This fixes a minor WARNING in the cfg80211:
[  130.658034] ------------[ cut here ]------------
[  130.662805] WARNING: CPU: 1 PID: 610 at net/wireless/core.c:954 wiphy_unregister+0xb4/0x198 [cfg80211]

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agobrcmfmac: disable PCIe interrupts before bus reset
Rafał Miłecki [Mon, 18 Nov 2019 11:53:08 +0000 (12:53 +0100)]
brcmfmac: disable PCIe interrupts before bus reset

Keeping interrupts on could result in brcmfmac freeing some resources
and then IRQ handlers trying to use them. That was obviously a straight
path for crashing a kernel.

Example:
CPU0                           CPU1
----                           ----
brcmf_pcie_reset
  brcmf_pcie_bus_console_read
  brcmf_detach
    ...
    brcmf_fweh_detach
    brcmf_proto_detach
                               brcmf_pcie_isr_thread
                                 ...
                                 brcmf_proto_msgbuf_rx_trigger
                                   ...
                                   drvr->proto->pd
    brcmf_pcie_release_irq

[  363.789218] Unable to handle kernel NULL pointer dereference at virtual address 00000038
[  363.797339] pgd = c0004000
[  363.800050] [00000038] *pgd=00000000
[  363.803635] Internal error: Oops: 17 [#1] SMP ARM
(...)
[  364.029209] Backtrace:
[  364.031725] [<bf243838>] (brcmf_proto_msgbuf_rx_trigger [brcmfmac]) from [<bf2471dc>] (brcmf_pcie_isr_thread+0x228/0x274 [brcmfmac])
[  364.043662]  r7:00000001 r6:c8ca0000 r5:00010000 r4:c7b4f800

Fixes: 4684997d9eea ("brcmfmac: reset PCIe bus on a firmware crash")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agortw88: allows to enable/disable HCI link PS mechanism
Yan-Hsuan Chuang [Mon, 18 Nov 2019 09:54:32 +0000 (17:54 +0800)]
rtw88: allows to enable/disable HCI link PS mechanism

Different interfaces have its own link-related power save mechanism.
Such as PCI can enter L1 state based on the traffic on the link, and
sometimes driver needs to enable/disable it to avoid some issues, like
throughput degrade when PCI trying to enter L1 state even if driver is
having heavy traffic.

For now, rtw88 only supports PCIE chips, and they just need to disable
ASPM L1 when driver is not in power save mode, such as IPS and LPS.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agortw88: pci: enable CLKREQ function if host supports it
Yan-Hsuan Chuang [Mon, 18 Nov 2019 09:54:31 +0000 (17:54 +0800)]
rtw88: pci: enable CLKREQ function if host supports it

By Realtek's design, there are two HW modules associated for CLKREQ,
one is responsible to follow the PCIE host settings, and another
is to actually working on it. But the module that is actually working
on it is default disabled, and driver should enable that module if
host and device have successfully sync'ed with each other.

The module is default disabled because sometimes the host does not
support it, and if there is any incorrect settings (ex. CLKREQ# is
not Bi-Direction), device can be lost and disconnected to the host.

So driver should first check after host and device are sync'ed, and
the host does support the function and set it in configuration
space, then driver can turn on the HW module to working on it.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agortw88: pci: use for loop instead of while loop for DBI/MDIO
Yan-Hsuan Chuang [Mon, 18 Nov 2019 09:54:30 +0000 (17:54 +0800)]
rtw88: pci: use for loop instead of while loop for DBI/MDIO

Use a for loop to polling DBI/MDIO read/write flags to avoid
infinite loop happens

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agortw88: pci: use macros to access PCI DBI/MDIO registers
Yan-Hsuan Chuang [Mon, 18 Nov 2019 09:54:29 +0000 (17:54 +0800)]
rtw88: pci: use macros to access PCI DBI/MDIO registers

Add some register and bit macros to access DBI/MDIO register. This
should not change the logic.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoqtnfmac: process HE capabilities requests
Mikhail Karpenko [Mon, 18 Nov 2019 08:23:16 +0000 (08:23 +0000)]
qtnfmac: process HE capabilities requests

Pass HE interface type data requests between firmware and driver.

Signed-off-by: Mikhail Karpenko <mkarpenko@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoqtnfmac: add TLV for extension IEs
Mikhail Karpenko [Mon, 18 Nov 2019 08:23:14 +0000 (08:23 +0000)]
qtnfmac: add TLV for extension IEs

Extension information elements have additional field for ID. This
commit adds TLV for such elements and a structure for interface HE
capabilities communication with firmware.

Signed-off-by: Mikhail Karpenko <mkarpenko@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoqtnfmac: signal that all packets coming from device are already flooded
Igor Mitsyanko [Mon, 18 Nov 2019 08:23:12 +0000 (08:23 +0000)]
qtnfmac: signal that all packets coming from device are already flooded

Firmware floods all packets that need to be flooded (multicast, broadcast,
unknown unicast) as required. Tell kernel bridge subsystem it does not
need to flood packet itself by marking each incoming frame
with skb->offload_fwd_mark flag.

Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoqtnfmac: advertise netdev port parent ID
Igor Mitsyanko [Mon, 18 Nov 2019 08:23:10 +0000 (08:23 +0000)]
qtnfmac: advertise netdev port parent ID

Use MAC address of the first active radio as a unique device ID.

Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoqtnfmac: add interface ID to each packet
Igor Mitsyanko [Mon, 18 Nov 2019 08:23:08 +0000 (08:23 +0000)]
qtnfmac: add interface ID to each packet

Add interface ID information to the tail of each transmitted packet
so that firmware can know to which interface the packet belongs to.
This is only needed if device supports HW switch capability.

Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoqtnfmac: track broadcast domain of each interface
Igor Mitsyanko [Mon, 18 Nov 2019 08:23:06 +0000 (08:23 +0000)]
qtnfmac: track broadcast domain of each interface

If firmware reports that it supports hardware switch capabilities,
driver needs to track and notify device whenever broadcast domain
of a particular network device changes (ie. whenever it's upper
master device changes).

Firmware needs a unique ID to tell broadcast domains from each other
which is an opaque number otherwise. For that purpose we can use
netspace:ifidx pair to uniquely identify each broadcast domain:
 - if netdev is not part of a bridge, then use it's own ifidx
   as a broadcast domain ID
 - if netdev is part of a bridge, then use bridge netdev ifidx
   as broadcast domain ID

Firmware makes sure that packets are only forwarded between
interfaces marked with the same broadcast domain ID.

Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoqtnfmac: remove VIF in firmware in case of error
Igor Mitsyanko [Mon, 18 Nov 2019 08:23:04 +0000 (08:23 +0000)]
qtnfmac: remove VIF in firmware in case of error

Currently in case of error when registering network device with the
kernel, we won't properly cleanup VIF state in firmware due to DEL_VIF
command will not be send to wifi card. Make sure it does.

Signed-off-by: Igor Mitsyanko <igor.mitsyanko.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agortlwifi: set proper udelay within rf_serial_read
Ping-Ke Shih [Mon, 18 Nov 2019 03:14:55 +0000 (11:14 +0800)]
rtlwifi: set proper udelay within rf_serial_read

Since read RF register is an indirect access that hardware needs time to
accomplish read action, but there's no ready bit, so delay is required to
guarantee the read value is correct. After investigating internal documents,
these delays are reduced as proper values.

Reported-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agortlwifi: rf_lock use non-irqsave spin_lock
Ping-Ke Shih [Mon, 18 Nov 2019 03:14:54 +0000 (11:14 +0800)]
rtlwifi: rf_lock use non-irqsave spin_lock

rf_lock is used to protect RF register access, but they will not called
from interrupt context, so *_irqsave version isn't necessary. Then, these
delays don't affect IRQ services.

The old code holds spin_lock_irqsave() that will be detected a long delay
as below:

  kworker/-276     4d...    0us : _raw_spin_lock_irqsave
  kworker/-276     4d...    0us : rtl8723_phy_rf_serial_read <-rtl8723de_phy_set_rf_reg
  kworker/-276     4d...    1us : rtl8723_phy_query_bb_reg <-rtl8723_phy_rf_serial_read
  kworker/-276     4d...    3us : rtl8723_phy_set_bb_reg <-rtl8723_phy_rf_serial_read
  kworker/-276     4d...    4us : __const_udelay <-rtl8723_phy_rf_serial_read
  kworker/-276     4d...    4us!: delay_mwaitx <-rtl8723_phy_rf_serial_read
  kworker/-276     4d... 1004us : rtl8723_phy_set_bb_reg <-rtl8723_phy_rf_serial_read
  [...]

Reported-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoipw2x00: remove set but not used variable 'force_update'
zhengbin [Sat, 16 Nov 2019 07:41:23 +0000 (15:41 +0800)]
ipw2x00: remove set but not used variable 'force_update'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/wireless/intel/ipw2x00/ipw2100.c: In function shim__set_security:
drivers/net/wireless/intel/ipw2x00/ipw2100.c:5582:9: warning: variable force_update set but not used [-Wunused-but-set-variable]

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoipw2x00: remove set but not used variable 'reason'
zhengbin [Sat, 16 Nov 2019 07:41:22 +0000 (15:41 +0800)]
ipw2x00: remove set but not used variable 'reason'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/wireless/intel/ipw2x00/ipw2200.c: In function ipw_wx_set_mlme:
drivers/net/wireless/intel/ipw2x00/ipw2200.c:6805:9: warning: variable reason set but not used [-Wunused-but-set-variable]

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agobrcmfmac: remove set but not used variable 'mpnum','nsp','nmp'
zhengbin [Sat, 16 Nov 2019 07:22:47 +0000 (15:22 +0800)]
brcmfmac: remove set but not used variable 'mpnum','nsp','nmp'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c: In function brcmf_chip_dmp_get_regaddr:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c:790:5: warning: variable mpnum set but not used [-Wunused-but-set-variable]
drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c: In function brcmf_chip_dmp_erom_scan:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c:866:10: warning: variable nsp set but not used [-Wunused-but-set-variable]
drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c: In function brcmf_chip_dmp_erom_scan:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c:866:5: warning: variable nmp set but not used [-Wunused-but-set-variable]

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
David S. Miller [Sun, 17 Nov 2019 02:47:31 +0000 (18:47 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Lots of overlapping changes and parallel additions, stuff
like that.

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Sun, 17 Nov 2019 02:14:32 +0000 (18:14 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This reverts a number of changes to the khwrng thread which feeds the
  kernel random number pool from hwrng drivers. They were trying to fix
  issues with suspend-and-resume but ended up causing regressions"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  Revert "hwrng: core - Freeze khwrng thread during suspend"

4 years agoRevert "hwrng: core - Freeze khwrng thread during suspend"
Herbert Xu [Sun, 17 Nov 2019 00:48:17 +0000 (08:48 +0800)]
Revert "hwrng: core - Freeze khwrng thread during suspend"

This reverts commit 03a3bb7ae631 ("hwrng: core - Freeze khwrng
thread during suspend"), ff296293b353 ("random: Support freezable
kthreads in add_hwgenerator_randomness()") and 59b569480dc8 ("random:
Use wait_event_freezable() in add_hwgenerator_randomness()").

These patches introduced regressions and we need more time to
get them ready for mainline.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
4 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Nov 2019 00:10:59 +0000 (16:10 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Two fixes: disable unreliable HPET on Intel Coffe Lake platforms, and
  fix a lockdep splat in the resctrl code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix potential lockdep warning
  x86/quirks: Disable HPET on Intel Coffe Lake platforms

4 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Nov 2019 00:08:46 +0000 (16:08 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix integer truncation bug in __do_adjtimex()"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ntp/y2038: Remove incorrect time_t truncation

4 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 16 Nov 2019 23:56:01 +0000 (15:56 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Misc fixes: a handful of AUX event handling related fixes, a Sparse
  fix and two ABI fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix missing static inline on perf_cgroup_switch()
  perf/core: Consistently fail fork on allocation failures
  perf/aux: Disallow aux_output for kernel events
  perf/core: Reattach a misplaced comment
  perf/aux: Fix the aux_output group inheritance fix
  perf/core: Disallow uncore-cgroup events

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Sat, 16 Nov 2019 23:52:00 +0000 (15:52 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Fix memory leak in xfrm_state code, from Steffen Klassert.

 2) Fix races between devlink reload operations and device
    setup/cleanup, from Jiri Pirko.

 3) Null deref in NFC code, from Stephan Gerhold.

 4) Refcount fixes in SMC, from Ursula Braun.

 5) Memory leak in slcan open error paths, from Jouni Hogander.

 6) Fix ETS bandwidth validation in hns3, from Yonglong Liu.

 7) Info leak on short USB request answers in ax88172a driver, from
    Oliver Neukum.

 8) Release mem region properly in ep93xx_eth, from Chuhong Yuan.

 9) PTP config timestamp flags validation, from Richard Cochran.

10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer.

11) Missing free_netdev() in gemini driver, from Chuhong Yuan.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
  ipmr: Fix skb headroom in ipmr_get_route().
  net: hns3: cleanup of stray struct hns3_link_mode_mapping
  net/smc: fix fastopen for non-blocking connect()
  rds: ib: update WR sizes when bringing up connection
  net: gemini: add missed free_netdev
  net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
  seg6: fix skb transport_header after decap_and_validate()
  seg6: fix srh pointer in get_srh()
  net: stmmac: Use the correct style for SPDX License Identifier
  octeontx2-af: Use the correct style for SPDX License Identifier
  ptp: Extend the test program to check the external time stamp flags.
  mlx5: Reject requests to enable time stamping on both edges.
  igb: Reject requests that fail to enable time stamping on both edges.
  dp83640: Reject requests to enable time stamping on both edges.
  mv88e6xxx: Reject requests to enable time stamping on both edges.
  ptp: Introduce strict checking of external time stamp options.
  renesas: reject unsupported external timestamp flags
  mlx5: reject unsupported external timestamp flags
  igb: reject unsupported external timestamp flags
  dp83640: reject unsupported external timestamp flags
  ...

4 years agomscc.c: fix semicolon.cocci warnings
kbuild test robot [Fri, 15 Nov 2019 22:38:34 +0000 (06:38 +0800)]
mscc.c: fix semicolon.cocci warnings

drivers/net/phy/mscc.c:1683:3-4: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: 75a1ccfe6c72 ("mscc.c: Add support for additional VSC PHYs")
CC: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: net: avoid ptl lock contention in tcp_mmap
Eric Dumazet [Sat, 16 Nov 2019 01:55:54 +0000 (17:55 -0800)]
selftests: net: avoid ptl lock contention in tcp_mmap

tcp_mmap is used as a reference program for TCP rx zerocopy,
so it is important to point out some potential issues.

If multiple threads are concurrently using getsockopt(...
TCP_ZEROCOPY_RECEIVE), there is a chance the low-level mm
functions compete on shared ptl lock, if vma are arbitrary placed.

Instead of letting the mm layer place the chunks back to back,
this patch enforces an alignment so that each thread uses
a different ptl lock.

Performance measured on a 100 Gbit NIC, with 8 tcp_mmap clients
launched at the same time :

$ for f in {1..8}; do ./tcp_mmap -H 2002:a05:6608:290:: & done

In the following run, we reproduce the old behavior by requesting no alignment :

$ tcp_mmap -sz -C $((128*1024)) -a 4096
received 32768 MB (100 % mmap'ed) in 9.69532 s, 28.3516 Gbit
  cpu usage user:0.08634 sys:3.86258, 120.511 usec per MB, 171839 c-switches
received 32768 MB (100 % mmap'ed) in 25.4719 s, 10.7914 Gbit
  cpu usage user:0.055268 sys:21.5633, 659.745 usec per MB, 9065 c-switches
received 32768 MB (100 % mmap'ed) in 28.5419 s, 9.63069 Gbit
  cpu usage user:0.057401 sys:23.8761, 730.392 usec per MB, 14987 c-switches
received 32768 MB (100 % mmap'ed) in 28.655 s, 9.59268 Gbit
  cpu usage user:0.059689 sys:23.8087, 728.406 usec per MB, 18509 c-switches
received 32768 MB (100 % mmap'ed) in 28.7808 s, 9.55074 Gbit
  cpu usage user:0.066042 sys:23.4632, 718.056 usec per MB, 24702 c-switches
received 32768 MB (100 % mmap'ed) in 28.8259 s, 9.5358 Gbit
  cpu usage user:0.056547 sys:23.6628, 723.858 usec per MB, 23518 c-switches
received 32768 MB (100 % mmap'ed) in 28.8808 s, 9.51767 Gbit
  cpu usage user:0.059357 sys:23.8515, 729.703 usec per MB, 14691 c-switches
received 32768 MB (100 % mmap'ed) in 28.8879 s, 9.51534 Gbit
  cpu usage user:0.047115 sys:23.7349, 725.769 usec per MB, 21773 c-switches

New behavior (automatic alignment based on Hugepagesize),
we can see the system overhead being dramatically reduced.

$ tcp_mmap -sz -C $((128*1024))
received 32768 MB (100 % mmap'ed) in 13.5339 s, 20.3103 Gbit
  cpu usage user:0.122644 sys:3.4125, 107.884 usec per MB, 168567 c-switches
received 32768 MB (100 % mmap'ed) in 16.0335 s, 17.1439 Gbit
  cpu usage user:0.132428 sys:3.55752, 112.608 usec per MB, 188557 c-switches
received 32768 MB (100 % mmap'ed) in 17.5506 s, 15.6621 Gbit
  cpu usage user:0.155405 sys:3.24889, 103.891 usec per MB, 226652 c-switches
received 32768 MB (100 % mmap'ed) in 19.1924 s, 14.3222 Gbit
  cpu usage user:0.135352 sys:3.35583, 106.542 usec per MB, 207404 c-switches
received 32768 MB (100 % mmap'ed) in 22.3649 s, 12.2906 Gbit
  cpu usage user:0.142429 sys:3.53187, 112.131 usec per MB, 250225 c-switches
received 32768 MB (100 % mmap'ed) in 22.5336 s, 12.1986 Gbit
  cpu usage user:0.140654 sys:3.61971, 114.757 usec per MB, 253754 c-switches
received 32768 MB (100 % mmap'ed) in 22.5483 s, 12.1906 Gbit
  cpu usage user:0.134035 sys:3.55952, 112.718 usec per MB, 252997 c-switches
received 32768 MB (100 % mmap'ed) in 22.6442 s, 12.139 Gbit
  cpu usage user:0.126173 sys:3.71251, 117.147 usec per MB, 253728 c-switches

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: load firmware for RTL8168fp/RTL8117
Heiner Kallweit [Fri, 15 Nov 2019 21:38:25 +0000 (22:38 +0100)]
r8169: load firmware for RTL8168fp/RTL8117

Load Realtek-provided firmware for RTL8168fp/RTL8117. Unlike the
firmware for other chip versions which is for the PHY, firmware for
RTL8168fp/RTL8117 is for the MAC.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: improve conditional firmware loading for RTL8168d
Heiner Kallweit [Fri, 15 Nov 2019 20:35:22 +0000 (21:35 +0100)]
r8169: improve conditional firmware loading for RTL8168d

Using constant MII_EXPANSION is misleading here because register 0x06
has a different meaning on page 0x0005. Here a proprietary PHY
parameter is read by writing the parameter id to register 0x05 on page
0x0005, followed by reading the parameter value from register 0x06.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: update to use phy_support_asym_pause()
Russell King [Fri, 15 Nov 2019 20:05:45 +0000 (20:05 +0000)]
net: phylink: update to use phy_support_asym_pause()

Use phy_support_asym_pause() rather than open-coding it.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipmr: Fix skb headroom in ipmr_get_route().
Guillaume Nault [Fri, 15 Nov 2019 17:29:52 +0000 (18:29 +0100)]
ipmr: Fix skb headroom in ipmr_get_route().

In route.c, inet_rtm_getroute_build_skb() creates an skb with no
headroom. This skb is then used by inet_rtm_getroute() which may pass
it to rt_fill_info() and, from there, to ipmr_get_route(). The later
might try to reuse this skb by cloning it and prepending an IPv4
header. But since the original skb has no headroom, skb_push() triggers
skb_under_panic():

skbuff: skb_under_panic: text:00000000ca46ad8a len:80 put:20 head:00000000cd28494e data:000000009366fd6b tail:0x3c end:0xec0 dev:veth0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:108!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 6 PID: 587 Comm: ip Not tainted 5.4.0-rc6+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
RIP: 0010:skb_panic+0xbf/0xd0
Code: 41 a2 ff 8b 4b 70 4c 8b 4d d0 48 c7 c7 20 76 f5 8b 44 8b 45 bc 48 8b 55 c0 48 8b 75 c8 41 54 41 57 41 56 41 55 e8 75 dc 7a ff <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
RSP: 0018:ffff888059ddf0b0 EFLAGS: 00010286
RAX: 0000000000000086 RBX: ffff888060a315c0 RCX: ffffffff8abe4822
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88806c9a79cc
RBP: ffff888059ddf118 R08: ffffed100d9361b1 R09: ffffed100d9361b0
R10: ffff88805c68aee3 R11: ffffed100d9361b1 R12: ffff88805d218000
R13: ffff88805c689fec R14: 000000000000003c R15: 0000000000000ec0
FS:  00007f6af184b700(0000) GS:ffff88806c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc8204a000 CR3: 0000000057b40006 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 skb_push+0x7e/0x80
 ipmr_get_route+0x459/0x6fa
 rt_fill_info+0x692/0x9f0
 inet_rtm_getroute+0xd26/0xf20
 rtnetlink_rcv_msg+0x45d/0x630
 netlink_rcv_skb+0x1a5/0x220
 rtnetlink_rcv+0x15/0x20
 netlink_unicast+0x305/0x3a0
 netlink_sendmsg+0x575/0x730
 sock_sendmsg+0xb5/0xc0
 ___sys_sendmsg+0x497/0x4f0
 __sys_sendmsg+0xcb/0x150
 __x64_sys_sendmsg+0x48/0x50
 do_syscall_64+0xd2/0xac0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Actually the original skb used to have enough headroom, but the
reserve_skb() call was lost with the introduction of
inet_rtm_getroute_build_skb() by commit 404eb77ea766 ("ipv4: support
sport, dport and ip_proto in RTM_GETROUTE").

We could reserve some headroom again in inet_rtm_getroute_build_skb(),
but this function shouldn't be responsible for handling the special
case of ipmr_get_route(). Let's handle that directly in
ipmr_get_route() by calling skb_realloc_headroom() instead of
skb_clone().

Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'wireless-drivers-next-2019-11-15' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Sat, 16 Nov 2019 21:06:07 +0000 (13:06 -0800)]
Merge tag 'wireless-drivers-next-2019-11-15' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.5

Second set of patches for v5.5. Nothing special this time, smaller
features to various drivers and of course fixes all over.

Major changes:

iwlwifi

* update scan FW API

* bump the supported FW API version

* add debug dump collection on assert in WoWLAN

* enable adaptive dwell on P2P interfaces

ath10k

* request for PM_QOS_CPU_DMA_LATENCY to improve firmware initialisation time

qtnfmac

* add support for getting/setting transmit power

* handle MIC failure event from firmware

rtl8xxxu

* add support for Edimax EW-7611ULB

wil6210

* add SPDX license identifiers
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: cleanup of stray struct hns3_link_mode_mapping
Salil Mehta [Fri, 15 Nov 2019 11:52:32 +0000 (11:52 +0000)]
net: hns3: cleanup of stray struct hns3_link_mode_mapping

This patch cleans-up the stray left over code. It has no
functionality impact.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: fix fastopen for non-blocking connect()
Ursula Braun [Fri, 15 Nov 2019 11:39:30 +0000 (12:39 +0100)]
net/smc: fix fastopen for non-blocking connect()

FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
TCP native during connection start, the FASTOPEN setsockopts trigger
this fallback, if the SMC-socket is still in state SMC_INIT.
But if a FASTOPEN setsockopt is called after a non-blocking connect(),
this is broken, and fallback does not make sense.
This change complements
commit cd2063604ea6 ("net/smc: avoid fallback in case of non-blocking connect")
and fixes the syzbot reported problem "WARNING in smc_unhash_sk".

Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
Fixes: e1bbdd570474 ("net/smc: reduce sock_put() for fallback sockets")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobonding: symmetric ICMP transmit
Matteo Croce [Fri, 15 Nov 2019 11:10:37 +0000 (12:10 +0100)]
bonding: symmetric ICMP transmit

A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports
to balance packets between slaves. With some network errors, we receive an ICMP
error packet by the remote host or a router. If sent by a router, the source IP
can differ from the remote host one. Additionally the ICMP protocol has no port
numbers, so a layer3+4 bonding will get a different hash than the previous one.
These two conditions could let the packet go through a different interface than
the other packets of the same flow:

    # tcpdump -qltnni veth0 |sed 's/^/0: /' &
    # tcpdump -qltnni veth1 |sed 's/^/1: /' &
    # hping3 -2 192.168.0.2 -p 9
    0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36

An ICMP error packet contains the header of the packet which caused the network
error, so inspect it and match the flow against it, so we can send the ICMP via
the same interface of the previous packet in the flow.
Move the IP and port dissect code into a generic function bond_flow_ip() and if
we are dissecting an ICMP error packet, call it again with the adjusted offset.

    # hping3 -2 192.168.0.2 -p 9
    1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0
    1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0
    0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
    0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0
    0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: omit error check from of_get_phy_mode
Horatiu Vultur [Fri, 15 Nov 2019 10:11:15 +0000 (11:11 +0100)]
net: mscc: ocelot: omit error check from of_get_phy_mode

The commit 0c65b2b90d13c ("net: of_get_phy_mode: Change API to solve
int/unit warnings") updated the function of_get_phy_mode declaration.
Now it returns an error code and in case the node doesn't contain the
property 'phy-mode' or 'phy-connection-type' it returns -EINVAL and would
set the phy_interface_t to PHY_INTERFACE_MODE_NA.

Ocelot VSC7514 has 4 internal phys which have the phy interface
PHY_INTERFACE_MODE_NA. So because of_get_phy_mode would assign
PHY_INTERFACE_MODE_NA to phy_mode when there is an error, there is no need
to add the error check.

Updates for v2:
 - drop error check because of_get_phy_mode already assigns phy_interface
   to PHY_INTERFACE_MODE in case of error.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: core: allow fast GRO for skbs with Ethernet header in head
Alexander Lobakin [Fri, 15 Nov 2019 09:11:35 +0000 (12:11 +0300)]
net: core: allow fast GRO for skbs with Ethernet header in head

Commit 78d3fd0b7de8 ("gro: Only use skb_gro_header for completely
non-linear packets") back in May'09 (v2.6.31-rc1) has changed the
original condition '!skb_headlen(skb)' to
'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since
the drivers that need this optimisation all provide completely
non-linear packets" (note that this condition has become the current
'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit
ced14f6804a9 ("net: Correct comparisons and calculations using
skb->tail and skb-transport_header") without any functional changes).

For now, we have the following rough statistics for v5.4-rc7:
1) napi_gro_frags: 14
2) napi_gro_receive with skb->head containing (most of) payload: 83
3) napi_gro_receive with skb->head containing all the headers: 20
4) napi_gro_receive with skb->head containing only Ethernet header: 2

With the current condition, fast GRO with the usage of
NAPI_GRO_CB(skb)->frag0 is available only in the [1] case.
Packets pushed by [2] and [3] go through the 'slow' path, but
it's not a problem for them as they already contain all the needed
headers in skb->head, so pskb_may_pull() only moves skb->data.

The layout of skbs in the fourth [4] case at the moment of
dev_gro_receive() is identical to skbs that have come through [1],
as napi_frags_skb() pulls Ethernet header to skb->head. The only
difference is that the mentioned condition is always false for them,
because skb_put() and friends irreversibly alter the tail pointer.
They also go through the 'slow' path, but now every single
pskb_may_pull() in every single .gro_receive() will call the *really*
slow __pskb_pull_tail() to pull headers to head. This significantly
decreases the overall performance for no visible reasons.

The only two users of method [4] is:
* drivers/staging/qlge
* drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq)

Note that in case with wireless drivers we can't use [1]
(napi_gro_frags()) at least for now and mac80211 stack always
performs pushes and pulls anyways, so performance hit is inavoidable.

At the moment of v2.6.31 the mentioned change was necessary (that's
why I don't add the "Fixes:" tag), but it became obsolete since
skb_gro_mac_header() has gone in commit a50e233c50db ("net-gro:
restore frag0 optimization"), so we can simply revert the condition
in gro_reset_offset() to allow skbs from [4] go through the 'fast'
path just like in case [1].

This was tested on a 600 MHz MIPS CPU and a custom driver and this
patch gave boosts up to 40 Mbps to method [4] in both directions
comparing to net-next, which made overall performance relatively
close to [1] (without it, [4] is the slowest).

v2:
- Add more references and explanations to commit message
- Fix some typos ibid
- No functional changes

Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agords: ib: update WR sizes when bringing up connection
Dag Moxnes [Fri, 15 Nov 2019 08:56:01 +0000 (09:56 +0100)]
rds: ib: update WR sizes when bringing up connection

Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and
rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result,
a connection being down while rds_ib_sysctl_max_send_wr or
rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when
it comes back up.

Move resizing of WRs to rds_ib_setup_qp so that connections will be setup
with the most current WR sizes.

Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: gemini: add missed free_netdev
Chuhong Yuan [Fri, 15 Nov 2019 06:24:54 +0000 (14:24 +0800)]
net: gemini: add missed free_netdev

This driver forgets to free allocated netdev in remove like
what is done in probe failure.
Add the free to fix it.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'bnx2x-Remove-function-casts'
David S. Miller [Sat, 16 Nov 2019 20:50:57 +0000 (12:50 -0800)]
Merge branch 'bnx2x-Remove-function-casts'

Kees Cook says:

====================
bnx2x: Remove function casts

In order to make the entire kernel usable under Clang's Control Flow
Integrity protections, function prototype casts need to be avoided
because this will trip CFI checks at runtime (i.e. a mismatch between
the caller's expected function prototype and the destination function's
prototype). Many of these cases can be found with -Wcast-function-type,
which found that bnx2x had a bunch of needless (or at least confusing)
function casts. This series removes them all.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove hw_reset_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:15 +0000 (21:07 -0800)]
bnx2x: Remove hw_reset_t function casts

All .rw_reset callbacks except bnx2x_84833_hw_reset_phy() use a
void return type. No callers of .hw_reset check a return value and
bnx2x_84833_hw_reset_phy() unconditionally returns 0. Remove all
hw_reset_t casts and fix the return type to void.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove format_fw_ver_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:14 +0000 (21:07 -0800)]
bnx2x: Remove format_fw_ver_t function casts

The return values for format_fw_ver_t callbacks are supposed to be
"int", not "u8". Ultimately, the top-level caller doesn't actually check
the return value at all, but just clean this all up anyway and fix the
prototypes so that casts are no longer needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove config_init_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:13 +0000 (21:07 -0800)]
bnx2x: Remove config_init_t function casts

No callers of .config_init check return values. Remove the casting and
change all callbacks to have the correct function prototype.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Remove read_status_t function casts
Kees Cook [Fri, 15 Nov 2019 05:07:12 +0000 (21:07 -0800)]
bnx2x: Remove read_status_t function casts

The function casts for .read_status callbacks end up casting some int
return values to u8. This seems to be bug-prone (-EINVAL being returned
into something that appears to be true/false), but fixing the function
prototypes doesn't change the existing behavior. Fix the return values
to remove the casts.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnx2x: Drop redundant callback function casts
Kees Cook [Fri, 15 Nov 2019 05:07:11 +0000 (21:07 -0800)]
bnx2x: Drop redundant callback function casts

NULL is already "void *" so it will auto-cast in assignments and
initializers. Additionally, all the callbacks for .link_reset,
.config_loopback, .set_link_led, and .phy_specific_func are already
correct. No casting is needed for these, so remove them.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoenetc: update TSN Qbv PSPEED set according to adjust link speed
Po Liu [Fri, 15 Nov 2019 03:33:41 +0000 (03:33 +0000)]
enetc: update TSN Qbv PSPEED set according to adjust link speed

ENETC has a register PSPEED to indicate the link speed of hardware.
It is need to update accordingly. PSPEED field needs to be updated
with the port speed for QBV scheduling purposes. Or else there is
chance for gate slot not free by frame taking the MAC if PSPEED and
phy speed not match. So update PSPEED when link adjust. This is
implement by the adjust_link.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoenetc: Configure the Time-Aware Scheduler via tc-taprio offload
Po Liu [Fri, 15 Nov 2019 03:33:33 +0000 (03:33 +0000)]
enetc: Configure the Time-Aware Scheduler via tc-taprio offload

ENETC supports in hardware for time-based egress shaping according
to IEEE 802.1Qbv. This patch implement the Qbv enablement by the
hardware offload method qdisc tc-taprio method.
Also update cbdr writeback to up level since control bd ring may
writeback data to control bd ring.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agopage_pool: do not release pool until inflight == 0.
Jonathan Lemon [Thu, 14 Nov 2019 22:13:00 +0000 (14:13 -0800)]
page_pool: do not release pool until inflight == 0.

The page pool keeps track of the number of pages in flight, and
it isn't safe to remove the pool until all pages are returned.

Disallow removing the pool until all pages are back, so the pool
is always available for page producers.

Make the page pool responsible for its own delayed destruction
instead of relying on XDP, so the page pool can be used without
the xdp memory model.

When all pages are returned, free the pool and notify xdp if the
pool is registered with the xdp memory system.  Have the callback
perform a table walk since some drivers (cpsw) may share the pool
among multiple xdp_rxq_info.

Note that the increment of pages_state_release_cnt may result in
inflight == 0, resulting in the pool being released.

Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'smc-last-part-of-termination-improvements'
David S. Miller [Sat, 16 Nov 2019 20:26:49 +0000 (12:26 -0800)]
Merge branch 'smc-last-part-of-termination-improvements'

Karsten Graul says:

====================
last part of termination improvements

Patches 1 and 2 finish the set of termination patches, introducing
a reboot handler that terminates all link groups. Patch 3 adds an
rcu_barrier before the module is unloaded, and patch 4 is cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: remove unused constant
Ursula Braun [Sat, 16 Nov 2019 16:47:32 +0000 (17:47 +0100)]
net/smc: remove unused constant

Constant SMC_CLOSE_WAIT_LISTEN_CLCSOCK_TIME is defined, but since
commit 3d502067599f ("net/smc: simplify wait when closing listen socket")
no longer used. Remove it.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: use rcu_barrier() on module unload
Ursula Braun [Sat, 16 Nov 2019 16:47:31 +0000 (17:47 +0100)]
net/smc: use rcu_barrier() on module unload

Add rcu_barrier() to make sure no RCU readers or callbacks are
pending when the module is unloaded.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: guarantee removal of link groups in reboot
Ursula Braun [Sat, 16 Nov 2019 16:47:30 +0000 (17:47 +0100)]
net/smc: guarantee removal of link groups in reboot

When rebooting it should be guaranteed all link groups are cleaned
up and freed.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: introduce bookkeeping of SMCR link groups
Ursula Braun [Sat, 16 Nov 2019 16:47:29 +0000 (17:47 +0100)]
net/smc: introduce bookkeeping of SMCR link groups

If the smc module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
Counters for the total number of SMCR link groups and for the total
number of SMCR links per IB device are introduced. smc module unloading
continues only if the total number of SMCR link groups is zero. IB device
removal continues only it the total number of SMCR links per IB device
has decreased to zero.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
Vladimir Oltean [Sat, 16 Nov 2019 16:08:25 +0000 (18:08 +0200)]
net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid

This sequence of operations:
ip link set dev br0 type bridge vlan_filtering 1
bridge vlan del dev swp2 vid 1
ip link set dev br0 type bridge vlan_filtering 1
ip link set dev br0 type bridge vlan_filtering 0

apparently fails with the message:

[   31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
[   31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0)
[   31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2
[   31.335599] ------------[ cut here ]------------
[   31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
[   31.349981] br0: Commit of attribute (id=6) failed.
[   31.354890] Modules linked in:
[   31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062
[   31.366167] Hardware name: Freescale LS1021A
[   31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14)
[   31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c)
[   31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c)
[   31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8)
[   31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4)
[   31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118)
[   31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518)
[   31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c)
[   31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60)
[   31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c)
[   31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110)
[   31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8)
[   31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4)
[   31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250)
[   31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c)
[   31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
[   31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0)
[   31.505122] 7fa0:                   00000002 b6f2e060 00000003 beabd6a4 00000000 00000000
[   31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4
[   31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68

The reason is the implementation of br_get_pvid:

static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
{
if (!vg)
return 0;

smp_rmb();
return vg->pvid;
}

Since VID 0 is an invalid pvid from the bridge's point of view, let's
add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that
doesn't really exist.

Fixes: 5f33183b7fdf ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'seg6-fixes-to-Segment-Routing-in-IPv6'
David S. Miller [Sat, 16 Nov 2019 20:18:32 +0000 (12:18 -0800)]
Merge branch 'seg6-fixes-to-Segment-Routing-in-IPv6'

Andrea Mayer says:

====================
seg6: fixes to Segment Routing in IPv6

This patchset is divided in 2 patches and it introduces some fixes
to Segment Routing in IPv6, which are:

- in function get_srh() fix the srh pointer after calling
  pskb_may_pull();

- fix the skb->transport_header after calling decap_and_validate()
  function;

Any comments on the patchset are welcome.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoseg6: fix skb transport_header after decap_and_validate()
Andrea Mayer [Sat, 16 Nov 2019 15:05:53 +0000 (16:05 +0100)]
seg6: fix skb transport_header after decap_and_validate()

in the receive path (more precisely in ip6_rcv_core()) the
skb->transport_header is set to skb->network_header + sizeof(*hdr). As a
consequence, after routing operations, destination input expects to find
skb->transport_header correctly set to the next protocol (or extension
header) that follows the network protocol. However, decap behaviors (DX*,
DT*) remove the outer IPv6 and SRH extension and do not set again the
skb->transport_header pointer correctly. For this reason, the patch sets
the skb->transport_header to the skb->network_header + sizeof(hdr) in each
DX* and DT* behavior.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoseg6: fix srh pointer in get_srh()
Andrea Mayer [Sat, 16 Nov 2019 15:05:52 +0000 (16:05 +0100)]
seg6: fix srh pointer in get_srh()

pskb_may_pull may change pointers in header. For this reason, it is
mandatory to reload any pointer that points into skb header.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: Use the correct style for SPDX License Identifier
Nishad Kamdar [Sat, 16 Nov 2019 09:40:59 +0000 (15:10 +0530)]
net: stmmac: Use the correct style for SPDX License Identifier

This patch corrects the SPDX License Identifier style in
header files related to STMicroelectronics based Multi-Gigabit
Ethernet driver. For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used).

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoocteontx2-af: Use the correct style for SPDX License Identifier
Nishad Kamdar [Sat, 16 Nov 2019 09:20:45 +0000 (14:50 +0530)]
octeontx2-af: Use the correct style for SPDX License Identifier

This patch corrects the SPDX License Identifier style in
header files related to Marvell OcteonTX2 network devices.
It uses an expilict block comment for the SPDX License
Identifier.

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 16 Nov 2019 16:20:43 +0000 (08:20 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "11 fixes"

MM fixes and one xz decompressor fix.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/debug.c: PageAnon() is true for PageKsm() pages
  mm/debug.c: __dump_page() prints an extra line
  mm/page_io.c: do not free shared swap slots
  mm/memory_hotplug: fix try_offline_node()
  mm,thp: recheck each page before collapsing file THP
  mm: slub: really fix slab walking for init_on_free
  mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
  mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations
  mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  mm: mempolicy: fix the wrong return value and potential pages leak of mbind

4 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 16 Nov 2019 02:37:20 +0000 (18:37 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull more input fixes from Dmitry Torokhov:
 "A couple of fixes in driver teardown paths and another ID for
  Synaptics RMI mode"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics - enable RMI mode for X1 Extreme 2nd Generation
  Input: synaptics-rmi4 - destroy F54 poller workqueue when removing
  Input: ff-memless - kill timer in destroy()

4 years agomm/debug.c: PageAnon() is true for PageKsm() pages
Ralph Campbell [Sat, 16 Nov 2019 01:35:07 +0000 (17:35 -0800)]
mm/debug.c: PageAnon() is true for PageKsm() pages

PageAnon() and PageKsm() use the low two bits of the page->mapping
pointer to indicate the page type.  PageAnon() only checks the LSB while
PageKsm() checks the least significant 2 bits are equal to 3.

Therefore, PageAnon() is true for KSM pages.  __dump_page() incorrectly
will never print "ksm" because it checks PageAnon() first.  Fix this by
checking PageKsm() first.

Link: http://lkml.kernel.org/r/20191113000651.20677-1-rcampbell@nvidia.com
Fixes: 1c6fb1d89e73 ("mm: print more information about mapping in __dump_page")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/debug.c: __dump_page() prints an extra line
Ralph Campbell [Sat, 16 Nov 2019 01:35:04 +0000 (17:35 -0800)]
mm/debug.c: __dump_page() prints an extra line

When dumping struct page information, __dump_page() prints the page type
with a trailing blank followed by the page flags on a separate line:

  anon
  flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)

It looks like the intent was to use pr_cont() for printing "flags:" but
pr_cont() usage is discouraged so fix this by extending the format to
include the flags into a single line:

  anon flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)

If the page is file backed, the name might be long so use two lines:

  shmem_aops name:"dev/zero"
  flags: 0x10000000008000c(uptodate|dirty|swapbacked)

Eliminate pr_conf() usage as well for appending compound_mapcount.

Link: http://lkml.kernel.org/r/20191112012608.16926-1-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_io.c: do not free shared swap slots
Vinayak Menon [Sat, 16 Nov 2019 01:35:00 +0000 (17:35 -0800)]
mm/page_io.c: do not free shared swap slots

The following race is observed due to which a processes faulting on a
swap entry, finds the page neither in swapcache nor swap.  This causes
zram to give a zero filled page that gets mapped to the process,
resulting in a user space crash later.

Consider parent and child processes Pa and Pb sharing the same swap slot
with swap_count 2.  Swap is on zram with SWP_SYNCHRONOUS_IO set.
Virtual address 'VA' of Pa and Pb points to the shared swap entry.

Pa                                       Pb

fault on VA                              fault on VA
do_swap_page                             do_swap_page
lookup_swap_cache fails                  lookup_swap_cache fails
                                         Pb scheduled out
swapin_readahead (deletes zram entry)
swap_free (makes swap_count 1)
                                         Pb scheduled in
                                         swap_readpage (swap_count == 1)
                                         Takes SWP_SYNCHRONOUS_IO path
                                         zram enrty absent
                                         zram gives a zero filled page

Fix this by making sure that swap slot is freed only when swap count
drops down to one.

Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org
Fixes: aa8d22a11da9 ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference")
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Suggested-by: Minchan Kim <minchan@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: fix try_offline_node()
David Hildenbrand [Sat, 16 Nov 2019 01:34:57 +0000 (17:34 -0800)]
mm/memory_hotplug: fix try_offline_node()

try_offline_node() is pretty much broken right now:

 - The node span is updated when onlining memory, not when adding it. We
   ignore memory that was mever onlined. Bad.

 - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
   trigger a kernel panic. Bad for memory that is offline but also bad
   for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
   first PFN of a section might contain garbage.

 - Sections belonging to mixed nodes are not properly considered.

As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE).  This makes things
more complicated.

Luckily, we can piggy pack on the node span and the nid stored in memory
blocks.  Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node().  Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.

If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline.  As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).

Introduce for_each_memory_block() to efficiently walk all memory blocks.

Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized.  This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.

Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized).  The introducing
commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.

I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node.  The node is properly onlined when adding the DIMMs.  When
removing the DIMMs, the node is properly offlined.

Masayoshi Mizuma reported:

: Without this patch, memory hotplug fails as panic:
:
:  BUG: kernel NULL pointer dereference, address: 0000000000000000
:  ...
:  Call Trace:
:   remove_memory_block_devices+0x81/0xc0
:   try_remove_memory+0xb4/0x130
:   __remove_memory+0xa/0x20
:   acpi_memory_device_remove+0x84/0x100
:   acpi_bus_trim+0x57/0x90
:   acpi_bus_trim+0x2e/0x90
:   acpi_device_hotplug+0x2b2/0x4d0
:   acpi_hotplug_work_fn+0x1a/0x30
:   process_one_work+0x171/0x380
:   worker_thread+0x49/0x3f0
:   kthread+0xf8/0x130
:   ret_from_fork+0x35/0x40

[david@redhat.com: v3]
Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm,thp: recheck each page before collapsing file THP
Song Liu [Sat, 16 Nov 2019 01:34:53 +0000 (17:34 -0800)]
mm,thp: recheck each page before collapsing file THP

In collapse_file(), for !is_shmem case, current check cannot guarantee
the locked page is up-to-date.  Specifically, xas_unlock_irq() should
not be called before lock_page() and get_page(); and it is necessary to
recheck PageUptodate() after locking the page.

With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
may contain corrupted data.  This is because khugepaged mistakenly
collapses some not up-to-date sub pages into a huge page, and assumes
the huge page is up-to-date.  This will NOT corrupt data in the disk,
because the page is read-only and never written back.  Fix this by
properly checking PageUptodate() after locking the page.  This check
replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".

Also, move PageDirty() check after locking the page.  Current khugepaged
should not try to collapse dirty file THP, because it is limited to
read-only .text.  The only case we hit a dirty page here is when the
page hasn't been written since write.  Bail out and retry when this
happens.

syzbot reported bug on previous version of this patch.

Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>