OSDN Git Service

uclinux-h8/linux.git
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 10 Dec 2020 23:30:13 +0000 (15:30 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) IPsec compat fixes, from Dmitry Safonov.

 2) Fix memory leak in xfrm_user_policy(). Fix from Yu Kuai.

 3) Fix polling in xsk sockets by using sk_poll_wait() instead of
    datagram_poll() which keys off of sk_wmem_alloc and such which xsk
    sockets do not update. From Xuan Zhuo.

 4) Missing init of rekey_data in cfgh80211, from Sara Sharon.

 5) Fix destroy of timer before init, from Davide Caratti.

 6) Missing CRYPTO_CRC32 selects in ethernet driver Kconfigs, from Arnd
    Bergmann.

 7) Missing error return in rtm_to_fib_config() switch case, from Zhang
    Changzhong.

 8) Fix some src/dest address handling in vrf and add a testcase. From
    Stephen Suryaputra.

 9) Fix multicast handling in Seville switches driven by mscc-ocelot
    driver. From Vladimir Oltean.

10) Fix proto value passed to skb delivery demux in udp, from Xin Long.

11) HW pkt counters not reported correctly in enetc driver, from Claudiu
    Manoil.

12) Fix deadlock in bridge, from Joseph Huang.

13) Missing of_node_pur() in dpaa2 driver, fromn Christophe JAILLET.

14) Fix pid fetching in bpftool when there are a lot of results, from
    Andrii Nakryiko.

15) Fix long timeouts in nft_dynset, from Pablo Neira Ayuso.

16) Various stymmac fixes, from Fugang Duan.

17) Fix null deref in tipc, from Cengiz Can.

18) When mss is biog, coose more resonable rcvq_space in tcp, fromn Eric
    Dumazet.

19) Revert a geneve change that likely isnt necessary, from Jakub
    Kicinski.

20) Avoid premature rx buffer reuse in various Intel driversm from Björn
    Töpel.

21) retain EcT bits during TIS reflection in tcp, from Wei Wang.

22) Fix Tso deferral wrt. cwnd limiting in tcp, from Neal Cardwell.

23) MPLS_OPT_LSE_LABEL attribute is 342 ot 8 bits, from Guillaume Nault

24) Fix propagation of 32-bit signed bounds in bpf verifier and add test
    cases, from Alexei Starovoitov.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  selftests: fix poll error in udpgro.sh
  selftests/bpf: Fix "dubious pointer arithmetic" test
  selftests/bpf: Fix array access with signed variable test
  selftests/bpf: Add test for signed 32-bit bound check bug
  bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds.
  MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver
  net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower
  net/mlx4_en: Handle TX error CQE
  net/mlx4_en: Avoid scheduling restart task if it is already running
  tcp: fix cwnd-limited bug for TSO deferral where we send nothing
  net: flow_offload: Fix memory leak for indirect flow block
  tcp: Retain ECT bits for tos reflection
  ethtool: fix stack overflow in ethnl_parse_bitset()
  e1000e: fix S0ix flow to allow S0i3.2 subset entry
  ice: avoid premature Rx buffer reuse
  ixgbe: avoid premature Rx buffer reuse
  i40e: avoid premature Rx buffer reuse
  igb: avoid transmit queue timeout in xdp path
  igb: use xdp_do_flush
  igb: skb add metasize for xdp
  ...

3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Thu, 10 Dec 2020 22:29:30 +0000 (14:29 -0800)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2020-12-10

The following pull-request contains BPF updates for your *net* tree.

We've added 21 non-merge commits during the last 12 day(s) which contain
a total of 21 files changed, 163 insertions(+), 88 deletions(-).

The main changes are:

1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei.

2) Fix ring_buffer__poll() return value, from Andrii.

3) Fix race in lwt_bpf, from Cong.

4) Fix test_offload, from Toke.

5) Various xsk fixes.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

Also thanks to reporters, reviewers and testers of commits in this pull-request:

Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John
Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: fix poll error in udpgro.sh
Paolo Abeni [Wed, 9 Dec 2020 11:21:13 +0000 (12:21 +0100)]
selftests: fix poll error in udpgro.sh

The test program udpgso_bench_rx always invokes the poll()
syscall with a timeout of 10ms. If a larger timeout is specified
via the command line, udpgso_bench_rx is supposed to do multiple
poll() calls till the timeout is expired or an event is received.

Currently the poll() loop errors out after the first invocation with
no events, and may causes self-tests failure alike:

failed
 GRO with custom segment size            ./udpgso_bench_rx: poll: 0x0 expected 0x1

This change addresses the issue allowing the poll() loop to consume
all the configured timeout.

Fixes: ada641ff6ed3 ("selftests: fixes for UDP GRO")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests/bpf: Fix "dubious pointer arithmetic" test
Jean-Philippe Brucker [Tue, 8 Dec 2020 18:01:54 +0000 (19:01 +0100)]
selftests/bpf: Fix "dubious pointer arithmetic" test

The verifier trace changed following a bugfix. After checking the 64-bit
sign, only the upper bit mask is known, not bit 31. Update the test
accordingly.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: Fix array access with signed variable test
Jean-Philippe Brucker [Tue, 8 Dec 2020 18:01:53 +0000 (19:01 +0100)]
selftests/bpf: Fix array access with signed variable test

The test fails because of a recent fix to the verifier, even though this
program is valid. In details what happens is:

    7: (61) r1 = *(u32 *)(r0 +0)

Load a 32-bit value, with signed bounds [S32_MIN, S32_MAX]. The bounds
of the 64-bit value are [0, U32_MAX]...

    8: (65) if r1 s> 0xffffffff goto pc+1

... therefore this is always true (the operand is sign-extended).

    10: (b4) w2 = 11
    11: (6d) if r2 s> r1 goto pc+1

When true, the 64-bit bounds become [0, 10]. The 32-bit bounds are still
[S32_MIN, 10].

    13: (64) w1 <<= 2

Because this is a 32-bit operation, the verifier propagates the new
32-bit bounds to the 64-bit ones, and the knowledge gained from insn 11
is lost.

    14: (0f) r0 += r1
    15: (7a) *(u64 *)(r0 +0) = 4

Then the verifier considers r0 unbounded here, rejecting the test. To
make the test work, change insn 8 to check the sign of the 32-bit value.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: Add test for signed 32-bit bound check bug
Jean-Philippe Brucker [Tue, 8 Dec 2020 18:01:52 +0000 (19:01 +0100)]
selftests/bpf: Add test for signed 32-bit bound check bug

After a 32-bit load followed by a branch, the verifier would reduce the
maximum bound of the register to 0x7fffffff, allowing a user to bypass
bound checks. Ensure such a program is rejected.

In the second test, the 64-bit compare should not sufficient to
determine whether the signed 32-bit lower bound is 0, so the verifier
should reject the second branch.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: Fix propagation of 32-bit signed bounds from 64-bit bounds.
Alexei Starovoitov [Tue, 8 Dec 2020 18:01:51 +0000 (19:01 +0100)]
bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds.

The 64-bit signed bounds should not affect 32-bit signed bounds unless the
verifier knows that upper 32-bits are either all 1s or all 0s. For example the
register with smin_value==1 doesn't mean that s32_min_value is also equal to 1,
since smax_value could be larger than 32-bit subregister can hold.
The verifier refines the smax/s32_max return value from certain helpers in
do_refine_retval_range(). Teach the verifier to recognize that smin/s32_min
value is also bounded. When both smin and smax bounds fit into 32-bit
subregister the verifier can propagate those bounds.

Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Thu, 10 Dec 2020 19:00:27 +0000 (11:00 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Two user triggerable crashers and a some EFA related regressions:

   - Syzkaller found a bug in CM

   - Restore access to the GID table and fix modify_qp for EFA

   - Crasher in qedr"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait
  RDMA/core: Fix empty gid table for non IB/RoCE devices
  RDMA/efa: Use the correct current and new states in modify QP
  RDMA/qedr: iWARP invalid(zero) doorbell address fix

3 years agoMerge tag 'media/v5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Thu, 10 Dec 2020 18:48:49 +0000 (10:48 -0800)]
Merge tag 'media/v5.10-4' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "A couple of fixes:

   - videobuf2: fix a DMABUF bug, preventing it to properly handle cache
     sync/flush

   - vidtv: an usage after free and a few sparse/smatch warning fixes

   - pulse8-cec: a duplicate free and a bug related to new firmware
     usage

   - mtk-cir: fix a regression on a clock setting"

* tag 'media/v5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: vidtv: fix some warnings
  media: vidtv: fix kernel-doc markups
  media: [next] media: vidtv: fix a read from an object after it has been freed
  media: vb2: set cache sync hints when init buffers
  media: pulse8-cec: add support for FW v10 and up
  media: pulse8-cec: fix duplicate free at disconnect or probe error
  media: mtk-cir: fix calculation of chk period

3 years agoMAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver
Mickey Rachamim [Wed, 9 Dec 2020 13:47:39 +0000 (15:47 +0200)]
MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver

Add maintainers info for new Marvell Prestera Ethernet switch driver.

Signed-off-by: Mickey Rachamim <mickeyr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower
Guillaume Nault [Wed, 9 Dec 2020 15:48:41 +0000 (16:48 +0100)]
net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower

TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL is a u32 attribute (MPLS label is
20 bits long).

Fixes the following bug:

 $ tc filter add dev ethX ingress protocol mpls_uc \
     flower mpls lse depth 2 label 256             \
     action drop

 $ tc filter show dev ethX ingress
   filter protocol mpls_uc pref 49152 flower chain 0
   filter protocol mpls_uc pref 49152 flower chain 0 handle 0x1
     eth_type 8847
     mpls
       lse depth 2 label 0  <-- invalid label 0, should be 256
   ...

Fixes: 61aec25a6db5 ("cls_flower: Support filtering on multiple MPLS Label Stack Entries")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Thu, 10 Dec 2020 02:55:46 +0000 (18:55 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
   from Subash Abhinov Kasiviswanathan.

2) Fix netlink dump of dynset timeouts later than 23 days.

3) Add comment for the indirect serialization of the nft commit mutex
   with rtnl_mutex.

4) Remove bogus check for confirmed conntrack when matching on the
   conntrack ID, from Brett Mastbergen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
David S. Miller [Thu, 10 Dec 2020 02:48:29 +0000 (18:48 -0800)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2020-12-09

This series contains updates to igb, ixgbe, i40e, and ice drivers.

Sven Auhagen fixes issues with igb XDP: return correct error value in XDP
xmit back, increase header padding to include space for double VLAN, add
an extack error when Rx buffer is too small for frame size, set metasize if
it is set in xdp, change xdp_do_flush_map to xdp_do_flush, and update
trans_start to avoid possible Tx timeout.

Björn fixes an issue where an Rx buffer can be reused prematurely with
XDP redirect for ixgbe, i40e, and ice drivers.

The following are changes since commit 323a391a220c4a234cb1e678689d7f4c3b73f863:
  can: isotp: isotp_setsockopt(): block setsockopt on bound sockets
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlx4_en-fixes'
David S. Miller [Thu, 10 Dec 2020 00:44:35 +0000 (16:44 -0800)]
Merge branch 'mlx4_en-fixes'

Tariq Toukan says:

====================
mlx4_en fixes

This patchset by Moshe contains fixes to the mlx4 Eth driver,
addressing issues in restart flow.

Patch 1 protects the restart task from being rescheduled while active.
  Please queue for -stable >= v2.6.
Patch 2 reconstructs SQs stuck in error state, and adds prints for improved
  debuggability.
  Please queue for -stable >= v3.12.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx4_en: Handle TX error CQE
Moshe Shemesh [Wed, 9 Dec 2020 13:03:39 +0000 (15:03 +0200)]
net/mlx4_en: Handle TX error CQE

In case error CQE was found while polling TX CQ, the QP is in error
state and all posted WQEs will generate error CQEs without any data
transmitted. Fix it by reopening the channels, via same method used for
TX timeout handling.

In addition add some more info on error CQE and WQE for debug.

Fixes: bd2f631d7c60 ("net/mlx4_en: Notify user when TX ring in error state")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx4_en: Avoid scheduling restart task if it is already running
Moshe Shemesh [Wed, 9 Dec 2020 13:03:38 +0000 (15:03 +0200)]
net/mlx4_en: Avoid scheduling restart task if it is already running

Add restarting state flag to avoid scheduling another restart task while
such task is already running. Change task name from watchdog_task to
restart_task to better fit the task role.

Fixes: 1e338db56e5a ("mlx4_en: Fix a race at restart task")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: fix cwnd-limited bug for TSO deferral where we send nothing
Neal Cardwell [Wed, 9 Dec 2020 03:57:59 +0000 (22:57 -0500)]
tcp: fix cwnd-limited bug for TSO deferral where we send nothing

When cwnd is not a multiple of the TSO skb size of N*MSS, we can get
into persistent scenarios where we have the following sequence:

(1) ACK for full-sized skb of N*MSS arrives
  -> tcp_write_xmit() transmit full-sized skb with N*MSS
  -> move pacing release time forward
  -> exit tcp_write_xmit() because pacing time is in the future

(2) TSQ callback or TCP internal pacing timer fires
  -> try to transmit next skb, but TSO deferral finds remainder of
     available cwnd is not big enough to trigger an immediate send
     now, so we defer sending until the next ACK.

(3) repeat...

So we can get into a case where we never mark ourselves as
cwnd-limited for many seconds at a time, even with
bulk/infinite-backlog senders, because:

o In case (1) above, every time in tcp_write_xmit() we have enough
cwnd to send a full-sized skb, we are not fully using the cwnd
(because cwnd is not a multiple of the TSO skb size). So every time we
send data, we are not cwnd limited, and so in the cwnd-limited
tracking code in tcp_cwnd_validate() we mark ourselves as not
cwnd-limited.

o In case (2) above, every time in tcp_write_xmit() that we try to
transmit the "remainder" of the cwnd but defer, we set the local
variable is_cwnd_limited to true, but we do not send any packets, so
sent_pkts is zero, so we don't call the cwnd-limited logic to update
tp->is_cwnd_limited.

Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Reported-by: Ingemar Johansson <ingemar.s.johansson@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: flow_offload: Fix memory leak for indirect flow block
Chris Mi [Tue, 8 Dec 2020 02:48:35 +0000 (10:48 +0800)]
net: flow_offload: Fix memory leak for indirect flow block

The offending commit introduces a cleanup callback that is invoked
when the driver module is removed to clean up the tunnel device
flow block. But it returns on the first iteration of the for loop.
The remaining indirect flow blocks will never be freed.

Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
3 years agotcp: Retain ECT bits for tos reflection
Wei Wang [Tue, 8 Dec 2020 17:55:08 +0000 (09:55 -0800)]
tcp: Retain ECT bits for tos reflection

For DCTCP, we have to retain the ECT bits set by the congestion control
algorithm on the socket when reflecting syn TOS in syn-ack, in order to
make ECN work properly.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Reported-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoethtool: fix stack overflow in ethnl_parse_bitset()
Michal Kubecek [Tue, 8 Dec 2020 22:13:51 +0000 (23:13 +0100)]
ethtool: fix stack overflow in ethnl_parse_bitset()

Syzbot reported a stack overflow in bitmap_from_arr32() called from
ethnl_parse_bitset() when bitset from netlink message is longer than
target bitmap length. While ethnl_compact_sanity_checks() makes sure that
trailing part is all zeros (i.e. the request does not try to touch bits
kernel does not recognize), we also need to cap change_bits to nbits so
that we don't try to write past the prepared bitmaps.

Fixes: 88db6d1e4f62 ("ethtool: add ethnl_parse_bitset() helper")
Reported-by: syzbot+9d39fa49d4df294aab93@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/3487ee3a98e14cd526f55b6caaa959d2dcbcad9f.1607465316.git.mkubecek@suse.cz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoe1000e: fix S0ix flow to allow S0i3.2 subset entry
Vitaly Lifshits [Tue, 8 Dec 2020 18:56:32 +0000 (12:56 -0600)]
e1000e: fix S0ix flow to allow S0i3.2 subset entry

Changed a configuration in the flows to align with
architecture requirements to achieve S0i3.2 substate.

This helps both i219V and i219LM configurations.

Also fixed a typo in the previous commit 632fbd5eb5b0
("e1000e: fix S0ix flows for cable connected case").

Fixes: 632fbd5eb5b0 ("e1000e: fix S0ix flows for cable connected case").
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Link: https://lore.kernel.org/r/20201208185632.151052-1-mario.limonciello@dell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoice: avoid premature Rx buffer reuse
Björn Töpel [Tue, 25 Aug 2020 17:27:36 +0000 (19:27 +0200)]
ice: avoid premature Rx buffer reuse

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Fixes: efc2214b6047 ("ice: Add support for XDP")
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoixgbe: avoid premature Rx buffer reuse
Björn Töpel [Tue, 25 Aug 2020 17:27:35 +0000 (19:27 +0200)]
ixgbe: avoid premature Rx buffer reuse

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoi40e: avoid premature Rx buffer reuse
Björn Töpel [Tue, 25 Aug 2020 17:27:34 +0000 (19:27 +0200)]
i40e: avoid premature Rx buffer reuse

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Longer explanation:

Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.

t0: Page is allocated, and put on the Rx ring
              +---------------
used by NIC ->| upper buffer
(rx_buffer)   +---------------
              | lower buffer
              +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX

t1: Buffer is received, and passed to the stack (e.g.)
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 1

t2: Buffer is received, and redirected
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------

Now, prior calling xdp_do_redirect():
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

This means that buffer *cannot* be flipped/reused, because the skb is
still using it.

The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
  page count  == USHRT_MAX - 1
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!

To work around this, the page count is stored prior calling
xdp_do_redirect().

Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not.

Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoigb: avoid transmit queue timeout in xdp path
Sven Auhagen [Wed, 11 Nov 2020 17:04:53 +0000 (18:04 +0100)]
igb: avoid transmit queue timeout in xdp path

Since we share the transmit queue with the network stack,
it is possible that we run into a transmit queue timeout.
This will reset the queue.
This happens under high load when XDP is using the
transmit queue pretty much exclusively.

netdev_start_xmit() sets the trans_start variable of the
transmit queue to jiffies which is later utilized by dev_watchdog(),
so to avoid timeout, let stack know that XDP xmit happened by
bumping the trans_start within XDP Tx routines to jiffies.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoigb: use xdp_do_flush
Sven Auhagen [Wed, 11 Nov 2020 17:04:52 +0000 (18:04 +0100)]
igb: use xdp_do_flush

Since it is a new XDP implementation change xdp_do_flush_map
to xdp_do_flush.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoigb: skb add metasize for xdp
Sven Auhagen [Wed, 11 Nov 2020 17:04:51 +0000 (18:04 +0100)]
igb: skb add metasize for xdp

add metasize if it is set in xdp

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoigb: XDP extack message on error
Sven Auhagen [Wed, 11 Nov 2020 17:04:50 +0000 (18:04 +0100)]
igb: XDP extack message on error

Add an extack error message when the RX buffer size is too small
for the frame size.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoigb: take VLAN double header into account
Sven Auhagen [Wed, 11 Nov 2020 17:04:49 +0000 (18:04 +0100)]
igb: take VLAN double header into account

Increase the packet header padding to include double VLAN tagging.
This patch uses a macro for this.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoigb: XDP xmit back fix error code
Sven Auhagen [Wed, 11 Nov 2020 17:04:48 +0000 (18:04 +0100)]
igb: XDP xmit back fix error code

The igb XDP xmit back function should only return
defined error codes.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoMerge tag 'arm-soc-fixes-v5.10-4b' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 9 Dec 2020 22:49:48 +0000 (14:49 -0800)]
Merge tag 'arm-soc-fixes-v5.10-4b' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "There are a few more PHY mode changes for allwinner SoC based boards
  with a Realtek PHY after the driver changed its behavior, I assume
  there will be more of these in the future. Also on for Allwinner, the
  Banana Pi M2 board had a regression that led to some devices not
  working because of a slightly incorrect voltage being applied.

  By popular demand, I picked up a change from Krzysztof Kozlowski to
  actually list the SoC tree in the MAINTAINERS file. We don't want to
  get Cc'd on normal patches that are picked up by platform maintainers,
  but the lack of an entry has led to confusion in the past.

  All the other changes are fairly benign, fixing boot-time or
  compile-time warning messages in various places:

   - A dtc warning on the OLPC XO-1.75

   - A boot-time warning on i.MX6 wandboard

   - A harmless compile-time warning

   - A regression causing one of the i.MX6 SoCs to be identified as
     another

   - Missing SoC identification of Allwinner V3 and S3"

* tag 'arm-soc-fixes-v5.10-4b' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  firmware: xilinx: Mark pm_api_features_map with static keyword
  ARM: dts: mmp2-olpc-xo-1-75: clear the warnings when make dtbs
  MAINTAINERS: add a limited ARM and ARM64 SoC entry
  MAINTAINERS: correct SoC Git address (formerly: arm-soc)
  ARM: keystone: remove SECTION_SIZE_BITS/MAX_PHYSMEM_BITS
  arm64: dts: allwinner: H5: NanoPi Neo Plus2: phy-mode rgmii-id
  arm64: dts: allwinner: A64 Sopine: phy-mode rgmii-id
  ARM: dts: imx6qdl-kontron-samx6i: fix I2C_PM scl pin
  ARM: dts: imx6qdl-wandboard-revd1: Remove PAD_GPIO_6 from enetgrp
  ARM: imx: Use correct SRC base address
  ARM: dts: sun7i: pcduino3-nano: enable RGMII RX/TX delay on PHY
  ARM: dts: sun8i: v3s: fix GIC node memory range
  ARM: dts: sun8i: v40: bananapi-m2-berry: Fix ethernet node
  ARM: dts: sun8i: r40: bananapi-m2-berry: Fix dcdc1 regulator
  ARM: dts: sun7i: bananapi: Enable RGMII RX/TX delay on Ethernet PHY
  ARM: dts: s3: pinecube: align compatible property to other S3 boards
  ARM: sunxi: Add machine match for the Allwinner V3 SoC
  arm64: dts: allwinner: h6: orangepi-one-plus: Fix ethernet

3 years agoRevert "geneve: pull IP header before ECN decapsulation"
Jakub Kicinski [Wed, 9 Dec 2020 22:39:56 +0000 (14:39 -0800)]
Revert "geneve: pull IP header before ECN decapsulation"

This reverts commit 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation").

Eric says: "network header should have been pulled already before
hitting geneve_rx()". Let's revert the syzbot fix since it's causing
more harm than good, and revisit.

Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=210569
Link: https://lore.kernel.org/netdev/CANn89iJVWfb=2i7oU1=D55rOyQnBbbikf+Mc6XHMkY7YX-yGEw@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agofirmware: xilinx: Mark pm_api_features_map with static keyword
Zou Wei [Tue, 1 Dec 2020 11:51:53 +0000 (19:51 +0800)]
firmware: xilinx: Mark pm_api_features_map with static keyword

Fix the following sparse warning:

drivers/firmware/xilinx/zynqmp.c:32:1: warning: symbol 'pm_api_features_map' was not declared. Should it be static?

Signed-off-by: Zou Wei <zou_wei@huawei.com>
Link: https://lore.kernel.org/r/1606823513-121578-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoARM: dts: mmp2-olpc-xo-1-75: clear the warnings when make dtbs
Zhen Lei [Mon, 7 Dec 2020 08:47:52 +0000 (16:47 +0800)]
ARM: dts: mmp2-olpc-xo-1-75: clear the warnings when make dtbs

The check_spi_bus_bridge() in scripts/dtc/checks.c requires that the node
have "spi-slave" property must with "#address-cells = <0>" and
"#size-cells = <0>". But currently both "#address-cells" and "#size-cells"
properties are deleted, the corresponding default values are 2 and 1. As a
result, the check fails and below warnings is displayed.

arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
/soc/apb@d4000000/spi@d4037000: incorrect #address-cells for SPI bus
  also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
/soc/apb@d4000000/spi@d4037000: incorrect #size-cells for SPI bus
  also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2-olpc-xo-1-75.dtb: Warning (spi_bus_reg): \
Failed prerequisite 'spi_bus_bridge'

Because the value of "#size-cells" is already defined as zero in the node
"ssp3: spi@d4037000" in arch/arm/boot/dts/mmp2.dtsi. So we only need to
explicitly add "#address-cells = <0>" and keep "#size-cells" no change.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20201207084752.1665-2-thunder.leizhen@huawei.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoRDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait
Leon Romanovsky [Fri, 4 Dec 2020 06:42:05 +0000 (08:42 +0200)]
RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait

If cm_create_timewait_info() fails, the timewait_info pointer will contain
an error value and will be used in cm_remove_remote() later.

  general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127]
  CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978
  Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00
  RSP: 0018:ffff888013127918 EFLAGS: 00010006
  RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000
  RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4
  RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d
  R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70
  R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c
  FS:  00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155
   cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline]
   rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107
   rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140
   ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069
   ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724
   vfs_write+0x220/0×830 fs/read_write.c:603
   ksys_write+0x1df/0×240 fs/read_write.c:658
   do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20201204064205.145795-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reported-by: Amit Matityahu <mitm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoMerge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Wed, 9 Dec 2020 17:59:14 +0000 (09:59 -0800)]
Merge tag 'iommu-fixes' of git://git./linux/kernel/git/arm64/linux

Pull iommu fix from Will Deacon:
 "Fix interrupt table length definition for AMD IOMMU.

  It's actually a fix for a fix, where the size of the interrupt
  remapping table was increased but a related constant for the
  size of the interrupt table was forgotten"

* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  iommu/amd: Set DTE[IntTabLen] to represent 512 IRTEs

3 years agocan: isotp: isotp_setsockopt(): block setsockopt on bound sockets
Oliver Hartkopp [Fri, 4 Dec 2020 13:35:07 +0000 (14:35 +0100)]
can: isotp: isotp_setsockopt(): block setsockopt on bound sockets

The isotp socket can be widely configured in its behaviour regarding addressing
types, fill-ups, receive pattern tests and link layer length. Usually all
these settings need to be fixed before bind() and can not be changed
afterwards.

This patch adds a check to enforce the common usage pattern.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Thomas Wagner <thwa1@web.de>
Link: https://lore.kernel.org/r/20201203140604.25488-2-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20201204133508.742120-3-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'bpf-xdp-offload-fixes'
Daniel Borkmann [Wed, 9 Dec 2020 15:27:42 +0000 (16:27 +0100)]
Merge branch 'bpf-xdp-offload-fixes'

Toke Høiland-Jørgensen says:

====================
This series restores the test_offload.py selftest to working order. It seems a
number of subtle behavioural changes have crept into various subsystems which
broke test_offload.py in a number of ways. Most of these are fairly benign
changes where small adjustments to the test script seems to be the best fix,
but one is an actual kernel bug that I've observed in the wild caused by a bad
interaction between xdp_attachment_flags_ok() and the rework of XDP program
handling in the core netdev code.

Patch 1 fixes the bug by removing xdp_attachment_flags_ok(), and the reminder of
the patches are adjustments to test_offload.py, including a new feature for
netdevsim to force a BPF verification fail. Please see the individual patches
for details.

Changelog:

v4:
  - Accidentally truncated the Fixes: hashes in patches 3/4 to 11 chars
v3:
  - Add Fixes: tags
v2:
  - Replace xdp_attachment_flags_ok() with a check in dev_xdp_attach()
  - Better packing of struct nsim_dev
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
3 years agoselftests/bpf/test_offload.py: Filter bpftool internal map when counting maps
Toke Høiland-Jørgensen [Wed, 9 Dec 2020 13:57:43 +0000 (14:57 +0100)]
selftests/bpf/test_offload.py: Filter bpftool internal map when counting maps

A few of the tests in test_offload.py expects to see a certain number of
maps created, and checks this by counting the number of maps returned by
bpftool. There is already a filter that will remove any maps already there
at the beginning of the test, but bpftool now creates a map for the PID
iterator rodata on each invocation, which makes the map count wrong. Fix
this by also filtering the pid_iter.rodata map by name when counting.

Fixes: d53dee3fe013 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226387.110217.9887866138149423444.stgit@toke.dk
3 years agoselftests/bpf/test_offload.py: Reset ethtool features after failed setting
Toke Høiland-Jørgensen [Wed, 9 Dec 2020 13:57:42 +0000 (14:57 +0100)]
selftests/bpf/test_offload.py: Reset ethtool features after failed setting

When setting the ethtool feature flag fails (as expected for the test), the
kernel now tracks that the feature was requested to be 'off' and refuses to
subsequently disable it again. So reset it back to 'on' so a subsequent
disable (that's not supposed to fail) can succeed.

Fixes: 417ec26477a5 ("selftests/bpf: add offload test based on netdevsim")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226280.110217.10696241563705667871.stgit@toke.dk
3 years agoselftests/bpf/test_offload.py: Fix expected case of extack messages
Toke Høiland-Jørgensen [Wed, 9 Dec 2020 13:57:41 +0000 (14:57 +0100)]
selftests/bpf/test_offload.py: Fix expected case of extack messages

Commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs
in net_device") changed the case of some of the extack messages being
returned when attaching of XDP programs failed. This broke test_offload.py,
so let's fix the test to reflect this.

Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226175.110217.11214100824416344952.stgit@toke.dk
3 years agoselftests/bpf/test_offload.py: Only check verifier log on verification fails
Toke Høiland-Jørgensen [Wed, 9 Dec 2020 13:57:40 +0000 (14:57 +0100)]
selftests/bpf/test_offload.py: Only check verifier log on verification fails

Since commit 6f8a57ccf851 ("bpf: Make verifier log more relevant by
default"), the verifier discards log messages for successfully-verified
programs. This broke test_offload.py which is looking for a verification
message from the driver callback. Change test_offload.py to use the toggle
in netdevsim to make the verification fail before looking for the
verification message.

Fixes: 6f8a57ccf851 ("bpf: Make verifier log more relevant by default")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226069.110217.12370824996153348073.stgit@toke.dk
3 years agonetdevsim: Add debugfs toggle to reject BPF programs in verifier
Toke Høiland-Jørgensen [Wed, 9 Dec 2020 13:57:39 +0000 (14:57 +0100)]
netdevsim: Add debugfs toggle to reject BPF programs in verifier

This adds a new debugfs toggle ('bpf_bind_verifier_accept') that can be
used to make netdevsim reject BPF programs from being accepted by the
verifier. If this toggle (which defaults to true) is set to false,
nsim_bpf_verify_insn() will return EOPNOTSUPP on the last
instruction (after outputting the 'Hello from netdevsim' verifier message).

This makes it possible to check the verification callback in the driver
from test_offload.py in selftests, since the verifier now clears the
verifier log on a successful load, hiding the message from the driver.

Fixes: 6f8a57ccf851 ("bpf: Make verifier log more relevant by default")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225964.110217.12584017165318065332.stgit@toke.dk
3 years agoselftests/bpf/test_offload.py: Remove check for program load flags match
Toke Høiland-Jørgensen [Wed, 9 Dec 2020 13:57:38 +0000 (14:57 +0100)]
selftests/bpf/test_offload.py: Remove check for program load flags match

Since we just removed the xdp_attachment_flags_ok() callback, also remove
the check for it in test_offload.py, and replace it with a test for the new
ambiguity-avoid check when multiple programs are loaded.

Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225858.110217.13036901876869496246.stgit@toke.dk
3 years agoxdp: Remove the xdp_attachment_flags_ok() callback
Toke Høiland-Jørgensen [Wed, 9 Dec 2020 13:57:37 +0000 (14:57 +0100)]
xdp: Remove the xdp_attachment_flags_ok() callback

Since commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF
programs in net_device"), the XDP program attachment info is now maintained
in the core code. This interacts badly with the xdp_attachment_flags_ok()
check that prevents unloading an XDP program with different load flags than
it was loaded with. In practice, two kinds of failures are seen:

- An XDP program loaded without specifying a mode (and which then ends up
  in driver mode) cannot be unloaded if the program mode is specified on
  unload.

- The dev_xdp_uninstall() hook always calls the driver callback with the
  mode set to the type of the program but an empty flags argument, which
  means the flags_ok() check prevents the program from being removed,
  leading to bpf prog reference leaks.

The original reason this check was added was to avoid ambiguity when
multiple programs were loaded. With the way the checks are done in the core
now, this is quite simple to enforce in the core code, so let's add a check
there and get rid of the xdp_attachment_flags_ok() callback entirely.

Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
3 years agonetfilter: nft_ct: Remove confirmation check for NFT_CT_ID
Brett Mastbergen [Tue, 8 Dec 2020 21:39:24 +0000 (16:39 -0500)]
netfilter: nft_ct: Remove confirmation check for NFT_CT_ID

Since commit 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id
hash calculation") the ct id will not change from initialization to
confirmation.  Removing the confirmation check allows for things like
adding an element to a 'typeof ct id' set in prerouting upon reception
of the first packet of a new connection, and then being able to
reference that set consistently both before and after the connection
is confirmed.

Fixes: 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation")
Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomm/madvise: remove racy mm ownership check
Minchan Kim [Wed, 9 Dec 2020 04:57:18 +0000 (20:57 -0800)]
mm/madvise: remove racy mm ownership check

Jann spotted the security hole due to race of mm ownership check.

If the task is sharing the mm_struct but goes through execve() before
mm_access(), it could skip process_madvise_behavior_valid check.  That
makes *any advice hint* to reach into the remote process.

This patch removes the mm ownership check.  With it, it will lose the
ability that local process could give *any* advice hint with vector
interface for some reason (e.g., performance).  Since there is no
concrete example in upstream yet, it would be better to remove the
abiliity at this moment and need to review when such new advice comes
up.

Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agobpf, doc: Update KP's email in MAINTAINERS
KP Singh [Tue, 8 Dec 2020 21:49:00 +0000 (22:49 +0100)]
bpf, doc: Update KP's email in MAINTAINERS

Helps me use a single account to sign off and send patches use
appropriate email redirection without needing to update MAINTAINERS.

Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201208214900.80684-1-kpsingh@kernel.org
3 years agotcp: select sane initial rcvq_space.space for big MSS
Eric Dumazet [Tue, 8 Dec 2020 16:21:31 +0000 (08:21 -0800)]
tcp: select sane initial rcvq_space.space for big MSS

Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.

This is no longer the case, and Hazem Mohamed Abuelfotoh reported
that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.

Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
of rcvq_space.space computation, while it can select later a smaller
value for tp->rcv_ssthresh and tp->window_clamp.

ss -temoi on receiver would show :

skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596

This means that TCP can not increase its window in tcp_grow_window(),
and that DRS can never kick.

Fix this by making sure that rcvq_space.space is not bigger than number of bytes
that can be held in TCP receive queue.

People unable/unwilling to change their kernel can work around this issue by
selecting a bigger tcp_rmem[1] value as in :

echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem

Based on an initial report and patch from Hazem Mohamed Abuelfotoh
 https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/

Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ll_temac: Fix potential NULL dereference in temac_probe()
Zhang Changzhong [Tue, 8 Dec 2020 01:53:42 +0000 (09:53 +0800)]
net: ll_temac: Fix potential NULL dereference in temac_probe()

platform_get_resource() may fail and in this case a NULL dereference
will occur.

Fix it to use devm_platform_ioremap_resource() instead of calling
platform_get_resource() and devm_ioremap().

This is detected by Coccinelle semantic patch.

@@
expression pdev, res, n, t, e, e1, e2;
@@

res = \(platform_get_resource\|platform_get_resource_byname\)(pdev, t, n);
+ if (!res)
+   return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);

Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree platforms")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoafs: Fix memory leak when mounting with multiple source parameters
David Howells [Tue, 8 Dec 2020 23:52:03 +0000 (23:52 +0000)]
afs: Fix memory leak when mounting with multiple source parameters

There's a memory leak in afs_parse_source() whereby multiple source=
parameters overwrite fc->source in the fs_context struct without freeing
the previously recorded source.

Fix this by only permitting a single source parameter and rejecting with
an error all subsequent ones.

This was caught by syzbot with the kernel memory leak detector, showing
something like the following trace:

  unreferenced object 0xffff888114375440 (size 32):
    comm "repro", pid 5168, jiffies 4294923723 (age 569.948s)
    backtrace:
      slab_post_alloc_hook+0x42/0x79
      __kmalloc_track_caller+0x125/0x16a
      kmemdup_nul+0x24/0x3c
      vfs_parse_fs_string+0x5a/0xa1
      generic_parse_monolithic+0x9d/0xc5
      do_new_mount+0x10d/0x15a
      do_mount+0x5f/0x8e
      __do_sys_mount+0xff/0x127
      do_syscall_64+0x2d/0x3a
      entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 13fcc6837049 ("afs: Add fs_context support")
Reported-by: syzbot+86dc6632faaca40133ab@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agonet: tipc: prevent possible null deref of link
Cengiz Can [Mon, 7 Dec 2020 08:14:24 +0000 (11:14 +0300)]
net: tipc: prevent possible null deref of link

`tipc_node_apply_property` does a null check on a `tipc_link_entry`
pointer but also accesses the same pointer out of the null check block.

This triggers a warning on Coverity Static Analyzer because we're
implying that `e->link` can BE null.

Move "Update MTU for node link entry" line into if block to make sure
that we're not in a state that `e->link` is null.

Signed-off-by: Cengiz Can <cengiz@kernel.wtf>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'sunxi-fixes-for-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Tue, 8 Dec 2020 23:13:37 +0000 (00:13 +0100)]
Merge tag 'sunxi-fixes-for-5.10-3' of git://git./linux/kernel/git/sunxi/linux into arm/fixes

A few more RGMII-ID fixes

* tag 'sunxi-fixes-for-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: dts: allwinner: H5: NanoPi Neo Plus2: phy-mode rgmii-id
  arm64: dts: allwinner: A64 Sopine: phy-mode rgmii-id

Link: https://lore.kernel.org/r/2a351c9c-470f-4c5e-ba37-80065ae0586d.lettre@localhost
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 8 Dec 2020 23:03:39 +0000 (15:03 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs

Pull sparc64 csum fix from Al Viro:
 "Fix for a brown paperbag regression in sparc64"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [regression fix] really dumb fuckup in sparc64 __csum_partial_copy() changes

3 years agoRevert "scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug"
Linus Torvalds [Tue, 8 Dec 2020 23:00:36 +0000 (15:00 -0800)]
Revert "scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug"

This reverts commit 103fbf8e4020845e4fcf63819288cedb092a3c91.

It turns out that it causes long boot-time latencies (to the point of
timeouts and failed boots).

The cause is the increase in request queues, and a fix for that is
queued up for 5.11, but we're reverting this commit that triggered the
problem for now.

Reported-and-tested-by: John Garry <john.garry@huawei.com>
Reported-and-tested-by: Julia Lawall <julia.lawall@inria.fr>
Reported-by: Qian Cai <cai@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/linux-scsi/fe3dff7dae4494e5a88caffbb4d877bbf472dceb.camel@redhat.com/
Link: https://lore.kernel.org/lkml/alpine.DEB.2.22.394.2012081813310.2680@hadrien/
Link: https://lore.kernel.org/linux-block/20201203012638.543321-1-ming.lei@redhat.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge branch 'stmmac-fixes'
David S. Miller [Tue, 8 Dec 2020 22:52:29 +0000 (14:52 -0800)]
Merge branch 'stmmac-fixes'

Joakim Zhang says:

====================
patches for stmmac

A patch set for stmmac, fix some driver issues.

ChangeLogs:
V1->V2:
* add Fixes tag.
* add patch 5/5 into this patch set.

V2->V3:
* rebase to latest net tree where fixes go.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: overwrite the dma_cap.addr64 according to HW design
Fugang Duan [Mon, 7 Dec 2020 10:51:41 +0000 (18:51 +0800)]
net: stmmac: overwrite the dma_cap.addr64 according to HW design

The current IP register MAC_HW_Feature1[ADDR64] only defines
32/40/64 bit width, but some SOCs support others like i.MX8MP
support 34 bits but it maps to 40 bits width in MAC_HW_Feature1[ADDR64].
So overwrite dma_cap.addr64 according to HW real design.

Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: delete the eee_ctrl_timer after napi disabled
Fugang Duan [Mon, 7 Dec 2020 10:51:40 +0000 (18:51 +0800)]
net: stmmac: delete the eee_ctrl_timer after napi disabled

There have chance to re-enable the eee_ctrl_timer and fire the timer
in napi callback after delete the timer in .stmmac_release(), which
introduces to access eee registers in the timer function after clocks
are disabled then causes system hang. Found this issue when do
suspend/resume and reboot stress test.

It is safe to delete the timer after napi disabled and disable lpi mode.

Fixes: d765955d2ae0b ("stmmac: add the Energy Efficient Ethernet support")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: free tx skb buffer in stmmac_resume()
Fugang Duan [Mon, 7 Dec 2020 10:51:39 +0000 (18:51 +0800)]
net: stmmac: free tx skb buffer in stmmac_resume()

When do suspend/resume test, there have WARN_ON() log dump from
stmmac_xmit() funciton, the code logic:
entry = tx_q->cur_tx;
first_entry = entry;
WARN_ON(tx_q->tx_skbuff[first_entry]);

In normal case, tx_q->tx_skbuff[txq->cur_tx] should be NULL because
the skb should be handled and freed in stmmac_tx_clean().

But stmmac_resume() reset queue parameters like below, skb buffers
may not be freed.
tx_q->cur_tx = 0;
tx_q->dirty_tx = 0;

So free tx skb buffer in stmmac_resume() to avoid warning and
memory leak.

log:
[   46.139824] ------------[ cut here ]------------
[   46.144453] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3235 stmmac_xmit+0x7a0/0x9d0
[   46.154969] Modules linked in: crct10dif_ce vvcam(O) flexcan can_dev
[   46.161328] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.4.24-2.1.0+g2ad925d15481 #1
[   46.170369] Hardware name: NXP i.MX8MPlus EVK board (DT)
[   46.175677] pstate: 80000005 (Nzcv daif -PAN -UAO)
[   46.180465] pc : stmmac_xmit+0x7a0/0x9d0
[   46.184387] lr : dev_hard_start_xmit+0x94/0x158
[   46.188913] sp : ffff800010003cc0
[   46.192224] x29: ffff800010003cc0 x28: ffff000177e2a100
[   46.197533] x27: ffff000176ef0840 x26: ffff000176ef0090
[   46.202842] x25: 0000000000000000 x24: 0000000000000000
[   46.208151] x23: 0000000000000003 x22: ffff8000119ddd30
[   46.213460] x21: ffff00017636f000 x20: ffff000176ef0cc0
[   46.218769] x19: 0000000000000003 x18: 0000000000000000
[   46.224078] x17: 0000000000000000 x16: 0000000000000000
[   46.229386] x15: 0000000000000079 x14: 0000000000000000
[   46.234695] x13: 0000000000000003 x12: 0000000000000003
[   46.240003] x11: 0000000000000010 x10: 0000000000000010
[   46.245312] x9 : ffff00017002b140 x8 : 0000000000000000
[   46.250621] x7 : ffff00017636f000 x6 : 0000000000000010
[   46.255930] x5 : 0000000000000001 x4 : ffff000176ef0000
[   46.261238] x3 : 0000000000000003 x2 : 00000000ffffffff
[   46.266547] x1 : ffff000177e2a000 x0 : 0000000000000000
[   46.271856] Call trace:
[   46.274302]  stmmac_xmit+0x7a0/0x9d0
[   46.277874]  dev_hard_start_xmit+0x94/0x158
[   46.282056]  sch_direct_xmit+0x11c/0x338
[   46.285976]  __qdisc_run+0x118/0x5f0
[   46.289549]  net_tx_action+0x110/0x198
[   46.293297]  __do_softirq+0x120/0x23c
[   46.296958]  irq_exit+0xb8/0xd8
[   46.300098]  __handle_domain_irq+0x64/0xb8
[   46.304191]  gic_handle_irq+0x5c/0x148
[   46.307936]  el1_irq+0xb8/0x180
[   46.311076]  cpuidle_enter_state+0x84/0x360
[   46.315256]  cpuidle_enter+0x34/0x48
[   46.318829]  call_cpuidle+0x18/0x38
[   46.322314]  do_idle+0x1e0/0x280
[   46.325539]  cpu_startup_entry+0x24/0x40
[   46.329460]  rest_init+0xd4/0xe0
[   46.332687]  arch_call_rest_init+0xc/0x14
[   46.336695]  start_kernel+0x420/0x44c
[   46.340353] ---[ end trace bc1ee695123cbacd ]---

Fixes: 47dd7a540b8a0 ("net: add support for STMicroelectronics Ethernet controllers.")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: start phylink instance before stmmac_hw_setup()
Fugang Duan [Mon, 7 Dec 2020 10:51:38 +0000 (18:51 +0800)]
net: stmmac: start phylink instance before stmmac_hw_setup()

Start phylink instance and resume back the PHY to supply
RX clock to MAC before MAC layer initialization by calling
.stmmac_hw_setup(), since DMA reset depends on the RX clock,
otherwise DMA reset cost maximum timeout value then finally
timeout.

Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: increase the timeout for dma reset
Fugang Duan [Mon, 7 Dec 2020 10:51:37 +0000 (18:51 +0800)]
net: stmmac: increase the timeout for dma reset

Current timeout value is not enough for gmac5 dma reset
on imx8mp platform, increase the timeout range.

Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years ago[regression fix] really dumb fuckup in sparc64 __csum_partial_copy() changes
Al Viro [Tue, 8 Dec 2020 21:37:47 +0000 (16:37 -0500)]
[regression fix] really dumb fuckup in sparc64 __csum_partial_copy() changes

~0U is -1, not 1

Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Fixes: fdf8bee96f9a "sparc64: propagate the calling convention changes down to __csum_partial_copy_...()"
X-brown-paperbag: yes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3 years agonetfilter: nftables: comment indirect serialization of commit_mutex with rtnl_mutex
Pablo Neira Ayuso [Tue, 8 Dec 2020 17:57:02 +0000 (18:57 +0100)]
netfilter: nftables: comment indirect serialization of commit_mutex with rtnl_mutex

Add an explicit comment in the code to describe the indirect
serialization of the holders of the commit_mutex with the rtnl_mutex.
Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on
netdevice from preparation phase") already describes this, but a comment
in this case is better for reference.

Reported-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 8 Dec 2020 20:20:34 +0000 (12:20 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs

Pull seq_file fix from Al Viro:
 "This fixes a regression introduced in this cycle wrt iov_iter based
  variant for reading a seq_file"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix return values of seq_read_iter()

3 years agonetfilter: nft_dynset: fix timeouts later than 23 days
Pablo Neira Ayuso [Tue, 8 Dec 2020 17:25:53 +0000 (18:25 +0100)]
netfilter: nft_dynset: fix timeouts later than 23 days

Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23
days"), otherwise ruleset listing breaks.

Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agobonding: fix feature flag setting at init time
Jarod Wilson [Sat, 5 Dec 2020 17:22:29 +0000 (12:22 -0500)]
bonding: fix feature flag setting at init time

Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. This code
runs both on post-module-load mode changes, as well as at module init
time, and when run at module init time, it is before register_netdevice()
has been called and filled in wanted_features. The empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

    esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera <ivecera@redhat.com>
Suggested-by: Ivan Vecera <ivecera@redhat.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Link: https://lore.kernel.org/r/20201205172229.576587-1-jarod@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotools/bpftool: Fix PID fetching with a lot of results
Andrii Nakryiko [Fri, 4 Dec 2020 23:20:01 +0000 (15:20 -0800)]
tools/bpftool: Fix PID fetching with a lot of results

In case of having so many PID results that they don't fit into a singe page
(4096) bytes, bpftool will erroneously conclude that it got corrupted data due
to 4096 not being a multiple of struct pid_iter_entry, so the last entry will
be partially truncated. Fix this by sizing the buffer to fit exactly N entries
with no truncation in the middle of record.

Fixes: d53dee3fe013 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201204232002.3589803-1-andrii@kernel.org
3 years agonetfilter: x_tables: Switch synchronization to RCU
Subash Abhinov Kasiviswanathan [Wed, 25 Nov 2020 18:27:22 +0000 (11:27 -0700)]
netfilter: x_tables: Switch synchronization to RCU

When running concurrent iptables rules replacement with data, the per CPU
sequence count is checked after the assignment of the new information.
The sequence count is used to synchronize with the packet path without the
use of any explicit locking. If there are any packets in the packet path using
the table information, the sequence count is incremented to an odd value and
is incremented to an even after the packet process completion.

The new table value assignment is followed by a write memory barrier so every
CPU should see the latest value. If the packet path has started with the old
table information, the sequence counter will be odd and the iptables
replacement will wait till the sequence count is even prior to freeing the
old table info.

However, this assumes that the new table information assignment and the memory
barrier is actually executed prior to the counter check in the replacement
thread. If CPU decides to execute the assignment later as there is no user of
the table information prior to the sequence check, the packet path in another
CPU may use the old table information. The replacement thread would then free
the table information under it leading to a use after free in the packet
processing context-

Unable to handle kernel NULL pointer dereference at virtual
address 000000000000008e
pc : ip6t_do_table+0x5d0/0x89c
lr : ip6t_do_table+0x5b8/0x89c
ip6t_do_table+0x5d0/0x89c
ip6table_filter_hook+0x24/0x30
nf_hook_slow+0x84/0x120
ip6_input+0x74/0xe0
ip6_rcv_finish+0x7c/0x128
ipv6_rcv+0xac/0xe4
__netif_receive_skb+0x84/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c

This could be fixed by forcing instruction order after the new table
information assignment or by switching to RCU for the synchronization.

Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
Reported-by: Sean Tranchetti <stranche@codeaurora.org>
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agomedia: vidtv: fix some warnings
Mauro Carvalho Chehab [Tue, 8 Dec 2020 06:57:13 +0000 (07:57 +0100)]
media: vidtv: fix some warnings

As reported by sparse:

drivers/media/test-drivers/vidtv/vidtv_ts.h:47:47: warning: array of flexible structures
drivers/media/test-drivers/vidtv/vidtv_channel.c:458:54: warning: incorrect type in argument 3 (different base types)
drivers/media/test-drivers/vidtv/vidtv_channel.c:458:54:    expected unsigned short [usertype] service_id
drivers/media/test-drivers/vidtv/vidtv_channel.c:458:54:    got restricted __be16 [usertype] service_id
drivers/media/test-drivers/vidtv/vidtv_s302m.c:471 vidtv_s302m_encoder_init() warn: possible memory leak of 'e'

Address such warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
3 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Jakub Kicinski [Tue, 8 Dec 2020 02:29:53 +0000 (18:29 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2020-12-07

1) Sysbot reported fixes for the new 64/32 bit compat layer.
   From Dmitry Safonov.

2) Fix a memory leak in xfrm_user_policy that was introduced
   by adding the 64/32 bit compat layer. From Yu Kuai.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  net: xfrm: fix memory leak in xfrm_user_policy()
  xfrm/compat: Don't allocate memory with __GFP_ZERO
  xfrm/compat: memset(0) 64-bit padding at right place
  xfrm/compat: Translate by copying XFRMA_UNSPEC attribute
====================

Link: https://lore.kernel.org/r/20201207093937.2874932-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: stmmac: dwmac-meson8b: fix mask definition of the m250_sel mux
Martin Blumenstingl [Sat, 5 Dec 2020 21:32:07 +0000 (22:32 +0100)]
net: stmmac: dwmac-meson8b: fix mask definition of the m250_sel mux

The m250_sel mux clock uses bit 4 in the PRG_ETH0 register. Fix this by
shifting the PRG_ETH0_CLK_M250_SEL_MASK accordingly as the "mask" in
struct clk_mux expects the mask relative to the "shift" field in the
same struct.

While here, get rid of the PRG_ETH0_CLK_M250_SEL_SHIFT macro and use
__ffs() to determine it from the existing PRG_ETH0_CLK_M250_SEL_MASK
macro.

Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20201205213207.519341-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agodpaa2-mac: Add a missing of_node_put after of_device_is_available
Christophe JAILLET [Sun, 6 Dec 2020 15:13:39 +0000 (16:13 +0100)]
dpaa2-mac: Add a missing of_node_put after of_device_is_available

Add an 'of_node_put()' call when a tested device node is not available.

Fixes: 94ae899b2096 ("dpaa2-mac: add PCS support through the Lynx module")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201206151339.44306-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: print new line in mptcp_seq_show() if mptcp isn't in use
Jianguo Wu [Sat, 5 Dec 2020 07:56:33 +0000 (15:56 +0800)]
mptcp: print new line in mptcp_seq_show() if mptcp isn't in use

When do cat /proc/net/netstat, the output isn't append with a new line, it looks like this:
[root@localhost ~]# cat /proc/net/netstat
...
MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0[root@localhost ~]#

This is because in mptcp_seq_show(), if mptcp isn't in use, net->mib.mptcp_statistics is NULL,
so it just puts all 0 after "MPTcpExt:", and return, forgot the '\n'.

After this patch:

[root@localhost ~]# cat /proc/net/netstat
...
MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0
[root@localhost ~]#

Fixes: fc518953bc9c8d7d ("mptcp: add and use MIB counter infrastructure")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/142e2fd9-58d9-bb13-fb75-951cccc2331e@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agobridge: Fix a deadlock when enabling multicast snooping
Joseph Huang [Fri, 4 Dec 2020 23:56:28 +0000 (18:56 -0500)]
bridge: Fix a deadlock when enabling multicast snooping

When enabling multicast snooping, bridge module deadlocks on multicast_lock
if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
network.

The deadlock was caused by the following sequence: While holding the lock,
br_multicast_open calls br_multicast_join_snoopers, which eventually causes
IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
Since the destination Ethernet address is a multicast address, br_dev_xmit
feeds the packet back to the bridge via br_multicast_rcv, which in turn
calls br_multicast_add_group, which then deadlocks on multicast_lock.

The fix is to move the call br_multicast_join_snoopers outside of the
critical section. This works since br_multicast_join_snoopers only deals
with IP and does not modify any multicast data structures of the bridge,
so there's no need to hold the lock.

Steps to reproduce:
1. sysctl net.ipv6.conf.all.force_mld_version=1
2. have another querier
3. ip link set dev bridge type bridge mcast_snooping 0 && \
   ip link set dev bridge type bridge mcast_snooping 1 < deadlock >

A typical call trace looks like the following:

[  936.251495]  _raw_spin_lock+0x5c/0x68
[  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
[  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
[  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
[  936.269689]  dev_hard_start_xmit+0x94/0x158
[  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
[  936.277890]  dev_queue_xmit+0x10/0x18
[  936.281563]  neigh_resolve_output+0xec/0x198
[  936.285845]  ip6_finish_output2+0x240/0x710
[  936.290039]  __ip6_finish_output+0x130/0x170
[  936.294318]  ip6_output+0x6c/0x1c8
[  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
[  936.301834]  igmp6_send+0x358/0x558
[  936.305326]  igmp6_join_group.part.0+0x30/0xf0
[  936.309774]  igmp6_group_added+0xfc/0x110
[  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
[  936.317885]  ipv6_dev_mc_inc+0x10/0x18
[  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
[  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]

Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoenetc: Fix reporting of h/w packet counters
Claudiu Manoil [Fri, 4 Dec 2020 17:15:05 +0000 (19:15 +0200)]
enetc: Fix reporting of h/w packet counters

Noticed some inconsistencies in packet statistics reporting.
This patch adds the missing Tx packet counter registers to
ethtool reporting and fixes the information strings for a
few of them.

Fixes: 16eb4c85c964 ("enetc: Add ethtool statistics")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201204171505.21389-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoRDMA/core: Fix empty gid table for non IB/RoCE devices
Gal Pressman [Sun, 6 Dec 2020 15:32:38 +0000 (17:32 +0200)]
RDMA/core: Fix empty gid table for non IB/RoCE devices

The query_gid_table ioctl skips non IB/RoCE ports, which as a result
returns an empty gid table for devices such as EFA which have a GID table,
but are not IB/RoCE.

Fixes: c4b4d548fabc ("RDMA/core: Introduce new GID table query API")
Link: https://lore.kernel.org/r/20201206153238.34878-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agolwt_bpf: Replace preempt_disable() with migrate_disable()
Cong Wang [Sat, 5 Dec 2020 07:59:46 +0000 (23:59 -0800)]
lwt_bpf: Replace preempt_disable() with migrate_disable()

migrate_disable() is just a wrapper for preempt_disable() in
non-RT kernel. It is safe to replace it, and RT kernel will
benefit.

Note that it is introduced since Feb 2020.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201205075946.497763-2-xiyou.wangcong@gmail.com
3 years agolwt: Disable BH too in run_lwt_bpf()
Dongdong Wang [Sat, 5 Dec 2020 07:59:45 +0000 (23:59 -0800)]
lwt: Disable BH too in run_lwt_bpf()

The per-cpu bpf_redirect_info is shared among all skb_do_redirect()
and BPF redirect helpers. Callers on RX path are all in BH context,
disabling preemption is not sufficient to prevent BH interruption.

In production, we observed strange packet drops because of the race
condition between LWT xmit and TC ingress, and we verified this issue
is fixed after we disable BH.

Although this bug was technically introduced from the beginning, that
is commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure"),
at that time call_rcu() had to be call_rcu_bh() to match the RCU context.
So this patch may not work well before RCU flavor consolidation has been
completed around v5.0.

Update the comments above the code too, as call_rcu() is now BH friendly.

Signed-off-by: Dongdong Wang <wangdongdong.6@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/bpf/20201205075946.497763-1-xiyou.wangcong@gmail.com
3 years agoMerge tag 'trace-v5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Mon, 7 Dec 2020 19:20:26 +0000 (11:20 -0800)]
Merge tag 'trace-v5.10-rc7' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix userstacktrace option for instances

  While writing an application that requires user stack trace option to
  work in instances, I found that the instance option has a bug that
  makes it a nop. The check for performing the user stack trace in an
  instance, checks the top level options (not the instance options) to
  determine if a user stack trace should be performed or not.

  This is not only incorrect, but also confusing for users. It confused
  me for a bit!"

* tag 'trace-v5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix userstacktrace option for instances

3 years agoMAINTAINERS: add a limited ARM and ARM64 SoC entry
Krzysztof Kozlowski [Tue, 1 Dec 2020 21:15:16 +0000 (23:15 +0200)]
MAINTAINERS: add a limited ARM and ARM64 SoC entry

It is expected for ARM and ARM64 SoC related code to go through
sub-architecture maintainers.  Their addresses were therefore not
documented to push patch traffic through sub-architecture maintainers.

However when patches touch generic code, e.g. multi_v7_defconfig, the
patch might not be picked up by them and instead should go to the SoC
maintainers - Arnd and Olof.

Add a minimal maintainer's entry for SoC covering only Makefile, so it
will not appear on most of submissions (except new devicetree boards).
It will though serve as a documentation and reference for cases when
submitter does not know where to send his SoC-related patches.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20201201211516.24921-2-krzk@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoMAINTAINERS: correct SoC Git address (formerly: arm-soc)
Krzysztof Kozlowski [Tue, 1 Dec 2020 21:15:15 +0000 (23:15 +0200)]
MAINTAINERS: correct SoC Git address (formerly: arm-soc)

The SoC Git was moved from arm/arm-soc.git to soc/soc.git.  Correct the
ARM Sub-architectures entry.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20201201211516.24921-1-krzk@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoARM: keystone: remove SECTION_SIZE_BITS/MAX_PHYSMEM_BITS
Arnd Bergmann [Thu, 3 Dec 2020 23:18:40 +0000 (00:18 +0100)]
ARM: keystone: remove SECTION_SIZE_BITS/MAX_PHYSMEM_BITS

These definitions are evidently left over from the days when
sparsemem settings were platform specific. This was no longer
the case when the platform got merged.

There was no warning in the past, but now the asm/sparsemem.h
header ends up being included indirectly, causing this warning:

In file included from /git/arm-soc/arch/arm/mach-keystone/keystone.c:24:
arch/arm/mach-keystone/memory.h:10:9: warning: 'SECTION_SIZE_BITS' macro redefined [-Wmacro-redefined]
 #define SECTION_SIZE_BITS       34
        ^
arch/arm/include/asm/sparsemem.h:23:9: note: previous definition is here
 #define SECTION_SIZE_BITS       28
        ^

Clearly the definitions never had any effect here, so remove them.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201203231847.1484900-1-arnd@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoMerge tag 'imx-fixes-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawngu...
Arnd Bergmann [Mon, 7 Dec 2020 14:28:59 +0000 (15:28 +0100)]
Merge tag 'imx-fixes-5.10-5' of git://git./linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 5.10, round 5:

- Fix a regression on SoC revision detection with ANATOP, that is
  introduced by commit  4a4fb66119eb ("ARM: imx: Add missing
  of_node_put()").
- Drop PAD_GPIO_6 from imx6qdl-wandboard ENET pin group, as the pin is
  used by camera sensor now.
- Fix I2C3_SCL pinmux on imx6qdl-kontron-samx6i board, so that SoM EEPROM
  can be accessed.

* tag 'imx-fixes-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: imx6qdl-kontron-samx6i: fix I2C_PM scl pin
  ARM: dts: imx6qdl-wandboard-revd1: Remove PAD_GPIO_6 from enetgrp
  ARM: imx: Use correct SRC base address

Link: https://lore.kernel.org/r/20201201091820.GW4072@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoMerge tag 'sunxi-fixes-for-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Mon, 7 Dec 2020 14:28:27 +0000 (15:28 +0100)]
Merge tag 'sunxi-fixes-for-5.10-2' of git://git./linux/kernel/git/sunxi/linux into arm/fixes

A few more RGMII-ID fixes, and a bunch of other more random fixes

* tag 'sunxi-fixes-for-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  ARM: dts: sun7i: pcduino3-nano: enable RGMII RX/TX delay on PHY
  ARM: dts: sun8i: v3s: fix GIC node memory range
  ARM: dts: sun8i: v40: bananapi-m2-berry: Fix ethernet node
  ARM: dts: sun8i: r40: bananapi-m2-berry: Fix dcdc1 regulator
  ARM: dts: sun7i: bananapi: Enable RGMII RX/TX delay on Ethernet PHY
  ARM: dts: s3: pinecube: align compatible property to other S3 boards
  ARM: sunxi: Add machine match for the Allwinner V3 SoC
  arm64: dts: allwinner: h6: orangepi-one-plus: Fix ethernet

Link: https://lore.kernel.org/r/1280f1de-1b6d-4cc2-8448-e5a9096a41e8.lettre@localhost
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 years agoiommu/amd: Set DTE[IntTabLen] to represent 512 IRTEs
Suravee Suthikulpanit [Mon, 7 Dec 2020 09:19:20 +0000 (03:19 -0600)]
iommu/amd: Set DTE[IntTabLen] to represent 512 IRTEs

According to the AMD IOMMU spec, the commit 73db2fc595f3
("iommu/amd: Increase interrupt remapping table limit to 512 entries")
also requires the interrupt table length (IntTabLen) to be set to 9
(power of 2) in the device table mapping entry (DTE).

Fixes: 73db2fc595f3 ("iommu/amd: Increase interrupt remapping table limit to 512 entries")
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20201207091920.3052-1-suravee.suthikulpanit@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoudp: fix the proto value passed to ip_protocol_deliver_rcu for the segments
Xin Long [Mon, 7 Dec 2020 07:55:40 +0000 (15:55 +0800)]
udp: fix the proto value passed to ip_protocol_deliver_rcu for the segments

Guillaume noticed that: for segments udp_queue_rcv_one_skb() returns the
proto, and it should pass "ret" unmodified to ip_protocol_deliver_rcu().
Otherwize, with a negtive value passed, it will underflow inet_protos.

This can be reproduced with IPIP FOU:

  # ip fou add port 5555 ipproto 4
  # ethtool -K eth1 rx-gro-list on

Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: remove a misused pragma packed
Huazhong Tan [Mon, 7 Dec 2020 07:20:25 +0000 (15:20 +0800)]
net: hns3: remove a misused pragma packed

hclge_dbg_reg_info[] is defined as an array of packed structure
accidentally. However, this array contains pointers, which are
no longer aligned naturally, and cannot be relocated on PPC64.
Hence, when compile-testing this driver on PPC64 with
CONFIG_RELOCATABLE=y (e.g. PowerPC allyesconfig), there will be
some warnings.

Since each field in structure hclge_qos_pri_map_cmd and
hclge_dbg_bitmap_cmd is type u8, the pragma packed is unnecessary
for these two structures as well, so remove the pragma packed in
hclge_debugfs.h to fix this issue, and this increases
hclge_dbg_reg_info[] by 4 bytes per entry.

Fixes: a582b78dfc33 ("net: hns3: code optimization for debugfs related to "dump reg"")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoLinux 5.10-rc7
Linus Torvalds [Sun, 6 Dec 2020 22:25:12 +0000 (14:25 -0800)]
Linux 5.10-rc7

3 years agoMerge tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 6 Dec 2020 19:48:17 +0000 (11:48 -0800)]
Merge tag 'char-misc-5.10-rc7' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small driver fixes, and one "large" revert, for
  5.10-rc7.

  They include:

   - revert mei patch from 5.10-rc1 that was using a reserved userspace
     value. It will be resubmitted once the proper id has been assigned
     by the virtio people.

   - habanalabs fixes found by the fall-through audit from Gustavo

   - speakup driver fixes for reported issues

   - fpga config build fix for reported issue.

  All of these except the revert have been in linux-next with no
  reported issues. The revert is "clean" and just removes a
  previously-added driver, so no real issue there"

* tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "mei: virtio: virtualization frontend driver"
  fpga: Specify HAS_IOMEM dependency for FPGA_DFL
  habanalabs: put devices before driver removal
  habanalabs: free host huge va_range if not used
  speakup: Reject setting the speakup line discipline outside of speakup

3 years agoMerge tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 6 Dec 2020 19:43:50 +0000 (11:43 -0800)]
Merge tag 'tty-5.10-rc7' of git://git./linux/kernel/git/gregkh/tty

Pull tty fixes from Greg KH:
 "Here are two tty core fixes for 5.10-rc7.

  They resolve some reported locking issues in the tty core. While they
  have not been in a released linux-next yet, they have passed all of
  the 0-day bot testing as well as the submitter's testing"

* tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: Fix ->session locking
  tty: Fix ->pgrp locking in tiocspgrp()

3 years agoMerge tag 'usb-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 6 Dec 2020 19:38:36 +0000 (11:38 -0800)]
Merge tag 'usb-5.10-rc7' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 5.10-rc7 that resolve a number of
  reported issues, and add some new device ids.

  Nothing major here, but these solve some problems that people were
  having with the 5.10-rc tree:

   - reverts for USB storage dma settings that broke working devices

   - thunderbolt use-after-free fix

   - cdns3 driver fixes

   - gadget driver userspace copy fix

   - new device ids

  All of these except for the reverts have been in linux-next with no
  reported issues. The reverts are "clean" and were tested by Hans, as
  well as passing the 0-day tests"

* tag 'usb-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: gadget: f_fs: Use local copy of descriptors for userspace copy
  usb: ohci-omap: Fix descriptor conversion
  Revert "usb-storage: fix sdev->host->dma_dev"
  Revert "uas: fix sdev->host->dma_dev"
  Revert "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
  USB: serial: kl5kusb105: fix memleak on open
  USB: serial: ch341: sort device-id entries
  USB: serial: ch341: add new Product ID for CH341A
  USB: serial: option: fix Quectel BG96 matching
  usb: cdns3: core: fix goto label for error path
  usb: cdns3: gadget: clear trb->length as zero after preparing every trb
  usb: cdns3: Fix hardware based role switch
  USB: serial: option: add support for Thales Cinterion EXS82
  USB: serial: option: add Fibocom NL668 variants
  thunderbolt: Fix use-after-free in remove_unplugged_switch()

3 years agoMerge tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 6 Dec 2020 19:22:39 +0000 (11:22 -0800)]
Merge tag 'x86-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Make the AMD L3 QoS code and data priorization enable/disable
     mechanism work correctly.

     The control bit was only set/cleared on one of the CPUs in a L3
     domain, but it has to be modified on all CPUs in the domain. The
     initial documentation was not clear about this, but the updated one
     from Oct 2020 spells it out.

   - Fix an off by one in the UV platform detection code which causes
     the UV hubs to be identified wrongly.

     The chip revisions start at 1 not at 0.

   - Fix a long standing bug in the evaluation of prefixes in the
     uprobes code which fails to handle repeated prefixes properly.

     The aggregate size of the prefixes can be larger than the bytes
     array but the code blindly iterated over the aggregate size beyond
     the array boundary. Add a macro to handle this case properly and
     use it at the affected places"

* tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
  x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
  x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
  x86/platform/uv: Fix UV4 hub revision adjustment
  x86/resctrl: Fix AMD L3 QOS CDP enable/disable

3 years agoMerge tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 6 Dec 2020 19:20:18 +0000 (11:20 -0800)]
Merge tag 'perf-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Two fixes for performance monitoring on X86:

   - Add recursion protection to another callchain invoked from
     x86_pmu_stop() which can recurse back into x86_pmu_stop(). The
     first attempt to fix this missed this extra code path.

   - Use the already filtered status variable to check for PEBS counter
     overflow bits and not the unfiltered full status read from
     IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
     would be evaluated incorrectly"

* tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Check PEBS status correctly
  perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS

3 years agoMerge tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 6 Dec 2020 19:15:55 +0000 (11:15 -0800)]
Merge tag 'irq-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of updates for the interrupt subsystem:

   - Make multiqueue devices which use the managed interrupt affinity
     infrastructure work on PowerPC/Pseries. PowerPC does not use the
     generic infrastructure for setting up PCI/MSI interrupts and the
     multiqueue changes failed to update the legacy PCI/MSI
     infrastructure. Make this work by passing the affinity setup
     information down to the mapping and allocation functions.

   - Move Jason Cooper from MAINTAINERS to CREDITS as his mail is
     bouncing and he's not reachable. We hope all is well with him and
     say thanks for his work over the years"

* tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc/pseries: Pass MSI affinity to irq_create_mapping()
  genirq/irqdomain: Add an irq_create_mapping_affinity() function
  MAINTAINERS: Move Jason Cooper to CREDITS

3 years agoMerge tag 'locking-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 6 Dec 2020 19:11:32 +0000 (11:11 -0800)]
Merge tag 'locking-urgent-2020-12-06' of git://git./linux/kernel/git/tip/tip

Pull intel_idle build fix from Thomas Gleixner:
 "A tiny build fix for a recent change in the intel_idle driver which
  missed a CONFIG dependency and broke the build for certain
  configurations"

* tag 'locking-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  intel_idle: Build fix

3 years agoMerge tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 6 Dec 2020 18:31:39 +0000 (10:31 -0800)]
Merge tag 'kbuild-fixes-v5.10-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Move -Wcast-align to W=3, which tends to be false-positive and there
   is no tree-wide solution.

 - Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a
   preprocessor option and makes sense for .S files as well.

 - Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.

 - Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.

 - Fix undesirable line breaks in *.mod files.

* tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: avoid split lines in .mod files
  kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
  kbuild: Hoist '--orphan-handling' into Kconfig
  Kbuild: do not emit debug info for assembly with LLVM_IAS=1
  kbuild: use -fmacro-prefix-map for .S sources
  Makefile.extrawarn: move -Wcast-align to W=3

3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 6 Dec 2020 18:20:59 +0000 (10:20 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "12 patches.

  Subsystems affected by this patch series: mm (memcg, zsmalloc, swap,
  mailmap, selftests, pagecache, hugetlb, pagemap), lib, and coredump"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/mmap.c: fix mmap return value when vma is merged after call_mmap()
  hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
  mm/filemap: add static for function __add_to_page_cache_locked
  userfaultfd: selftests: fix SIGSEGV if huge mmap fails
  tools/testing/selftests/vm: fix build error
  mailmap: add two more addresses of Uwe Kleine-König
  mm/swapfile: do not sleep with a spin lock held
  mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING
  mm: list_lru: set shrinker map bit when child nr_items is not zero
  mm: memcg/slab: fix obj_cgroup_charge() return value handling
  coredump: fix core_pattern parse error
  zlib: export S390 symbols for zlib modules

3 years agomm/mmap.c: fix mmap return value when vma is merged after call_mmap()
Liu Zixian [Sun, 6 Dec 2020 06:15:15 +0000 (22:15 -0800)]
mm/mmap.c: fix mmap return value when vma is merged after call_mmap()

On success, mmap should return the begin address of newly mapped area,
but patch "mm: mmap: merge vma after call_mmap() if possible" set
vm_start of newly merged vma to return value addr.  Users of mmap will
get wrong address if vma is merged after call_mmap().  We fix this by
moving the assignment to addr before merging vma.

We have a driver which changes vm_flags, and this bug is found by our
testcases.

Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hongxiang Lou <louhongxiang@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Link: https://lkml.kernel.org/r/20201203085350.22624-1-liuzixian4@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohugetlb_cgroup: fix offline of hugetlb cgroup with reservations
Mike Kravetz [Sun, 6 Dec 2020 06:15:12 +0000 (22:15 -0800)]
hugetlb_cgroup: fix offline of hugetlb cgroup with reservations

Adrian Moreno was ruuning a kubernetes 1.19 + containerd/docker workload
using hugetlbfs.  In this environment the issue is reproduced by:

 - Start a simple pod that uses the recently added HugePages medium
   feature (pod yaml attached)

 - Start a DPDK app. It doesn't need to run successfully (as in transfer
   packets) nor interact with real hardware. It seems just initializing
   the EAL layer (which handles hugepage reservation and locking) is
   enough to trigger the issue

 - Delete the Pod (or let it "Complete").

This would result in a kworker thread going into a tight loop (top output):

   1425 root      20   0       0      0      0 R  99.7   0.0   5:22.45 kworker/28:7+cgroup_destroy

'perf top -g' reports:

  -   63.28%     0.01%  [kernel]                    [k] worker_thread
     - 49.97% worker_thread
        - 52.64% process_one_work
           - 62.08% css_killed_work_fn
              - hugetlb_cgroup_css_offline
                   41.52% _raw_spin_lock
                 - 2.82% _cond_resched
                      rcu_all_qs
                   2.66% PageHuge
        - 0.57% schedule
           - 0.57% __schedule

We are spinning in the do-while loop in hugetlb_cgroup_css_offline.
Worse yet, we are holding the master cgroup lock (cgroup_mutex) while
infinitely spinning.  Little else can be done on the system as the
cgroup_mutex can not be acquired.

Do note that the issue can be reproduced by simply offlining a hugetlb
cgroup containing pages with reservation counts.

The loop in hugetlb_cgroup_css_offline is moving page counts from the
cgroup being offlined to the parent cgroup.  This is done for each
hstate, and is repeated until hugetlb_cgroup_have_usage returns false.
The routine moving counts (hugetlb_cgroup_move_parent) is only moving
'usage' counts.  The routine hugetlb_cgroup_have_usage is checking for
both 'usage' and 'reservation' counts.  Discussion about what to do with
reservation counts when reparenting was discussed here:

https://lore.kernel.org/linux-kselftest/CAHS8izMFAYTgxym-Hzb_JmkTK1N_S9tGN71uS6MFV+R7swYu5A@mail.gmail.com/

The decision was made to leave a zombie cgroup for with reservation
counts.  Unfortunately, the code checking reservation counts was
incorrectly added to hugetlb_cgroup_have_usage.

To fix the issue, simply remove the check for reservation counts.  While
fixing this issue, a related bug in hugetlb_cgroup_css_offline was
noticed.  The hstate index is not reinitialized each time through the
do-while loop.  Fix this as well.

Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
Reported-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201203220242.158165-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/filemap: add static for function __add_to_page_cache_locked
Alex Shi [Sun, 6 Dec 2020 06:15:09 +0000 (22:15 -0800)]
mm/filemap: add static for function __add_to_page_cache_locked

  mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Link: https://lkml.kernel.org/r/1604661895-5495-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>