git.osdn.net Git - android-x86/kernel.git/log

x86/crypto: aesni - fix crash in cryptomgr_test

Fix a freeze issue in early boot stages happening with x86_64
that could be avoided only with CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y

Bisecting gives bad commit is 1476db2d129d5
("crypto: aesni - Move HashKey computation from stack to gcm_context")

Use unaligned mov instructions to avoid general protection fault

Here follows the console log of the problem:

[    1.377775] general protection fault: 0000 [#1] PREEMPT SMP
[    1.378746] CPU: 3 PID: 958 Comm: cryptomgr_test Not tainted 4.18.0-rc8-android-x86_64-g1a7fa0435ab6 #24
[    1.378746] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[    1.378746] RIP: 0010:aesni_gcm_init+0x89/0x30f
[    1.378746] Code: 0f 6f ca 66 0f 73 fa 08 66 0f 73 d9 08 66 0f eb da 66 0f 70 d1 24 66 0f 76 15 83 11 03 01 66 0f db 15 6b 11 03 01 66 0f ef da <66> 0f 7f 5e 60 66 0f 6f eb 66 0f 70 cb 4e 66 0f ef cb 66 0f 7f 8e
[    1.378746] RSP: 0000:ffff9fef010377d8 EFLAGS: 00010246
[    1.378746] RAX: ffff9fef01037b98 RBX: 0000000000000010 RCX: ffff9ab4de49d050
[    1.378746] RDX: ffff9fef01037b98 RSI: ffff9fef010378c8 RDI: ffff9ab4de49d060
[    1.378746] RBP: 0000000000000000 R08: ffff9ab4de48e000 R09: 0000000000000008
[    1.378746] R10: ffff9ab55e0e1000 R11: 0000000000000000 R12: ffff9ab4de49d050
[    1.378746] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000008
[    1.378746] FS:  0000000000000000(0000) GS:ffff9ab4e3d80000(0000) knlGS:0000000000000000
[    1.378746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.378746] CR2: 0000000000000000 CR3: 000000009980a001 CR4: 00000000000606e0
[    1.378746] Call Trace:
[    1.378746]  ? gcmaes_crypt_by_sg+0x12c/0x610
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x40/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x40/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x40/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x40/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x40/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x40/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x40/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to+0x14d/0x4a0
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? cache_alloc_refill+0x600/0x8e0
[    1.378746]  ? gcmaes_encrypt+0x1b8/0x380
[    1.378746]  ? rfc4106_set_hash_subkey+0x65/0xb0
[    1.378746]  ? aesni_enc+0xf/0x14
[    1.378746]  ? rfc4106_set_hash_subkey+0x65/0xb0
[    1.378746]  ? crypto_aead_setkey+0xa6/0xe0
[    1.378746]  ? try_to_wake_up+0x4b0/0x4b0
[    1.378746]  ? crypto_aead_setkey+0xa6/0xe0
[    1.378746]  ? helper_rfc4106_encrypt+0x91/0xc0
[    1.378746]  ? __test_aead+0xd8c/0x1290
[    1.378746]  ? __kmalloc+0x126/0x200
[    1.378746]  ? crypto_create_tfm+0x39/0xe0
[    1.378746]  ? test_aead+0x33/0xd0
[    1.378746]  ? alg_test_aead+0x4d/0xb0
[    1.378746]  ? alg_test.part.11+0xd7/0x280
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? finish_task_switch+0x90/0x240
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __schedule+0x311/0x881
[    1.378746]  ? __wake_up_common+0x86/0x120
[    1.378746]  ? cryptomgr_probe+0xe0/0xe0
[    1.378746]  ? cryptomgr_test+0x40/0x50
[    1.378746]  ? kthread+0xfa/0x130
[    1.378746]  ? __switch_to_asm+0x34/0x70
[    1.378746]  ? __kthread_parkme+0x70/0x70
[    1.378746]  ? ret_from_fork+0x35/0x40
[    1.378746] Modules linked in:
[    1.490849] ---[ end trace db6c6409ac47aa26 ]---
[    1.491659] RIP: 0010:aesni_gcm_init+0x89/0x30f
[    1.492819] Code: 0f 6f ca 66 0f 73 fa 08 66 0f 73 d9 08 66 0f eb da 66 0f 70 d1 24 66 0f 76 15 83 11 03 01 66 0f db 15 6b 11 03 01 66 0f ef da <66> 0f 7f 5e 60 66 0f 6f eb 66 0f 70 cb 4e 66 0f ef cb 66 0f 7f 8e
[    1.497069] RSP: 0000:ffff9fef010377d8 EFLAGS: 00010246
[    1.498193] RAX: ffff9fef01037b98 RBX: 0000000000000010 RCX: ffff9ab4de49d050
[    1.499901] RDX: ffff9fef01037b98 RSI: ffff9fef010378c8 RDI: ffff9ab4de49d060
[    1.502322] RBP: 0000000000000000 R08: ffff9ab4de48e000 R09: 0000000000000008
[    1.505797] R10: ffff9ab55e0e1000 R11: 0000000000000000 R12: ffff9ab4de49d050
[    1.507597] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000008
[    1.509530] FS:  0000000000000000(0000) GS:ffff9ab4e3d80000(0000) knlGS:0000000000000000
[    1.511639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.513132] CR2: 0000000000000000 CR3: 000000009980a001 CR4: 00000000000606e0
[    1.514915] note: cryptomgr_test[958] exited with preempt_count 2

Fixes: 1476db2d129d5 ("crypto: aesni - Move HashKey computation from stack to gcm_context")
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>

drm/radeon: enable ABGR and XBGR formats (v2)

Add support for DRM_FORMAT_{A,X}BGR8888 in atombios_crtc
Swapping of red and blue channels is implemented for radeon chipsets:
DCE2/R6xx and later - crossbar registers are defined and used
DCE1/R5xx - AVIVO_D1GRPH_SWAP_RB bit is used

(v2) Set AVIVO_D1GRPH_SWAP_RB bit in fb_format, using bitwise OR for DCE1 path
Use bitwise OR where required for big endian settings in fb_swap
Use existing code style CHIP_R600 condition, fix typo in R600 blue crossbar

Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>

drm/amdgpu: enable ABGR and XBGR formats (v2)

Add support for DRM_FORMAT_{A,X}BGR8888 in amdgpu with amd dc disabled

(v2) Crossbar registers are defined and used to swap red and blue channels,
keeping the existing coding style in each of the dce modules.
After setting crossbar bits in fb_swap, use bitwise OR for big endian

Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>

drm/amd/display: enable ABGR and XBGR formats (v4)

SURFACE_PIXEL_FORMAT_GRPH_ABGR8888 is supported in amd/display/dc/dc_hw_types.h
and the necessary crossbars register controls to swap red and blue channels
are already implemented in drm/amd/display/dc/dce/dce_mem_input.c

(v4) Logic to handle new formats is added only in amdgpu_dm module.

Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>

Revert "drm/amdgpu: Don't default to DC support for Kaveri and older"

This reverts commit d9fda248046ac035f18a6e663f2f9245b4bf9470.

ALSA: x86: modify modalias to match more devices

The driver declares a modalias platform:hdmi_lpe_audio. However, in
/sys/devices/pci0000:00/0000:00:02.0/hdmi-lpe-audio/modalias of
ASUS VivoStick PC (TS10) it is platform:hdmi-lpe-audio.

Extend the modalias pattern to match this device.

Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>

drm: amdgpu,radeon: force amdgpu for si,cik

Current default for si_support and cik_support prefers radeon,
we set the opposite in order to prefer amdgpu,
which means that radeon becomes by default disabled for SI, CIK parts

Comments are not updated as these changes are just for testing

brcmfmac: disable power saving to stabilize wifi connectivity

From https://drive.google.com/open?id=0B4DiU2o72Fbub0U2ZzJaUzl5OEE

brcmfmac: fix setp2p error

From https://drive.google.com/open?id=0B4DiU2o72Fbub0U2ZzJaUzl5OEE

i915: pm: Be less agressive with clockfreq changes on Bay Trail

Bay Trail devices are known to hang when changing the frequency often,
this is discussed in great length in:
https://bugzilla.kernel.org/show_bug.cgi?id=109051

Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds
on Baytrail v3") is an attempt to workaround this. Several users in
bko109051 report that an earlier version of this patch, v1:
https://bugzilla.kernel.org/attachment.cgi?id=251471

Works better for them and they still see hangs with the merged v3.

Comparing the 2 versions shows that they are indeed not equivalent,
v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps,
as v3 does. It also contained these modifications to i915_irq.c:

     if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
         if (!vlv_c0_above(dev_priv,
                   &dev_priv->rps.down_ei, &now,
-                  dev_priv->rps.down_threshold))
+                  VLV_RP_DOWN_EI_THRESHOLD))
             events |= GEN6_PM_RP_DOWN_THRESHOLD;
         dev_priv->rps.down_ei = now;
     }

     if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
         if (vlv_c0_above(dev_priv,
                  &dev_priv->rps.up_ei, &now,
-                 dev_priv->rps.up_threshold))
+                 VLV_RP_UP_EI_THRESHOLD))
             events |= GEN6_PM_RP_UP_THRESHOLD;
         dev_priv->rps.up_ei = now;
     }

Which use less aggressive up/down thresholds, which results in less
GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() ->
valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ).
With the last call being the likely cause of the hang.

This commit hardcodes the threshold_up and _down values for Bay Trail to
less aggressive values, reducing the amount of clock frequency changes,
thus avoiding the hangs some people are still seeing with the merged fix.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

intel_idle: Disable C6N and C6S on Bay Trail

It seems that Bay Trail SoCs sometimes have issues waking from C6,
a lot of users even report Bay Trail devices only being stable
when passing intel_idle.max_cstate=1 to the kernel.

This commits disables the C6 states while leaving the C7 states
available so that the cores can still reach deep sleep states.

There are several indicators that this is part of the solution for
all the users who need to pass intel_idle.max_cstate=1:

1) The "VLP52 EOI Transactions May Not be Sent if Software
   Enters Core C6 During an Interrupt Service Routine" errata.

2) Several users who need intel_idle.max_cstate=1 indicate in bko109051
   (which has over 800 comments!) that using a shell script which
   disables C6N and C6S through sysfs allows them to remove
   intel_idle.max_cstate=1 and still have a stable system which does
   use the C7 states for power-saving.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=109051
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

drm/nouveau: expose atomic ioctl by default (kernel 4.18)

atomic module parameter current default is disabled,
we set to enabled for convenience and simplification

ANDROID: brcmfmac: Use monotonic boot time for BSS TSF (v2)

Reverts 8e6cffb3b42f "brmc80211: dont use jiffies for BSS TSF".
v2: Use monotonic boot time instead of jiffies to avoid overflow
after 5 minutes and problems after system goes into suspend.

Android uses the TSF as timestamp when scanning for WiFi networks
and discards scan results entirely if the TSF is before the time
the scan started.

Use the monotonic boot time to avoid discarding all scan results
when using brcmfmac on Android.

Fixes: 8e6cffb3b42f "brmc80211: dont use jiffies for BSS TSF"

staging: rtl8812au update Kconfig and Makefile

staging: rtl8812au: add updated driver

Cloned from https://github.com/lwfinger/rtl8812au
branch master commit 8a7beb9 (stable driver v4.2.2)
and removed .git/ folder and .gitignore file

staging: rtl8723bu update Kconfig and Makefile

staging: rtl8723bu: add updated driver

Cloned from https://github.com/lwfinger/rtl8723bu
branch master commit 8091632
removed .git/ folder and .gitignore file
forced git add for required .bin firmware files

Input: goodix - enable support for GDIX1002 parts

At least two vendors (Chuwi and Onda) have parts with GDIX1002 id

net: wireless: broadcom: wl: add patch for kernel 4.15

build.mk is modified to apply incremental patch linux-415.patch
which implements the use of timer_setup() instead of init_timer()
for kernel 4.15 and later, thus avoiding following error:

error: implicit declaration of function 'init_timer'

Similar problem arises for device/generic/common/tp_smapi/hdaps.c:782
which requires to use timer_setup() instead of init_timer()
for kernel 4.15 and later

Input: add a new driver for D-WAV MultiTouch Screen

net: wireless: broadcom: wl: add patch for kernel 4.12

build.mk is modified to apply incremental patch linux-412.patch
copied from https://aur.archlinux.org/cgit/aur.git/tree/?h=broadcom-wl

Porting of 0542e4e "add linux412.patch created by wichmannpas"

net: wireless: broadcom: wl: add patch for kernel 4.11

build.mk is modified to apply incremental patch linux-411.patch
copied from https://aur.archlinux.org/cgit/aur.git/tree/?h=broadcom-wl

Porting of 5224162 "add linux411.patch created by wichmannpas"

net: wireless: broadcom: wl: fix kernel >= 4.8 panic

linux-recent.patch was ok up till kernel 4.7
with kernel 4.8 (and 4.9) new changes are critical to avoid crash

build.mk is modified to apply incremental patch linux-48.patch
copied from https://aur.archlinux.org/cgit/aur.git/tree/?h=broadcom-wl

Reference: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839629

v2: cat all patches to apply them once (cwhuang)

android: use ld.bfd instead of ld.gold

We build the kernel with android toolchain. But kernel 4.9 needs
ld.bfd to be linked correctly.

net: wireless: broadcom: wl: refine the rules

Sync with kernel-4.4 branch.

vmwgfx: change the default resolution to 1280x720

HID: multitouch: add ids of Uiworks

x86: instruction set emulation for SSSE3 and SSE4.1

This commit implements the full set of SSSE3 and SSE4.1
instruction set and popcnt/movbe.

Known limitation:
1. SSSE3 instructions over MMX registers are not implemented
2. REX prefix support is not complete

Change-Id: I9b2d927b690b27460b9a944971587fe5cb7de8e1

net: wireless: wl: allows curl to perform insecure SSL connections

shmem: enable user xattr for tmpfs

Forward port the commit 0b98841 of Michael Müller to kernel 4.8.
This is necessary for android-x86 live mode.

Input: add the driver for NextWindow touchscreens

Copy the code from https://github.com/DadaMonad/nwfermi
commit 63d7fd5071b9c8e6937a6d5d186d1d689641b99b with cosmetics.

net: wireless: wl: allow wifi scan when max_scan_ie_len is zero

NO_REF_TASK
Tested: with max_scan_ie_len patch in wl, this kernel should work with
ACER ES1

Change-Id: I3eecdaac1fd4962aa69289a13bdb72f3e3284456

net: wireless: wl: fix driver cannot trigger scan

The wl driver didn't set max_scan_ie_len, so the scan request will
be treated as invalid argument request.

NO_REF_TASK
Tested: run on acer-es13, wifi will work

Change-Id: I74703851e159d6f39afa7439bfff2be5eebf0b66

mwifiex: change interface name to wlanX

The other name doesn't work well in android.

drivers/base: cacheinfo: remove noisy error message

This error message is considered annoying and irrelevant.
Lower the message to debug level.

net: wireless: broadcom: add support for wl sta driver

Porting of support for Broadcom wl driver 6.30.223.271.

Since kernel 4.6 wireless drivers folders are organized per vendor.
So wl folder is moved to drivers/net/wireless/broadcom/.

The linux-recent.patch required specific changes for kernel 4.7
local paths in {wl,linux-recent}.patch are aligned for simplicity,
so they can be applied by build.mk with the same command: patch -p1.

The build.mk was originally extracted from build/core/tasks/kernel.mk.

x86/intel: force tsc to be reliable on Cherrytrail

x86: dv11p-tablet: enabling resume via PBTN

A patch from https://bugzilla.kernel.org/show_bug.cgi?id=102281#c51
by Alexander Diewald.

x86/vdso: fix a build break on CONFIG_FUNCTION_TRACER=y

The -mfentry must be filtered out, otherwise it causes the error:

arch/x86/entry/vdso/vdso32/note.S:1:0: sorry, unimplemented: -mfentry isn't supported for 32-bit in combination with -fpic

staging: add GSLx680 I2C touch controller

Copied from the latest commit 667efce of
https://github.com/onitake/gslx680-acpi.

Kionix KXCJ9: add KIOX000A identifier

x86, build: globally set -fno-pic

Android toolchain enables -fpic by default.
We never want this for 32-bit and 64-bit kernels
and it will break the build.

sony-laptop: Add poll/select support to tablet mode change

x86/intel: force tsc to be reliable on Baytrail

SSSE3 emulation for invalid opcode

Use SSEPlus reference implementation of SSSE3 instructions.
For CPUs without SSSE3 support.

Signed-off-by: Robert Mazur <robert.mazur.mazur@gmail.com>

iio: ak8975: Added autodetect feature for ACPI

Using i2c auto detect feature and auto device creation feature,
enumerate ak8975 device, by checking their presence.
This is needed because when this device sits behind an i2c mux, there
is no way to define i2c mux in ACPI. This will enable ak8975 on
windows based tablets/laptops running Linux when connected via a mux.
Since DT model already can define an i2c mux and devices connected to
it, this feature is only enabled for ACPI.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

sony-laptop: Added support for multi-flip screen transfromations.

android-x86: add an empty Android.mk

It prevents findleaves.py from searching into the subdirectories.

HID: input: add asus vendor keys

HACK: drm: disable GPU authentication

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>

usbnet: [TEMP HACK] force the interface name to ethX

Currently the android framework only recognizes ethX as the
Ethernet interface name.

ACPICA: Add Android-IA string for _OSI method

This change adds a new _OSI string "Android-IA" for ASUS BIOS
to query if the OS supports Android features.

Add device ID for Egalax multitouch 0x72e9.

net/wireless: ipw2200: change interface name to wlan0

x86: add driver for Lenovo ideapad S10-3T rotate button

Provided by Javier S. Pedro.

ALSA: add audio support for Eee PC 1004

Linux 4.18

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Eight fixes.

  The most important one is the mpt3sas fix which makes the driver work
  again on big endian systems. The rest are mostly minor error path or
  checker issues and the vmw_scsi one fixes a performance problem"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED
  scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
  scsi: mpt3sas: Swap I/O memory read value back to cpu endianness
  scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO
  scsi: fcoe: drop frames in ELS LOGO error path
  scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send
  scsi: qedi: Fix a potential buffer overflow
  scsi: qla2xxx: Fix memory leak for allocating abort IOCB

init: rename and re-order boot_cpu_state_init()

This is purely a preparatory patch for upcoming changes during the 4.19
merge window.

We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).

This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized.  It even has a comment to
that effect.

Except it _doesn't_ actually run after the percpu data has been properly
initialized.  On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().

This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:

  -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
  +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);

which is obviously the right thing to do.  Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.

So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.

Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"A bunch of race fixes, mostly around lazy pathwalk.

  All of it is -stable fodder, a large part going back to 2013"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make sure that __dentry_kill() always invalidates d_seq, unhashed or not
  fix __legitimize_mnt()/mntput() race
  fix mntput/mntput race
  root dentries need RCU-delayed freeing

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"Last bit of straggler fixes...

  1) Fix btf library licensing to LGPL, from Martin KaFai lau.

  2) Fix error handling in bpf sockmap code, from Daniel Borkmann.

  3) XDP cpumap teardown handling wrt. execution contexts, from Jesper
     Dangaard Brouer.

  4) Fix loss of runtime PM on failed vlan add/del, from Ivan
     Khoronzhuk.

  5) xen-netfront caches skb_shinfo(skb) across a __pskb_pull_tail()
     call, which potentially changes the skb's data buffer, and thus
     skb_shinfo(). Fix from Juergen Gross"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  xen/netfront: don't cache skb_shinfo()
  net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan
  net: ethernet: ti: cpsw: clear all entries when delete vid
  xdp: fix bug in devmap teardown code path
  samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
  xdp: fix bug in cpumap teardown code path
  bpf, sockmap: fix cork timeout for select due to epipe
  bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
  bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
  bpf: btf: Change tools/lib/bpf/btf to LGPL

xen/netfront: don't cache skb_shinfo()

skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
its return value.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cpsw-runtime-pm-fix'

Grygorii Strashko says:

====================
net: ethernet: ti: cpsw: fix runtime pm while add/del reserved vid

Here 2 not critical fixes for:
- vlan ale table leak while error if deleting vlan (simplifies next fix)
- runtime pm while try to set reserved vlan
====================

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan

It's exclusive with normal behaviour but if try to set vlan to one of
the reserved values is made, the cpsw runtime pm is broken.

Fixes: a6c5d14f5136 ("drivers: net: cpsw: ndev: fix accessing to suspended device")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpsw: clear all entries when delete vid

In cases if some of the entries were not found in forwarding table
while killing vlan, the rest not needed entries still left in the
table. No need to stop, as entry was deleted anyway. So fix this by
returning error only after all was cleaned. To implement this, return
-ENOENT in cpsw_ale_del_mcast() as it's supposed to be.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature

If zram supports writeback feature, it's no longer a
BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations
for incompressible pages.

Do not pretend to be synchronous IO device.  It makes the system very
sluggish due to waiting for IO completion from upper layers.

Furthermore, it causes a user-after-free problem because swap thinks the
opearion is done when the IO functions returns so it can free the page
(e.g., lock_page_or_retry and goto out_release in do_swap_page) but in
fact, IO is asynchronous so the driver could access a just freed page
afterward.

This patch fixes the problem.

  BUG: Bad page state in process qemu-system-x86  pfn:3dfab21
  page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1
  flags: 0x17fffc000000008(uptodate)
  raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000
  raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
  bad because of flags: 0x8(uptodate)
  CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G    B 4.18.0-rc5+ #1
  Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
  Call Trace:
    dump_stack+0x5c/0x7b
    bad_page+0xba/0x120
    get_page_from_freelist+0x1016/0x1250
    __alloc_pages_nodemask+0xfa/0x250
    alloc_pages_vma+0x7c/0x1c0
    do_swap_page+0x347/0x920
    __handle_mm_fault+0x7b4/0x1110
    handle_mm_fault+0xfc/0x1f0
    __get_user_pages+0x12f/0x690
    get_user_pages_unlocked+0x148/0x1f0
    __gfn_to_pfn_memslot+0xff/0x3c0 [kvm]
    try_async_pf+0x87/0x230 [kvm]
    tdp_page_fault+0x132/0x290 [kvm]
    kvm_mmu_page_fault+0x74/0x570 [kvm]
    kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm]
    kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
    do_vfs_ioctl+0xa2/0x630
    ksys_ioctl+0x70/0x80
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x55/0x100
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
[minchan@kernel.org: fix changelog, add comment]
Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/20180805233722.217347-1-minchan@kernel.org
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Tino Lehnig <tino.lehnig@contabo.de>
Tested-by: Tino Lehnig <tino.lehnig@contabo.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory.c: check return value of ioremap_prot

ioremap_prot() can return NULL which could lead to an oops.

Link: http://lkml.kernel.org/r/1533195441-58594-1-git-send-email-chenjie6@huawei.com
Signed-off-by: chen jie <chenjie6@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: chenjie <chenjie6@huawei.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib/ubsan: remove null-pointer checks

With gcc-8 fsanitize=null become very noisy.  GCC started to complain
about things like &a->b, where 'a' is NULL pointer.  There is no NULL
dereference, we just calculate address to struct member.  It's
technically undefined behavior so UBSAN is correct to report it.  But as
long as there is no real NULL-dereference, I think, we should be fine.

-fno-delete-null-pointer-checks compiler flag should protect us from any
consequences.  So let's just no use -fsanitize=null as it's not useful
for us.  If there is a real NULL-deref we will see crash.  Even if
userspace mapped something at NULL (root can do this), with things like
SMAP should catch the issue.

Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: GDB: update e-mail address

This entry was created with my personal e-mail address. Update this entry
to my open-source kernel.org account.

Link: http://lkml.kernel.org/r/20180806143904.4716-4-kieran.bingham@ideasonboard.com
Signed-off-by: Kieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:
"A single driver bugfix for I2C.

  The bug was found by systematically stress testing the driver, so I am
  confident to merge it that late in the cycle although it is probably
  unusually large"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: xlp9xx: Fix case where SSIF read transaction completes early

Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-08-10

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix cpumap and devmap on teardown as they're under RCU context
   and won't have same assumption as running under NAPI protection,
   from Jesper.

2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
   a bug where socket error was not propagated correctly, from Daniel.

3) Fix incompatible libbpf header license for BTF code and match it
   before it gets officially released with the rest of libbpf which
   is LGPL-2.1, from Martin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

make sure that __dentry_kill() always invalidates d_seq, unhashed or not

RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.

Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fix __legitimize_mnt()/mntput() race

__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...

Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fix mntput/mntput race

mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test.  Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files.  One
of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69.  There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting.  It observes the
decrement of component #69, but not the increment of component #0.  As the
result, the total it gets is not 1 as it should've been - it's 0.  At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down.  However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.

It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.

Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.

Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'bpf-fix-cpu-and-devmap-teardown'

Jesper Dangaard Brouer says:

====================
Removing entries from cpumap and devmap, goes through a number of
syncronization steps to make sure no new xdp_frames can be enqueued.
But there is a small chance, that xdp_frames remains which have not
been flushed/processed yet. Flushing these during teardown, happens
from RCU context and not as usual under RX NAPI context.

The optimization introduced in commt 389ab7f01af9 ("xdp: introduce
xdp_return_frame_rx_napi"), missed that the flush operation can also
be called from RCU context. Thus, we cannot always use the
xdp_return_frame_rx_napi call, which take advantage of the protection
provided by XDP RX running under NAPI protection.

The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
adjusted to easier reproduce (verified by Red Hat QA).
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

xdp: fix bug in devmap teardown code path

Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed. The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.

Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier

The teardown race in cpumap is really hard to reproduce. These changes
makes it easier to reproduce, for QA.

The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.

Also increase MAX_CPUS, as my QA department have larger machines than me.

Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

xdp: fix bug in cpumap teardown code path

When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.

The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path. Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.

This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu. It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.

Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"This fixes a performance regression in arm64 NEON crypto as well as a
  crash in x86 aegis/morus on unsupported CPUs"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/aegis,morus - Fix and simplify CPUID checks
  crypto: arm64 - revert NEON yield for fast AEAD implementations

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) The real fix for the ipv6 route metric leak Sabrina was seeing, from
    Cong Wang.

2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room
    conditions, from Willem de Bruijn.

3) vsock can reinitialize active work struct, fix from Cong Wang.

4) RXRPC keepalive generator can wedge a cpu, fix from David Howells.

5) Fix locking in AF_SMC ioctl, from Ursula Braun.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  dsa: slave: eee: Allow ports to use phylink
  net/smc: move sock lock in smc_ioctl()
  net/smc: allow sysctl rmem and wmem defaults for servers
  net/smc: no shutdown in state SMC_LISTEN
  net: aquantia: Fix IFF_ALLMULTI flag functionality
  rxrpc: Fix the keepalive generator [ver #2]
  net/mlx5e: Cleanup of dcbnl related fields
  net/mlx5e: Properly check if hairpin is possible between two functions
  vhost: reset metadata cache when initializing new IOTLB
  llc: use refcount_inc_not_zero() for llc_sap_find()
  dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()
  tipc: fix an interrupt unsafe locking scenario
  vsock: split dwork to avoid reinitializations
  net: thunderx: check for failed allocation lmac->dmacs
  cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts
  packet: refine ring v3 block size test to hold one frame
  ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit
  ipv6: fix double refcount of fib6_metrics

i2c: xlp9xx: Fix case where SSIF read transaction completes early

During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.

The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.

Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.

Signed-off-by: George Cherian <george.cherian@cavium.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org

dsa: slave: eee: Allow ports to use phylink

For a port to be able to use EEE, both the MAC and the PHY must
support EEE. A phy can be provided by both a phydev or phylink. Verify
at least one of these exist, not just phydev.

Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smc-fixes'

Ursula Braun says:

====================
net/smc: fixes 2018-08-08

here are small fixes for SMC: The first patch makes sure, shutdown code
is not executed for sockets in state SMC_LISTEN. The second patch resets
send and receive buffer values for accepted sockets, since TCP buffer size
optimizations for the internal CLC socket should not be forwarded to the
outer SMC socket. The third patch solves a race between connect and ioctl
reported by syzbot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: move sock lock in smc_ioctl()

When an SMC socket is connecting it is decided whether fallback to
TCP is needed. To avoid races between connect and ioctl move the
sock lock before the use_fallback check.

Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com
Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com
Fixes: 1992d99882af ("net/smc: take sock lock in smc_ioctl()")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: allow sysctl rmem and wmem defaults for servers

Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl
defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base
for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size
optimizations for servers should be ignored.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: no shutdown in state SMC_LISTEN

Invoking shutdown for a socket in state SMC_LISTEN does not make
sense. Nevertheless programs like syzbot fuzzing the kernel may
try to do this. For SMC this means a socket refcounting problem.
This patch makes sure a shutdown call for an SMC socket in state
SMC_LISTEN simply returns with -ENOTCONN.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: Fix IFF_ALLMULTI flag functionality

It was noticed that NIC always pass all multicast traffic to the host
regardless of IFF_ALLMULTI flag on the interface.
The rule in MC Filter Table in NIC, that is configured to accept any
multicast packets, is turning on if IFF_MULTICAST flag is set on the
interface. It leads to passing all multicast traffic to the host.
This fix changes the condition to turn on that rule by checking
IFF_ALLMULTI flag as it should.

Fixes: b21f502f84be ("net:ethernet:aquantia: Fix for multicast filter handling.")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Fix the keepalive generator [ver #2]

AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open.  The implementation is incorrect in the following ways:

(1) It mixes up ktime_t and time64_t types.

(2) It uses ktime_get_real(), the output of which may jump forward or
     backward due to adjustments to the time of day.

(3) If the current time jumps forward too much or jumps backwards, the
     generator function will crank the base of the time ring round one slot
     at a time (ie. a 1s period) until it catches up, spewing out VERSION
     packets as it goes.

Fix the problem by:

(1) Only using time64_t.  There's no need for sub-second resolution.

(2) Use ktime_get_seconds() rather than ktime_get_real() so that time
     isn't perceived to go backwards.

(3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
     parts:

     (a) The "worker" function that manages the buckets and the timer.

     (b) The "dispatch" function that takes the pending peers and
      potentially transmits a keepalive packet before putting them back
      in the ring into the slot appropriate to the revised last-Tx time.

(4) Taking everything that's pending out of the ring and splicing it into
     a temporary collector list for processing.

     In the case that there's been a significant jump forward, the ring
     gets entirely emptied and then the time base can be warped forward
     before the peers are processed.

     The warping can't happen if the ring isn't empty because the slot a
     peer is in is keepalive-time dependent, relative to the base time.

(5) Limit the number of iterations of the bucket array when scanning it.

(6) Set the timer to skip any empty slots as there's no point waking up if
     there's nothing to do yet.

This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started.  The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:

watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
...
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
EIP: lock_acquire+0x69/0x80
...
Call Trace:
? rxrpc_peer_keepalive_worker+0x5e/0x350
? _raw_spin_lock_bh+0x29/0x60
? rxrpc_peer_keepalive_worker+0x5e/0x350
? rxrpc_peer_keepalive_worker+0x5e/0x350
? __lock_acquire+0x3d3/0x870
? process_one_work+0x110/0x340
? process_one_work+0x166/0x340
? process_one_work+0x110/0x340
? worker_thread+0x39/0x3c0
? kthread+0xdb/0x110
? cancel_delayed_work+0x90/0x90
? kthread_stop+0x70/0x70
? ret_from_fork+0x19/0x24

Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-fixes'

Saeed Mahameed says:

====================
Mellanox, mlx5e fixes 2018-08-07

I know it is late into 4.18 release, and this is why I am submitting
only two mlx5e ethernet fixes.

The first one from Or, is needed for -stable and it fixes hairpin
for "same device" check.

The second fix is a non risk fix from Huy which cleans up and improves
error return value reporting for dcbnl_ieee_setapp.

For -stable v4.16
- net/mlx5e: Properly check if hairpin is possible between two functions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Cleanup of dcbnl related fields

Remove unused netdev_registered_init/remove in en.h
Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails.
Remove extra white space

Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Properly check if hairpin is possible between two functions

The current check relies on function BDF addresses and can get
us wrong e.g when two VFs are assigned into a VM and the PCI
v-address is set by the hypervisor.

Fixes: 5c65c564c962 ('net/mlx5e: Support offloading TC NIC hairpin flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Alaa Hleihel <alaa@mellanox.com>
Tested-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

parisc: Define mb() and add memory barriers to assembler unlock sequences

For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.

This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml

For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>

parisc: Enable CONFIG_MLONGCALLS by default

Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>

Merge branch 'sockmap-fixes'

Daniel Borkmann says:

====================
Two sockmap fixes in bpf_tcp_sendmsg(), and one fix for the
sockmap kernel selftest. Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, sockmap: fix cork timeout for select due to epipe

I ran into the same issue as a009f1f396d0 ("selftests/bpf:
test_sockmap, timing improvements") where I had a broken
pipe error on the socket due to remote end timing out on
select and then shutting down it's sockets while the other
side was still sending. We may need to do a bigger rework
in general on the test_sockmap.c, but for now increase it
to a more suitable timeout.

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path

In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of
ENOMEM, it may also mean that we've partially filled the scatterlist
entries with pages. Later jumping to sk_stream_wait_memory()
we could further fail with an error for several reasons, however
we miss to call free_start_sg() if the local sk_msg_buff was used.

Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, sockmap: fix bpf_tcp_sendmsg sock error handling

While working on bpf_tcp_sendmsg() code, I noticed that when a
sk->sk_err is set we error out with err = sk->sk_err. However
this is problematic since sk->sk_err is a positive error value
and therefore we will neither go into sk_stream_error() nor will
we report an error back to user space. I had this case with EPIPE
and user space was thinking sendmsg() succeeded since EPIPE is
a positive value, thinking we submitted 32 bytes. Fix it by
negating the sk->sk_err value.

Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

vhost: reset metadata cache when initializing new IOTLB

We need to reset metadata cache during new IOTLB initialization,
otherwise the stale pointers to previous IOTLB may be still accessed
which will lead a use after free.

Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com
Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

llc: use refcount_inc_not_zero() for llc_sap_find()

llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.

Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.

Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()

The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
can lead to undefined behavior [1].

In order to fix this use a gradual shift of the window with a 'while'
loop, similar to what tcp_cwnd_restart() is doing.

When comparing delta and RTO there is a minor difference between TCP
and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
'cwnd' if delta equals RTO. That case is preserved in this change.

[1]:
[40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
[40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
[40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G        W   E     4.18.0-rc7.x86_64 #1
...
[40851.377176] Call Trace:
[40851.408503]  dump_stack+0xf1/0x17b
[40851.451331]  ? show_regs_print_info+0x5/0x5
[40851.503555]  ubsan_epilogue+0x9/0x7c
[40851.548363]  __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
[40851.617109]  ? __ubsan_handle_load_invalid_value+0x18f/0x18f
[40851.686796]  ? xfrm4_output_finish+0x80/0x80
[40851.739827]  ? lock_downgrade+0x6d0/0x6d0
[40851.789744]  ? xfrm4_prepare_output+0x160/0x160
[40851.845912]  ? ip_queue_xmit+0x810/0x1db0
[40851.895845]  ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40851.963530]  ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40852.029063]  dccp_xmit_packet+0x1d3/0x720 [dccp]
[40852.086254]  dccp_write_xmit+0x116/0x1d0 [dccp]
[40852.142412]  dccp_sendmsg+0x428/0xb20 [dccp]
[40852.195454]  ? inet_dccp_listen+0x200/0x200 [dccp]
[40852.254833]  ? sched_clock+0x5/0x10
[40852.298508]  ? sched_clock+0x5/0x10
[40852.342194]  ? inet_create+0xdf0/0xdf0
[40852.388988]  sock_sendmsg+0xd9/0x160
...

Fixes: 113ced1f52e5 ("dccp ccid-2: Perform congestion-window validation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>