OSDN Git Service

android-x86/kernel.git
7 years agolib/syscall: Clear return values when no stack
Kees Cook [Thu, 23 Mar 2017 22:46:16 +0000 (15:46 -0700)]
lib/syscall: Clear return values when no stack

commit 854fbd6e5f60fe99e8e3a569865409fca378f143 upstream.

Commit:

  aa1f1a639621 ("lib/syscall: Pin the task stack in collect_syscall()")

... added logic to handle a process stack not existing, but left sp and pc
uninitialized, which can be later reported via /proc/$pid/syscall for zombie
processes, potentially exposing kernel memory to userspace.

  Zombie /proc/$pid/syscall before:
  -1 0xffffffff9a060100 0xffff92f42d6ad900

  Zombie /proc/$pid/syscall after:
  -1 0x0 0x0

Reported-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: aa1f1a639621 ("lib/syscall: Pin the task stack in collect_syscall()")
Link: http://lkml.kernel.org/r/20170323224616.GA92694@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/mce: Fix copy/paste error in exception table entries
Tony Luck [Mon, 20 Mar 2017 21:40:30 +0000 (14:40 -0700)]
x86/mce: Fix copy/paste error in exception table entries

commit 26a37ab319a26d330bab298770d692bb9c852aff upstream.

Back in commit:

  92b0729c34cab ("x86/mm, x86/mce: Add memcpy_mcsafe()")

... I made a copy/paste error setting up the exception table entries
and ended up with two for label .L_cache_w3 and none for .L_cache_w2.

This means that if we take a machine check on:

  .L_cache_w2: movq 2*8(%rsi), %r10

then we don't have an exception table entry for this instruction
and we can't recover.

Fix: s/3/2/

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 92b0729c34cab ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Link: http://lkml.kernel.org/r/1490046030-25862-1-git-send-email-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
Baoquan He [Fri, 24 Mar 2017 04:59:52 +0000 (12:59 +0800)]
x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization

commit a46f60d76004965e5669dbf3fc21ef3bc3632eb4 upstream.

Currently KASLR is enabled on three regions: the direct mapping of physical
memory, vamlloc and vmemmap. However the EFI region is also mistakenly
included for VA space randomization because of misusing EFI_VA_START macro
and assuming EFI_VA_START < EFI_VA_END.

(This breaks kexec and possibly other things that rely on stable addresses.)

The EFI region is reserved for EFI runtime services virtual mapping which
should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
we can see:

  ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space

EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
Here EFI_VA_START = -4G, and EFI_VA_END = -64G.

Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Thomas Garnier <thgarnie@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/etnaviv: (re-)protect fence allocation with GPU mutex
Lucas Stach [Wed, 22 Mar 2017 11:07:23 +0000 (12:07 +0100)]
drm/etnaviv: (re-)protect fence allocation with GPU mutex

commit f3cd1b064f1179d9e6188c6d67297a2360880e10 upstream.

The fence allocation needs to be protected by the GPU mutex, otherwise
the fence seqnos of concurrent submits might not match the insertion order
of the jobs in the kernel ring. This breaks the assumption that jobs
complete with monotonically increasing fence seqnos.

Fixes: d9853490176c (drm/etnaviv: take GPU lock later in the submit process)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/vc4: Allocate the right amount of space for boot-time CRTC state.
Eric Anholt [Tue, 28 Mar 2017 20:13:43 +0000 (13:13 -0700)]
drm/vc4: Allocate the right amount of space for boot-time CRTC state.

commit 6d6e500391875cc372336c88e9a8af377be19c36 upstream.

Without this, the first modeset would dereference past the allocation
when trying to free the mm node.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170328201343.4884-1-eric@anholt.net
Fixes: d8dbf44f13b9 ("drm/vc4: Make the CRTCs cooperate on allocating display lists.")
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags
Michel Dänzer [Fri, 24 Mar 2017 10:01:09 +0000 (19:01 +0900)]
drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags

commit ce4b4f228e51219b0b79588caf73225b08b5b779 upstream.

We were accidentally only overriding the first VRAM placement. For BOs
with the RADEON_GEM_NO_CPU_ACCESS flag set,
radeon_ttm_placement_from_domain creates a second VRAM placment with
fpfn == 0. If VRAM is almost full, the first VRAM placement with
fpfn > 0 may not work, but the second one with fpfn == 0 always will
(the BO's current location trivially satisfies it). Because "moving"
the BO to its current location puts it back on the LRU list, this
results in an infinite loop.

Fixes: 2a85aedd117c ("drm/radeon: Try evicting from CPU accessible to
                      inaccessible VRAM first")
Reported-by: Zachary Michaels <zmichaels@oblong.com>
Reported-and-Tested-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: kvm_io_bus_unregister_dev() should never fail
David Hildenbrand [Thu, 23 Mar 2017 17:24:19 +0000 (18:24 +0100)]
KVM: kvm_io_bus_unregister_dev() should never fail

commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.

No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.

There is nothing the callers could do, except retrying over and over
again.

So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).

Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: clear bus pointer when destroyed
Peter Xu [Wed, 15 Mar 2017 08:01:17 +0000 (16:01 +0800)]
KVM: x86: clear bus pointer when destroyed

commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream.

When releasing the bus, let's clear the bus pointers to mark it out. If
any further device unregister happens on this bus, we know that we're
done if we found the bus being released already.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoserial: mxs-auart: Fix baudrate calculation
Uwe Kleine-König [Mon, 20 Mar 2017 09:05:38 +0000 (10:05 +0100)]
serial: mxs-auart: Fix baudrate calculation

commit a6040bc610554c66088fda3608ae5d6307c548e4 upstream.

The reference manual for the i.MX28 recommends to calculate the divisor
as

divisor = (UARTCLK * 32) / baud rate, rounded to the nearest integer

, so let's do this. For a typical setup of UARTCLK = 24 MHz and baud
rate = 115200 this changes the divisor from 6666 to 6667 and so the
actual baud rate improves from 115211.521 Bd (error ≅ 0.01 %) to
115194.240 Bd (error ≅ 0.005 %).

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: fix linked-list corruption in rh_call_control()
Alan Stern [Fri, 24 Mar 2017 17:38:28 +0000 (13:38 -0400)]
USB: fix linked-list corruption in rh_call_control()

commit 1633682053a7ee8058e10c76722b9b28e97fb73f upstream.

Using KASAN, Dmitry found a bug in the rh_call_control() routine: If
buffer allocation fails, the routine returns immediately without
unlinking its URB from the control endpoint, eventually leading to
linked-list corruption.

This patch fixes the problem by jumping to the end of the routine
(where the URB is unlinked) when an allocation failure occurs.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotty/serial: atmel: fix TX path in atmel_console_write()
Nicolas Ferre [Mon, 20 Mar 2017 15:38:57 +0000 (16:38 +0100)]
tty/serial: atmel: fix TX path in atmel_console_write()

commit 497e1e16f45c70574dc9922c7f75c642c2162119 upstream.

A side effect of 89d8232411a8 ("tty/serial: atmel_serial: BUG: stop DMA
from transmitting in stop_tx") is that the console can be called with
TX path disabled. Then the system would hang trying to push charecters
out in atmel_console_putchar().

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Fixes: 89d8232411a8 ("tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotty/serial: atmel: fix race condition (TX+DMA)
Richard Genoud [Mon, 20 Mar 2017 10:52:41 +0000 (11:52 +0100)]
tty/serial: atmel: fix race condition (TX+DMA)

commit 31ca2c63fdc0aee725cbd4f207c1256f5deaabde upstream.

If uart_flush_buffer() is called between atmel_tx_dma() and
atmel_complete_tx_dma(), the circular buffer has been cleared, but not
atmel_port->tx_len.
That leads to a circular buffer overflow (dumping (UART_XMIT_SIZE -
atmel_port->tx_len) bytes).

Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoACPI: Do not create a platform_device for IOAPIC/IOxAPIC
Joerg Roedel [Wed, 22 Mar 2017 17:33:25 +0000 (18:33 +0100)]
ACPI: Do not create a platform_device for IOAPIC/IOxAPIC

commit 08f63d97749185fab942a3a47ed80f5bd89b8b7d upstream.

No platform-device is required for IO(x)APICs, so don't even
create them.

[ rjw: This fixes a problem with leaking platform device objects
  after IOAPIC/IOxAPIC hot-removal events.]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoACPI: Fix incompatibility with mcount-based function graph tracing
Josh Poimboeuf [Thu, 16 Mar 2017 13:56:28 +0000 (08:56 -0500)]
ACPI: Fix incompatibility with mcount-based function graph tracing

commit 61b79e16c68d703dde58c25d3935d67210b7d71b upstream.

Paul Menzel reported a warning:

  WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
  Bad frame pointer: expected f6919d98, received f6919db0
    from func acpi_pm_device_sleep_wake return to c43b6f9d

The warning means that function graph tracing is broken for the
acpi_pm_device_sleep_wake() function.  That's because the ACPI Makefile
unconditionally sets the '-Os' gcc flag to optimize for size.  That's an
issue because mcount-based function graph tracing is incompatible with
'-Os' on x86, thanks to the following gcc bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109

I have another patch pending which will ensure that mcount-based
function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
x86.

But this patch is needed in addition to that one because the ACPI
Makefile overrides that config option for no apparent reason.  It has
had this flag since the beginning of git history, and there's no related
comment, so I don't know why it's there.  As far as I can tell, there's
no reason for it to be there.  The appropriate behavior is for it to
honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
kernel.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoparisc: Fix access fault handling in pa_memcpy()
Helge Deller [Wed, 29 Mar 2017 19:41:05 +0000 (21:41 +0200)]
parisc: Fix access fault handling in pa_memcpy()

commit 554bfeceb8a22d448cd986fc9efce25e833278a1 upstream.

pa_memcpy() is the major memcpy implementation in the parisc kernel which is
used to do any kind of userspace/kernel memory copies.

Al Viro noticed various bugs in the implementation of pa_mempcy(), most notably
that in case of faults it may report back to have copied more bytes than it
actually did.

Fixing those bugs is quite hard in the C-implementation, because the compiler
is messing around with the registers and we are not guaranteed that specific
variables are always in the same processor registers. This makes proper fault
handling complicated.

This patch implements pa_memcpy() in assembler. That way we have correct fault
handling and adding a 64-bit copy routine was quite easy.

Runtime tested with 32- and 64bit kernels.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoparisc: Avoid stalled CPU warnings after system shutdown
Helge Deller [Wed, 29 Mar 2017 06:25:30 +0000 (08:25 +0200)]
parisc: Avoid stalled CPU warnings after system shutdown

commit 476e75a44b56038bee9207242d4bc718f6b4de06 upstream.

Commit 73580dac7618 ("parisc: Fix system shutdown halt") introduced an endless
loop for systems which don't provide a software power off function.  But the
soft lockup detector will detect this and report stalled CPUs after some time.
Avoid those unwanted warnings by disabling the soft lockup detector.

Fixes: 73580dac7618 ("parisc: Fix system shutdown halt")
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoparisc: Clean up fixup routines for get_user()/put_user()
Helge Deller [Sat, 25 Mar 2017 10:59:15 +0000 (11:59 +0100)]
parisc: Clean up fixup routines for get_user()/put_user()

commit d19f5e41b344a057bb2450024a807476f30978d2 upstream.

Al Viro noticed that userspace accesses via get_user()/put_user() can be
simplified a lot with regard to usage of the exception handling.

This patch implements a fixup routine for get_user() and put_user() in such
that the exception handler will automatically load -EFAULT into the register
%r8 (the error value) in case on a fault on userspace.  Additionally the fixup
routine will zero the target register on fault in case of a get_user() call.
The target register is extracted out of the faulting assembly instruction.

This patch brings a few benefits over the old implementation:
1. Exception handling gets much cleaner, easier and smaller in size.
2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped.
3. No need to hardcode %r9 as target register for get_user() any longer. This
   helps the compiler register allocator and thus creates less assembler
   statements.
4. No dependency on the exception_data contents any longer.
5. Nested faults will be handled cleanly.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonfsd: map the ENOKEY to nfserr_perm for avoiding warning
Kinglong Mee [Fri, 10 Mar 2017 01:52:20 +0000 (09:52 +0800)]
nfsd: map the ENOKEY to nfserr_perm for avoiding warning

commit c952cd4e949ab3d07287efc2e80246e03727d15d upstream.

Now that Ext4 and f2fs filesystems support encrypted directories and
files, attempts to access those files may return ENOKEY, resulting in
the following WARNING.

Map ENOKEY to nfserr_perm instead of nfserr_io.

[ 1295.411759] ------------[ cut here ]------------
[ 1295.411787] WARNING: CPU: 0 PID: 12786 at fs/nfsd/nfsproc.c:796 nfserrno+0x74/0x80 [nfsd]
[ 1295.411806] nfsd: non-standard errno: -126
[ 1295.411816] Modules linked in: nfsd nfs_acl auth_rpcgss nfsv4 nfs lockd fscache tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event coretemp crct10dif_pclmul crc32_generic crc32_pclmul snd_ens1371 gameport ghash_clmulni_intel snd_ac97_codec f2fs intel_rapl_perf ac97_bus snd_seq ppdev snd_pcm snd_rawmidi snd_timer vmw_balloon snd_seq_device snd joydev soundcore parport_pc parport nfit acpi_cpufreq tpm_tis vmw_vmci tpm_tis_core tpm shpchp i2c_piix4 grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs_acl]
[ 1295.412522] CPU: 0 PID: 12786 Comm: nfsd Tainted: G        W       4.11.0-rc1+ #521
[ 1295.412959] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1295.413814] Call Trace:
[ 1295.414252]  dump_stack+0x63/0x86
[ 1295.414666]  __warn+0xcb/0xf0
[ 1295.415087]  warn_slowpath_fmt+0x5f/0x80
[ 1295.415502]  ? put_filp+0x42/0x50
[ 1295.415927]  nfserrno+0x74/0x80 [nfsd]
[ 1295.416339]  nfsd_open+0xd7/0x180 [nfsd]
[ 1295.416746]  nfs4_get_vfs_file+0x367/0x3c0 [nfsd]
[ 1295.417182]  ? security_inode_permission+0x41/0x60
[ 1295.417591]  nfsd4_process_open2+0x9b2/0x1200 [nfsd]
[ 1295.418007]  nfsd4_open+0x481/0x790 [nfsd]
[ 1295.418409]  nfsd4_proc_compound+0x395/0x680 [nfsd]
[ 1295.418812]  nfsd_dispatch+0xb8/0x1f0 [nfsd]
[ 1295.419233]  svc_process_common+0x4d9/0x830 [sunrpc]
[ 1295.419631]  svc_process+0xfe/0x1b0 [sunrpc]
[ 1295.420033]  nfsd+0xe9/0x150 [nfsd]
[ 1295.420420]  kthread+0x101/0x140
[ 1295.420802]  ? nfsd_destroy+0x60/0x60 [nfsd]
[ 1295.421199]  ? kthread_park+0x90/0x90
[ 1295.421598]  ret_from_fork+0x2c/0x40
[ 1295.421996] ---[ end trace 0d5a969cd7852e1f ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFSv4.1 fix infinite loop on IO BAD_STATEID error
Olga Kornievskaia [Thu, 30 Mar 2017 17:49:03 +0000 (13:49 -0400)]
NFSv4.1 fix infinite loop on IO BAD_STATEID error

commit 0e3d3e5df07dcf8a50d96e0ecd6ab9a888f55dfc upstream.

Commit 63d63cbf5e03 "NFSv4.1: Don't recheck delegations that
have already been checked" introduced a regression where when a
client received BAD_STATEID error it would not send any TEST_STATEID
and instead go into an infinite loop of resending the IO that caused
the BAD_STATEID.

Fixes: 63d63cbf5e03 ("NFSv4.1: Don't recheck delegations that have already been checked")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agommc: sdhci-of-at91: fix MMC_DDR_52 timing selection
Ludovic Desroches [Tue, 28 Mar 2017 09:00:45 +0000 (11:00 +0200)]
mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection

commit d0918764c17b94c30bbb2619929b1719ff52707a upstream.

The controller has different timings for MMC_TIMING_UHS_DDR50 and
MMC_TIMING_MMC_DDR52. Configuring the controller with SDHCI_CTRL_UHS_DDR50,
when MMC_TIMING_MMC_DDR52 timings are requested, is not correct and can
lead to unexpected behavior.

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Fixes: bb5f8ea4d514 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agommc: sdhci: Disable runtime pm when the sdio_irq is enabled
Hans de Goede [Sun, 26 Mar 2017 11:14:45 +0000 (13:14 +0200)]
mmc: sdhci: Disable runtime pm when the sdio_irq is enabled

commit 923713b357455cfb9aca2cd3429cb0806a724ed2 upstream.

SDIO cards may need clock to send the card interrupt to the host.

On a cherrytrail tablet with a RTL8723BS wifi chip, without this patch
pinging the tablet results in:

PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=78.6 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1760 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=753 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=3.88 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=795 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1841 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=810 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1860 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=812 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=48.6 ms

Where as with this patch I get:

PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=3.96 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1.97 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=17.2 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=2.46 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=2.83 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=2.10 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=2.04 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=1.40 ms

Cc: Dong Aisheng <b29396@freescale.com>
Cc: Ian W MORRISON <ianwmorrison@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoHID: wacom: Don't add ghost interface as shared data
Aaron Armstrong Skomra [Wed, 29 Mar 2017 17:35:39 +0000 (10:35 -0700)]
HID: wacom: Don't add ghost interface as shared data

commit 8b4073596997f2ccbf68d8e72e07b827388a4536 upstream.

A previous commit (below) adds a check for already probed interfaces to
Wacom's matching heuristic. Unfortunately this causes the Bamboo Pen
(CTL-460) to match itself to its 'ghost' touch interface. After
subsequent changes to the driver this match to the ghost causes the
kernel to crash. This patch avoids calling wacom_add_shared_data()
for the BAMBOO_PEN's ghost touch interface.

Fixes: 41372d5d40e7 ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC")
Signed-off-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer
Takashi Sakamoto [Fri, 24 Feb 2017 02:48:41 +0000 (11:48 +0900)]
ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer

commit d1a6fe41d3c4ff0d26f0b186d774493555ca5282 upstream.

In 'skl_tplg_set_module_init_data()', a pointer to 'params' member of
'struct skl_algo_data' is calculated, then casted to (u32 *) and assigned
to a member of configuration data. The configuration data is passed to the
other functions and used to process intel IPC. In this processing, the
value of member is used to get message data, however this can bring invalid
memory access in 'skl_set_module_params()' as a result of calculation of
a pointer for actual message data.

(sound/soc/intel/skylake/skl-topology.c)
skl_tplg_init_pipe_modules()
->skl_tplg_set_module_init_data() (has this bug)
->skl_tplg_set_module_params()
  (sound/soc/intel/skylake/skl-messages.c)
  ->skl_set_module_params()
    ((char *)param) + data_offset

This commit fixes the bug.

Fixes: abb740033b56 ("ASoC: Intel: Skylake: Add support to configure module params")
Signed-off-by: Takashi Sakamoto <takashi.sakamoto@miraclelinux.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoASoC: atmel-classd: fix audio clock rate
Songjun Wu [Fri, 24 Feb 2017 07:10:43 +0000 (15:10 +0800)]
ASoC: atmel-classd: fix audio clock rate

commit cd3ac9affc43b44f49d7af70d275f0bd426ba643 upstream.

Fix the audio clock rate according to the datasheet.

Reported-by: Dushara Jayasinghe <dushara@successful.com.au>
Signed-off-by: Songjun Wu <songjun.wu@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoALSA: hda - fix a problem for lineout on a Dell AIO machine
Hui Wang [Fri, 31 Mar 2017 02:31:40 +0000 (10:31 +0800)]
ALSA: hda - fix a problem for lineout on a Dell AIO machine

commit 2f726aec19a9d2c63bec9a8a53a3910ffdcd09f8 upstream.

On this Dell AIO machine, the lineout jack does not work.

We found the pin 0x1a is assigned to lineout on this machine, and in
the past, we applied ALC298_FIXUP_DELL1_MIC_NO_PRESENCE to fix the
heaset-set mic problem for this machine, this fixup will redefine
the pin 0x1a to headphone-mic, as a result the lineout doesn't
work anymore.

After consulting with Dell, they told us this machine doesn't support
microphone via headset jack, so we add a new fixup which only defines
the pin 0x18 as the headset-mic.

[rearranged the fixup insertion position by tiwai in order to make the
 merge with other branches easier -- tiwai]

Fixes: 59ec4b57bcae ("ALSA: hda - Fix headset mic detection problem for two dell machines")
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoALSA: seq: Fix race during FIFO resize
Takashi Iwai [Fri, 24 Mar 2017 16:07:57 +0000 (17:07 +0100)]
ALSA: seq: Fix race during FIFO resize

commit 2d7d54002e396c180db0c800c1046f0a3c471597 upstream.

When a new event is queued while processing to resize the FIFO in
snd_seq_fifo_clear(), it may lead to a use-after-free, as the old pool
that is being queued gets removed.  For avoiding this race, we need to
close the pool to be deleted and sync its usage before actually
deleting it.

The issue was spotted by syzkaller.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoPCI: iproc: Save host bridge window resource in struct iproc_pcie
Bjorn Helgaas [Thu, 9 Mar 2017 17:27:07 +0000 (11:27 -0600)]
PCI: iproc: Save host bridge window resource in struct iproc_pcie

commit 6e347b5e05ea2ac4ac467a5a1cfaebb2c7f06f80 upstream.

The host bridge memory window resource is inserted into the iomem_resource
tree and cannot be deallocated until the host bridge itself is removed.

Previously, the window was on the stack, which meant the iomem_resource
entry pointed into the stack and was corrupted as soon as the probe
function returned, which caused memory corruption and errors like this:

  pcie_iproc_bcma bcma0:8: resource collision: [mem 0x40000000-0x47ffffff] conflicts with PCIe MEM space [mem 0x40000000-0x47ffffff]

Move the memory window resource from the stack into struct iproc_pcie so
its lifetime matches that of the host bridge.

Fixes: c3245a566400 ("PCI: iproc: Request host bridge window resources")
Reported-and-tested-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoscsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function
Bart Van Assche [Sat, 18 Mar 2017 00:02:02 +0000 (17:02 -0700)]
scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function

commit 7cb689fe42927281b8d98606ae5450173fcc66a9 upstream.

Callers of scsi_dh_activate(), e.g. dm-mpath, assume that this function
either returns an error code or calls the completion function. Make
alua_activate() call the completion function even if scsi_device_get()
fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoscsi: scsi_dh_alua: Check scsi_device_get() return value
Bart Van Assche [Sat, 18 Mar 2017 00:02:01 +0000 (17:02 -0700)]
scsi: scsi_dh_alua: Check scsi_device_get() return value

commit 625fe857e4fac6518716f3c0ff5e5deb8ec6d238 upstream.

Do not queue ALUA work nor call scsi_device_put() if the
scsi_device_get() call fails. This patch fixes the following crash:

general protection fault: 0000 [#1] SMP
RIP: 0010:scsi_device_put+0xb/0x30
Call Trace:
 scsi_disk_put+0x2d/0x40
 sd_release+0x3d/0xb0
 __blkdev_put+0x29e/0x360
 blkdev_put+0x49/0x170
 dm_put_table_device+0x58/0xc0 [dm_mod]
 dm_put_device+0x70/0xc0 [dm_mod]
 free_priority_group+0x92/0xc0 [dm_multipath]
 free_multipath+0x70/0xc0 [dm_multipath]
 multipath_dtr+0x19/0x20 [dm_multipath]
 dm_table_destroy+0x67/0x120 [dm_mod]
 dev_suspend+0xde/0x240 [dm_mod]
 ctl_ioctl+0x1f5/0x520 [dm_mod]
 dm_ctl_ioctl+0xe/0x20 [dm_mod]
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Fixes: commit 03197b61c5ec ("scsi_dh_alua: Use workqueue for RTPG")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoscsi: libsas: fix ata xfer length
John Garry [Thu, 16 Mar 2017 15:07:28 +0000 (23:07 +0800)]
scsi: libsas: fix ata xfer length

commit 9702c67c6066f583b629cf037d2056245bb7a8e6 upstream.

The total ata xfer length may not be calculated properly, in that we do
not use the proper method to get an sg element dma length.

According to the code comment, sg_dma_len() should be used after
dma_map_sg() is called.

This issue was found by turning on the SMMUv3 in front of the hisi_sas
controller in hip07. Multiple sg elements were being combined into a
single element, but the original first element length was being use as
the total xfer length.

Fixes: ff2aeb1eb64c8a4770a6 ("libata: convert to chained sg")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoscsi: sg: check length passed to SG_NEXT_CMD_LEN
peter chang [Wed, 15 Feb 2017 22:11:54 +0000 (14:11 -0800)]
scsi: sg: check length passed to SG_NEXT_CMD_LEN

commit bf33f87dd04c371ea33feb821b60d63d754e3124 upstream.

The user can control the size of the next command passed along, but the
value passed to the ioctl isn't checked against the usable max command
size.

Signed-off-by: Peter Chang <dpf@google.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: try any AG when allocating the first btree block when reflinking
Christoph Hellwig [Wed, 8 Mar 2017 18:38:53 +0000 (10:38 -0800)]
xfs: try any AG when allocating the first btree block when reflinking

commit 2fcc319d2467a5f5b78f35f79fd6e22741a31b1e upstream.

When a reflink operation causes the bmap code to allocate a btree block
we're currently doing single-AG allocations due to having ->firstblock
set and then try any higher AG due a little reflink quirk we've put in
when adding the reflink code.  But given that we do not have a minleft
reservation of any kind in this AG we can still not have any space in
the same or higher AG even if the file system has enough free space.
To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
path instead.

[And yes, we need to redo this properly instead of piling hacks over
 hacks.  I'm working on that, but it's not going to be a small series.
 In the meantime this fixes the customer reported issue]

Also add a warning for failing allocations to make it easier to debug.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: use iomap new flag for newly allocated delalloc blocks
Brian Foster [Wed, 8 Mar 2017 17:58:08 +0000 (09:58 -0800)]
xfs: use iomap new flag for newly allocated delalloc blocks

commit f65e6fad293b3a5793b7fa2044800506490e7a2e upstream.

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, drop the
post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
just return the record inserted into the fork. This ensures only new
blocks are returned and thus that preexisting delalloc blocks are
always handled as "found" blocks and not punched out on a failed
rewrite.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask
Chandan Rajendra [Thu, 2 Mar 2017 23:06:33 +0000 (15:06 -0800)]
xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask

commit d5825712ee98d68a2c17bc89dad2c30276894cba upstream.

When block size is larger than inode cluster size, the call to
XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
would have set xfs_sb->sb_inoalignmt to 0. Hence in
xfs_set_inoalignment(), xfs_mount->m_inoalign_mask gets initialized to
-1 instead of 0. However, xfs_mount->m_sinoalign would get correctly
intialized to 0 because for every positive value of xfs_mount->m_dalign,
the condition "!(mp->m_dalign & mp->m_inoalign_mask)" would evaluate to
false.

Also, xfs_imap() worked fine even with xfs_mount->m_inoalign_mask having
-1 as the value because blks_per_cluster variable would have the value 1
and hence we would never have a need to use xfs_mount->m_inoalign_mask
to compute the inode chunk's agbno and offset within the chunk.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix and streamline error handling in xfs_end_io
Christoph Hellwig [Thu, 2 Mar 2017 23:02:51 +0000 (15:02 -0800)]
xfs: fix and streamline error handling in xfs_end_io

commit 787eb485509f9d58962bd8b4dbc6a5ac6e2034fe upstream.

There are two different cases of buffered I/O errors:

 - first we can have an already shutdown fs.  In that case we should skip
   any on-disk operations and just clean up the appen transaction if
   present and destroy the ioend
 - a real I/O error.  In that case we should cleanup any lingering COW
   blocks.  This gets skipped in the current code and is fixed by this
   patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: only reclaim unwritten COW extents periodically
Christoph Hellwig [Wed, 8 Mar 2017 00:45:58 +0000 (16:45 -0800)]
xfs: only reclaim unwritten COW extents periodically

commit 3802a345321a08093ba2ddb1849e736f84e8d450 upstream.

We only want to reclaim preallocations from our periodic work item.
Currently this is archived by looking for a dirty inode, but that check
is rather fragile.  Instead add a flag to xfs_reflink_cancel_cow_* so
that the caller can ask for just cancelling unwritten extents in the COW
fork.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix typos in commit message]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: tune down agno asserts in the bmap code
Christoph Hellwig [Fri, 17 Feb 2017 01:12:51 +0000 (17:12 -0800)]
xfs: tune down agno asserts in the bmap code

commit 410d17f67e583559be3a922f8b6cc336331893f3 upstream.

In various places we currently assert that xfs_bmap_btalloc allocates
from the same as the firstblock value passed in, unless it's either
NULLAGNO or the dop_low flag is set.  But the reflink code does not
fully follow this convention as it passes in firstblock purely as
a hint for the allocator without actually having previous allocations
in the transaction, and without having a minleft check on the current
AG, leading to the assert firing on a very full and heavily used
file system.  As even the reflink code only allocates from equal or
higher AGs for now we can simply the check to always allow for equal
or higher AGs.

Note that we need to eventually split the two meanings of the firstblock
value.  At that point we can also allow the reflink code to allocate
from any AG instead of limiting it in any way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment
Chandan Rajendra [Fri, 17 Feb 2017 01:12:16 +0000 (17:12 -0800)]
xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment

commit 8ee9fdbebc84b39f1d1c201c5e32277c61d034aa upstream.

On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace,

XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026

kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048
DEBUG_PAGEALLOC
NUMA
pSeries
Modules linked in:
CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58
task: c000000102606d80 task.stack: c0000001026b8000
NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290
REGS: c0000001026bb090 TRAP: 0700   Not tainted  (4.10.0-rc5)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>
CR: 28004428  XER: 00000000
CFAR: c0000000004ef180 SOFTE: 1
GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea
GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0
GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990
GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000
GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0
GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001
GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000
GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0
NIP [c0000000004ef798] .assfail+0x28/0x30
LR [c0000000004ef798] .assfail+0x28/0x30
Call Trace:
[c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable)
[c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0
[c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480
[c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90
[c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820
[c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320
[c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610
[c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0
[c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790
[c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410
[c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230
[c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120
[c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc
Instruction dump:
4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78
38600000 f8010010 f821ff91 4bfff94d <0fe0000060000000 7c0802a6 7c892378

When block size is larger than inode cluster size, the call to
XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
would have set xfs_sb->sb_inoalignmt to 0. This causes
xfs_ialloc_cluster_alignment() to return 0.  Due to this
args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned
equivalent of -1 assigned to it. This later causes alloc_len in
xfs_alloc_space_available() to have a value of 0. In such a scenario
when args.total is also 0, the assert statement "ASSERT(args->maxlen >
0);" fails.

This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in
xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb().

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't reserve blocks for right shift transactions
Brian Foster [Wed, 15 Feb 2017 18:18:10 +0000 (10:18 -0800)]
xfs: don't reserve blocks for right shift transactions

commit 48af96ab92bc68fb645068b978ce36df2379e076 upstream.

The block reservation for the transaction allocated in
xfs_shift_file_space() is an artifact of the original collapse range
support. It exists to handle the case where a collapse range occurs,
the initial extent is left shifted into a location that forms a
contiguous boundary with the previous extent and thus the extents
are merged. This code was subsequently refactored and reused for
insert range (right shift) support.

If an insert range occurs under low free space conditions, the
extent at the starting offset is split before the first shift
transaction is allocated. If the block reservation fails, this
leaves separate, but contiguous extents around in the inode. While
not a fatal problem, this is unexpected and will flag a warning on
subsequent insert range operations on the inode. This problem has
been reproduce intermittently by generic/270 running against a
ramdisk device.

Since right shift does not create new extent boundaries in the
inode, a block reservation for extent merge is unnecessary. Update
xfs_shift_file_space() to conditionally reserve fs blocks for left
shift transactions only. This avoids the warning reproduced by
generic/270.

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix uninitialized variable in _reflink_convert_cow
Darrick J. Wong [Tue, 14 Feb 2017 06:52:27 +0000 (22:52 -0800)]
xfs: fix uninitialized variable in _reflink_convert_cow

commit 93aaead52a9eebdc20dc8fa673c350e592a06949 upstream.

Fix an uninitialize variable.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: split indlen reservations fairly when under reserved
Brian Foster [Tue, 14 Feb 2017 06:48:30 +0000 (22:48 -0800)]
xfs: split indlen reservations fairly when under reserved

commit 75d65361cf3c0dae2af970c305e19c727b28a510 upstream.

Certain workoads that punch holes into speculative preallocation can
cause delalloc indirect reservation splits when the delalloc extent is
split in two. If further splits occur, an already short-handed extent
can be split into two in a manner that leaves zero indirect blocks for
one of the two new extents. This occurs because the shortage is large
enough that the xfs_bmap_split_indlen() algorithm completely drains the
requested indlen of one of the extents before it honors the existing
reservation.

This ultimately results in a warning from xfs_bmap_del_extent(). This
has been observed during file copies of large, sparse files using 'cp
--sparse=always.'

To avoid this problem, update xfs_bmap_split_indlen() to explicitly
apply the reservation shortage fairly between both extents. This smooths
out the overall indlen shortage and defers the situation where we end up
with a delalloc extent with zero indlen reservation to extreme
circumstances.

Reported-by: Patrick Dung <mpatdung@gmail.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: handle indlen shortage on delalloc extent merge
Brian Foster [Tue, 14 Feb 2017 06:48:18 +0000 (22:48 -0800)]
xfs: handle indlen shortage on delalloc extent merge

commit 0e339ef8556d9e567aa7925f8892c263d79430d9 upstream.

When a delalloc extent is created, it can be merged with pre-existing,
contiguous, delalloc extents. When this occurs,
xfs_bmap_add_extent_hole_delay() merges the extents along with the
associated indirect block reservations. The expectation here is that the
combined worst case indlen reservation is always less than or equal to
the indlen reservation for the individual extents.

This is not always the case, however, as existing extents can less than
the expected indlen reservation if the extent was previously split due
to a hole punch. If a new extent merges with such an extent, the total
indlen requirement may be larger than the sum of the indlen reservations
held by both extents.

xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen
reservation is always available and assigns it to the merged extent
without consideration for the indlen held by the pre-existing extent. As
a result, the subsequent xfs_mod_fdblocks() call can attempt an
unintentional allocation rather than a free (indicated by an ASSERT()
failure). Further, if the allocation happens to fail in this context,
the failure goes unhandled and creates a filesystem wide block
accounting inconsistency.

Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the
indlen reservation assigned to the merged extent to the sum of the
indlen reservations held by each of the individual extents.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: don't fail xfs_extent_busy allocation
Christoph Hellwig [Tue, 7 Feb 2017 22:06:46 +0000 (14:06 -0800)]
xfs: don't fail xfs_extent_busy allocation

commit 5e30c23d13919a718b22d4921dc5c0accc59da27 upstream.

We don't just need the structure to track busy extents which can be
avoided with a synchronous transaction, but also to keep track of
pending discard.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: reject all unaligned direct writes to reflinked files
Christoph Hellwig [Mon, 6 Feb 2017 21:00:54 +0000 (13:00 -0800)]
xfs: reject all unaligned direct writes to reflinked files

commit 54a4ef8af4e0dc5c983d17fcb9cf5fd25666d94e upstream.

We currently fall back from direct to buffered writes if we detect a
remaining shared extent in the iomap_begin callback.  But by the time
iomap_begin is called for the potentially unaligned end block we might
have already written most of the data to disk, which we'd now write
again using buffered I/O.  To avoid this reject all writes to reflinked
files before starting I/O so that we are guaranteed to only write the
data once.

The alternative would be to unshare the unaligned start and/or end block
before doing the I/O. I think that's doable, and will actually be
required to support reflinks on DAX file system.  But it will take a
little more time and I'd rather get rid of the double write ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[slight changes in context due to the new direct I/O code in 4.10+]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: update ctime and mtime on clone destinatation inodes
Christoph Hellwig [Tue, 7 Feb 2017 01:45:51 +0000 (17:45 -0800)]
xfs: update ctime and mtime on clone destinatation inodes

commit c5ecb42342852892f978572ddc6dca703460f25a upstream.

We're changing both metadata and data, so we need to update the
timestamps for clone operations.  Dedupe on the other hand does
not change file data, and only changes invisible metadata so the
timestamps should not be updated.

This follows existing btrfs behavior.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: remove redundant is_dedupe test]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: reset b_first_retry_time when clear the retry status of xfs_buf_t
Hou Tao [Fri, 3 Feb 2017 22:39:07 +0000 (14:39 -0800)]
xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t

commit 4dd2eb633598cb6a5a0be2fd9a2be0819f5eeb5f upstream.

After successful IO or permanent error, b_first_retry_time also
needs to be cleared, else the invalid first retry time will be
used by the next retry check.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: mark speculative prealloc CoW fork extents unwritten
Darrick J. Wong [Thu, 2 Feb 2017 23:14:02 +0000 (15:14 -0800)]
xfs: mark speculative prealloc CoW fork extents unwritten

commit 5eda43000064a69a39fb7869cc63c9571535ad29 upstream.

Christoph Hellwig pointed out that there's a potentially nasty race when
performing simultaneous nearby directio cow writes:

"Thread 1 writes a range from B to c

"                    B --------- C
                           p

"a little later thread 2 writes from A to B

"        A --------- B
               p

[editor's note: the 'p' denote cowextsize boundaries, which I added to
make this more clear]

"but the code preallocates beyond B into the range where thread
"1 has just written, but ->end_io hasn't been called yet.
"But once ->end_io is called thread 2 has already allocated
"up to the extent size hint into the write range of thread 1,
"so the end_io handler will splice the unintialized blocks from
"that preallocation back into the file right after B."

We can avoid this race by ensuring that thread 1 cannot accidentally
remap the blocks that thread 2 allocated (as part of speculative
preallocation) as part of t2's write preparation in t1's end_io handler.
The way we make this happen is by taking advantage of the unwritten
extent flag as an intermediate step.

Recall that when we begin the process of writing data to shared blocks,
we create a delayed allocation extent in the CoW fork:

D: --RRRRRRSSSRRRRRRRR---
C: ------DDDDDDD---------

When a thread prepares to CoW some dirty data out to disk, it will now
convert the delalloc reservation into an /unwritten/ allocated extent in
the cow fork.  The da conversion code tries to opportunistically
allocate as much of a (speculatively prealloc'd) extent as possible, so
we may end up allocating a larger extent than we're actually writing
out:

D: --RRRRRRSSSRRRRRRRR---
U: ------UUUUUUU---------

Next, we convert only the part of the extent that we're actively
planning to write to normal (i.e. not unwritten) status:

D: --RRRRRRSSSRRRRRRRR---
U: ------UURRUUU---------

If the write succeeds, the end_cow function will now scan the relevant
range of the CoW fork for real extents and remap only the real extents
into the data fork:

D: --RRRRRRRRSRRRRRRRR---
U: ------UU--UUU---------

This ensures that we never obliterate valid data fork extents with
unwritten blocks from the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: allow unwritten extents in the CoW fork
Darrick J. Wong [Thu, 2 Feb 2017 23:14:01 +0000 (15:14 -0800)]
xfs: allow unwritten extents in the CoW fork

commit 05a630d76bd3f39baf0eecfa305bed2820796dee upstream.

In the data fork, we only allow extents to perform the following state
transitions:

delay -> real <-> unwritten

There's no way to move directly from a delalloc reservation to an
/unwritten/ allocated extent.  However, for the CoW fork we want to be
able to do the following to each extent:

delalloc -> unwritten -> written -> remapped to data fork

This will help us to avoid a race in the speculative CoW preallocation
code between a first thread that is allocating a CoW extent and a second
thread that is remapping part of a file after a write.  In order to do
this, however, we need two things: first, we have to be able to
transition from da to unwritten, and second the function that converts
between real and unwritten has to be made aware of the cow fork.  Do
both of those things.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: verify free block header fields
Darrick J. Wong [Thu, 2 Feb 2017 23:14:00 +0000 (15:14 -0800)]
xfs: verify free block header fields

commit de14c5f541e78c59006bee56f6c5c2ef1ca07272 upstream.

Perform basic sanity checking of the directory free block header
fields so that we avoid hanging the system on invalid data.

(Granted that just means that now we shutdown on directory write,
but that seems better than hanging...)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: check for obviously bad level values in the bmbt root
Darrick J. Wong [Thu, 2 Feb 2017 23:13:59 +0000 (15:13 -0800)]
xfs: check for obviously bad level values in the bmbt root

commit b3bf607d58520ea8c0666aeb4be60dbb724cd3a2 upstream.

We can't handle a bmbt that's taller than BTREE_MAXLEVELS, and there's
no such thing as a zero-level bmbt (for that we have extents format),
so if we see this, send back an error code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: filter out obviously bad btree pointers
Darrick J. Wong [Thu, 2 Feb 2017 23:13:58 +0000 (15:13 -0800)]
xfs: filter out obviously bad btree pointers

commit d5a91baeb6033c3392121e4d5c011cdc08dfa9f7 upstream.

Don't let anybody load an obviously bad btree pointer.  Since the values
come from disk, we must return an error, not just ASSERT.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fail _dir_open when readahead fails
Darrick J. Wong [Thu, 2 Feb 2017 23:13:58 +0000 (15:13 -0800)]
xfs: fail _dir_open when readahead fails

commit 7a652bbe366464267190c2792a32ce4fff5595ef upstream.

When we open a directory, we try to readahead block 0 of the directory
on the assumption that we're going to need it soon.  If the bmbt is
corrupt, the directory will never be usable and the readahead fails
immediately, so we might as well prevent the directory from being opened
at all.  This prevents a subsequent read or modify operation from
hitting it and taking the fs offline.

NOTE: We're only checking for early failures in the block mapping, not
the readahead directory block itself.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix toctou race when locking an inode to access the data map
Darrick J. Wong [Thu, 2 Feb 2017 23:13:57 +0000 (15:13 -0800)]
xfs: fix toctou race when locking an inode to access the data map

commit 4b5bd5bf3fb182dc504b1b64e0331300f156e756 upstream.

We use di_format and if_flags to decide whether we're grabbing the ilock
in btree mode (btree extents not loaded) or shared mode (anything else),
but the state of those fields can be changed by other threads that are
also trying to load the btree extents -- IFEXTENTS gets set before the
_bmap_read_extents call and cleared if it fails.

We don't actually need to have IFEXTENTS set until after the bmbt
records are successfully loaded and validated, which will fix the race
between multiple threads trying to read the same directory.  The next
patch strengthens directory bmbt validation by refusing to open the
directory if reading the bmbt to start directory readahead fails.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: fix eofblocks race with file extending async dio writes
Brian Foster [Sat, 28 Jan 2017 07:22:57 +0000 (23:22 -0800)]
xfs: fix eofblocks race with file extending async dio writes

commit e4229d6b0bc9280f29624faf170cf76a9f1ca60e upstream.

It's possible for post-eof blocks to end up being used for direct I/O
writes. dio write performs an upfront unwritten extent allocation, sends
the dio and then updates the inode size (if necessary) on write
completion. If a file release occurs while a file extending dio write is
in flight, it is possible to mistake the post-eof blocks for speculative
preallocation and incorrectly truncate them from the inode. This means
that the resulting dio write completion can discover a hole and allocate
new blocks rather than perform unwritten extent conversion.

This requires a strange mix of I/O and is thus not likely to reproduce
in real world workloads. It is intermittently reproduced by generic/299.
The error manifests as an assert failure due to transaction overrun
because the aforementioned write completion transaction has only
reserved enough blocks for btree operations:

  XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \
   file: fs/xfs//xfs_trans.c, line: 309

The root cause is that xfs_free_eofblocks() uses i_size to truncate
post-eof blocks from the inode, but async, file extending direct writes
do not update i_size until write completion, long after inode locks are
dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode
to the incorrect size.

Update xfs_free_eofblocks() to serialize against dio similar to how
extending writes are serialized against i_size updates before post-eof
block zeroing. Specifically, wait on dio while under the iolock. This
ensures that dio write completions have updated i_size before post-eof
blocks are processed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: sync eofblocks scans under iolock are livelock prone
Brian Foster [Sat, 28 Jan 2017 07:22:56 +0000 (23:22 -0800)]
xfs: sync eofblocks scans under iolock are livelock prone

commit c3155097ad89a956579bc305856a1f2878494e52 upstream.

The xfs_eofblocks.eof_scan_owner field is an internal field to
facilitate invoking eofb scans from the kernel while under the iolock.
This is necessary because the eofb scan acquires the iolock of each
inode. Synchronous scans are invoked on certain buffered write failures
while under iolock. In such cases, the scan owner indicates that the
context for the scan already owns the particular iolock and prevents a
double lock deadlock.

eofblocks scans while under iolock are still livelock prone in the event
of multiple parallel scans, however. If multiple buffered writes to
different inodes fail and invoke eofblocks scans at the same time, each
scan avoids a deadlock with its own inode by virtue of the
eof_scan_owner field, but will never be able to acquire the iolock of
the inode from the parallel scan. Because the low free space scans are
invoked with SYNC_WAIT, the scan will not return until it has processed
every tagged inode and thus both scans will spin indefinitely on the
iolock being held across the opposite scan. This problem can be
reproduced reliably by generic/224 on systems with higher cpu counts
(x16).

To avoid this problem, simplify the semantics of eofblocks scans to
never invoke a scan while under iolock. This means that the buffered
write context must drop the iolock before the scan. It must reacquire
the lock before the write retry and also repeat the initial write
checks, as the original state might no longer be valid once the iolock
was dropped.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: pull up iolock from xfs_free_eofblocks()
Brian Foster [Sat, 28 Jan 2017 07:22:55 +0000 (23:22 -0800)]
xfs: pull up iolock from xfs_free_eofblocks()

commit a36b926180cda375ac2ec89e1748b47137cfc51c upstream.

xfs_free_eofblocks() requires the IOLOCK_EXCL lock, but is called from
different contexts where the lock may or may not be held. The
need_iolock parameter exists for this reason, to indicate whether
xfs_free_eofblocks() must acquire the iolock itself before it can
proceed.

This is ugly and confusing. Simplify the semantics of
xfs_free_eofblocks() to require the caller to acquire the iolock
appropriately and kill the need_iolock parameter. While here, the mp
param can be removed as well as the xfs_mount is accessible from the
xfs_inode structure. This patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: use per-AG reservations for the finobt
Christoph Hellwig [Wed, 25 Jan 2017 15:49:35 +0000 (07:49 -0800)]
xfs: use per-AG reservations for the finobt

commit 76d771b4cbe33c581bd6ca2710c120be51172440 upstream.

Currently we try to rely on the global reserved block pool for block
allocations for the free inode btree, but I have customer reports
(fairly complex workload, need to find an easier reproducer) where that
is not enough as the AG where we free an inode that requires a new
finobt block is entirely full.  This causes us to cancel a dirty
transaction and thus a file system shutdown.

I think the right way to guard against this is to treat the finot the same
way as the refcount btree and have a per-AG reservations for the possible
worst case size of it, and the patch below implements that.

Note that this could increase mount times with large finobt trees.  In
an ideal world we would have added a field for the number of finobt
fields to the AGI, similar to what we did for the refcount blocks.
We should do add it next time we rev the AGI or AGF format by adding
new fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: only update mount/resv fields on success in __xfs_ag_resv_init
Christoph Hellwig [Wed, 25 Jan 2017 15:49:34 +0000 (07:49 -0800)]
xfs: only update mount/resv fields on success in __xfs_ag_resv_init

commit 4dfa2b84118fd6c95202ae87e62adf5000ccd4d0 upstream.

Try to reserve the blocks first and only then update the fields in
or hanging off the mount structure.  This way we can call __xfs_ag_resv_init
again after a previous failure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxen/setup: Don't relocate p2m over existing one
Ross Lagerwall [Mon, 12 Dec 2016 14:35:13 +0000 (14:35 +0000)]
xen/setup: Don't relocate p2m over existing one

commit 7ecec8503af37de6be4f96b53828d640a968705f upstream.

When relocating the p2m, take special care not to relocate it so
that is overlaps with the current location of the p2m/initrd. This is
needed since the full extent of the current location is not marked as a
reserved region in the e820.

This was seen to happen to a dom0 with a large initial p2m and a small
reserved region in the middle of the initial p2m.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agolibceph: force GFP_NOIO for socket allocations
Ilya Dryomov [Tue, 21 Mar 2017 12:44:28 +0000 (13:44 +0100)]
libceph: force GFP_NOIO for socket allocations

commit 633ee407b9d15a75ac9740ba9d3338815e1fcb95 upstream.

sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:

    Workqueue: ceph-msgr con_work [libceph]
    ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
    0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
    ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
    Call Trace:
    [<ffffffff816dd629>] schedule+0x29/0x70
    [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
    [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
    [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
    [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
    [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
    [<ffffffff81086335>] flush_work+0x165/0x250
    [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
    [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
    [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
    [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
    [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
    [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
    [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
    [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
    [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
    [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
    [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
    [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
    [<ffffffff8115af70>] shrink_slab+0x100/0x140
    [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
    [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
    [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
    [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
    [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
    [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
    [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
    [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
    [<ffffffff811d8566>] alloc_inode+0x26/0xa0
    [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
    [<ffffffff815b933e>] sock_alloc+0x1e/0x80
    [<ffffffff815ba855>] __sock_create+0x95/0x220
    [<ffffffff815baa04>] sock_create_kern+0x24/0x30
    [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
    [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
    [<ffffffff81084c19>] process_one_work+0x159/0x4f0
    [<ffffffff8108561b>] worker_thread+0x11b/0x530
    [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
    [<ffffffff8108b6f9>] kthread+0xc9/0xe0
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
    [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90

Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.

Link: http://tracker.ceph.com/issues/19309
Reported-by: Sergey Jerusalimov <wintchester@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.9.20
Greg Kroah-Hartman [Fri, 31 Mar 2017 08:32:02 +0000 (10:32 +0200)]
Linux 4.9.20

7 years agousb: musb: fix possible spinlock deadlock
Bin Liu [Fri, 10 Mar 2017 20:43:37 +0000 (14:43 -0600)]
usb: musb: fix possible spinlock deadlock

commit bc1e2154542071e3cfe1734b143af9b8bdacf8bd upstream.

The DSPS glue calls del_timer_sync() in its musb_platform_disable()
implementation, which requires the caller to not hold a lock. But
musb_remove() calls musb_platform_disable() will musb->lock held. This
could causes spinlock deadlock.

So change musb_remove() to call musb_platform_disable() without holds
musb->lock. This doesn't impact the musb_platform_disable implementation
in other glue drivers.

root@am335x-evm:~# modprobe -r musb-dsps
[  126.134879] musb-hdrc musb-hdrc.1: remove, state 1
[  126.140465] usb usb2: USB disconnect, device number 1
[  126.146178] usb 2-1: USB disconnect, device number 2
[  126.416985] musb-hdrc musb-hdrc.1: USB bus 2 deregistered
[  126.423943]
[  126.425525] ======================================================
[  126.431997] [ INFO: possible circular locking dependency detected ]
[  126.438564] 4.11.0-rc1-00003-g1557f13bca04-dirty #77 Not tainted
[  126.444852] -------------------------------------------------------
[  126.451414] modprobe/778 is trying to acquire lock:
[  126.456523]  (((&glue->timer))){+.-...}, at: [<c01b8788>] del_timer_sync+0x0/0xd0
[  126.464403]
[  126.464403] but task is already holding lock:
[  126.470511]  (&(&musb->lock)->rlock){-.-...}, at: [<bf30b7f8>] musb_remove+0x50/0x1
30 [musb_hdrc]
[  126.479965]
[  126.479965] which lock already depends on the new lock.
[  126.479965]
[  126.488531]
[  126.488531] the existing dependency chain (in reverse order) is:
[  126.496368]
[  126.496368] -> #1 (&(&musb->lock)->rlock){-.-...}:
[  126.502968]        otg_timer+0x80/0xec [musb_dsps]
[  126.507990]        call_timer_fn+0xb4/0x390
[  126.512372]        expire_timers+0xf0/0x1fc
[  126.516754]        run_timer_softirq+0x80/0x178
[  126.521511]        __do_softirq+0xc4/0x554
[  126.525802]        irq_exit+0xe8/0x158
[  126.529735]        __handle_domain_irq+0x58/0xb8
[  126.534583]        __irq_usr+0x54/0x80
[  126.538507]
[  126.538507] -> #0 (((&glue->timer))){+.-...}:
[  126.544636]        del_timer_sync+0x40/0xd0
[  126.549066]        musb_remove+0x6c/0x130 [musb_hdrc]
[  126.554370]        platform_drv_remove+0x24/0x3c
[  126.559206]        device_release_driver_internal+0x14c/0x1e0
[  126.565225]        bus_remove_device+0xd8/0x108
[  126.569970]        device_del+0x1e4/0x308
[  126.574170]        platform_device_del+0x24/0x8c
[  126.579006]        platform_device_unregister+0xc/0x20
[  126.584394]        dsps_remove+0x14/0x30 [musb_dsps]
[  126.589595]        platform_drv_remove+0x24/0x3c
[  126.594432]        device_release_driver_internal+0x14c/0x1e0
[  126.600450]        driver_detach+0x38/0x6c
[  126.604740]        bus_remove_driver+0x4c/0xa0
[  126.609407]        SyS_delete_module+0x11c/0x1e4
[  126.614252]        __sys_trace_return+0x0/0x10

Fixes: ea2f35c01d5ea ("usb: musb: Fix sleeping function called from invalid context for hdrc glue")
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosched/rt: Add a missing rescheduling point
Sebastian Andrzej Siewior [Tue, 24 Jan 2017 14:40:06 +0000 (15:40 +0100)]
sched/rt: Add a missing rescheduling point

commit 619bd4a71874a8fd78eb6ccf9f272c5e98bcc7b7 upstream.

Since the change in commit:

  fd7a4bed1835 ("sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks")

... we don't reschedule a task under certain circumstances:

Lets say task-A, SCHED_OTHER, is running on CPU0 (and it may run only on
CPU0) and holds a PI lock. This task is removed from the CPU because it
used up its time slice and another SCHED_OTHER task is running. Task-B on
CPU1 runs at RT priority and asks for the lock owned by task-A. This
results in a priority boost for task-A. Task-B goes to sleep until the
lock has been made available. Task-A is already runnable (but not active),
so it receives no wake up.

The reality now is that task-A gets on the CPU once the scheduler decides
to remove the current task despite the fact that a high priority task is
enqueued and waiting. This may take a long time.

The desired behaviour is that CPU0 immediately reschedules after the
priority boost which made task-A the task with the lowest priority.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fd7a4bed1835 ("sched, rt: Convert switched_{from, to}_rt() prio_changed_rt() to balance callbacks")
Link: http://lkml.kernel.org/r/20170124144006.29821-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofscrypt: remove broken support for detecting keyring key revocation
Eric Biggers [Tue, 21 Feb 2017 23:07:11 +0000 (15:07 -0800)]
fscrypt: remove broken support for detecting keyring key revocation

commit 1b53cf9815bb4744958d41f3795d5d5a1d365e2d upstream.

Filesystem encryption ostensibly supported revoking a keyring key that
had been used to "unlock" encrypted files, causing those files to become
"locked" again.  This was, however, buggy for several reasons, the most
severe of which was that when key revocation happened to be detected for
an inode, its fscrypt_info was immediately freed, even while other
threads could be using it for encryption or decryption concurrently.
This could be exploited to crash the kernel or worse.

This patch fixes the use-after-free by removing the code which detects
the keyring key having been revoked, invalidated, or expired.  Instead,
an encrypted inode that is "unlocked" now simply remains unlocked until
it is evicted from memory.  Note that this is no worse than the case for
block device-level encryption, e.g. dm-crypt, and it still remains
possible for a privileged user to evict unused pages, inodes, and
dentries by running 'sync; echo 3 > /proc/sys/vm/drop_caches', or by
simply unmounting the filesystem.  In fact, one of those actions was
already needed anyway for key revocation to work even somewhat sanely.
This change is not expected to break any applications.

In the future I'd like to implement a real API for fscrypt key
revocation that interacts sanely with ongoing filesystem operations ---
waiting for existing operations to complete and blocking new operations,
and invalidating and sanitizing key material and plaintext from the VFS
caches.  But this is a hard problem, and for now this bug must be fixed.

This bug affected almost all versions of ext4, f2fs, and ubifs
encryption, and it was potentially reachable in any kernel configured
with encryption support (CONFIG_EXT4_ENCRYPTION=y,
CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
CONFIG_UBIFS_FS_ENCRYPTION=y).  Note that older kernels did not use the
shared fs/crypto/ code, but due to the potential security implications
of this bug, it may still be worthwhile to backport this fix to them.

Fixes: b7236e21d55f ("ext4 crypto: reorganize how we store keys in the inode")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agometag/ptrace: Reject partial NT_METAG_RPIPE writes
Dave Martin [Mon, 27 Mar 2017 14:10:57 +0000 (15:10 +0100)]
metag/ptrace: Reject partial NT_METAG_RPIPE writes

commit 7195ee3120d878259e8d94a5d9f808116f34d5ea upstream.

It's not clear what behaviour is sensible when doing partial write of
NT_METAG_RPIPE, so just don't bother.

This patch assumes that userspace will never rely on a partial SETREGSET
in this case, since it's not clear what should happen anyway.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agometag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
Dave Martin [Mon, 27 Mar 2017 14:10:56 +0000 (15:10 +0100)]
metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS

commit 5fe81fe98123ce41265c65e95d34418d30d005d1 upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill TXSTATUS, a well-defined default value is used, based on the
task's current value.

Suggested-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agometag/ptrace: Preserve previous registers for short regset write
Dave Martin [Mon, 27 Mar 2017 14:10:55 +0000 (15:10 +0100)]
metag/ptrace: Preserve previous registers for short regset write

commit a78ce80d2c9178351b34d78fec805140c29c193e upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosparc/ptrace: Preserve previous registers for short regset write
Dave Martin [Mon, 27 Mar 2017 14:10:59 +0000 (15:10 +0100)]
sparc/ptrace: Preserve previous registers for short regset write

commit d3805c546b275c8cc7d40f759d029ae92c7175f2 upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomips/ptrace: Preserve previous registers for short regset write
Dave Martin [Mon, 27 Mar 2017 14:10:58 +0000 (15:10 +0100)]
mips/ptrace: Preserve previous registers for short regset write

commit d614fd58a2834cfe4efa472c33c8f3ce2338b09b upstream.

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoh8300/ptrace: Fix incorrect register transfer count
Dave Martin [Mon, 27 Mar 2017 14:10:54 +0000 (15:10 +0100)]
h8300/ptrace: Fix incorrect register transfer count

commit 502585c7555083d4a949c08350306b9ec196779e upstream.

regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun
if CONFIG_CPU_H8S is set, since this adds an extra entry to
register_offset[] but not to user_regs_struct.

So, iterate over user_regs_struct based on its actual size, not based on
the length of register_offset[].

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoc6x/ptrace: Remove useless PTRACE_SETREGSET implementation
Dave Martin [Mon, 27 Mar 2017 14:10:53 +0000 (15:10 +0100)]
c6x/ptrace: Remove useless PTRACE_SETREGSET implementation

commit fb411b837b587a32046dc4f369acb93a10b1def8 upstream.

gpr_set won't work correctly and can never have been tested, and the
correct behaviour is not clear due to the endianness-dependent task
layout.

So, just remove it.  The core code will now return -EOPNOTSUPPORT when
trying to set NT_PRSTATUS on this architecture until/unless a correct
implementation is supplied.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopinctrl: qcom: Don't clear status bit on irq_unmask
Bjorn Andersson [Tue, 14 Mar 2017 15:23:26 +0000 (08:23 -0700)]
pinctrl: qcom: Don't clear status bit on irq_unmask

commit a6566710adaa4a7dd5e0d99820ff9c9c30ee5951 upstream.

Clearing the status bit on irq_unmask will discard any pending interrupt
that did arrive after the irq_ack, i.e. while the IRQ handler function
was executing.

Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver")
Cc: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovirtio_balloon: init 1st buffer in stats vq
Ladi Prosek [Thu, 23 Mar 2017 07:04:18 +0000 (08:04 +0100)]
virtio_balloon: init 1st buffer in stats vq

commit fc8653228c8588a120f6b5dad6983b7b61ff669e upstream.

When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.

This patch updates the stats before pushing the initial buffer.

Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
  virtio implementation and violates the spec "Driver MUST supply the
  same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
  spec clause, plus "invalid tag" is not really defined.

Note: the spec says:
When using the legacy interface, the device SHOULD ignore all values in
the first buffer in the statsq supplied by the driver after device
initialization. Note: Historically, drivers supplied an uninitialized
buffer in the first buffer.

Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: cleanup the page tracking SRCU instance
Paolo Bonzini [Mon, 27 Mar 2017 15:53:50 +0000 (17:53 +0200)]
KVM: x86: cleanup the page tracking SRCU instance

commit 2beb6dad2e8f95d710159d5befb390e4f62ab5cf upstream.

SRCU uses a delayed work item.  Skip cleaning it up, and
the result is use-after-free in the work item callbacks.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 0eb05bf290cfe8610d9680b49abef37febd1c38a
Reviewed-by: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
Andy Whitcroft [Thu, 23 Mar 2017 07:45:44 +0000 (07:45 +0000)]
xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder

commit f843ee6dd019bcece3e74e76ad9df0155655d0df upstream.

Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues.  To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
Andy Whitcroft [Wed, 22 Mar 2017 07:29:31 +0000 (07:29 +0000)]
xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window

commit 677e806da4d916052585301785d847c3b3e6186a upstream.

When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer.  However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents.  We do
not at this point check that the replay_window is within the allocated
memory.  This leads to out-of-bounds reads and writes triggered by
netlink packets.  This leads to memory corruption and the potential for
priviledge escalation.

We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn.  It however does not check the replay_window
remains within that buffer.  Add validation of the contained
replay_window.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfrm: policy: init locks early
Florian Westphal [Wed, 8 Feb 2017 10:52:29 +0000 (11:52 +0100)]
xfrm: policy: init locks early

commit c282222a45cb9503cbfbebfdb60491f06ae84b49 upstream.

Dmitry reports following splat:
 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1
[..]
 spin_lock_bh include/linux/spinlock.h:304 [inline]
 xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963
 xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041
 xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091
 ops_init+0x10a/0x530 net/core/net_namespace.c:115
 setup_net+0x2ed/0x690 net/core/net_namespace.c:291
 copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
 SYSC_unshare kernel/fork.c:2281 [inline]

Problem is that when we get error during xfrm_net_init we will call
xfrm_policy_fini which will acquire xfrm_policy_lock before it was
initialized.  Just move it around so locks get set up first.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 283bc9f35bbbcb0e9 ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoLinux 4.9.19
Greg Kroah-Hartman [Thu, 30 Mar 2017 07:41:57 +0000 (09:41 +0200)]
Linux 4.9.19

7 years agocrypto: algif_hash - avoid zero-sized array
Jiri Slaby [Thu, 15 Dec 2016 13:31:01 +0000 (14:31 +0100)]
crypto: algif_hash - avoid zero-sized array

commit 6207119444595d287b1e9e83a2066c17209698f3 upstream.

With this reproducer:
  struct sockaddr_alg alg = {
          .salg_family = 0x26,
          .salg_type = "hash",
          .salg_feat = 0xf,
          .salg_mask = 0x5,
          .salg_name = "digest_null",
  };
  int sock, sock2;

  sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
  bind(sock, (struct sockaddr *)&alg, sizeof(alg));
  sock2 = accept(sock, NULL, NULL);
  setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
  accept(sock2, NULL, NULL);

==== 8< ======== 8< ======== 8< ======== 8< ====

one can immediatelly see an UBSAN warning:
UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
variable length array bound value 0 <= 0
CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default #1
...
Call Trace:
...
 [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
 [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
 [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
 [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
 [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
 [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40

It is a correct warning, as hash state is propagated to accept as zero,
but creating a zero-length variable array is not allowed in C.

Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
similar happens in the code there, so we just allocate one byte even
though we do not use the array.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofbcon: Fix vc attr at deinit
Takashi Iwai [Wed, 11 Jan 2017 16:09:50 +0000 (17:09 +0100)]
fbcon: Fix vc attr at deinit

commit 8aac7f34369726d1a158788ae8aff3002d5eb528 upstream.

fbcon can deal with vc_hi_font_mask (the upper 256 chars) and adjust
the vc attrs dynamically when vc_hi_font_mask is changed at
fbcon_init().  When the vc_hi_font_mask is set, it remaps the attrs in
the existing console buffer with one bit shift up (for 9 bits), while
it remaps with one bit shift down (for 8 bits) when the value is
cleared.  It works fine as long as the font gets updated after fbcon
was initialized.

However, we hit a bizarre problem when the console is switched to
another fb driver (typically from vesafb or efifb to drmfb).  At
switching to the new fb driver, we temporarily rebind the console to
the dummy console, then rebind to the new driver.  During the
switching, we leave the modified attrs as is.  Thus, the new fbcon
takes over the old buffer as if it were to contain 8 bits chars
(although the attrs are still shifted for 9 bits), and effectively
this results in the yellow color texts instead of the original white
color, as found in the bugzilla entry below.

An easy fix for this is to re-adjust the attrs before leaving the
fbcon at con_deinit callback.  Since the code to adjust the attrs is
already present in the current fbcon code, in this patch, we simply
factor out the relevant code, and call it from fbcon_deinit().

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000619
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm: reference count event->completion
Daniel Vetter [Wed, 21 Dec 2016 10:23:30 +0000 (11:23 +0100)]
drm: reference count event->completion

commit 24835e442f289813aa568d142a755672a740503c upstream.

When writing the generic nonblocking commit code I assumed that
through clever lifetime management I can assure that the completion
(stored in drm_crtc_commit) only gets freed after it is completed. And
that worked.

I also wanted to make nonblocking helpers resilient against driver
bugs, by having timeouts everywhere. And that worked too.

Unfortunately taking boths things together results in oopses :( Well,
at least sometimes: What seems to happen is that the drm event hangs
around forever stuck in limbo land. The nonblocking helpers eventually
time out, move on and release it. Now the bug I tested all this
against is drivers that just entirely fail to deliver the vblank
events like they should, and in those cases the event is simply
leaked. But what seems to happen, at least sometimes, on i915 is that
the event is set up correctly, but somohow the vblank fails to fire in
time. Which means the event isn't leaked, it's still there waiting for
eventually a vblank to fire. That tends to happen when re-enabling the
pipe, and then the trap springs and the kernel oopses.

The correct fix here is simply to refcount the crtc commit to make
sure that the event sticks around even for drivers which only
sometimes fail to deliver vblanks for some arbitrary reasons. Since
crtc commits are already refcounted that's easy to do.

References: https://bugs.freedesktop.org/show_bug.cgi?id=96781
Cc: Jim Rees <rees@umich.edu>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161221102331.31033-1-daniel.vetter@ffwll.ch
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonl80211: fix dumpit error path RTNL deadlocks
Johannes Berg [Wed, 15 Mar 2017 13:26:04 +0000 (14:26 +0100)]
nl80211: fix dumpit error path RTNL deadlocks

commit ea90e0dc8cecba6359b481e24d9c37160f6f524f upstream.

Sowmini pointed out Dmitry's RTNL deadlock report to me, and it turns out
to be perfectly accurate - there are various error paths that miss unlock
of the RTNL.

To fix those, change the locking a bit to not be conditional in all those
nl80211_prepare_*_dump() functions, but make those require the RTNL to
start with, and fix the buggy error paths. This also let me use sparse
(by appropriately overriding the rtnl_lock/rtnl_unlock functions) to
validate the changes.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/bridge: analogix dp: Fix runtime PM state on driver bind
Marek Szyprowski [Fri, 30 Dec 2016 09:57:46 +0000 (10:57 +0100)]
drm/bridge: analogix dp: Fix runtime PM state on driver bind

commit f0a8b49c03d22a511a601dc54b2a3425a41e35fa upstream.

Analogix_dp_bind() can be called from component framework, which doesn't
guarantee proper runtime PM state of the device during bind operation,
so ensure that device is runtime active before doing any register access.
This ensures that the power domain, to which DP module belongs, is turned
on. While at it, also fix the unbalanced call to phy_power_on() in
analogix_dp_bind() function.

This patch solves the following kernel oops on Samsung Exynos5250 Snow
board:

Unhandled fault: imprecise external abort (0x406) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 406 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 75 Comm: kworker/0:2 Not tainted 4.9.0 #1046
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: events deferred_probe_work_func
task: ee272300 task.stack: ee312000
PC is at analogix_dp_enable_sw_function+0x18/0x2c
LR is at analogix_dp_init_dp+0x2c/0x50
...
[<c03fcb38>] (analogix_dp_enable_sw_function) from [<c03fa9c4>] (analogix_dp_init_dp+0x2c/0x50)
[<c03fa9c4>] (analogix_dp_init_dp) from [<c03fab6c>] (analogix_dp_bind+0x184/0x42c)
[<c03fab6c>] (analogix_dp_bind) from [<c03fdb84>] (component_bind_all+0xf0/0x218)
[<c03fdb84>] (component_bind_all) from [<c03ed64c>] (exynos_drm_load+0x134/0x200)
[<c03ed64c>] (exynos_drm_load) from [<c03d5058>] (drm_dev_register+0xa0/0xd0)
[<c03d5058>] (drm_dev_register) from [<c03d66b8>] (drm_platform_init+0x58/0xb0)
[<c03d66b8>] (drm_platform_init) from [<c03fe0c4>] (try_to_bring_up_master+0x14c/0x188)
[<c03fe0c4>] (try_to_bring_up_master) from [<c03fe188>] (component_add+0x88/0x138)
[<c03fe188>] (component_add) from [<c0403a38>] (platform_drv_probe+0x50/0xb0)
[<c0403a38>] (platform_drv_probe) from [<c0402470>] (driver_probe_device+0x1f0/0x2a8)
[<c0402470>] (driver_probe_device) from [<c0400a54>] (bus_for_each_drv+0x44/0x8c)
[<c0400a54>] (bus_for_each_drv) from [<c04021f8>] (__device_attach+0x9c/0x100)
[<c04021f8>] (__device_attach) from [<c04018e8>] (bus_probe_device+0x84/0x8c)
[<c04018e8>] (bus_probe_device) from [<c0401d1c>] (deferred_probe_work_func+0x60/0x8c)
[<c0401d1c>] (deferred_probe_work_func) from [<c012fc14>] (process_one_work+0x120/0x318)
[<c012fc14>] (process_one_work) from [<c012fe34>] (process_scheduled_works+0x28/0x38)
[<c012fe34>] (process_scheduled_works) from [<c0130048>] (worker_thread+0x204/0x4ac)
[<c0130048>] (worker_thread) from [<c01352c4>] (kthread+0xd8/0xf4)
[<c01352c4>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
Code: e59035f0 e5935018 f57ff04f e3c55001 (f57ff04e)
---[ end trace 3d1d0d87796de344 ]---

Reviewed-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1483091866-1088-1-git-send-email-m.szyprowski@samsung.com
Cc: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodevice-dax: fix pmd/pte fault fallback handling
Dave Jiang [Fri, 10 Mar 2017 20:24:22 +0000 (13:24 -0700)]
device-dax: fix pmd/pte fault fallback handling

commit 0134ed4fb9e78672ee9f7b18007114404c81e63f upstream.

Jeff Moyer reports:

    With a device dax alignment of 4KB or 2MB, I get sigbus when running
    the attached fio job file for the current kernel (4.11.0-rc1+).  If
    I specify an alignment of 1GB, it works.

    I turned on debug output, and saw that it was failing in the huge
    fault code.

     dax dax1.0: dax_open
     dax dax1.0: dax_mmap
     dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
     dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60
     dax dax1.0: dax_release

    fio config for reproduce:
    [global]
    ioengine=dev-dax
    direct=0
    filename=/dev/dax0.0
    bs=2m

    [write]
    rw=write

    [read]
    stonewall
    rw=read

The driver fails to fallback when taking a fault that is larger than
the device alignment, or handling a larger fault when a smaller
mapping is already established. While we could support larger
mappings for a device with a smaller alignment, that change is
too large for the immediate fix. The simplest change is to force
fallback until the fault size matches the alignment.

Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Cc: <stable@vger.kernel.org>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agolibceph: don't set weight to IN when OSD is destroyed
Ilya Dryomov [Wed, 1 Mar 2017 16:33:27 +0000 (17:33 +0100)]
libceph: don't set weight to IN when OSD is destroyed

commit b581a5854eee4b7851dedb0f8c2ceb54fb902c06 upstream.

Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs.  Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.

Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.

Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoDrivers: hv: vmbus: Don't leak memory when a channel is rescinded
K. Y. Srinivasan [Mon, 13 Mar 2017 03:00:30 +0000 (20:00 -0700)]
Drivers: hv: vmbus: Don't leak memory when a channel is rescinded

commit 5e030d5ce9d99a899b648413139ff65bab12b038 upstream.

When we close a channel that has been rescinded, we will leak memory since
vmbus_teardown_gpadl() returns an error. Fix this so that we can properly
cleanup the memory allocated to the ring buffers.

Fixes: ccb61f8a99e6 ("Drivers: hv: vmbus: Fix a rescind handling bug")

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoDrivers: hv: vmbus: Don't leak channel ids
K. Y. Srinivasan [Mon, 13 Mar 2017 22:57:09 +0000 (15:57 -0700)]
Drivers: hv: vmbus: Don't leak channel ids

commit 9a5476020a5f06a0fc6f17097efc80275d2f03cd upstream.

If we cannot allocate memory for the channel, free the relid
associated with the channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agointel_th: Don't leak module refcount on failure to activate
Alexander Shishkin [Fri, 24 Feb 2017 14:04:15 +0000 (16:04 +0200)]
intel_th: Don't leak module refcount on failure to activate

commit e609ccef5222c73b46b322be7d3796d60bff353d upstream.

Output 'activation' may fail for the reasons of the output driver,
for example, if msc's buffer is not allocated. We forget, however,
to drop the module reference in this case. So each attempt at
activation in this case leaks a reference, preventing the module
from ever unloading.

This patch adds the missing module_put() in the activation error
path.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agojbd2: don't leak memory if setting up journal fails
Eric Biggers [Wed, 15 Mar 2017 19:08:48 +0000 (15:08 -0400)]
jbd2: don't leak memory if setting up journal fails

commit cd9cb405e0b948363811dc74dbb2890f56f2cb87 upstream.

In journal_init_common(), if we failed to allocate the j_wbuf array, or
if we failed to create the buffer_head for the journal superblock, we
leaked the memory allocated for the revocation tables.  Fix this.

Fixes: f0c9fd5458bacf7b12a9a579a727dc740cbe047e
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoauxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches
Dmitry Torokhov [Mon, 20 Feb 2017 00:33:35 +0000 (16:33 -0800)]
auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches

commit abda288bb207e5c681306299126af8c022709c18 upstream.

The OF device table must be terminated, otherwise we'll be walking past it
and into areas unknown.

Fixes: 0cad855fbd08 ("auxdisplay: img-ascii-lcd: driver for simple ASCII...")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/amdgpu: reinstate oland workaround for sclk
Alex Deucher [Thu, 16 Mar 2017 01:13:25 +0000 (21:13 -0400)]
drm/amdgpu: reinstate oland workaround for sclk

commit e11ddff68a7c455e63c4b46154a3e75c699a7b55 upstream.

Higher sclks seem to be unstable on some boards.

bug: https://bugs.freedesktop.org/show_bug.cgi?id=100222

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoblk-mq: don't complete un-started request in timeout handler
Ming Lei [Wed, 22 Mar 2017 02:14:43 +0000 (10:14 +0800)]
blk-mq: don't complete un-started request in timeout handler

commit 95a49603707d982b25d17c5b70e220a05556a2f9 upstream.

When iterating busy requests in timeout handler,
if the STARTED flag of one request isn't set, that means
the request is being processed in block layer or driver, and
isn't submitted to hardware yet.

In current implementation of blk_mq_check_expired(),
if the request queue becomes dying, un-started requests are
handled as being completed/freed immediately. This way is
wrong, and can cause rq corruption or double allocation[1][2],
when doing I/O and removing&resetting NVMe device at the sametime.

This patch fixes several issues reported by Yi Zhang.

[1]. oops log 1
[  581.789754] ------------[ cut here ]------------
[  581.789758] kernel BUG at block/blk-mq.c:374!
[  581.789760] invalid opcode: 0000 [#1] SMP
[  581.789761] Modules linked in: vfat fat ipmi_ssif intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvme
irqbypass crct10dif_pclmul nvme_core crc32_pclmul ghash_clmulni_intel
intel_cstate ipmi_si mei_me ipmi_devintf intel_uncore sg ipmi_msghandler
intel_rapl_perf iTCO_wdt mei iTCO_vendor_support mxm_wmi lpc_ich dcdbas shpchp
pcspkr acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd dm_multipath grace
sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci
crc32c_intel tg3 libata megaraid_sas i2c_core ptp fjes pps_core dm_mirror
dm_region_hash dm_log dm_mod
[  581.789796] CPU: 1 PID: 1617 Comm: kworker/1:1H Not tainted 4.10.0.bz1420297+ #4
[  581.789797] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  581.789804] Workqueue: kblockd blk_mq_timeout_work
[  581.789806] task: ffff8804721c8000 task.stack: ffffc90006ee4000
[  581.789809] RIP: 0010:blk_mq_end_request+0x58/0x70
[  581.789810] RSP: 0018:ffffc90006ee7d50 EFLAGS: 00010202
[  581.789811] RAX: 0000000000000001 RBX: ffff8802e4195340 RCX: ffff88028e2f4b88
[  581.789812] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
[  581.789813] RBP: ffffc90006ee7d60 R08: 0000000000000003 R09: ffff88028e2f4b00
[  581.789814] R10: 0000000000001000 R11: 0000000000000001 R12: 00000000fffffffb
[  581.789815] R13: ffff88042abe5780 R14: 000000000000002d R15: ffff88046fbdff80
[  581.789817] FS:  0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
[  581.789818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  581.789819] CR2: 00007f64f403a008 CR3: 000000014d078000 CR4: 00000000001406e0
[  581.789820] Call Trace:
[  581.789825]  blk_mq_check_expired+0x76/0x80
[  581.789828]  bt_iter+0x45/0x50
[  581.789830]  blk_mq_queue_tag_busy_iter+0xdd/0x1f0
[  581.789832]  ? blk_mq_rq_timed_out+0x70/0x70
[  581.789833]  ? blk_mq_rq_timed_out+0x70/0x70
[  581.789840]  ? __switch_to+0x140/0x450
[  581.789841]  blk_mq_timeout_work+0x88/0x170
[  581.789845]  process_one_work+0x165/0x410
[  581.789847]  worker_thread+0x137/0x4c0
[  581.789851]  kthread+0x101/0x140
[  581.789853]  ? rescuer_thread+0x3b0/0x3b0
[  581.789855]  ? kthread_park+0x90/0x90
[  581.789860]  ret_from_fork+0x2c/0x40
[  581.789861] Code: 48 85 c0 74 0d 44 89 e6 48 89 df ff d0 5b 41 5c 5d c3 48
8b bb 70 01 00 00 48 85 ff 75 0f 48 89 df e8 7d f0 ff ff 5b 41 5c 5d c3 <0f>
0b e8 71 f0 ff ff 90 eb e9 0f 1f 40 00 66 2e 0f 1f 84 00 00
[  581.789882] RIP: blk_mq_end_request+0x58/0x70 RSP: ffffc90006ee7d50
[  581.789889] ---[ end trace bcaf03d9a14a0a70 ]---

[2]. oops log2
[ 6984.857362] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 6984.857372] IP: nvme_queue_rq+0x6e6/0x8cd [nvme]
[ 6984.857373] PGD 0
[ 6984.857374]
[ 6984.857376] Oops: 0000 [#1] SMP
[ 6984.857379] Modules linked in: ipmi_ssif vfat fat intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si iTCO_wdt
iTCO_vendor_support mxm_wmi ipmi_devintf intel_cstate sg dcdbas intel_uncore
mei_me intel_rapl_perf mei pcspkr lpc_ich ipmi_msghandler shpchp
acpi_power_meter wmi nfsd auth_rpcgss dm_multipath nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect crc32c_intel sysimgblt fb_sys_fops ttm nvme drm nvme_core ahci
libahci i2c_core tg3 libata ptp megaraid_sas pps_core fjes dm_mirror
dm_region_hash dm_log dm_mod
[ 6984.857416] CPU: 7 PID: 1635 Comm: kworker/7:1H Not tainted
4.10.0-2.el7.bz1420297.x86_64 #1
[ 6984.857417] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[ 6984.857427] Workqueue: kblockd blk_mq_run_work_fn
[ 6984.857429] task: ffff880476e3da00 task.stack: ffffc90002e90000
[ 6984.857432] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme]
[ 6984.857433] RSP: 0018:ffffc90002e93c50 EFLAGS: 00010246
[ 6984.857434] RAX: 0000000000000000 RBX: ffff880275646600 RCX: 0000000000001000
[ 6984.857435] RDX: 0000000000000fff RSI: 00000002fba2a000 RDI: ffff8804734e6950
[ 6984.857436] RBP: ffffc90002e93d30 R08: 0000000000002000 R09: 0000000000001000
[ 6984.857437] R10: 0000000000001000 R11: 0000000000000000 R12: ffff8804741d8000
[ 6984.857438] R13: 0000000000000040 R14: ffff880475649f80 R15: ffff8804734e6780
[ 6984.857439] FS:  0000000000000000(0000) GS:ffff88047fcc0000(0000) knlGS:0000000000000000
[ 6984.857440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6984.857442] CR2: 0000000000000010 CR3: 0000000001c09000 CR4: 00000000001406e0
[ 6984.857443] Call Trace:
[ 6984.857451]  ? mempool_free+0x2b/0x80
[ 6984.857455]  ? bio_free+0x4e/0x60
[ 6984.857459]  blk_mq_dispatch_rq_list+0xf5/0x230
[ 6984.857462]  blk_mq_process_rq_list+0x133/0x170
[ 6984.857465]  __blk_mq_run_hw_queue+0x8c/0xa0
[ 6984.857467]  blk_mq_run_work_fn+0x12/0x20
[ 6984.857473]  process_one_work+0x165/0x410
[ 6984.857475]  worker_thread+0x137/0x4c0
[ 6984.857478]  kthread+0x101/0x140
[ 6984.857480]  ? rescuer_thread+0x3b0/0x3b0
[ 6984.857481]  ? kthread_park+0x90/0x90
[ 6984.857489]  ret_from_fork+0x2c/0x40
[ 6984.857490] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44
89 95 60 ff ff ff e8 b7 dd 12 e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c>
8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff
[ 6984.857511] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: ffffc90002e93c50
[ 6984.857512] CR2: 0000000000000010
[ 6984.895359] ---[ end trace 2d7ceb528432bf83 ]---

Reported-by: Yi Zhang <yizhan@redhat.com>
Tested-by: Yi Zhang <yizhan@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocgroup, net_cls: iterate the fds of only the tasks which are being migrated
Tejun Heo [Tue, 14 Mar 2017 23:25:56 +0000 (19:25 -0400)]
cgroup, net_cls: iterate the fds of only the tasks which are being migrated

commit a05d4fd9176003e0c1f9c3d083f4dac19fd346ab upstream.

The net_cls controller controls the classid field of each socket which
is associated with the cgroup.  Because the classid is per-socket
attribute, when a task migrates to another cgroup or the configured
classid of the cgroup changes, the controller needs to walk all
sockets and update the classid value, which was implemented by
3b13758f51de ("cgroups: Allow dynamically changing net_classid").

While the approach is not scalable, migrating tasks which have a lot
of fds attached to them is rare and the cost is born by the ones
initiating the operations.  However, for simplicity, both the
migration and classid config change paths call update_classid() which
scans all fds of all tasks in the target css.  This is an overkill for
the migration path which only needs to cover a much smaller subset of
tasks which are actually getting migrated in.

On cgroup v1, this can lead to unexpected scalability issues when one
tries to migrate a task or process into a net_cls cgroup which already
contains a lot of fds.  Even if the migration traget doesn't have many
to get scanned, update_classid() ends up scanning all fds in the
target cgroup which can be extremely numerous.

Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
even worse.  Before bfc2cf6f61fc ("cgroup: call subsys->*attach() only
for subsystems which are actually affected by migration"), cgroup core
would call the ->css_attach callback even for controllers which don't
see actual migration to a different css.

As net_cls is always disabled but still mounted on cgroup v2, whenever
a process is migrated on the cgroup v2 hierarchy, net_cls sees
identity migration from root to root and cgroup core used to call
->css_attach callback for those.  The net_cls ->css_attach ends up
calling update_classid() on the root net_cls css to which all
processes on the system belong to as the controller isn't used.  This
makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
which is horrible and easily leads to noticeable stalls triggering RCU
stall warnings and so on.

The worst symptom is already fixed in upstream by bfc2cf6f61fc
("cgroup: call subsys->*attach() only for subsystems which are
actually affected by migration"); however, backporting that commit is
too invasive and we want to avoid other cases too.

This patch updates net_cls's cgrp_attach() to iterate fds of only the
processes which are actually getting migrated.  This removes the
surprising migration cost which is dependent on the total number of
fds in the target cgroup.  As this leaves write_classid() the only
user of update_classid(), open-code the helper into write_classid().

Reported-by: David Goode <dgoode@fb.com>
Fixes: 3b13758f51de ("cgroups: Allow dynamically changing net_classid")
Cc: Nina Schiff <ninasc@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocpufreq: Restore policy min/max limits on CPU online
Viresh Kumar [Tue, 21 Mar 2017 06:06:06 +0000 (11:36 +0530)]
cpufreq: Restore policy min/max limits on CPU online

commit ff010472fb75670cb5c08671e820eeea3af59c87 upstream.

On CPU online the cpufreq core restores the previous governor (or
the previous "policy" setting for ->setpolicy drivers), but it does
not restore the min/max limits at the same time, which is confusing,
inconsistent and real pain for users who set the limits and then
suspend/resume the system (using full suspend), in which case the
limits are reset on all CPUs except for the boot one.

Fix this by making cpufreq_online() restore the limits when an inactive
policy is brought online.

The commit log and patch are inspired from Rafael's earlier work.

Reported-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoarm64: kaslr: Fix up the kernel image alignment
Neeraj Upadhyay [Wed, 22 Mar 2017 11:38:25 +0000 (17:08 +0530)]
arm64: kaslr: Fix up the kernel image alignment

commit afd0e5a876703accb95894f23317a13e2c49b523 upstream.

If kernel image extends across alignment boundary, existing
code increases the KASLR offset by size of kernel image. The
offset is masked after resizing. There are cases, where after
masking, we may still have kernel image extending across
boundary. This eventually results in only 2MB block getting
mapped while creating the page tables. This results in data aborts
while accessing unmapped regions during second relocation (with
kaslr offset) in __primary_switch. To fix this problem, round up the
kernel image size, by swapper block size, before adding it for
correction.

For example consider below case, where kernel image still crosses
1GB alignment boundary, after masking the offset, which is fixed
by rounding up kernel image size.

SWAPPER_TABLE_SHIFT = 30
Swapper using section maps with section size 2MB.
CONFIG_PGTABLE_LEVELS = 3
VA_BITS = 39

_text  : 0xffffff8008080000
_end   : 0xffffff800aa1b000
offset : 0x1f35600000
mask = ((1UL << (VA_BITS - 2)) - 1) & ~(SZ_2M - 1)

(_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7c
(_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d

offset after existing correction (before mask) = 0x1f37f9b000
(_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7d
(_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d

offset (after mask) = 0x1f37e00000
(_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7c
(_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d

new offset w/ rounding up = 0x1f38000000
(_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7d
(_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d

Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR")
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoARM: at91: pm: cpu_idle: switch DDR to power-down mode
Nicolas Ferre [Tue, 14 Mar 2017 08:38:04 +0000 (09:38 +0100)]
ARM: at91: pm: cpu_idle: switch DDR to power-down mode

commit 60b89f1928af80b546b5c3fd8714a62f6f4b8844 upstream.

On some DDR controllers, compatible with the sama5d3 one,
the sequence to enter/exit/re-enter the self-refresh mode adds
more constrains than what is currently written in the at91_idle
driver. An actual access to the DDR chip is needed between exit
and re-enter of this mode which is somehow difficult to implement.
This sequence can completely hang the SoC. It is particularly
experienced on parts which embed a L2 cache if the code run
between IDLE calls fits in it...

Moreover, as the intention is to enter and exit pretty rapidly
from IDLE, the power-down mode is a good candidate.

So now we use power-down instead of self-refresh. As we can
simplify the code for sama5d3 compatible DDR controllers,
we instantiate a new sama5d3_ddr_standby() function.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Fixes: 017b5522d5e3 ("ARM: at91: Add new binding for sama5d3-ddramc")
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "ARM: at91/dt: sama5d2: Use new compatible for ohci node"
Romain Izard [Fri, 17 Feb 2017 15:12:50 +0000 (16:12 +0100)]
Revert "ARM: at91/dt: sama5d2: Use new compatible for ohci node"

commit 9e10889a3177340dcda7d29c6d8fbd97247b007b upstream.

This reverts commit cab43282682e ("ARM: at91/dt: sama5d2: Use new
compatible for ohci node")

It depends from commit 7150bc9b4d43 ("usb: ohci-at91: Forcibly suspend
ports while USB suspend") which was reverted and implemented
differently. With the new implementation, the compatible string must
remain the same.

The compatible string introduced by this commit has been used in the
default SAMA5D2 dtsi starting from Linux 4.8. As it has never been
working correctly in an official release, removing it should not be
breaking the stability rules.

Fixes: cab43282682e ("ARM: at91/dt: sama5d2: Use new compatible for ohci node")
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoiommu/vt-d: Fix NULL pointer dereference in device_to_iommu
Koos Vriezen [Wed, 1 Mar 2017 20:02:50 +0000 (21:02 +0100)]
iommu/vt-d: Fix NULL pointer dereference in device_to_iommu

commit 5003ae1e735e6bfe4679d9bed6846274f322e77e upstream.

The function device_to_iommu() in the Intel VT-d driver
lacks a NULL-ptr check, resulting in this oops at boot on
some platforms:

 BUG: unable to handle kernel NULL pointer dereference at 00000000000007ab
 IP: [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
 PGD 0

 [...]

 Call Trace:
   ? find_or_alloc_domain.constprop.29+0x1a/0x300
   ? dw_dma_probe+0x561/0x580 [dw_dmac_core]
   ? __get_valid_domain_for_dev+0x39/0x120
   ? __intel_map_single+0x138/0x180
   ? intel_alloc_coherent+0xb6/0x120
   ? sst_hsw_dsp_init+0x173/0x420 [snd_soc_sst_haswell_pcm]
   ? mutex_lock+0x9/0x30
   ? kernfs_add_one+0xdb/0x130
   ? devres_add+0x19/0x60
   ? hsw_pcm_dev_probe+0x46/0xd0 [snd_soc_sst_haswell_pcm]
   ? platform_drv_probe+0x30/0x90
   ? driver_probe_device+0x1ed/0x2b0
   ? __driver_attach+0x8f/0xa0
   ? driver_probe_device+0x2b0/0x2b0
   ? bus_for_each_dev+0x55/0x90
   ? bus_add_driver+0x110/0x210
   ? 0xffffffffa11ea000
   ? driver_register+0x52/0xc0
   ? 0xffffffffa11ea000
   ? do_one_initcall+0x32/0x130
   ? free_vmap_area_noflush+0x37/0x70
   ? kmem_cache_alloc+0x88/0xd0
   ? do_init_module+0x51/0x1c4
   ? load_module+0x1ee9/0x2430
   ? show_taint+0x20/0x20
   ? kernel_read_file+0xfd/0x190
   ? SyS_finit_module+0xa3/0xb0
   ? do_syscall_64+0x4a/0xb0
   ? entry_SYSCALL64_slow_path+0x25/0x25
 Code: 78 ff ff ff 4d 85 c0 74 ee 49 8b 5a 10 0f b6 9b e0 00 00 00 41 38 98 e0 00 00 00 77 da 0f b6 eb 49 39 a8 88 00 00 00 72 ce eb 8f <41> f6 82 ab 07 00 00 04 0f 85 76 ff ff ff 0f b6 4d 08 88 0e 49
 RIP  [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
  RSP <ffffc90001457a78>
 CR2: 00000000000007ab
 ---[ end trace 16f974b6d58d0aad ]---

Add the missing pointer check.

Fixes: 1c387188c60f53b338c20eee32db055dfe022a9b ("iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions")
Signed-off-by: Koos Vriezen <koos.vriezen@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxen/acpi: upload PM state from init-domain to Xen
Ankur Arora [Tue, 21 Mar 2017 22:43:38 +0000 (15:43 -0700)]
xen/acpi: upload PM state from init-domain to Xen

commit 1914f0cd203c941bba72f9452c8290324f1ef3dc upstream.

This was broken in commit cd979883b9ed ("xen/acpi-processor:
fix enabling interrupts on syscore_resume"). do_suspend (from
xen/manage.c) and thus xen_resume_notifier never get called on
the initial-domain at resume (it is if running as guest.)

The rationale for the breaking change was that upload_pm_data()
potentially does blocking work in syscore_resume(). This patch
addresses the original issue by scheduling upload_pm_data() to
execute in workqueue context.

Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Based-on-patch-by: Konrad Wilk <konrad.wilk@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocrypto: ccp - Assign DMA commands to the channel's CCP
Gary R Hook [Fri, 10 Mar 2017 18:28:18 +0000 (12:28 -0600)]
crypto: ccp - Assign DMA commands to the channel's CCP

commit 7c468447f40645fbf2a033dfdaa92b1957130d50 upstream.

The CCP driver generally uses a round-robin approach when
assigning operations to available CCPs. For the DMA engine,
however, the DMA mappings of the SGs are associated with a
specific CCP. When an IOMMU is enabled, the IOMMU is
programmed based on this specific device.

If the DMA operations are not performed by that specific
CCP then addressing errors and I/O page faults will occur.

Update the CCP driver to allow a specific CCP device to be
requested for an operation and use this in the DMA engine
support.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>