OSDN Git Service
Markos Chandras [Mon, 9 Mar 2015 14:54:49 +0000 (14:54 +0000)]
MIPS: asm: asm-eva: Introduce kernel load/store variants
commit
60cd7e08e453bc6828ac4b539f949e4acd80f143 upstream.
Introduce new macros for kernel load/store variants which will be
used to perform regular kernel space load/store operations in EVA
mode.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9500/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Markos Chandras [Fri, 27 Feb 2015 07:51:32 +0000 (07:51 +0000)]
MIPS: Malta: Detect and fix bad memsize values
commit
f7f8aea4b97c4d48e42f02cb37026bee445f239f upstream.
memsize denotes the amount of RAM we can access from kseg{0,1} and
that should be up to 256M. In case the bootloader reports a value
higher than that (perhaps reporting all the available RAM) it's best
if we fix it ourselves and just warn the user about that. This is
usually a problem with the bootloader and/or its environment.
[ralf@linux-mips.org: Remove useless parens as suggested bei Sergei.
Reformat long pr_warn statement to fit into 80 column limit.]
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9362/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Hogan [Wed, 25 Feb 2015 13:08:05 +0000 (13:08 +0000)]
MIPS: lose_fpu(): Disable FPU when MSA enabled
commit
acaf6a97d623af123314c2f8ce4cf7254f6b2fc1 upstream.
The lose_fpu() function only disables the FPU in CP0_Status.CU1 if the
FPU is in use and MSA isn't enabled.
This isn't necessarily a problem because KSTK_STATUS(current), the
version of CP0_Status stored on the kernel stack on entry from user
mode, does always get updated and gets restored when returning to user
mode, but I don't think it was intended, and it is inconsistent with the
case of only the FPU being in use. Sometimes leaving the FPU enabled may
also mask kernel bugs where FPU operations are executed when the FPU
might not be enabled.
So lets disable the FPU in the MSA case too.
Fixes:
33c771ba5c5d ("MIPS: save/disable MSA in lose_fpu")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9323/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Hogan [Fri, 6 Feb 2015 11:11:56 +0000 (11:11 +0000)]
MIPS: KVM: Handle MSA Disabled exceptions from guest
commit
98119ad53376885819d93dfb8737b6a9a61ca0ba upstream.
Guest user mode can generate a guest MSA Disabled exception on an MSA
capable core by simply trying to execute an MSA instruction. Since this
exception is unknown to KVM it will be passed on to the guest kernel.
However guest Linux kernels prior to v3.15 do not set up an exception
handler for the MSA Disabled exception as they don't support any MSA
capable cores. This results in a guest OS panic.
Since an older processor ID may be being emulated, and MSA support is
not advertised to the guest, the correct behaviour is to generate a
Reserved Instruction exception in the guest kernel so it can send the
guest process an illegal instruction signal (SIGILL), as would happen
with a non-MSA-capable core.
Fix this as minimally as reasonably possible by preventing
kvm_mips_check_privilege() from relaying MSA Disabled exceptions from
guest user mode to the guest kernel, and handling the MSA Disabled
exception by emulating a Reserved Instruction exception in the guest,
via a new handle_msa_disabled() KVM callback.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Serebrin [Thu, 16 Apr 2015 18:58:05 +0000 (11:58 -0700)]
KVM: VMX: Preserve host CR4.MCE value while in guest mode.
commit
085e68eeafbf76e21848ad5bafaecec88a11dd64 upstream.
The host's decision to enable machine check exceptions should remain
in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset
and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
Tested: Built.
On earlier version, tested by injecting machine check
while a guest is spinning.
Before the change, if guest CR4.MCE==0, then the machine check is
escalated to Catastrophic Error (CATERR) and the machine dies.
If guest CR4.MCE==1, then the machine check causes VMEXIT and is
handled normally by host Linux. After the change, injecting a machine
check causes normal Linux machine check handling.
Signed-off-by: Ben Serebrin <serebrin@google.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andre Przywara [Fri, 10 Apr 2015 15:17:59 +0000 (16:17 +0100)]
KVM: arm/arm64: check IRQ number on userland injection
commit
fd1d0ddf2ae92fb3df42ed476939861806c5d785 upstream.
When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
only check it against a fixed limit, which historically is set
to 127. With the new dynamic IRQ allocation the effective limit may
actually be smaller (64).
So when now a malicious or buggy userland injects a SPI in that
range, we spill over on our VGIC bitmaps and bytemaps memory.
I could trigger a host kernel NULL pointer dereference with current
mainline by injecting some bogus IRQ number from a hacked kvmtool:
-----------------
....
DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
DEBUG: IRQ #114 still in the game, writing to bytemap now...
Unable to handle kernel NULL pointer dereference at virtual address
00000000
pgd =
ffffffc07652e000
[
00000000] *pgd=
00000000f658b003, *pud=
00000000f658b003, *pmd=
0000000000000000
Internal error: Oops:
96000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
Hardware name: FVP Base (DT)
task:
ffffffc0774e9680 ti:
ffffffc0765a8000 task.ti:
ffffffc0765a8000
PC is at kvm_vgic_inject_irq+0x234/0x310
LR is at kvm_vgic_inject_irq+0x30c/0x310
pc : [<
ffffffc0000ae0a8>] lr : [<
ffffffc0000ae180>] pstate:
80000145
.....
So this patch fixes this by checking the SPI number against the
actual limit. Also we remove the former legacy hard limit of
127 in the ioctl code.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
[maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
as suggested by Christopher Covington]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Radim Krčmář [Wed, 8 Apr 2015 12:16:48 +0000 (14:16 +0200)]
KVM: use slowpath for cross page cached accesses
commit
ca3f0874723fad81d0c701b63ae3a17a408d5f25 upstream.
kvm_write_guest_cached() does not mark all written pages as dirty and
code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot
with cross page accesses. Fix all the easy way.
The check is '<= 1' to have the same result for 'len = 0' cache anywhere
in the page. (nr_pages_needed is 0 on page boundary.)
Fixes:
8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Message-Id: <
20150408121648.GA3519@potion.brq.redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heiko Carstens [Wed, 25 Mar 2015 09:13:33 +0000 (10:13 +0100)]
s390/hibernate: fix save and restore of kernel text section
commit
d74419495633493c9cd3f2bbeb7f3529d0edded6 upstream.
Sebastian reported a crash caused by a jump label mismatch after resume.
This happens because we do not save the kernel text section during suspend
and therefore also do not restore it during resume, but use the kernel image
that restores the old system.
This means that after a suspend/resume cycle we lost all modifications done
to the kernel text section.
The reason for this is the pfn_is_nosave() function, which incorrectly
returns that read-only pages don't need to be saved. This is incorrect since
we mark the kernel text section read-only.
We still need to make sure to not save and restore pages contained within
NSS and DCSS segment.
To fix this add an extra case for the kernel text section and only save
those pages if they are not contained within an NSS segment.
Fixes the following crash (and the above bugs as well):
Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0
Found: c0 04 00 00 00 00
Expected: c0 f4 00 00 00 11
New: c0 04 00 00 00 00
Kernel panic - not syncing: Corrupted kernel text
CPU: 0 PID: 9 Comm: migration/0 Not tainted
3.19.0-01975-gb1b096e70f23 #4
Call Trace:
[<
0000000000113972>] show_stack+0x72/0xf0
[<
000000000081f15e>] dump_stack+0x6e/0x90
[<
000000000081c4e8>] panic+0x108/0x2b0
[<
000000000081be64>] jump_label_bug.isra.2+0x104/0x108
[<
0000000000112176>] __jump_label_transform+0x9e/0xd0
[<
00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50
[<
00000000001d1136>] multi_cpu_stop+0x12e/0x170
[<
00000000001d1472>] cpu_stopper_thread+0xb2/0x168
[<
000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0
[<
0000000000158baa>] kthread+0x10a/0x110
[<
0000000000824a86>] kernel_thread_starter+0x6/0xc
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jens Freimann [Mon, 16 Mar 2015 11:17:13 +0000 (12:17 +0100)]
KVM: s390: fix get_all_floating_irqs
commit
94aa033efcac47b09db22cb561e135baf37b7887 upstream.
This fixes a bug introduced with commit
c05c4186bbe4 ("KVM: s390:
add floating irq controller").
get_all_floating_irqs() does copy_to_user() while holding
a spin lock. Let's fix this by filling a temporary buffer
first and copy it to userspace after giving up the lock.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ekaterina Tumanova [Tue, 3 Mar 2015 08:54:41 +0000 (09:54 +0100)]
KVM: s390: Zero out current VMDB of STSI before including level3 data.
commit
b75f4c9afac2604feb971441116c07a24ecca1ec upstream.
s390 documentation requires words 0 and 10-15 to be reserved and stored as
zeros. As we fill out all other fields, we can memset the full structure.
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Hildenbrand [Wed, 4 Feb 2015 14:59:11 +0000 (15:59 +0100)]
KVM: s390: reinjection of irqs can fail in the tpi handler
commit
15462e37ca848abac7477dece65f8af25febd744 upstream.
The reinjection of an I/O interrupt can fail if the list is at the limit
and between the dequeue and the reinjection, another I/O interrupt is
injected (e.g. if user space floods kvm with I/O interrupts).
This patch avoids this memory leak and returns -EFAULT in this special
case. This error is not recoverable, so let's fail hard. This can later
be avoided by not dequeuing the interrupt but working directly on the
locked list.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Hildenbrand [Wed, 4 Feb 2015 14:53:42 +0000 (15:53 +0100)]
KVM: s390: fix handling of write errors in the tpi handler
commit
261520dcfcba93ca5dfe671b88ffab038cd940c8 upstream.
If the I/O interrupt could not be written to the guest provided
area (e.g. access exception), a program exception was injected into the
guest but "inti" wasn't freed, therefore resulting in a memory leak.
In addition, the I/O interrupt wasn't reinjected. Therefore the dequeued
interrupt is lost.
This patch fixes the problem while cleaning up the function and making the
cc and rc logic easier to handle.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrzej Pietrasiewicz [Tue, 3 Mar 2015 09:52:05 +0000 (10:52 +0100)]
usb: gadget: printer: enqueue printer's response for setup request
commit
eb132ccbdec5df46e29c9814adf76075ce83576b upstream.
Function-specific setup requests should be handled in such a way, that
apart from filling in the data buffer, the requests are also actually
enqueued: if function-specific setup is called from composte_setup(),
the "usb_ep_queue()" block of code in composite_setup() is skipped.
The printer function lacks this part and it results in e.g. get device id
requests failing: the host expects some response, the device prepares it
but does not equeue it for sending to the host, so the host finally asserts
timeout.
This patch adds enqueueing the prepared responses.
Fixes:
2e87edf49227: "usb: gadget: make g_printer use composite"
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scott Wood [Sat, 11 Apr 2015 00:37:34 +0000 (19:37 -0500)]
powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
commit
50c6a665b383cb5839e45d04e36faeeefaffa052 upstream.
Commit
dc6c9a35b66b5 ("mm: account pmd page tables to the process")
added a counter that is incremented whenever a PMD is allocated and
decremented whenever a PMD is freed. For hugepages on PPC, common code
is used to allocated PMDs, but arch-specific code is used to free PMDs.
This results in kernel output such as "BUG: non-zero nr_pmds on freeing
mm: 1" when using hugepages.
Update the PPC hugepage PMD freeing code to decrement the count, just
as the above commit did for free_pmd_range().
Fixes:
dc6c9a35b66b5 ("mm: account pmd page tables to the process")
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gerald Schaefer [Tue, 14 Apr 2015 22:42:30 +0000 (15:42 -0700)]
mm/hugetlb: use pmd_page() in follow_huge_pmd()
commit
97534127012f0e396eddea4691f4c9b170aed74b upstream.
Commit
61f77eda9bbf ("mm/hugetlb: reduce arch dependent code around
follow_huge_*") broke follow_huge_pmd() on s390, where pmd and pte
layout differ and using pte_page() on a huge pmd will return wrong
results. Using pmd_page() instead fixes this.
All architectures that were touched by that commit have pmd_page()
defined, so this should not break anything on other architectures.
Fixes:
61f77eda "mm/hugetlb: reduce arch dependent code around follow_huge_*"
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Mon, 30 Mar 2015 17:26:47 +0000 (18:26 +0100)]
Btrfs: fix inode eviction infinite loop after extent_same ioctl
commit
113e8283869b9855c8b999796aadd506bbac155f upstream.
If we pass a length of 0 to the extent_same ioctl, we end up locking an
extent range with a start offset greater then its end offset (if the
destination file's offset is greater than zero). This results in a warning
from extent_io.c:insert_state through the following call chain:
btrfs_extent_same()
btrfs_double_lock()
lock_extent_range()
lock_extent(inode->io_tree, offset, offset + len - 1)
lock_extent_bits()
__set_extent_bit()
insert_state()
--> WARN_ON(end < start)
This leads to an infinite loop when evicting the inode. This is the same
problem that my previous patch titled
"Btrfs: fix inode eviction infinite loop after cloning into it" addressed
but for the extent_same ioctl instead of the clone ioctl.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Mon, 30 Mar 2015 17:23:59 +0000 (18:23 +0100)]
Btrfs: fix inode eviction infinite loop after cloning into it
commit
ccccf3d67294714af2d72a6fd6fd7d73b01c9329 upstream.
If we attempt to clone a 0 length region into a file we can end up
inserting a range in the inode's extent_io tree with a start offset
that is greater then the end offset, which triggers immediately the
following warning:
[ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3914.620886] BTRFS: end < start 4095 4096
(...)
[ 3914.638093] Call Trace:
[ 3914.638636] [<
ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3914.639620] [<
ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3914.640789] [<
ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3914.642041] [<
ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3914.643236] [<
ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3914.644441] [<
ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3914.645711] [<
ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3914.646914] [<
ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33
[ 3914.648058] [<
ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs]
[ 3914.650105] [<
ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs]
[ 3914.651361] [<
ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs]
[ 3914.652761] [<
ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs]
[ 3914.654128] [<
ffffffff811226dd>] ? might_fault+0x58/0xb5
[ 3914.655320] [<
ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs]
(...)
[ 3914.669271] ---[ end trace
14843d3e2e622fc1 ]---
This later makes the inode eviction handler enter an infinite loop that
keeps dumping the following warning over and over:
[ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3915.119913] BTRFS: end < start 4095 4096
(...)
[ 3915.137394] Call Trace:
[ 3915.137913] [<
ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3915.139154] [<
ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3915.140316] [<
ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3915.141505] [<
ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3915.142709] [<
ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3915.143849] [<
ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3915.145120] [<
ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs]
[ 3915.146352] [<
ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50
[ 3915.147565] [<
ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3915.148785] [<
ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33
[ 3915.149931] [<
ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs]
[ 3915.151154] [<
ffffffff81168904>] evict+0xa0/0x148
[ 3915.152094] [<
ffffffff811689e5>] dispose_list+0x39/0x43
[ 3915.153081] [<
ffffffff81169564>] evict_inodes+0xdc/0xeb
[ 3915.154062] [<
ffffffff81154418>] generic_shutdown_super+0x49/0xef
[ 3915.155193] [<
ffffffff811546d1>] kill_anon_super+0x13/0x1e
[ 3915.156274] [<
ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
(...)
[ 3915.167404] ---[ end trace
14843d3e2e622fc2 ]---
So just bail out of the clone ioctl if the length of the region to clone
is zero, without locking any extent range, in order to prevent this issue
(same behaviour as a pwrite with a 0 length for example).
This is trivial to reproduce. For example, the steps for the test I just
made for fstests:
mkfs.btrfs -f SCRATCH_DEV
mount SCRATCH_DEV $SCRATCH_MNT
touch $SCRATCH_MNT/foo
touch $SCRATCH_MNT/bar
$CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
umount $SCRATCH_MNT
A test case for fstests follows soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Sterba [Wed, 25 Mar 2015 18:26:41 +0000 (19:26 +0100)]
btrfs: don't accept bare namespace as a valid xattr
commit
3c3b04d10ff1811a27f86684ccd2f5ba6983211d upstream.
Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly
works:
$ touch file
$ setfattr -n user. -v 1 file
$ getfattr -d file
user.="1"
ie. the missing attribute name after the namespace.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291
Reported-by: William Douglas <william.douglas@intel.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Mon, 23 Mar 2015 14:07:40 +0000 (14:07 +0000)]
Btrfs: fix log tree corruption when fs mounted with -o discard
commit
dcc82f4783ad91d4ab654f89f37ae9291cdc846a upstream.
While committing a transaction we free the log roots before we write the
new super block. Freeing the log roots implies marking the disk location
of every node/leaf (metadata extent) as pinned before the new super block
is written. This is to prevent the disk location of log metadata extents
from being reused before the new super block is written, otherwise we
would have a corrupted log tree if before the new super block is written
a crash/reboot happens and the location of any log tree metadata extent
ended up being reused and rewritten.
Even though we pinned the log tree's metadata extents, we were issuing a
discard against them if the fs was mounted with the -o discard option,
resulting in corruption of the log tree if a crash/reboot happened before
writing the new super block - the next time the fs was mounted, during
the log replay process we would find nodes/leafs of the log btree with
a content full of zeroes, causing the process to fail and require the
use of the tool btrfs-zero-log to wipeout the log tree (and all data
previously fsynced becoming lost forever).
Fix this by not doing a discard when pinning an extent. The discard will
be done later when it's safe (after the new super block is committed) at
extent-tree.c:btrfs_finish_extent_commit().
Fixes:
e688b7252f78 (Btrfs: fix extent pinning bugs in the tree log)
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nadav Amit [Sun, 12 Apr 2015 18:47:15 +0000 (21:47 +0300)]
KVM: x86: Fix MSR_IA32_BNDCFGS in msrs_to_save
commit
9e9c3fe40bcd28e3f98f0ad8408435f4503f2781 upstream.
kvm_init_msr_list is currently called before hardware_setup. As a result,
vmx_mpx_supported always returns false when kvm_init_msr_list checks whether to
save MSR_IA32_BNDCFGS.
Move kvm_init_msr_list after vmx_hardware_setup is called to fix this issue.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <
1428864435-4732-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Thu, 23 Apr 2015 15:33:59 +0000 (08:33 -0700)]
x86: fix special __probe_kernel_write() tail zeroing case
commit
d869844bd081081bf537e806a44811884230643e upstream.
Commit
cae2a173fe94 ("x86: clean up/fix 'copy_in_user()' tail zeroing")
fixed the failure case tail zeroing of one special case of the x86-64
generic user-copy routine, namely when used for the user-to-user case
("copy_in_user()").
But in the process it broke an even more unusual case: using the user
copy routine for kernel-to-kernel copying.
Now, normally kernel-kernel copies are obviously done using memcpy(),
but we have a couple of special cases when we use the user-copy
functions. One is when we pass a kernel buffer to a regular user-buffer
routine, using set_fs(KERNEL_DS). That's a "normal" case, and continued
to work fine, because it never takes any faults (with the possible
exception of a silent and successful vmalloc fault).
But Jan Beulich pointed out another, very unusual, special case: when we
use the user-copy routines not because it's a path that expects a user
pointer, but for a couple of ftrace/kgdb cases that want to do a kernel
copy, but do so using "unsafe" buffers, and use the user-copy routine to
gracefully handle faults. IOW, for probe_kernel_write().
And that broke for the case of a faulting kernel destination, because we
saw the kernel destination and wanted to try to clear the tail of the
buffer. Which doesn't work, since that's what faults.
This only triggers for things like kgdb and ftrace users (eg trying
setting a breakpoint on read-only memory), but it's definitely a bug.
The fix is to not compare against the kernel address start (TASK_SIZE),
but instead use the same limits "access_ok()" uses.
Reported-and-tested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zijlstra [Sat, 11 Apr 2015 10:16:22 +0000 (12:16 +0200)]
perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
commit
517e6341fa123ec3a2f9ea78ad547be910529881 upstream.
Ingo reported that cycles:pp didn't work for him on some machines.
It turns out that in this commit:
af4bdcf675cf perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events
Andi forgot to explicitly allow that event when he
disabled event flags for PEBS on those uarchs.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes:
af4bdcf675cf ("perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mike Galbraith [Sat, 18 Jan 2014 16:14:44 +0000 (17:14 +0100)]
sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs
commit
f8e617f4582995f7c25ef25b4167213120ad122b upstream.
To fully take advantage of MWAIT, apparently the CLFLUSH instruction needs
another quirk on certain CPUs: proper barriers around it on certain machines.
On a Q6600 SMP system, pipe-test scheduling performance, cross core,
improves significantly:
3.8.13 487.2 KHz 1.000
3.13.0-master 415.5 KHz .852
3.13.0-master+ 415.2 KHz .852 + restore mwait_idle
3.13.0-master++ 488.5 KHz 1.002 + restore mwait_idle + IPI fix
Since X86_BUG_CLFLUSH_MONITOR is already a quirk, don't create a separate
quirk for the extra smp_mb()s.
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1390061684.5566.4.camel@marge.simpson.net
[ Ported to recent kernel, added comments about the quirk. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Len Brown [Wed, 15 Jan 2014 05:37:34 +0000 (00:37 -0500)]
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
commit
b253149b843f89cd300cbdbea27ce1f847506f99 upstream.
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert
69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Radim Krčmář [Thu, 2 Apr 2015 18:44:23 +0000 (20:44 +0200)]
x86: vdso: fix pvclock races with task migration
commit
80f7fdb1c7f0f9266421f823964fd1962681f6ce upstream.
If we were migrated right after __getcpu, but before reading the
migration_count, we wouldn't notice that we read TSC of a different
VCPU, nor that KVM's bug made pvti invalid, as only migration_count
on source VCPU is increased.
Change vdso instead of updating migration_count on destination.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Fixes:
0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
Message-Id: <
1428000263-11892-1-git-send-email-rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marcelo Tosatti [Mon, 23 Mar 2015 23:21:51 +0000 (20:21 -0300)]
x86: kvm: Revert "remove sched notifier for cross-cpu migrations"
commit
0a4e6be9ca17c54817cf814b4b5aa60478c6df27 upstream.
The following point:
2. per-CPU pvclock time info is updated if the
underlying CPU changes.
Is not true anymore since "KVM: x86: update pvclock area conditionally,
on cpu migration".
Add task migration notification back.
Problem noticed by Andy Lutomirski.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Wed, 28 Jan 2015 00:06:02 +0000 (16:06 -0800)]
x86/asm/decoder: Fix and enforce max instruction size in the insn decoder
commit
91e5ed49fca09c2b83b262b9757d1376ee2b46c3 upstream.
x86 instructions cannot exceed 15 bytes, and the instruction
decoder should enforce that. Prior to
6ba48ff46f76, the
instruction length limit was implicitly set to 16, which was an
approximation of 15, but there is currently no limit at all.
Fix MAX_INSN_SIZE (it should be 15, not 16), and fix the decoder
to reject instructions that exceed MAX_INSN_SIZE.
Other than potentially confusing some of the decoder sanity
checks, I'm not aware of any actual problems that omitting this
check would cause, nor am I aware of any practical problems
caused by the MAX_INSN_SIZE error.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Fixes:
6ba48ff46f76 ("x86: Remove arbitrary instruction size limit ...
Link: http://lkml.kernel.org/r/f8f0bc9b8c58cfd6830f7d88400bf1396cbdcd0f.1422403511.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gu Zheng [Fri, 3 Apr 2015 00:44:47 +0000 (08:44 +0800)]
md: fix md io stats accounting broken
commit
74672d069b298b03e9f657fd70915e055739882e upstream.
Simon reported the md io stats accounting issue:
"
I'm seeing "iostat -x -k 1" print this after a RAID1 rebuild on 4.0-rc5.
It's not abnormal other than it's 3-disk, with one being SSD (sdc) and
the other two being write-mostly:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 345.00 0.00 0.00 0.00 0.00 100.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 58779.00 0.00 0.00 0.00 0.00 100.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.00 0.00 0.00 0.00 0.00 100.00
"
The cause is commit "
18c0b223cf9901727ef3b02da6711ac930b4e5d4" uses the
generic_start_io_acct to account the disk stats rather than the open code,
but it also introduced the increase to .in_flight[rw] which is needless to
md. So we re-use the open code here to fix it.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Amir Vadai [Mon, 27 Apr 2015 10:40:56 +0000 (13:40 +0300)]
net/mlx4_en: Prevent setting invalid RSS hash function
[ Upstream commit
b37069090b7c5615610a8aa6b36533d67b364d38 ]
mlx4_en_check_rxfh_func() was checking for hardware support before
setting a known RSS hash function, but didn't do any check before
setting unknown RSS hash function. Need to make it fail on such values.
In this occasion, moved the actual setting of the new value from the
check function into mlx4_en_set_rxfh().
Fixes:
947cbb0 ("net/mlx4_en: Support for configurable RSS hash function")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Sat, 25 Apr 2015 16:35:24 +0000 (09:35 -0700)]
net: rfs: fix crash in get_rps_cpus()
[ Upstream commit
a31196b07f8034eba6a3487a1ad1bb5ec5cd58a5 ]
Commit
567e4b79731c ("net: rfs: add hash collision detection") had one
mistake :
RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu()
and get_rps_cpu(), as @next_cpu was the result of an AND with
rps_cpu_mask
This bug showed up on a host with 72 cpus :
next_cpu was 0x7f, and the code was trying to access percpu data of an
non existent cpu.
In a follow up patch, we might get rid of compares against nr_cpu_ids,
if we init the tables with 0. This is silly to test for a very unlikely
condition that exists only shortly after table initialization, as
we got rid of rps_reset_sock_flow() and similar functions that were
writing this RPS_NO_CPU magic value at flow dismantle : When table is
old enough, it never contains this value anymore.
Fixes:
567e4b79731c ("net: rfs: add hash collision detection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Khoroshilov [Sat, 25 Apr 2015 01:07:03 +0000 (04:07 +0300)]
pxa168: fix double deallocation of managed resources
[ Upstream commit
0e03fd3e335d272bee88fe733d5fd13f5c5b7140 ]
Commit
43d3ddf87a57 ("net: pxa168_eth: add device tree support") starts
to use managed resources by adding devm_clk_get() and
devm_ioremap_resource(), but it leaves explicit iounmap() and clock_put()
in pxa168_eth_remove() and in failure handling code of pxa168_eth_probe().
As a result double free can happen.
The patch removes explicit resource deallocation. Also it converts
clk_disable() to clk_disable_unprepare() to make it symmetrical with
clk_prepare_enable().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Fri, 24 Apr 2015 23:05:01 +0000 (16:05 -0700)]
net: fix crash in build_skb()
[ Upstream commit
2ea2f62c8bda242433809c7f4e9eae1c52c40bbe ]
When I added pfmemalloc support in build_skb(), I forgot netlink
was using build_skb() with a vmalloc() area.
In this patch I introduce __build_skb() for netlink use,
and build_skb() is a wrapper handling both skb->head_frag and
skb->pfmemalloc
This means netlink no longer has to hack skb->head_frag
[ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
[ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[ 1567.700067] Dumping ftrace buffer:
[ 1567.700067] (ftrace buffer empty)
[ 1567.700067] Modules linked in:
[ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted
4.0.0-next-20150424-sasha-00037-g4796e21 #2167
[ 1567.700067] task:
ffff880127efb000 ti:
ffff880246770000 task.ti:
ffff880246770000
[ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
[ 1567.700067] RSP: 0018:
ffff8802467779d8 EFLAGS:
00010202
[ 1567.700067] RAX:
000041000ed8e000 RBX:
ffffc9008ed8e000 RCX:
000000000000002c
[ 1567.700067] RDX:
0000000000000004 RSI:
0000000000000000 RDI:
ffffffffb3fd6049
[ 1567.700067] RBP:
ffff8802467779f8 R08:
0000000000000019 R09:
ffff8801d0168000
[ 1567.700067] R10:
ffff8801d01680c7 R11:
ffffed003a02d019 R12:
ffffc9000ed8e000
[ 1567.700067] R13:
0000000000000f40 R14:
0000000000001180 R15:
ffffc9000ed8e000
[ 1567.700067] FS:
00007f2a7da3f700(0000) GS:
ffff8801d1000000(0000) knlGS:
0000000000000000
[ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1567.700067] CR2:
0000000000738308 CR3:
000000022e329000 CR4:
00000000000007e0
[ 1567.700067] Stack:
[ 1567.700067]
ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
[ 1567.700067]
ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
[ 1567.700067]
ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
[ 1567.700067] Call Trace:
[ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
[ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
[ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
[ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
[ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
[ 1567.774369] sock_write_iter (net/socket.c:823)
[ 1567.774369] ? sock_sendmsg (net/socket.c:806)
[ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
[ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
[ 1567.774369] ? default_llseek (fs/read_write.c:487)
[ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
[ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
[ 1567.774369] vfs_write (fs/read_write.c:539)
[ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
[ 1567.774369] ? SyS_read (fs/read_write.c:577)
[ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
[ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
[ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
Fixes:
79930f5892e ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Wed, 22 Apr 2015 14:33:36 +0000 (07:33 -0700)]
net: do not deplete pfmemalloc reserve
[ Upstream commit
79930f5892e134c6da1254389577fffb8bd72c66 ]
build_skb() should look at the page pfmemalloc status.
If set, this means page allocator allocated this page in the
expectation it would help to free other pages. Networking
stack can do that only if skb->pfmemalloc is also set.
Also, we must refrain using high order pages from the pfmemalloc
reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
them. Under memory pressure, using order-0 pages is probably the best
strategy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Thu, 23 Apr 2015 17:42:39 +0000 (10:42 -0700)]
tcp: avoid looping in tcp_send_fin()
[ Upstream commit
845704a535e9b3c76448f52af1b70e4422ea03fd ]
Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.
Lets try a different strategy :
In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.
By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.
In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Wed, 22 Apr 2015 01:32:24 +0000 (18:32 -0700)]
tcp: fix possible deadlock in tcp_send_fin()
[ Upstream commit
d83769a580f1132ac26439f50068a29b02be535e ]
Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
case a huge process is killed by OOM, and tcp_mem[2] is hit.
To be able to free memory we need to make progress, so this
patch allows FIN packets to not care about tcp_mem[2], if
skb allocation succeeded.
In a follow-up patch, we might abort tcp_send_fin() infinite loop
in case TIF_MEMDIE is set on this thread, as memory allocator
did its best getting extra memory already.
This patch reverts
d22e15371811 ("tcp: fix tcp fin memory accounting")
Fixes:
d22e15371811 ("tcp: fix tcp fin memory accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tom Herbert [Mon, 20 Apr 2015 21:10:05 +0000 (14:10 -0700)]
ppp: call skb_checksum_complete_unset in ppp_receive_frame
[ Upstream commit
3dfb05340ec6676e6fc71a9ae87bbbe66d3c2998 ]
Call checksum_complete_unset in PPP receive to discard checksum-complete
value. PPP does not pull checksum for headers and also modifies packet
as in VJ compression.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tom Herbert [Mon, 20 Apr 2015 21:10:04 +0000 (14:10 -0700)]
net: add skb_checksum_complete_unset
[ Upstream commit
4e18b9adf2f910ec4d30b811a74a5b626e6c6125 ]
This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE
is set. This is called to discard checksum-complete when packet
is being modified and checksum is not pulled for headers in a layer.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sebastian Pöhn [Mon, 20 Apr 2015 07:19:20 +0000 (09:19 +0200)]
ip_forward: Drop frames with attached skb->sk
[ Upstream commit
2ab957492d13bb819400ac29ae55911d50a82a13 ]
Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets
Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.
This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.
v2:
Remove useless comment
Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Wed, 29 Apr 2015 08:22:30 +0000 (10:22 +0200)]
Linux 4.0.1
Jann Horn [Sun, 19 Apr 2015 00:48:39 +0000 (02:48 +0200)]
fs: take i_mutex during prepare_binprm for set[ug]id executables
commit
8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.
This patch was mostly written by Linus Torvalds.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Herbert Xu [Thu, 16 Apr 2015 01:03:27 +0000 (09:03 +0800)]
skbuff: Do not scrub skb mark within the same name space
[ Upstream commit
213dd74aee765d4e5f3f4b9607fef0cf97faa2af ]
On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
> Le 15/04/2015 15:57, Herbert Xu a écrit :
> >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
> [snip]
> >Subject: skbuff: Do not scrub skb mark within the same name space
> >
> >The commit
ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
> Maybe add a Fixes tag?
> Fixes:
ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
>
> >harmonize cleanup done on skb on rx path") broke anyone trying to
> >use netfilter marking across IPv4 tunnels. While most of the
> >fields that are cleared by skb_scrub_packet don't matter, the
> >netfilter mark must be preserved.
> >
> >This patch rearranges skb_scurb_packet to preserve the mark field.
> nit: s/scurb/scrub
>
> Else it's fine for me.
Sure.
PS I used the wrong email for James the first time around. So
let me repeat the question here. Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?
---8<---
The commit
ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
harmonize cleanup done on skb on rx path") broke anyone trying to
use netfilter marking across IPv4 tunnels. While most of the
fields that are cleared by skb_scrub_packet don't matter, the
netfilter mark must be preserved.
This patch rearranges skb_scrub_packet to preserve the mark field.
Fixes:
ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Herbert Xu [Thu, 16 Apr 2015 08:12:53 +0000 (16:12 +0800)]
Revert "net: Reset secmark when scrubbing packet"
[ Upstream commit
4c0ee414e877b899f7fc80aafb98d9425c02797f ]
This patch reverts commit
b8fb4e0648a2ab3734140342002f68fb0c7d1602
because the secmark must be preserved even when a packet crosses
namespace boundaries. The reason is that security labels apply to
the system as a whole and is not per-namespace.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexei Starovoitov [Tue, 14 Apr 2015 22:57:13 +0000 (15:57 -0700)]
bpf: fix verifier memory corruption
[ Upstream commit
c3de6317d748e23b9e46ba36e10483728d00d144 ]
Due to missing bounds check the DAG pass of the BPF verifier can corrupt
the memory which can cause random crashes during program loading:
[8.449451] BUG: unable to handle kernel paging request at
ffffffffffffffff
[8.451293] IP: [<
ffffffff811de33d>] kmem_cache_alloc_trace+0x8d/0x2f0
[8.452329] Oops: 0000 [#1] SMP
[8.452329] Call Trace:
[8.452329] [<
ffffffff8116cc82>] bpf_check+0x852/0x2000
[8.452329] [<
ffffffff8116b7e4>] bpf_prog_load+0x1e4/0x310
[8.452329] [<
ffffffff811b190f>] ? might_fault+0x5f/0xb0
[8.452329] [<
ffffffff8116c206>] SyS_bpf+0x806/0xa30
Fixes:
f1bca824dabb ("bpf: add search pruning optimization to verifier")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Wed, 15 Apr 2015 01:45:00 +0000 (18:45 -0700)]
bnx2x: Fix busy_poll vs netpoll
[ Upstream commit
074975d0374333f656c48487aa046a21a9b9d7a1 ]
Commit
9a2620c877454 ("bnx2x: prevent WARN during driver unload")
switched the napi/busy_lock locking mechanism from spin_lock() into
spin_lock_bh(), breaking inter-operability with netconsole, as netpoll
disables interrupts prior to calling our napi mechanism.
This switches the driver into using atomic assignments instead of the
spinlock mechanisms previously employed.
Based on initial patch from Yuval Mintz & Ariel Elior
I basically added softirq starvation avoidance, and mixture
of atomic operations, plain writes and barriers.
Note this slightly reduces the overhead for this driver when no
busy_poll sockets are in use.
Fixes:
9a2620c877454 ("bnx2x: prevent WARN during driver unload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Thu, 9 Apr 2015 20:31:56 +0000 (13:31 -0700)]
tcp: tcp_make_synack() should clear skb->tstamp
[ Upstream commit
b50edd7812852d989f2ef09dcfc729690f54a42d ]
I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.
11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S
945476042:
945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>
20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S
3160535375:
3160535375(0) ack
945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>
This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.
Fixes:
7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jesse Gross [Thu, 9 Apr 2015 18:19:14 +0000 (11:19 -0700)]
udptunnels: Call handle_offloads after inserting vlan tag.
[ Upstream commit
b736a623bd099cdf5521ca9bd03559f3bc7fa31c ]
handle_offloads() calls skb_reset_inner_headers() to store
the layer pointers to the encapsulated packet. However, we
currently push the vlag tag (if there is one) onto the packet
afterwards. This changes the MAC header for the encapsulated
packet but it is not reflected in skb->inner_mac_header, which
breaks GSO and drivers which attempt to use this for encapsulation
offloads.
Fixes:
1eaa8178 ("vxlan: Add tx-vlan offload support.")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sun, 12 Apr 2015 22:12:50 +0000 (15:12 -0700)]
Linux 4.0
Linus Torvalds [Sun, 12 Apr 2015 17:56:12 +0000 (10:56 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs and fs fixes from Al Viro:
"Several AIO and OCFS2 fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ocfs2: _really_ sync the right range
ocfs2_file_write_iter: keep return value and current position update in sync
[regression] ocfs2: do *not* increment ->ki_pos twice
ioctx_alloc(): fix vma (and file) leak on failure
fix mremap() vs. ioctx_kill() race
Linus Torvalds [Sun, 12 Apr 2015 17:43:30 +0000 (10:43 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull last minute thermal-SoC management fixes from Eduardo Valentin:
"Specifics:
- Minor fixes on ST and RCAR thermal drivers.
- Avoid flooding kernel log when driver returns -EAGAIN.
Note: I am sending this pull on Rui's behalf while he fixes issues in
his Linux box"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
drivers: thermal: st: remove several sparse warnings
thermal: constify of_device_id array
thermal: Do not log an error if thermal_zone_get_temp returns -EAGAIN
thermal: rcar: Fix typo in r8a73a4 SoC name
Linus Torvalds [Sat, 11 Apr 2015 22:57:36 +0000 (15:57 -0700)]
Merge tag 'asoc-fix-v4.0-rc7' of git://git./linux/kernel/git/broonie/sound
Pull last-minute ASoC fix from Mark Brown:
"This patch backs out a change that came in during the merge window
which selects a configuration for GPIO4 on pcm512x CODECs that may not
be suitable for all systems using the device. Changes for v4.1 will
make this properly configurable but for now it's safest to revert to
the v3.19 behaviour and leave the pin configuration alone.
Sorry for sending this direct at the last minute but due to the GPIO
misuse it'd be really good to get it in the release and I'd not
realised it hadn't been sent yet - between some travel, a job change
and other non-urgent fixes coming in I'd lost track of the urgency.
It's been in -next for several weeks now, is isolated to the driver
and fairly clear to inspection"
* tag 'asoc-fix-v4.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound:
ASoC: pcm512x: Remove hardcoding of pll-lock to GPIO4
Howard Mitchell [Mon, 23 Mar 2015 21:17:01 +0000 (21:17 +0000)]
ASoC: pcm512x: Remove hardcoding of pll-lock to GPIO4
Currently GPIO4 is hardcoded to output the pll-lock signal.
Unfortunately this is after the pll-out GPIO is configured which
is selectable in the device tree. Therefore it is not possible to
use GPIO4 for pll-out. Therefore this patch removes the
configuration of GPIO4.
Signed-off-by: Howard Mitchell <hm@hmbedded.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Linus Torvalds [Sat, 11 Apr 2015 20:46:07 +0000 (13:46 -0700)]
Revert "dmaengine: Add a warning for drivers not using the generic slave caps retrieval"
This reverts commit
ecc19d17868be9c9f8f00ed928791533c420f3e0.
It added a new warning to try to encourage driver writers to set the
device capabities properly, but drivers haven't been updated and in the
meantime it just generaters a scary message that users cannot actually
do anything about.
Warnings like these are appropriate if you actually expect to fix the
code that causes them. They are not appropriate for releases.
Requested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 9 Apr 2015 21:12:22 +0000 (14:12 -0700)]
blk-mq: initialize 'struct request' and associated data to zero
Jan Engelhardt reports a strange oops with an invalid ->sense_buffer
pointer in scsi_init_cmd_errh() with the blk-mq code.
The sense_buffer pointer should have been initialized by the call to
scsi_init_request() from blk_mq_init_rq_map(), but there seems to be
some non-repeatable memory corruptor.
This patch makes sure we initialize the whole struct request allocation
(and the associated 'struct scsi_cmnd' for the SCSI case) to zero, by
using __GFP_ZERO in the allocation. The old code initialized a couple
of individual fields, leaving the rest undefined (although many of them
are then initialized in later phases, like blk_mq_rq_ctx_init() etc.
It's not entirely clear why this matters, but it's the rigth thing to do
regardless, and with 4.0 imminent this is the defensive "let's just make
sure everything is initialized properly" patch.
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 11 Apr 2015 17:52:13 +0000 (10:52 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fix from Vinod Koul:
"I have one more fix to fix the boot warning on cppi driver due to
missing capabilities"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: cppi41: add missing bitfields
Linus Torvalds [Sat, 11 Apr 2015 17:47:17 +0000 (10:47 -0700)]
Merge tag 'for-linus-4.0-1' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull late ipmi fixes from Corey Minyard:
"Some annoying issues in the IPMI driver that would be good to have
fixed before 4.0 is released.
These got reported or discovered late, but they will avoid some
situations that would cause lots of log spam and in one case a
deadlock"
* tag 'for-linus-4.0-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi_ssif: Use interruptible completion for waiting in the thread
ipmi/powernv: Fix minor locking bug
ipmi: Handle BMCs that don't allow clearing the rcv irq bit
Felipe Balbi [Wed, 8 Apr 2015 16:45:42 +0000 (11:45 -0500)]
dmaengine: cppi41: add missing bitfields
Add missing directions, residue_granularity,
srd_addr_widths and dst_addr_widths bitfields.
Without those we will see a kernel WARN()
when loading musb on am335x devices.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Corey Minyard [Sat, 4 Apr 2015 06:54:26 +0000 (01:54 -0500)]
ipmi_ssif: Use interruptible completion for waiting in the thread
The code was using an normal completion, but that caused stuck
task errors after a while. Use an interruptible one to avoid that.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Alistair Popple [Fri, 10 Apr 2015 07:32:20 +0000 (17:32 +1000)]
ipmi/powernv: Fix minor locking bug
If ipmi_powernv_recv(...) is called without a current message it
prints a warning and returns. However it fails to release the message
lock causing the system to dead lock during any subsequent IPMI
operations.
This error path should never normally be taken unless there are bugs
elsewhere in the system.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Corey Minyard [Fri, 3 Apr 2015 17:13:48 +0000 (12:13 -0500)]
ipmi: Handle BMCs that don't allow clearing the rcv irq bit
Some BMCs don't let you clear the receive irq bit in the global
enables. This is kind of silly, but they give an error if you
try to clear it. Compensate for this by detecting the situation
and working around it.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Thomas D <whissi@whissi.de>
Reviewed-by: Thomas D <whissi@whissi.de>
Linus Torvalds [Sat, 11 Apr 2015 00:41:47 +0000 (17:41 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is our remaining set of three fixes for 4.0: two oops fixes(one
for cable pulls triggering oopses and the other be2iscsi specific) and
one warn on in sysfs on multipath devices using enclosures"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
Defer processing of REQ_PREEMPT requests for blocked devices
be2iscsi: Fix kernel panic when device initialization fails
enclosure: fix WARN_ON removing an adapter in multi-path devices
Linus Torvalds [Fri, 10 Apr 2015 23:56:40 +0000 (16:56 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"Just a few small fixes:
Two from Andy, the first addresses a v4.0 target specific regression
to a user visible configfs attribute, and the second adds a set of
missing brackets around IPv6 discovery portal information within
iscsi-target.
And one from Mike that fixes an OOPs regression in traditional
iscsi-target when an iovec allocation fails, that has been present
since v3.10.y code. (CC'd to stable)"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi target: fix oops when adding reject pdu
iscsi-target: TargetAddress in SendTargets should bracket ipv6 addresses
target: Allow userspace to write 1 to attrib/emulate_fua_write
Mike Christie [Fri, 10 Apr 2015 07:47:27 +0000 (02:47 -0500)]
iscsi target: fix oops when adding reject pdu
This fixes a oops due to a double list add when adding a reject PDU for
iscsit_allocate_iovecs allocation failures. The cmd has already been
added to the conn_cmd_list in iscsit_setup_scsi_cmd, so this has us call
iscsit_reject_cmd.
Note that for ERL0 the reject PDU is not actually sent, so this patch
is not completely tested. Just verified we do not oops. The problem is the
add reject functions return -1 which is returned all the way up to
iscsi_target_rx_thread which for ERL0 will drop the connection.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Linus Torvalds [Fri, 10 Apr 2015 18:16:54 +0000 (11:16 -0700)]
Merge tag 'sound-4.0' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are fixes gathered for 4.0-final; one FireFire endian fix, two
USB-audio quirks, and three HD-audio quirks.
All relatively small and device-specific fixes, should be pretty safe
to apply"
* tag 'sound-4.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support
ALSA: hda - Fix headphone pin config for Lifebook T731
ALSA: bebob: fix to processing in big-endian machine for sending cue
ALSA: hda/realtek - Make more stable to get pin sense for ALC283
ALSA: usb-audio: don't try to get Benchmark DAC1 sample rate
ALSA: hda/realtek - Support Dell headset mode for ALC256
Linus Torvalds [Fri, 10 Apr 2015 17:51:34 +0000 (10:51 -0700)]
Merge tag 'nios2-fixes-v4.0-final' of git://git.rocketboards.org/linux-socfpga-next
Pull arch/nios2 fixes from Ley Foon Tan:
"There are 3 arch/nios2 fixes for 4.0 final:
- fix cache coherency issue when debugging with gdb
- move restart_block to struct task_struct (aligned with other
architectures)
- fix for missing registers defines for ptrace"
* tag 'nios2-fixes-v4.0-final' of git://git.rocketboards.org/linux-socfpga-next:
nios2: fix cache coherency issue when debug with gdb
nios2: add missing ptrace registers defines
nios2: signal: Move restart_block to struct task_struct
Ley Foon Tan [Fri, 10 Apr 2015 03:10:08 +0000 (11:10 +0800)]
nios2: fix cache coherency issue when debug with gdb
Remove the end address checking for flushda function. We need to flush
each address line for flushda instruction, from start to end address.
This is because flushda instruction only flush the cache if tag and line
fields are matched.
Change to use ldwio instruction (bypass cache) to load the instruction
that causing trap. Our interest is the actual instruction that executed
by the processor, this should be uncached.
Note, EA address might be an userspace cached address.
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Linus Torvalds [Fri, 10 Apr 2015 00:44:27 +0000 (17:44 -0700)]
Merge tag 'pm+acpi-4.0-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are stable-candidate fixes of some recently reported issues in
the cpufreq core, cpuidle core, the ACPI cpuidle driver and the
hibernate core.
Specifics:
- Revert a 3.17 hibernate commit that was supposed to fix an issue
related to e820 reserved regions, but broke resume from hibernation
on Lenovo x230 (Rafael J Wysocki).
- Prevent the ACPI cpuidle driver from overwriting the name and
description of the C0 state set by the core when the list of
C-states changes (Thomas Schlichter).
- Remove the no longer needed state_count field from struct
cpuidle_device which prevents the list of C-states shown by the
sysfs interface from becoming incorrect when the current number of
them is different from the number of C-states on boot (Bartlomiej
Zolnierkiewicz).
- The cpufreq core updates the policy object of the only online CPU
during system resume to make it reflect the current hardware state,
but it always assumes that CPU to be CPU0 which need not be the
case, so fix the code to avoid that assumption (Viresh Kumar)"
* tag 'pm+acpi-4.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"
cpuidle: ACPI: do not overwrite name and description of C0
cpuidle: remove state_count field from struct cpuidle_device
cpufreq: Schedule work for the first-online CPU on resume
Rafael J. Wysocki [Thu, 9 Apr 2015 21:25:23 +0000 (23:25 +0200)]
Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
* pm-sleep:
Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"
* pm-cpufreq:
cpufreq: Schedule work for the first-online CPU on resume
* pm-cpuidle:
cpuidle: ACPI: do not overwrite name and description of C0
cpuidle: remove state_count field from struct cpuidle_device
Linus Torvalds [Thu, 9 Apr 2015 17:17:44 +0000 (10:17 -0700)]
Merge tag 'pci-v4.0-fixes-3' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Here are some fixes for v4.0. I apologize for how late they are. We
were hoping for some better fixes, but couldn't get them polished in
time. These fix:
- a Xen domU oops with PCI passthrough devices
- a sparc T5 boot failure
- a STM SPEAr13xx crash (use after initdata freed)
- a cpcihp hotplug driver thinko
- an AER thinko that printed stack junk
Details:
Enumeration
- Don't look for ACPI hotplug parameters if ACPI is disabled (Bjorn Helgaas)
Resource management
- Revert "sparc/PCI: Clip bridge windows to fit in upstream windows" (Bjorn Helgaas)
AER
- Avoid info leak in __print_tlp_header() (Rasmus Villemoes)
PCI device hotplug
- Add missing curly braces in cpci_configure_slot() (Dan Carpenter)
ST Microelectronics SPEAr13xx host bridge driver
- Drop __initdata from spear13xx_pcie_driver (Matwey V. Kornilov)
* tag 'pci-v4.0-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "sparc/PCI: Clip bridge windows to fit in upstream windows"
PCI: Don't look for ACPI hotplug parameters if ACPI is disabled
PCI: cpcihp: Add missing curly braces in cpci_configure_slot()
PCI/AER: Avoid info leak in __print_tlp_header()
PCI: spear: Drop __initdata from spear13xx_pcie_driver
Dmitry M. Fedin [Thu, 9 Apr 2015 14:37:03 +0000 (17:37 +0300)]
ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support
Adds an entry for Creative USB X-Fi to the rc_config array in
mixer_quirks.c to allow use of volume knob on the device.
Adds support for newer X-Fi Pro card, known as "Model No. SB1095"
with USB ID "041e:3237"
Signed-off-by: Dmitry M. Fedin <dmitry.fedin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Al Viro [Wed, 8 Apr 2015 21:00:32 +0000 (17:00 -0400)]
ocfs2: _really_ sync the right range
"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.
Spotted by Joseph Qi <joseph.qi@huawei.com> back in January;
unfortunately, I'd missed his mail back then ;-/
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Ley Foon Tan [Thu, 9 Apr 2015 10:28:05 +0000 (18:28 +0800)]
nios2: add missing ptrace registers defines
These are all register available in nios2.
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Linus Torvalds [Wed, 8 Apr 2015 22:12:25 +0000 (15:12 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Final drm fixes: one core locking imbalance regression, and a bunch of
i915 baytrail s/r fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: fix drm_mode_getconnector() locking imbalance regression
drm/i915/vlv: remove wait for previous GFX clk disable request
drm/i915/chv: Remove Wait for a previous gfx force-off
drm/i915/vlv: save/restore the power context base reg
Linus Torvalds [Wed, 8 Apr 2015 21:51:56 +0000 (14:51 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull ceph revert from Sage Weil:
"This corrects a recent misadventure with __GFP_MEMALLOC and
PF_MEMALLOC; it turns out it's not a good fit for RBD and we're better
off relying on dirty page throttling"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
Revert "libceph: use memalloc flags for net IO"
Linus Torvalds [Wed, 8 Apr 2015 21:42:49 +0000 (14:42 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Three fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: numa: disable change protection for vma(VM_HUGETLB)
include/linux/dmapool.h: declare struct device
mm: move zone lock to a different cache line than order-0 free page lists
Linus Torvalds [Tue, 7 Apr 2015 17:33:49 +0000 (10:33 -0700)]
Copy the kernel module data from user space in chunks
Unlike most (all?) other copies from user space, kernel module loading
is almost unlimited in size. So we do a potentially huge
"copy_from_user()" when we copy the module data from user space to the
kernel buffer, which can be a latency concern when preemption is
disabled (or voluntary).
Also, because 'copy_from_user()' clears the tail of the kernel buffer on
failures, even a *failed* copy can end up wasting a lot of time.
Normally neither of these are concerns in real life, but they do trigger
when doing stress-testing with trinity. Running in a VM seems to add
its own overheadm causing trinity module load testing to even trigger
the watchdog.
The simple fix is to just chunk up the module loading, so that it never
tries to copy insanely big areas in one go. That bounds the latency,
and also the amount of (unnecessarily, in this case) cleared memory for
the failure case.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 6 Apr 2015 17:26:17 +0000 (10:26 -0700)]
x86: clean up/fix 'copy_in_user()' tail zeroing
The rule for 'copy_from_user()' is that it zeroes the remaining kernel
buffer even when the copy fails halfway, just to make sure that we don't
leave uninitialized kernel memory around. Because even if we check for
errors, some kernel buffers stay around after thge copy (think page
cache).
However, the x86-64 logic for user copies uses a copy_user_generic()
function for all the cases, that set the "zerorest" flag for any fault
on the source buffer. Which meant that it didn't just try to clear the
kernel buffer after a failure in copy_from_user(), it also tried to
clear the destination user buffer for the "copy_in_user()" case.
Not only is that pointless, it also means that the clearing code has to
worry about the tail clearing taking page faults for the user buffer
case. Which is just stupid, since that case shouldn't happen in the
first place.
Get rid of the whole "zerorest" thing entirely, and instead just check
if the destination is in kernel space or not. And then just use
memset() to clear the tail of the kernel buffer if necessary.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Wed, 8 Apr 2015 20:59:50 +0000 (06:59 +1000)]
Merge tag 'drm-intel-fixes-2015-04-08' of git://anongit.freedesktop.org/drm-intel into drm-fixes
three commits, all cc: stable, to address Baytrail
suspend/resume issues.
* tag 'drm-intel-fixes-2015-04-08' of git://anongit.freedesktop.org/drm-intel:
drm/i915/vlv: remove wait for previous GFX clk disable request
drm/i915/chv: Remove Wait for a previous gfx force-off
drm/i915/vlv: save/restore the power context base reg
Al Viro [Wed, 8 Apr 2015 19:45:02 +0000 (15:45 -0400)]
ocfs2_file_write_iter: keep return value and current position update in sync
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 8 Apr 2015 19:41:17 +0000 (15:41 -0400)]
[regression] ocfs2: do *not* increment ->ki_pos twice
generic_file_direct_write() already does that. Broken by
"ocfs2: do not fallback to buffer I/O write if appending"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Takashi Iwai [Wed, 8 Apr 2015 18:47:55 +0000 (20:47 +0200)]
ALSA: hda - Fix headphone pin config for Lifebook T731
Some BIOS version of Fujitsu Lifebook T731 seems to set up the
headphone pin (0x21) without the assoc number 0x0f while it's set only
to the output on the docking port (0x1a). With the recent commit
[
03ad6a8c93b6: ALSA: hda - Fix "PCM" name being used on one DAC when
there are two DACs], this resulted in the weird mixer element
mapping where the headphone on the laptop is assigned as a shared
volume with the speaker and the docking port is assigned as an
individual headphone.
This patch improves the situation by correcting the headphone pin
config to the more appropriate value.
Reported-and-tested-by: Taylor Smock <smocktaylor@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Bart Van Assche [Wed, 4 Mar 2015 09:31:47 +0000 (10:31 +0100)]
Defer processing of REQ_PREEMPT requests for blocked devices
SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:
BUG: unable to handle kernel paging request at
ffffc9001a6bc084
IP: [<
ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
Process rescan-scsi-bus (pid: 9241, threadinfo
ffff88053484a000, task
ffff880534aae100)
Call Trace:
[<
ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
[<
ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
[<
ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
[<
ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
[<
ffffffff81223b37>] __blk_run_queue+0x27/0x30
[<
ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
[<
ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
[<
ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
[<
ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
[<
ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
[<
ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
[<
ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
[<
ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
[<
ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
[<
ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
[<
ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
[<
ffffffff811589de>] vfs_write+0xce/0x140
[<
ffffffff81158b53>] sys_write+0x53/0xa0
[<
ffffffff81464592>] system_call_fastpath+0x16/0x1b
[<
00007f611c9d9300>] 0x7f611c9d92ff
Reported-by: Max Gurtuvoy <maxg@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
John Soni Jose [Thu, 12 Feb 2015 01:15:47 +0000 (06:45 +0530)]
be2iscsi: Fix kernel panic when device initialization fails
Kernel panic was happening as iscsi_host_remove() was called on
a host which was not yet added.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Takashi Sakamoto [Wed, 8 Apr 2015 16:15:03 +0000 (01:15 +0900)]
ALSA: bebob: fix to processing in big-endian machine for sending cue
Some M-Audio devices require to receive bootup command just after
powering on, while codes in BeBoB driver doesn't work properly in
big-endian machine because the command should be aligned by
little-endian.
This commit fixes this bug. This fix should go to stable kernel.
Cc: Takayuki Shiroma <t.shiroma.oki@gmail.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Bjorn Helgaas [Wed, 8 Apr 2015 15:04:55 +0000 (10:04 -0500)]
Revert "sparc/PCI: Clip bridge windows to fit in upstream windows"
This reverts commit
d63e2e1f3df904bf6bd150bdafb42ddbb3257ea8.
David Ahern reported that
d63e2e1f3df9 breaks booting on an 8-socket T5
sparc system. He also verified that the system boots with
d63e2e1f3df9
reverted. Yinghai has some fixes, but they need a little more polishing
than we can do before v4.0.
Link: http://lkml.kernel.org/r/5514391F.2030300@oracle.com
Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.19+
Bjorn Helgaas [Tue, 24 Mar 2015 16:12:45 +0000 (11:12 -0500)]
PCI: Don't look for ACPI hotplug parameters if ACPI is disabled
Booting a v3.18 or newer Xen domU kernel with PCI devices passed through
results in an oops (this is a 32-bit 3.13.11 dom0 with a 64-bit 4.4.0
hypervisor and 32-bit domU):
BUG: unable to handle kernel paging request at
0030303e
IP: [<
c06ed0e6>] acpi_ns_validate_handle+0x12/0x1a
Call Trace:
[<
c06eda4d>] ? acpi_evaluate_object+0x31/0x1fc
[<
c06b78e1>] ? pci_get_hp_params+0x111/0x4e0
[<
c0407bc7>] ? xen_force_evtchn_callback+0x17/0x30
[<
c04085fb>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<
c0699d34>] ? pci_device_add+0x24/0x450
Don't look for ACPI configuration information if ACPI has been disabled.
I don't think this is the best fix, because we can boot plain Linux (no
Xen) with "acpi=off", and we don't need this check in pci_get_hp_params().
There should be a better fix that would make Xen domU work the same way.
The domU kernel has ACPI support but it has no AML. There should be a way
to initialize the ACPI data structures so things fail gracefully rather
than oopsing. This is an interim fix to address the regression.
Fixes:
6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96301
Reported-by: Michael D Labriola <mlabriol@gdeb.com>
Tested-by: Michael D Labriola <mlabriol@gdeb.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.18+
Kailang Yang [Wed, 8 Apr 2015 08:34:00 +0000 (16:34 +0800)]
ALSA: hda/realtek - Make more stable to get pin sense for ALC283
Pin sense will active when power pin is wake up.
Power pin will not wake up immediately during resume state.
Add some delay to wait for power pin activated.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Ley Foon Tan [Wed, 8 Apr 2015 05:44:18 +0000 (13:44 +0800)]
nios2: signal: Move restart_block to struct task_struct
See https://lkml.org/lkml/2014/10/29/643 and commit
f56141e3e2d9
("all arches, signal: move restart_block to struct task_struct")
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Tommi Rantala [Fri, 3 Apr 2015 07:45:29 +0000 (10:45 +0300)]
drm: fix drm_mode_getconnector() locking imbalance regression
Regression in commit
2caa80e72b57c6216aec6f6a11fcfb4fec46daa0
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sun Feb 22 11:38:36 2015 +0100
drm: Fix deadlock due to getconnector locking changes
If the drm_connector_find() call returns NULL, we should no longer
call drm_modeset_unlock() to avoid locking imbalance.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Andy Grover [Tue, 31 Mar 2015 17:43:18 +0000 (10:43 -0700)]
iscsi-target: TargetAddress in SendTargets should bracket ipv6 addresses
"The domainname can be specified as either a DNS host name, a
dotted-decimal IPv4 address, or a bracketed IPv6 address as specified
in [RFC2732]."
See https://bugzilla.redhat.com/show_bug.cgi?id=
1206868
Reported-by: Kyle Brantley <kyle@averageurl.com>
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Linus Torvalds [Wed, 8 Apr 2015 00:38:31 +0000 (17:38 -0700)]
Merge tag 'media/v3.20-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
"A series of fixup patches for version 4.0:
- one VB2 core fixup, when stopping the stream;
- one VB2 core fixup for dma-contig memory type;
- driver fixes at rtl28xx, s5p (tv, jpeg, mfc, soc-camera, sh_veu,
cx23885, gspca"
* tag 'media/v3.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] rtl28xxu: return success for unimplemented FE callback
[media] rtl2832: disable regmap register cache
[media] vb2: Fix dma_dir setting for dma-contig mem type
[media] media: s5p-mfc: fix broken pointer cast on 64bit arch
[media] media: s5p-mfc: fix mmap support for 64bit arch
[media] cx23885: fix querycap
[media] sh_veu: v4l2_dev wasn't set
[media] s5p-mfc: Fix NULL pointer dereference caused by not set q->lock
[media] s5p-jpeg: exynos3250: fix erroneous reset procedure
[media] s5p-tv: hdmi needs I2C support
[media] s5p-jpeg: Initialize cb and cr to zero
[media] media: fix gspca drivers build dependencies
[media] soc-camera: Fix devm_kfree() in soc_of_bind()
[media] media: atmel-isi: increase the burst length to improve the performance
[media] vb2: fix 'UNBALANCED' warnings when calling vb2_thread_stop()
Naoya Horiguchi [Tue, 7 Apr 2015 21:26:47 +0000 (14:26 -0700)]
mm: numa: disable change protection for vma(VM_HUGETLB)
Currently when a process accesses a hugetlb range protected with
PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb
subsystem into a broken/uncontrollable state, where for example
h->resv_huge_pages is subtracted too much and wraps around to a very
large number, and the free hugepage pool is no longer maintainable.
This patch simply stops changing protection for vma(VM_HUGETLB) to fix
the problem. And this also allows us to avoid useless overhead of minor
faults.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Tue, 7 Apr 2015 21:26:44 +0000 (14:26 -0700)]
include/linux/dmapool.h: declare struct device
dmapool uses struct device in function arguments but relies on an
implicit inclusion to declare struct device causing warnings in some
configurations:
include/linux/dmapool.h:31:7: warning: 'struct device' declared inside parameter list
Fix this by adding a struct device declaration to the file.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 7 Apr 2015 21:26:41 +0000 (14:26 -0700)]
mm: move zone lock to a different cache line than order-0 free page lists
Huang Ying reported the following problem due to commit
3484b2de9499 ("mm:
rearrange zone fields into read-only, page alloc, statistics and page
reclaim lines") from the Intel performance tests
24b7e5819ad5cbef 3484b2de9499df23c4604a513b
---------------- --------------------------
%stddev %change %stddev
\ | \
152288 \261 0% -46.2% 81911 \261 0% aim7.jobs-per-min
237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time
237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time.max
25026 \261 0% +70.7% 42712 \261 0% aim7.time.system_time
2186645 \261 5% +32.0%
2885949 \261 4% aim7.time.voluntary_context_switches
4576561 \261 1% +24.9%
5715773 \261 0% aim7.time.involuntary_context_switches
The problem is specific to very large machines under stress. It was not
reproducible with the machines I had used to justify the original patch
because large numbers of CPUs are required. When pressure is high enough,
the cache line is bouncing between CPUs trying to acquire the lock and the
holder of the lock adjusting free lists. The intention was that the
acquirer of the lock would automatically have the cache line holding the
free lists but according to Huang, this is not a universal win.
One possibility is to move the zone lock to its own cache line but it
increases the size of the zone. This patch moves the lock to the other
end of the free lists where they do not contend under high pressure. It
does mean the page allocator paths now require more cache lines but Huang
reports that it restores performance to previous levels on large machines
%stddev %change %stddev
\ | \
84568 \261 1% +94.3% 164280 \261 1% aim7.jobs-per-min
2881944 \261 2% -35.1%
1870386 \261 8% aim7.time.voluntary_context_switches
681 \261 1% -3.4% 658 \261 0% aim7.time.user_time
5538139 \261 0% -12.1%
4867884 \261 0% aim7.time.involuntary_context_switches
44174 \261 1% -46.0% 23848 \261 1% aim7.time.system_time
426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time
426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time.max
468 \261 1% -43.1% 266 \261 2% uptime.boot
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Huang Ying <ying.huang@intel.com>
Tested-by: Huang Ying <ying.huang@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eduardo Valentin [Tue, 7 Apr 2015 20:42:12 +0000 (13:42 -0700)]
drivers: thermal: st: remove several sparse warnings
Simple patch to make symbols static. Symbols that are not
shared with other parts of the kernel can be made static.
This change also removes several sparse complains.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Ajit Pal Singh <ajitpal.singh@st.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Fabian Frederick [Mon, 16 Mar 2015 19:17:09 +0000 (20:17 +0100)]
thermal: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Hans de Goede [Sat, 21 Mar 2015 14:02:55 +0000 (15:02 +0100)]
thermal: Do not log an error if thermal_zone_get_temp returns -EAGAIN
Some temperature sensors only get updated every few seconds and while
waiting for the first irq reporting a (new) temperature to happen there
get_temp operand will return -EAGAIN as it does not have any data to report
yet.
Not logging an error in this case avoids messages like these from showing
up in dmesg on affected systems:
[ 1.219353] thermal thermal_zone0: failed to read out thermal zone 0
[ 2.015433] thermal thermal_zone0: failed to read out thermal zone 0
[ 2.416737] thermal thermal_zone0: failed to read out thermal zone 0
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Geert Uytterhoeven [Wed, 18 Mar 2015 18:42:41 +0000 (19:42 +0100)]
thermal: rcar: Fix typo in r8a73a4 SoC name
r8a73a4 is R-Mobile APE6, not AP6.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Ilya Dryomov [Thu, 2 Apr 2015 11:40:58 +0000 (14:40 +0300)]
Revert "libceph: use memalloc flags for net IO"
This reverts commit
89baaa570ab0b476db09408d209578cfed700e9f.
Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support. (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.) See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.
On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.
[1] "SOCK_MEMALLOC vs loopback"
http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
http://www.spinics.net/lists/ceph-devel/msg23392.html
Conflicts:
net/ceph/messenger.c [ context: tcp_nodelay option ]
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mel Gorman <mgorman@suse.de>
Jesse Barnes [Wed, 1 Apr 2015 21:22:58 +0000 (14:22 -0700)]
drm/i915/vlv: remove wait for previous GFX clk disable request
Looks like it was introduced in:
commit
650ad970a39f8b6164fe8613edc150f585315289
Author: Imre Deak <imre.deak@intel.com>
Date: Fri Apr 18 16:35:02 2014 +0300
drm/i915: vlv: factor out vlv_force_gfx_clock and check for pending force-of
but I'm not sure why. It has caused problems for us in the past (see
85250ddff7a6 "drm/i915/chv: Remove Wait for a previous gfx force-off"
and
8d4eee9cd7a1 "drm/i915: vlv: increase timeout when forcing on the
GFX clock") and doesn't seem to be required, so let's just drop it.
References: https://bugs.freedesktop.org/show_bug.cgi?id=89611
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Cc: stable@vger.kernel.org # c9c52e24194a: drm/i915/chv: Remove Wait ...
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Deepak S [Sat, 28 Mar 2015 09:53:34 +0000 (15:23 +0530)]
drm/i915/chv: Remove Wait for a previous gfx force-off
On CHV, PUNIT team confirmed that 'VLV_GFX_CLK_STATUS_BIT' is not a
sticky bit and it will always be set. So ignore Check for previous
Gfx force off during suspend and allow the force clk as part S0ix
Sequence
Signed-off-by: Deepak S <deepak.s@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>