OSDN Git Service
Omar Sandoval [Wed, 9 May 2018 09:08:51 +0000 (02:08 -0700)]
BACKPORT: block: use ktime_get_ns() instead of sched_clock() for cfq
cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit
28f4197e5d47 ("block: disable preemption before
using sched_clock()").
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry-picked from
84c7afcebed913c93d50f116b046b7f0d8ec0cdc)
[nc: only backport cfq changes]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Harsh Shandilya <msfjarvis@gmail.com>
kdrag0n [Sun, 2 Dec 2018 05:07:00 +0000 (21:07 -0800)]
arm64: only pass -mpc-relative-literal-loads if 843419 fix is enabled
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Wang Han [Sat, 9 Jun 2018 05:36:57 +0000 (22:36 -0700)]
fs: sync: Avoid calling fdget without fdput
When adding fsync support, we check for fsync_enabled() in several
cases but it appears that we should fdput() after fdget() but the
current code just check for fsync_enabled and directly return in
some cases after calling fdget().
Fix it by checking fsync_enabled first and move the f initialization
after the check. Also remove some unnecessary fsync_enabled checks as
this will be checked later in do_fsync().
Signed-off-by: Wang Han <416810799@qq.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
franciscofranco [Thu, 22 Nov 2012 07:45:56 +0000 (23:45 -0800)]
Added fsync on/off support.
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
Masahiro Yamada [Tue, 3 Jul 2018 01:22:00 +0000 (10:22 +0900)]
arm64: add endianness option to LDFLAGS instead of LD
With the recent syntax extension, Kconfig is now able to evaluate the
compiler / toolchain capability.
However, accumulating flags to 'LD' is not compatible with the way
it works; 'LD' must be passed to Kconfig to call $(ld-option,...)
from Kconfig files. If you tweak 'LD' in arch Makefile depending on
CONFIG_CPU_BIG_ENDIAN, this would end up with circular dependency
between Makefile and Kconfig.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Paul Kocialkowski [Mon, 2 Jul 2018 09:16:59 +0000 (11:16 +0200)]
arm64: Use aarch64elf and aarch64elfb emulation mode variants
The aarch64linux and aarch64linuxb emulation modes are not supported by
bare-metal toolchains and Linux using them forbids building the kernel
with these toolchains.
Since there is apparently no reason to target these emulation modes, the
more generic elf modes are used instead, allowing to build on bare-metal
toolchains as well as the already-supported ones.
Fixes:
3d6a7b99e3fa ("arm64: ensure the kernel is compiled for LP64")
Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul Kocialkowski <contact@paulk.fr>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ndesaulniers@google.com [Mon, 11 Feb 2019 19:30:06 +0000 (19:30 +0000)]
FROMLIST: BACKPORT: Makefile: lld: set -O2 linker flag when linking with LLD
For arm64:
0.34% size improvement with lld -O2 over lld for vmlinux.
3.3% size improvement with lld -O2 over lld for Image.lz4-dtb.
Link: https://github.com/ClangBuiltLinux/linux/issues/343
Suggested-by: Rui Ueyama <ruiu@google.com>
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
ndesaulniers@google.com [Mon, 11 Feb 2019 19:30:05 +0000 (19:30 +0000)]
FROMLIST: BACKPORT: Makefile: lld: tell clang to use lld
This is needed because clang doesn't select which linker to use based on
$LD but rather -fuse-ld={bfd,gold,lld,<absolute path to linker>}. This
is problematic especially for cc-ldoption, which checks for linker flag
support via invoking the compiler, rather than the linker.
Select the linker via absolute path from $PATH via `which`. This allows
you to build with:
$ make LD=ld.lld
$ make LD=ld.lld-8
$ make LD=/path/to/ld.lld
Add -Qunused-arguments to KBUILD_CPPFLAGS when using LLD, as otherwise
Clang likes to complain about -fuse-lld= being unused when compiling but
not linking (-c) such as when cc-option is used.
Link: https://github.com/ClangBuiltLinux/linux/issues/342
Link: https://github.com/ClangBuiltLinux/linux/issues/366
Link: https://github.com/ClangBuiltLinux/linux/issues/357
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Danny Lin [Wed, 7 Aug 2019 12:06:50 +0000 (15:06 +0300)]
kbuild: add ld-name macro
[wight554: stripped from kdrag0n/proton_bluecross@
aba5259]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Ard Biesheuvel [Mon, 3 Dec 2018 19:58:05 +0000 (20:58 +0100)]
arm64: relocatable: fix inconsistencies in linker script and options
readelf complains about the section layout of vmlinux when building
with CONFIG_RELOCATABLE=y (for KASLR):
readelf: Warning: [21]: Link field (0) should index a symtab section.
readelf: Warning: [21]: Info field (0) should index a relocatable section.
Also, it seems that our use of '-pie -shared' is contradictory, and
thus ambiguous. In general, the way KASLR is wired up at the moment
is highly tailored to how ld.bfd happens to implement (and conflate)
PIE executables and shared libraries, so given the current effort to
support other toolchains, let's fix some of these issues as well.
- Drop the -pie linker argument and just leave -shared. In ld.bfd,
the differences between them are unclear (except for the ELF type
of the produced image [0]) but lld chokes on seeing both at the
same time.
- Rename the .rela output section to .rela.dyn, as is customary for
shared libraries and PIE executables, so that it is not misidentified
by readelf as a static relocation section (producing the warnings
above).
- Pass the -z notext and -z norelro options to explicitly instruct the
linker to permit text relocations, and to omit the RELRO program
header (which requires a certain section layout that we don't adhere
to in the kernel). These are the defaults for current versions of
ld.bfd.
- Discard .eh_frame and .gnu.hash sections to avoid them from being
emitted between .head.text and .text, screwing up the section layout.
These changes only affect the ELF image, and produce the same binary
image.
[0]
b9dce7f1ba01 ("arm64: kernel: force ET_DYN ELF type for ...")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Smith <peter.smith@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[wight554: fix conflicts in arch/arm64/kernel/vmlinux.lds.S]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Nick Desaulniers [Fri, 27 Oct 2017 16:33:41 +0000 (09:33 -0700)]
arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
Upon upgrading to binutils 2.27, we found that our lz4 and gzip
compressed kernel images were significantly larger, resulting is 10ms
boot time regressions.
As noted by Rahul:
"aarch64 binaries uses RELA relocations, where each relocation entry
includes an addend value. This is similar to x86_64. On x86_64, the
addend values are also stored at the relocation offset for relative
relocations. This is an optimization: in the case where code does not
need to be relocated, the loader can simply skip processing relative
relocations. In binutils-2.25, both bfd and gold linkers did this for
x86_64, but only the gold linker did this for aarch64. The kernel build
here is using the bfd linker, which stored zeroes at the relocation
offsets for relative relocations. Since a set of zeroes compresses
better than a set of non-zero addend values, this behavior was resulting
in much better lz4 compression.
The bfd linker in binutils-2.27 is now storing the actual addend values
at the relocation offsets. The behavior is now consistent with what it
does for x86_64 and what gold linker does for both architectures. The
change happened in this upstream commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=
1f56df9d0d5ad89806c24e71f296576d82344613
Since a bunch of zeroes got replaced by non-zero addend values, we see
the side effect of lz4 compressed image being a bit bigger.
To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
flag can be used:
$ LDFLAGS="--no-apply-dynamic-relocs" make
With this flag, the compressed image size is back to what it was with
binutils-2.25.
If the kernel is using ASLR, there aren't additional runtime costs to
--no-apply-dynamic-relocs, as the relocations will need to be applied
again anyway after the kernel is relocated to a random address.
If the kernel is not using ASLR, then presumably the current default
behavior of the linker is better. Since the static linker performed the
dynamic relocs, and the kernel is not moved to a different address at
load time, it can skip applying the relocations all over again."
Some measurements:
$ ld -v
GNU ld (binutils-2.25-
f3d35cf6) 2.25.51.
20141117
^
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng
300652760 Oct 26 11:57 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng
16932627 Oct 26 11:57 Image.lz4-dtb
$ ld -v
GNU ld (binutils-2.27-
53dd00a1) 2.27.0.
20170315
^
pre patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng
300376208 Oct 26 11:43 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng
18159474 Oct 26 11:43 Image.lz4-dtb
post patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng
300376208 Oct 26 12:06 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng
16932466 Oct 26 12:06 Image.lz4-dtb
By Siqi's measurement w/ gzip:
binutils 2.27 with this patch (with --no-apply-dynamic-relocs):
Image
41535488
Image.gz
13404067
binutils 2.27 without this patch (without --no-apply-dynamic-relocs):
Image
41535488
Image.gz
14125516
Any compression scheme should be able to get better results from the
longer runs of zeros, not just GZIP and LZ4.
10ms boot time savings isn't anything to get excited about, but users of
arm64+compression+bfd-2.27 should not have to pay a penalty for no
runtime improvement.
Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com>
Reported-by: Sindhuri Pentyala <spentyala@google.com>
Reported-by: Wei Wang <wvw@google.com>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com>
Suggested-by: Siqi Lin <siqilin@google.com>
Suggested-by: Stephen Hines <srhines@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: added comment to Makefile]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Thu, 20 Oct 2016 10:12:57 +0000 (11:12 +0100)]
arm64: kernel: force ET_DYN ELF type for CONFIG_RELOCATABLE=y
GNU ld used to set the ELF file type to ET_DYN for PIE executables, which
is the same file type used for shared libraries. However, this was changed
recently, and now PIE executables are emitted as ET_EXEC instead.
The distinction is only relevant for ELF loaders, and so there is little
reason to care about the difference when building the kernel, which is
why the change has gone unnoticed until now.
However, debuggers do use the ELF binary, and expect ET_EXEC type files
to appear in memory at the exact offset described in the ELF metadata.
This means source level debugging is no longer possible when KASLR is in
effect or when executing the stub.
So add the -shared LD option when building with CONFIG_RELOCATABLE=y. This
forces the ELF file type to be set to ET_DYN (which is what you get when
building with binutils 2.24 and earlier anyway), and has no other ill
effects.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Andrew Pinski [Mon, 18 Sep 2017 10:20:20 +0000 (11:20 +0100)]
arm64: ensure the kernel is compiled for LP64
The kernel needs to be compiled as a LP64 binary for ARM64, even when
using a compiler that defaults to code-generation for the ILP32 ABI.
Consequently, we need to explicitly pass '-mabi=lp64' (supported on
gcc-4.9 and newer).
Signed-off-by: Andrew Pinski <Andrew.Pinski@caviumnetworks.com>
Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Michal Marek [Tue, 30 Aug 2016 08:31:35 +0000 (10:31 +0200)]
arm64: Set UTS_MACHINE in the Makefile
The make rpm target depends on proper UTS_MACHINE definition. Also, use
the variable in arch/arm64/kernel/setup.c, so that it's not accidentally
removed in the future.
Reported-and-tested-by: Fabian Vogt <fvogt@suse.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Alexey Dobriyan [Tue, 10 Apr 2018 23:34:45 +0000 (16:34 -0700)]
seq_file: allocate seq_file from kmem_cache
For fine-grained debugging and usercopy protection.
Link: http://lkml.kernel.org/r/20180310085027.GA17121@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Park Ju Hyung [Tue, 9 Jul 2019 10:36:18 +0000 (19:36 +0900)]
selinux: reduce calls to context_struct_to_string()
context_struct_to_string() contains expensive kmalloc() calls.
In most cases, there's no purpose in calling context_struct_to_string()
on !CONFIG_AUDIT as logs won't be saved anyways.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Danny Lin [Sun, 4 Aug 2019 03:40:30 +0000 (03:40 +0000)]
exec: Add node tampering blacklist function
We'll be adding checks to block writes from processes which tamper with
values that we control from within the kernel, especially ones that
userspace writes to for boosting. Add a central function to perform the
process check to reduce code duplication.
This blacklists the following processes which are known to tamper with
such values:
- init
- libperfmgr (power@1.3-servi and NodeLooperThrea)
- perfd (perf@1.0-servic)
- init.qcom.post_boot.sh (init.qcom.post_)
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
[added libperfmgr 1.2 in case some ROMs use wahoo powerhal]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Danny Lin [Fri, 29 Mar 2019 05:30:39 +0000 (22:30 -0700)]
setlocalversion: Reduce git commit hash length
This reduces the length of the git commit hash embedded in the kernel
version fron 12 characters to 8 characters.
Before:
4.9.166-Proton-v17-gd503a36d7582
After:
4.9.166-Proton-v17-g47057aaa
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
wloot [Sat, 29 Jun 2019 15:34:40 +0000 (23:34 +0800)]
base: make device_{online,offline} no op
Yaroslav Furman [Fri, 24 Aug 2018 09:10:33 +0000 (12:10 +0300)]
msm_serial_hs: actually check if msm_serial_hs_tx_work failed to init
Just a small typo.
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Minchan Kim [Mon, 5 Nov 2018 14:44:42 +0000 (23:44 +0900)]
FROMLIST: zram: fix lockdep warning of free block handling
[ 254.519728] ================================
[ 254.520311] WARNING: inconsistent lock state
[ 254.520898] 4.19.0+ #390 Not tainted
[ 254.521387] --------------------------------
[ 254.521732] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 254.521732] zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 254.521732]
00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
[ 254.521732] {SOFTIRQ-ON-W} state was registered at:
[ 254.521732] _raw_spin_lock+0x2c/0x40
[ 254.521732] zram_make_request+0x755/0xdc9
[ 254.521732] generic_make_request+0x373/0x6a0
[ 254.521732] submit_bio+0x6c/0x140
[ 254.521732] __swap_writepage+0x3a8/0x480
[ 254.521732] shrink_page_list+0x1102/0x1a60
[ 254.521732] shrink_inactive_list+0x21b/0x3f0
[ 254.521732] shrink_node_memcg.constprop.99+0x4f8/0x7e0
[ 254.521732] shrink_node+0x7d/0x2f0
[ 254.521732] do_try_to_free_pages+0xe0/0x300
[ 254.521732] try_to_free_pages+0x116/0x2b0
[ 254.521732] __alloc_pages_slowpath+0x3f4/0xf80
[ 254.521732] __alloc_pages_nodemask+0x2a2/0x2f0
[ 254.521732] __handle_mm_fault+0x42e/0xb50
[ 254.521732] handle_mm_fault+0x55/0xb0
[ 254.521732] __do_page_fault+0x235/0x4b0
[ 254.521732] page_fault+0x1e/0x30
[ 254.521732] irq event stamp: 228412
[ 254.521732] hardirqs last enabled at (228412): [<
ffffffff98245846>] __slab_free+0x3e6/0x600
[ 254.521732] hardirqs last disabled at (228411): [<
ffffffff98245625>] __slab_free+0x1c5/0x600
[ 254.521732] softirqs last enabled at (228396): [<
ffffffff98e0031e>] __do_softirq+0x31e/0x427
[ 254.521732] softirqs last disabled at (228403): [<
ffffffff98072051>] irq_exit+0xd1/0xe0
[ 254.521732]
[ 254.521732] other info that might help us debug this:
[ 254.521732] Possible unsafe locking scenario:
[ 254.521732]
[ 254.521732] CPU0
[ 254.521732] ----
[ 254.521732] lock(&(&zram->bitmap_lock)->rlock);
[ 254.521732] <Interrupt>
[ 254.521732] lock(&(&zram->bitmap_lock)->rlock);
[ 254.521732]
[ 254.521732] *** DEADLOCK ***
[ 254.521732]
[ 254.521732] no locks held by zram_verify/2095.
[ 254.521732]
[ 254.521732] stack backtrace:
[ 254.521732] CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
[ 254.521732] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 254.521732] Call Trace:
[ 254.521732] <IRQ>
[ 254.521732] dump_stack+0x67/0x9b
[ 254.521732] print_usage_bug+0x1bd/0x1d3
[ 254.521732] mark_lock+0x4aa/0x540
[ 254.521732] ? check_usage_backwards+0x160/0x160
[ 254.521732] __lock_acquire+0x51d/0x1300
[ 254.521732] ? free_debug_processing+0x24e/0x400
[ 254.521732] ? bio_endio+0x6d/0x1a0
[ 254.521732] ? lockdep_hardirqs_on+0x9b/0x180
[ 254.521732] ? lock_acquire+0x90/0x180
[ 254.521732] lock_acquire+0x90/0x180
[ 254.521732] ? put_entry_bdev+0x1e/0x50
[ 254.521732] _raw_spin_lock+0x2c/0x40
[ 254.521732] ? put_entry_bdev+0x1e/0x50
[ 254.521732] put_entry_bdev+0x1e/0x50
[ 254.521732] zram_free_page+0xf6/0x110
[ 254.521732] zram_slot_free_notify+0x42/0xa0
[ 254.521732] end_swap_bio_read+0x5b/0x170
[ 254.521732] blk_update_request+0x8f/0x340
[ 254.521732] scsi_end_request+0x2c/0x1e0
[ 254.521732] scsi_io_completion+0x98/0x650
[ 254.521732] blk_done_softirq+0x9e/0xd0
[ 254.521732] __do_softirq+0xcc/0x427
[ 254.521732] irq_exit+0xd1/0xe0
[ 254.521732] do_IRQ+0x93/0x120
[ 254.521732] common_interrupt+0xf/0xf
[ 254.521732] </IRQ>
With writeback feature, zram_slot_free_notify could be called
in softirq context by end_swap_bio_read. However, bitmap_lock
is not aware of that so lockdep yell out. Thanks.
get_entry_bdev
spin_lock(bitmap->lock);
irq
softirq
end_swap_bio_read
zram_slot_free_notify
zram_slot_lock <-- deadlock prone
zram_free_page
put_entry_bdev
spin_lock(bitmap->lock); <-- deadlock prone
With akpm's suggestion(i.e. bitmap operation is already atomic),
we could remove bitmap lock. It might fail to find a empty slot
if serious contention happens. However, it's not severe problem
because huge page writeback has already possiblity to fail if there
is severe memory pressure. Worst case is just keeping
the incompressible in memory, not storage.
The other problem is zram_slot_lock in zram_slot_slot_free_notify.
To make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it. Although it's rare to be contented,
this patch adds new debug stat "miss_free" to keep monitoring
how often it happens.
(am from https://lore.kernel.org/lkml/
20181203024045.153534-2-minchan@kernel.org/)
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Bug:
117683663
Change-Id: I9030753558338e86f2bb3bdd96ec06faf4cdd305
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Tyler Nijmeh [Fri, 24 May 2019 20:49:55 +0000 (13:49 -0700)]
mm: Increase vmstat interval
Red Hat linux states that vmstat updates can be mildly expensive.
Increase from 1 second to 10 seconds.
Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Julian Liu [Tue, 17 Sep 2019 14:01:28 +0000 (22:01 +0800)]
zram: Limit the max disksize to 2GB
Philip Cuadra [Thu, 18 Jan 2018 19:32:26 +0000 (11:32 -0800)]
msm: bus_arb: disable debug logging
This debug logging consumes 10% of all the cpu cycles in the drivers
communicating with the dsp. Disable the logging for the time being
until it we add a config.
Bug:
71867957
Change-Id: I97d418fee3d4576b077ed284ed5ae4447da5a789
Signed-off-by: Philip Cuadra <philipcuadra@google.com>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
kdrag0n [Sun, 23 Dec 2018 06:29:33 +0000 (22:29 -0800)]
block: disable random pool contribution by default
Signed-off-by: kdrag0n <dragon@khronodragon.com>
wloot [Thu, 6 Jun 2019 07:04:53 +0000 (15:04 +0800)]
phy: Do not compile unused drivers
Ard Biesheuvel [Wed, 11 Jan 2017 16:41:52 +0000 (16:41 +0000)]
BACKPORT: crypto: arm64/aes - add scalar implementation
This adds a scalar implementation of AES, based on the precomputed tables
that are exposed by the generic AES code. Since rotates are cheap on arm64,
this implementation only uses the 4 core tables (of 1 KB each), and avoids
the prerotated ones, reducing the D-cache footprint by 75%.
On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster
than the generic C code. (Note that this is still >13x slower than the code
that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per
byte.)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit
bed593c0e852f5c1efd3ca4e984fd744c51cf6ee)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Alin Jerpelea [Sat, 1 Jul 2017 06:10:54 +0000 (08:10 +0200)]
tty: serial: msm_serial_hs: fix ipclog spam
ifdef CONFIG_IPC_LOGGING functions to fix spam when config is
not enabled
Signed-off-by: Alin Jerpelea <alin.jerpelea@sonymobile.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Angelo G. Del Regno [Sat, 27 May 2017 15:36:11 +0000 (17:36 +0200)]
soc: qcom: Further fixes for !IPC_LOGGING for IPA, SMD, SMEM.
Do not crash when IPC_LOGGING is not activated.
Signed-off-by: celtare21 <celtare21@gmail.com>
Alin Jerpelea [Wed, 4 Jan 2017 15:51:53 +0000 (16:51 +0100)]
soc: qcom: stop spam when IPC_LOGGING is disabled
Signed-off-by: Alin Jerpelea <alin.jerpelea@sonymobile.com>
Aaron Lu [Fri, 17 Aug 2018 22:49:14 +0000 (15:49 -0700)]
mm, page_alloc: double zone's batchsize
To improve page allocator's performance for order-0 pages, each CPU has
a Per-CPU-Pageset(PCP) per zone. Whenever an order-0 page is needed,
PCP will be checked first before asking pages from Buddy. When PCP is
used up, a batch of pages will be fetched from Buddy to improve
performance and the size of batch can affect performance.
zone's batch size gets doubled last time by commit
ba56e91c9401("mm:
page_alloc: increase size of per-cpu-pages") over ten years ago. Since
then, CPU has envolved a lot and CPU's cache sizes also increased.
Dave Hansen is concerned the current batch size doesn't fit well with
modern hardware and suggested me to do two things: first, use a page
allocator intensive benchmark, e.g. will-it-scale/page_fault1 to find
out how performance changes with different batch sizes on various
machines and then choose a new default batch size; second, see how this
new batch size work with other workloads.
In the first test, we saw performance gains on high-core-count systems
and little to no effect on older systems with more modest core counts.
In this phase's test data, two candidates: 63 and 127 are chosen.
In the second step, ebizzy, oltp, kbuild, pigz, netperf, vm-scalability
and more will-it-scale sub-tests are tested to see how these two
candidates work with these workloads and decides a new default according
to their results.
Most test results are flat. will-it-scale/page_fault2 process mode has
10%-18% performance increase on 4-sockets Skylake and Broadwell.
vm-scalability/lru-file-mmap-read has 17%-47% performance increase for
4-sockets servers while for 2-sockets servers, it caused 3%-8% performance
drop. Further analysis showed that, with a larger pcp->batch and thus
larger pcp->high(the relationship of pcp->high=6 * pcp->batch is
maintained in this patch), zone lock contention shifted to LRU add side
lock contention and that caused performance drop. This performance drop
might be mitigated by others' work on optimizing LRU lock.
Another downside of increasing pcp->batch is, when PCP is used up and need
to fetch a batch of pages from Buddy, since batch is increased, that time
can be longer than before. My understanding is, this doesn't affect
slowpath where direct reclaim and compaction dominates. For fastpath,
throughput is a win(according to will-it-scale/page_fault1) but worst
latency can be larger now.
Overall, I think double the batch size from 31 to 63 is relatively safe
and provide good performance boost for high-core-count systems.
The two phase's test results are listed below(all tests are done with THP
disabled).
Phase one(will-it-scale/page_fault1) test results:
Skylake-EX: increased batch size has a good effect on zone->lock
contention, though LRU contention will rise at the same time and
limited the final performance increase.
batch score change zone_contention lru_contention total_contention
31
15345900 +0.00% 64% 8% 72%
53
17903847 +16.67% 32% 38% 70%
63
17992886 +17.25% 24% 45% 69%
73
18022825 +17.44% 10% 61% 71%
119
18023401 +17.45% 4% 66% 70%
127
18029012 +17.48% 3% 66% 69%
137
18036075 +17.53% 4% 66% 70%
165
18035964 +17.53% 2% 67% 69%
188
18101105 +17.95% 2% 67% 69%
223
18130951 +18.15% 2% 67% 69%
255
18118898 +18.07% 2% 67% 69%
267
18101559 +17.96% 2% 67% 69%
299
18160468 +18.34% 2% 68% 70%
320
18139845 +18.21% 2% 67% 69%
393
18160869 +18.34% 2% 68% 70%
424
18170999 +18.41% 2% 68% 70%
458
18144868 +18.24% 2% 68% 70%
467
18142366 +18.22% 2% 68% 70%
498
18154549 +18.30% 1% 68% 69%
511
18134525 +18.17% 1% 69% 70%
Broadwell-EX: similar pattern as Skylake-EX.
batch score change zone_contention lru_contention total_contention
31
16703983 +0.00% 67% 7% 74%
53
18195393 +8.93% 43% 28% 71%
63
18288885 +9.49% 38% 33% 71%
73
18344329 +9.82% 35% 37% 72%
119
18535529 +10.96% 24% 46% 70%
127
18513596 +10.83% 23% 48% 71%
137
18514327 +10.84% 23% 48% 71%
165
18511840 +10.82% 22% 49% 71%
188
18593478 +11.31% 17% 53% 70%
223
18601667 +11.36% 17% 52% 69%
255
18774825 +12.40% 12% 58% 70%
267
18754781 +12.28% 9% 60% 69%
299
18892265 +13.10% 7% 63% 70%
320
18873812 +12.99% 8% 62% 70%
393
18891174 +13.09% 6% 64% 70%
424
18975108 +13.60% 6% 64% 70%
458
18932364 +13.34% 8% 62% 70%
467
18960891 +13.51% 5% 65% 70%
498
18944526 +13.41% 5% 64% 69%
511
18960839 +13.51% 5% 64% 69%
Skylake-EP: although increased batch reduced zone->lock contention, but
the effect is not as good as EX: zone->lock contention is still as high as
20% with a very high batch value instead of 1% on Skylake-EX or 5% on
Broadwell-EX. Also, total_contention actually decreased with a higher
batch but that doesn't translate to performance increase.
batch score change zone_contention lru_contention total_contention
31
9554867 +0.00% 66% 3% 69%
53
9855486 +3.15% 63% 3% 66%
63
9980145 +4.45% 62% 4% 66%
73
10092774 +5.63% 62% 5% 67%
119
10310061 +7.90% 45% 19% 64%
127
10342019 +8.24% 42% 19% 61%
137
10358182 +8.41% 42% 21% 63%
165
10397060 +8.81% 37% 24% 61%
188
10341808 +8.24% 34% 26% 60%
223
10349135 +8.31% 31% 27% 58%
255
10327189 +8.08% 28% 29% 57%
267
10344204 +8.26% 27% 29% 56%
299
10325043 +8.06% 25% 30% 55%
320
10310325 +7.91% 25% 31% 56%
393
10293274 +7.73% 21% 31% 52%
424
10311099 +7.91% 21% 32% 53%
458
10321375 +8.02% 21% 32% 53%
467
10303881 +7.84% 21% 32% 53%
498
10332462 +8.14% 20% 33% 53%
511
10325016 +8.06% 20% 32% 52%
Broadwell-EP: zone->lock and lru lock had an agreement to make sure
performance doesn't increase and they successfully managed to keep total
contention at 70%.
batch score change zone_contention lru_contention total_contention
31
10121178 +0.00% 19% 50% 69%
53
10142366 +0.21% 6% 63% 69%
63
10117984 -0.03% 11% 58% 69%
73
10123330 +0.02% 7% 63% 70%
119
10108791 -0.12% 2% 67% 69%
127
10166074 +0.44% 3% 66% 69%
137
10141574 +0.20% 3% 66% 69%
165
10154499 +0.33% 2% 68% 70%
188
10124921 +0.04% 2% 67% 69%
223
10137399 +0.16% 2% 67% 69%
255
10143289 +0.22% 0% 68% 68%
267
10123535 +0.02% 1% 68% 69%
299
10140952 +0.20% 0% 68% 68%
320
10163170 +0.41% 0% 68% 68%
393
10000633 -1.19% 0% 69% 69%
424
10087998 -0.33% 0% 69% 69%
458
10187116 +0.65% 0% 69% 69%
467
10146790 +0.25% 0% 69% 69%
498
10197958 +0.76% 0% 69% 69%
511
10152326 +0.31% 0% 69% 69%
Haswell-EP: similar to Broadwell-EP.
batch score change zone_contention lru_contention total_contention
31
10442205 +0.00% 14% 48% 62%
53
10442255 +0.00% 5% 57% 62%
63
10452059 +0.09% 6% 57% 63%
73
10482349 +0.38% 5% 59% 64%
119
10454644 +0.12% 3% 60% 63%
127
10431514 -0.10% 3% 59% 62%
137
10423785 -0.18% 3% 60% 63%
165
10481216 +0.37% 2% 61% 63%
188
10448755 +0.06% 2% 61% 63%
223
10467144 +0.24% 2% 61% 63%
255
10480215 +0.36% 2% 61% 63%
267
10484279 +0.40% 2% 61% 63%
299
10466450 +0.23% 2% 61% 63%
320
10452578 +0.10% 2% 61% 63%
393
10499678 +0.55% 1% 62% 63%
424
10481454 +0.38% 1% 62% 63%
458
10473562 +0.30% 1% 62% 63%
467
10484269 +0.40% 0% 62% 62%
498
10505599 +0.61% 0% 62% 62%
511
10483395 +0.39% 0% 62% 62%
Westmere-EP: contention is pretty small so not interesting. Note too high
a batch value could hurt performance.
batch score change zone_contention lru_contention total_contention
31
4831523 +0.00% 2% 3% 5%
53
4834086 +0.05% 2% 4% 6%
63
4834262 +0.06% 2% 3% 5%
73
4832851 +0.03% 2% 4% 6%
119
4830534 -0.02% 1% 3% 4%
127
4827461 -0.08% 1% 4% 5%
137
4827459 -0.08% 1% 3% 4%
165
4820534 -0.23% 0% 4% 4%
188
4817947 -0.28% 0% 3% 3%
223
4809671 -0.45% 0% 3% 3%
255
4802463 -0.60% 0% 4% 4%
267
4801634 -0.62% 0% 3% 3%
299
4798047 -0.69% 0% 3% 3%
320
4793084 -0.80% 0% 3% 3%
393
4785877 -0.94% 0% 3% 3%
424
4782911 -1.01% 0% 3% 3%
458
4779346 -1.08% 0% 3% 3%
467
4780306 -1.06% 0% 3% 3%
498
4780589 -1.05% 0% 3% 3%
511
4773724 -1.20% 0% 3% 3%
Skylake-Desktop: similar to Westmere-EP, nothing interesting.
batch score change zone_contention lru_contention total_contention
31
3906608 +0.00% 2% 3% 5%
53
3940164 +0.86% 2% 3% 5%
63
3937289 +0.79% 2% 3% 5%
73
3940201 +0.86% 2% 3% 5%
119
3933240 +0.68% 2% 3% 5%
127
3930514 +0.61% 2% 4% 6%
137
3938639 +0.82% 0% 3% 3%
165
3908755 +0.05% 0% 3% 3%
188
3905621 -0.03% 0% 3% 3%
223
3903015 -0.09% 0% 4% 4%
255
3889480 -0.44% 0% 3% 3%
267
3891669 -0.38% 0% 4% 4%
299
3898728 -0.20% 0% 4% 4%
320
3894547 -0.31% 0% 4% 4%
393
3875137 -0.81% 0% 4% 4%
424
3874521 -0.82% 0% 3% 3%
458
3880432 -0.67% 0% 4% 4%
467
3888715 -0.46% 0% 3% 3%
498
3888633 -0.46% 0% 4% 4%
511
3875305 -0.80% 0% 5% 5%
Haswell-Desktop: zone->lock is pretty low as other desktops, though lru
contention is higher than other desktops.
batch score change zone_contention lru_contention total_contention
31
3511158 +0.00% 2% 5% 7%
53
3555445 +1.26% 2% 6% 8%
63
3561082 +1.42% 2% 6% 8%
73
3547218 +1.03% 2% 6% 8%
119
3571319 +1.71% 1% 7% 8%
127
3549375 +1.09% 0% 6% 6%
137
3560233 +1.40% 0% 6% 6%
165
3555176 +1.25% 2% 6% 8%
188
3551501 +1.15% 0% 8% 8%
223
3531462 +0.58% 0% 7% 7%
255
3570400 +1.69% 0% 7% 7%
267
3532235 +0.60% 1% 8% 9%
299
3562326 +1.46% 0% 6% 6%
320
3553569 +1.21% 0% 8% 8%
393
3539519 +0.81% 0% 7% 7%
424
3549271 +1.09% 0% 8% 8%
458
3528885 +0.50% 0% 8% 8%
467
3526554 +0.44% 0% 7% 7%
498
3525302 +0.40% 0% 9% 9%
511
3527556 +0.47% 0% 8% 8%
Sandybridge-Desktop: the 0% contention isn't accurate but caused by
dropped fractional part. Since multiple contention path's contentions
are all under 1% here, with some arithmetic operations like add, the
final deviation could be as large as 3%.
batch score change zone_contention lru_contention total_contention
31
1744495 +0.00% 0% 0% 0%
53
1755341 +0.62% 0% 0% 0%
63
1758469 +0.80% 0% 0% 0%
73
1759626 +0.87% 0% 0% 0%
119
1770417 +1.49% 0% 0% 0%
127
1768252 +1.36% 0% 0% 0%
137
1767848 +1.34% 0% 0% 0%
165
1765088 +1.18% 0% 0% 0%
188
1766918 +1.29% 0% 0% 0%
223
1767866 +1.34% 0% 0% 0%
255
1768074 +1.35% 0% 0% 0%
267
1763187 +1.07% 0% 0% 0%
299
1765620 +1.21% 0% 0% 0%
320
1767603 +1.32% 0% 0% 0%
393
1764612 +1.15% 0% 0% 0%
424
1758476 +0.80% 0% 0% 0%
458
1758593 +0.81% 0% 0% 0%
467
1757915 +0.77% 0% 0% 0%
498
1753363 +0.51% 0% 0% 0%
511
1755548 +0.63% 0% 0% 0%
Phase two test results:
Note: all percent change is against base(batch=31).
ebizzy.throughput (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
2410037±7%
2600451±2% +7.9%
2602878 +8.0%
lkp-bdw-ex1
1493328 1489243 -0.3%
1492145 -0.1%
lkp-skl-2sp2
1329674 1345891 +1.2%
1351056 +1.6%
lkp-bdw-ep2 711511 711511 0.0% 710708 -0.1%
lkp-wsm-ep2 75750 75528 -0.3% 75441 -0.4%
lkp-skl-d01 264126 262791 -0.5% 264113 +0.0%
lkp-hsw-d01 176601 176328 -0.2% 176368 -0.1%
lkp-sb02 98937 98937 +0.0% 99030 +0.1%
kbuild.buildtime (less is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 107.00 107.67 +0.6% 107.11 +0.1%
lkp-bdw-ex1 97.33 97.33 +0.0% 97.42 +0.1%
lkp-skl-2sp2 180.00 179.83 -0.1% 179.83 -0.1%
lkp-bdw-ep2 178.17 179.17 +0.6% 177.50 -0.4%
lkp-wsm-ep2 737.00 738.00 +0.1% 738.00 +0.1%
lkp-skl-d01 642.00 653.00 +1.7% 653.00 +1.7%
lkp-hsw-d01 1310.00 1316.00 +0.5% 1311.00 +0.1%
netperf/TCP_STREAM.Throughput_total_Mbps (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 948790 947144 -0.2% 948333 -0.0%
lkp-bdw-ex1 904224 904366 +0.0% 904926 +0.1%
lkp-skl-2sp2 239731 239607 -0.1% 239565 -0.1%
lk-bdw-ep2 365764 365933 +0.0% 365951 +0.1%
lkp-wsm-ep2 93736 93803 +0.1% 93808 +0.1%
lkp-skl-d01 77314 77303 -0.0% 77375 +0.1%
lkp-hsw-d01 58617 60387 +3.0% 60208 +2.7%
lkp-sb02 29990 30137 +0.5% 30103 +0.4%
oltp.transactions (higer is better)
machine batch=31 batch=63 batch=127
lkp-bdw-ex1
9073276 9100377 +0.3%
9036344 -0.4%
lkp-skl-2sp2
8898717 8852054 -0.5%
8894459 -0.0%
lkp-bdw-ep2
13426155 13384654 -0.3%
13333637 -0.7%
lkp-hsw-ep2
13146314 13232784 +0.7%
13193163 +0.4%
lkp-wsm-ep2
5035355 5019348 -0.3%
5033418 -0.0%
lkp-skl-d01 418485
4413339 -0.1%
4419039 +0.0%
lkp-hsw-d01
3517817±5%
3396120±3% -3.5%
3455138±3% -1.8%
pigz.throughput (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 1.513e+08 1.507e+08 -0.4% 1.511e+08 -0.2%
lkp-bdw-ex1 2.060e+08 2.052e+08 -0.4% 2.044e+08 -0.8%
lkp-skl-2sp2 8.836e+08 8.845e+08 +0.1% 8.836e+08 -0.0%
lkp-bdw-ep2 8.275e+08 8.464e+08 +2.3% 8.330e+08 +0.7%
lkp-wsm-ep2 2.224e+08 2.221e+08 -0.2% 2.218e+08 -0.3%
lkp-skl-d01 1.177e+08 1.177e+08 -0.0% 1.176e+08 -0.1%
lkp-hsw-d01 1.154e+08 1.154e+08 +0.1% 1.154e+08 -0.0%
lkp-sb02 0.633e+08 0.633e+08 +0.1% 0.633e+08 +0.0%
will-it-scale.malloc1.processes (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 620181 620484 +0.0% 620240 +0.0%
lkp-bdw-ex1
1403610 1401201 -0.2%
1417900 +1.0%
lkp-skl-2sp2
1288097 1284145 -0.3%
1283907 -0.3%
lkp-bdw-ep2
1427879 1427675 -0.0%
1428266 +0.0%
lkp-hsw-ep2
1362546 1353965 -0.6%
1354759 -0.6%
lkp-wsm-ep2
2099657 2107576 +0.4%
2100226 +0.0%
lkp-skl-d01
1476835 1476358 -0.0%
1474487 -0.2%
lkp-hsw-d01
1308810 1303429 -0.4%
1301299 -0.6%
lkp-sb02 589286 589284 -0.0% 588101 -0.2%
will-it-scale.malloc1.threads (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 21289 21125 -0.8% 21241 -0.2%
lkp-bdw-ex1 28114 28089 -0.1% 28007 -0.4%
lkp-skl-2sp2 91866 91946 +0.1% 92723 +0.9%
lkp-bdw-ep2 37637 37501 -0.4% 37317 -0.9%
lkp-hsw-ep2 43673 43590 -0.2% 43754 +0.2%
lkp-wsm-ep2 28577 28298 -1.0% 28545 -0.1%
lkp-skl-d01 175277 173343 -1.1% 173082 -1.3%
lkp-hsw-d01 130303 129566 -0.6% 129250 -0.8%
lkp-sb02 113742±3% 116911 +2.8% 116417±3% +2.4%
will-it-scale.malloc2.processes (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 1.206e+09 1.206e+09 -0.0% 1.206e+09 +0.0%
lkp-bdw-ex1 1.319e+09 1.319e+09 -0.0% 1.319e+09 +0.0%
lkp-skl-2sp2 8.000e+08 8.021e+08 +0.3% 7.995e+08 -0.1%
lkp-bdw-ep2 6.582e+08 6.634e+08 +0.8% 6.513e+08 -1.1%
lkp-hsw-ep2 6.671e+08 6.669e+08 -0.0% 6.665e+08 -0.1%
lkp-wsm-ep2 1.805e+08 1.806e+08 +0.0% 1.804e+08 -0.1%
lkp-skl-d01 1.611e+08 1.611e+08 -0.0% 1.610e+08 -0.0%
lkp-hsw-d01 1.333e+08 1.332e+08 -0.0% 1.332e+08 -0.0%
lkp-sb02
82485104 82478206 -0.0%
82473546 -0.0%
will-it-scale.malloc2.threads (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 1.574e+09 1.574e+09 -0.0% 1.574e+09 -0.0%
lkp-bdw-ex1 1.737e+09 1.737e+09 +0.0% 1.737e+09 -0.0%
lkp-skl-2sp2 9.161e+08 9.162e+08 +0.0% 9.181e+08 +0.2%
lkp-bdw-ep2 7.856e+08 8.015e+08 +2.0% 8.113e+08 +3.3%
lkp-hsw-ep2 6.908e+08 6.904e+08 -0.1% 6.907e+08 -0.0%
lkp-wsm-ep2 2.409e+08 2.409e+08 +0.0% 2.409e+08 -0.0%
lkp-skl-d01 1.199e+08 1.199e+08 -0.0% 1.199e+08 -0.0%
lkp-hsw-d01 1.029e+08 1.029e+08 -0.0% 1.029e+08 +0.0%
lkp-sb02
68081213 68061423 -0.0%
68076037 -0.0%
will-it-scale.page_fault2.processes (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
14509125±4%
16472364 +13.5%
17123117 +18.0%
lkp-bdw-ex1
14736381 16196588 +9.9%
16364011 +11.0%
lkp-skl-2sp2
6354925 6435444 +1.3%
6436644 +1.3%
lkp-bdw-ep2
8749584 8834422 +1.0%
8827179 +0.9%
lkp-hsw-ep2
8762591 8845920 +1.0%
8825697 +0.7%
lkp-wsm-ep2
3036083 3030428 -0.2%
3021741 -0.5%
lkp-skl-d01
2307834 2304731 -0.1%
2286142 -0.9%
lkp-hsw-d01
1806237 1800786 -0.3%
1795943 -0.6%
lkp-sb02 842616 837844 -0.6% 833921 -1.0%
will-it-scale.page_fault2.threads
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
1623294 1615132±2% -0.5%
1656777 +2.1%
lkp-bdw-ex1
1995714 2025948 +1.5%
2113753±3% +5.9%
lkp-skl-2sp2
2346708 2415591 +2.9%
2416919 +3.0%
lkp-bdw-ep2
2342564 2344882 +0.1%
2300206 -1.8%
lkp-hsw-ep2
1820658 1831681 +0.6%
1844057 +1.3%
lkp-wsm-ep2
1725482 1733774 +0.5%
1740517 +0.9%
lkp-skl-d01
1832833 1823628 -0.5%
1806489 -1.4%
lkp-hsw-d01
1427913 1427287 -0.0%
1420226 -0.5%
lkp-sb02 750626 748615 -0.3% 746621 -0.5%
will-it-scale.page_fault3.processes (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
24382726 24400317 +0.1%
24668774 +1.2%
lkp-bdw-ex1
35399750 35683124 +0.8%
35829492 +1.2%
lkp-skl-2sp2
28136820 28068248 -0.2%
28147989 +0.0%
lkp-bdw-ep2
37269077 37459490 +0.5%
37373073 +0.3%
lkp-hsw-ep2
36224967 36114085 -0.3%
36104908 -0.3%
lkp-wsm-ep2
16820457 16911005 +0.5%
16968596 +0.9%
lkp-skl-d01
7721138 7725904 +0.1%
7756740 +0.5%
lkp-hsw-d01
7611979 7650928 +0.5%
7651323 +0.5%
lkp-sb02
3781546 3796502 +0.4%
3796827 +0.4%
will-it-scale.page_fault3.threads (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
1865820±3%
1900917±2% +1.9%
1826245±4% -2.1%
lkp-bdw-ex1
3094060 3148326 +1.8%
3150036 +1.8%
lkp-skl-2sp2
3952940 3953898 +0.0%
3989360 +0.9%
lkp-bdw-ep2
3420373±3%
3643964 +6.5%
3644910±5% +6.6%
lkp-hsw-ep2
2609635±2%
2582310±3% -1.0%
2780459 +6.5%
lkp-wsm-ep2
4395001 4417196 +0.5%
4432499 +0.9%
lkp-skl-d01
5363977 5400003 +0.7%
5411370 +0.9%
lkp-hsw-d01
5274131 5311294 +0.7%
5319359 +0.9%
lkp-sb02
2917314 2913004 -0.1%
2935286 +0.6%
will-it-scale.read1.processes (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
73762279±14%
69322519±10% -6.0%
69349855±13% -6.0% (result unstable)
lkp-bdw-ex1 1.701e+08 1.704e+08 +0.1% 1.705e+08 +0.2%
lkp-skl-2sp2
63111570 63113953 +0.0%
63836573 +1.1%
lkp-bdw-ep2
79247409 79424610 +0.2%
78012656 -1.6%
lkp-hsw-ep2
67677026 68308800 +0.9%
67539106 -0.2%
lkp-wsm-ep2
13339630 13939817 +4.5%
13766865 +3.2%
lkp-skl-d01
10969487 10972650 +0.0% no data
lkp-hsw-d01
9857342±2%
10080592±2% +2.3%
10131560 +2.8%
lkp-sb02
5189076 5197473 +0.2%
5163253 -0.5%
will-it-scale.read1.threads (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
62468045±12%
73666726±7% +17.9%
79553123±12% +27.4% (result unstable)
lkp-bdw-ex1 1.62e+08 1.624e+08 +0.3% 1.614e+08 -0.3%
lkp-skl-2sp2
58319780 59181032 +1.5%
59821353 +2.6%
lkp-bdw-ep2
74057992 75698171 +2.2%
74990869 +1.3%
lkp-hsw-ep2
63672959 63639652 -0.1%
64387051 +1.1%
lkp-wsm-ep2
13489943 13526058 +0.3%
13259032 -1.7%
lkp-skl-d01
10297906 10338796 +0.4%
10407328 +1.1%
lkp-hsw-d01
9636721 9667376 +0.3%
9341147 -3.1%
lkp-sb02
4801938 4804496 +0.1%
4802290 +0.0%
will-it-scale.write1.processes (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 1.111e+08 1.104e+08±2% -0.7% 1.122e+08±2% +1.0%
lkp-bdw-ex1 1.392e+08 1.399e+08 +0.5% 1.397e+08 +0.4%
lkp-skl-2sp2
59369233 58994841 -0.6%
58715168 -1.1%
lkp-bdw-ep2
61820979 CPU throttle
63593123 +2.9%
lkp-hsw-ep2
57897587 57435605 -0.8%
56347450 -2.7%
lkp-wsm-ep2
7814203 7918017±2% +1.3%
7669068 -1.9%
lkp-skl-d01
8886557 8971422 +1.0%
8818366 -0.8%
lkp-hsw-d01
9171001±5%
9189915 +0.2%
9483909 +3.4%
lkp-sb02
4475406 4475294 -0.0%
4501756 +0.6%
will-it-scale.write1.threads (higer is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1 1.058e+08 1.055e+08±2% -0.2% 1.065e+08 +0.7%
lkp-bdw-ex1 1.316e+08 1.300e+08 -1.2% 1.308e+08 -0.6%
lkp-skl-2sp2
54492421 56086678 +2.9%
55975657 +2.7%
lkp-bdw-ep2
59360449 59003957 -0.6%
58101262 -2.1%
lkp-hsw-ep2
53346346±2%
52530876 -1.5%
52902487 -0.8%
lkp-wsm-ep2
7774006 7800092±2% +0.3%
7558833 -2.8%
lkp-skl-d01
8346174 8235695 -1.3% no data
lkp-hsw-d01
8636244 8655731 +0.2%
8658868 +0.3%
lkp-sb02
4181820 4204107 +0.5%
4182992 +0.0%
vm-scalability.anon-r-rand.throughput (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
11933873±3%
12356544±2% +3.5%
12188624 +2.1%
lkp-bdw-ex1
7114424±2%
7330949±2% +3.0%
7392419 +3.9%
lkp-skl-2sp2
6773277±5%
6492332±8% -4.1%
6543962 -3.4%
lkp-bdw-ep2
7133846±4%
7233508 +1.4%
7013518±3% -1.7%
lkp-hsw-ep2
4576626 4527098 -1.1%
4551679 -0.5%
lkp-wsm-ep2
2583599 2592492 +0.3%
2588039 +0.2%
lkp-hsw-d01 998199±2%
1028311 +3.0%
1006460±2% +0.8%
lkp-sb02 570572 567854 -0.5% 568449 -0.4%
vm-scalability.anon-r-rand-mt.throughput (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
1789419 1787830 -0.1%
1788208 -0.1%
lkp-bdw-ex1
3492595±2%
3554966±2% +1.8%
3558835±3% +1.9%
lkp-skl-2sp2
3856238±2%
3975403±4% +3.1%
3994600 +3.6%
lkp-bdw-ep2
3726963±11%
3809292±6% +2.2%
3871924±4% +3.9%
lkp-hsw-ep2
2131760±3%
2033578±4% -4.6%
2130727±6% -0.0%
lkp-wsm-ep2
2369731 2368384 -0.1%
2370252 +0.0%
lkp-skl-d01
1207128 1206220 -0.1%
1205801 -0.1%
lkp-hsw-d01 964317 992329±2% +2.9% 992099±2% +2.9%
lkp-sb02 567137 567346 +0.0% 566144 -0.2%
vm-scalability.lru-file-mmap-read.throughput (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
19560469±6%
23018999 +17.7%
23418800 +19.7%
lkp-bdw-ex1
17769135±14%
26141676±3% +47.1%
26284723±5% +47.9%
lkp-skl-2sp2
14056512 13578884 -3.4%
13146214 -6.5%
lkp-bdw-ep2
15336542 14737654 -3.9%
14088159 -8.1%
lkp-hsw-ep2
16275498 15756296 -3.2%
15018090 -7.7%
lkp-wsm-ep2
11272160 11237231 -0.3%
11310047 +0.3%
lkp-skl-d01
7322119 7324569 +0.0%
7184148 -1.9%
lkp-hsw-d01
6449234 6404542 -0.7%
6356141 -1.4%
lkp-sb02
3517943 3520668 +0.1%
3527309 +0.3%
vm-scalability.lru-file-mmap-read-rand.throughput (higher is better)
machine batch=31 batch=63 batch=127
lkp-skl-4sp1
1689052 1697553 +0.5%
1698726 +0.6%
lkp-bdw-ex1
1675246 1699764 +1.5%
1712226 +2.2%
lkp-skl-2sp2
1800533 1799749 -0.0%
1800581 +0.0%
lkp-bdw-ep2
1807422 1807758 +0.0%
1804932 -0.1%
lkp-hsw-ep2
1809807 1808781 -0.1%
1807811 -0.1%
lkp-wsm-ep2
1800198 1802434 +0.1%
1801236 +0.1%
lkp-skl-d01 696689 695537 -0.2% 694106 -0.4%
lkp-hsw-d01 698364 698666 +0.0% 696686 -0.2%
lkp-sb02 258939 258787 -0.1% 258199 -0.3%
Link: http://lkml.kernel.org/r/20180711055855.29072-1-aaron.lu@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
Ard Biesheuvel [Tue, 21 Nov 2017 13:40:17 +0000 (13:40 +0000)]
BACKPORT: crypto: arm64/aes-ce-cipher - move assembler code to .S file
Most crypto drivers involving kernel mode NEON take care to put the code
that actually touches the NEON register file in a separate compilation
unit, to prevent the compiler from reordering code that preserves or
restores the NEON context with code that may corrupt it. This is
necessary because we currently have no way to express the restrictions
imposed upon use of the NEON in kernel mode in a way that the compiler
understands.
However, in the case of aes-ce-cipher, it did not seem unreasonable to
deviate from this rule, given how it does not seem possible for the
compiler to reorder cross object function calls with asm blocks whose
in- and output constraints reflect that it reads from and writes to
memory.
Now that LTO is being proposed for the arm64 kernel, it is time to
revisit this. The link time optimization may replace the function
calls to kernel_neon_begin() and kernel_neon_end() with instantiations
of the IR that make up its implementation, allowing further reordering
with the asm block.
So let's clean this up, and move the asm() blocks into a separate .S
file.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-By: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit
019cd46984d04703a39924178f503a98436ac0d7)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Ard Biesheuvel [Mon, 24 Jul 2017 10:28:11 +0000 (11:28 +0100)]
UPSTREAM: crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar code that can be invoked in that case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit
b8fb993a836cd432309410eadf083bbe9c0e9e9c)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Ard Biesheuvel [Mon, 24 Jul 2017 10:28:10 +0000 (11:28 +0100)]
UPSTREAM: crypto: arm64/aes-ce-cipher - match round key endianness with generic code
In order to be able to reuse the generic AES code as a fallback for
situations where the NEON may not be used, update the key handling
to match the byte order of the generic code: it stores round keys
as sequences of 32-bit quantities rather than streams of bytes, and
so our code needs to be updated to reflect that.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit
f402e3115e20b345bd6fbfcf463a506d958c7bf6)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Danny Lin [Thu, 11 Jul 2019 02:18:40 +0000 (19:18 -0700)]
Makefile: Use pipes rather than temporary files for intermediate steps
GCC supports the use of pipes for intermediate compilation steps (e.g.
passing the generated assembly code to the assembler) as a replacement for
temporary files. This bypasses VFS and other layers which can introduce
substantial amounts of overhead and instead redirects data directly
between processes.
The final product and generated code are unaffected. Memory usage while
compiling is slightly higher.
Tests showed a substantial reduction in build time when using GCC to
compile an x86 4.19 kernel:
Using temporary files in tmpfs: 2m41s
Using pipes: 2m36s
Similar benefits were observed with an Android arm64 4.9 kernel:
Using tmpfs: 5m34s
Using pipes: 4m33s
Enable the feature when possible (i.e. when the compiler supports it) to
speed up builds at effectively no cost for many setups, particularly
those with weaker CPUs.
Test: kernel compiles and boots
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
celtare21 [Wed, 6 Feb 2019 15:05:13 +0000 (15:05 +0000)]
msm8998: Pool msm-thermal every 200ms
Signed-off-by: celtare21 <celtare21@gmail.com>
Park Ju Hyung [Sun, 7 Jul 2019 17:43:37 +0000 (02:43 +0900)]
adreno: hardcode for A540v2
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
[wight554: adapted A630 change for A540v2]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
kdrag0n [Tue, 13 Nov 2018 05:43:38 +0000 (21:43 -0800)]
gpu: adreno: only compile Adreno 5xx driver
We won't be using this with an Adreno 3xx/4xx GPU.
Signed-off-by: kdrag0n <dragon@khronodragon.com>
[wight554: adapted 6xx change for 5xx]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Prasad Sodagudi [Mon, 24 Sep 2018 23:25:55 +0000 (16:25 -0700)]
kernel: time: Add delay after cpu_relax() in tight loops
Tight loops of spin_lock_irqsave() and spin_unlock_irqrestore()
in timer and hrtimer are causing scheduling delays. Add delay of
few nano seconds after cpu_relax in the timer/hrtimer tight loops.
Change-Id: Iaa0ab92da93f7b245b1d922b6edca2bebdc0fbce
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
[wight554: fix conflicts in kernel/time/{hrtimer.c,tick-internal.h}]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Liam Mark [Mon, 7 Jan 2019 20:07:25 +0000 (12:07 -0800)]
ion: ensure prefetch/drain requests are processed in order
Currently when there is an ION prefetch or drain request we are adding
them to the start of the prefetch_list list.
This can cause the requests to be processed in the wrong order, for
example if there is a prefetch request made followed by a drain request
(before the prefetch request has been processed) the the drain request
will be handled first.
Ensure prefetch and drain requests are processed in order by appending
requests to the end of the prefetch_list list
Change-Id: I935a0d69a52267e888e9b19e1e8c0d9bd68e295b
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Prakash Kamliya [Thu, 27 Dec 2018 13:06:08 +0000 (18:36 +0530)]
msm: kgsl: Relax adreno spin idle tight loop
Tight loop of adreno_spin_idle() causing RT throttling.
Relax the tight loop by giving chance to other thread.
Change-Id: Ic23d4551c0cc0b5f2fa7844ca73444d1412d480c
Signed-off-by: Prakash Kamliya <pkamliya@codeaurora.org>
Rick Yiu [Wed, 26 Sep 2018 08:45:50 +0000 (16:45 +0800)]
ANDROID: block/cfq-iosched: make group_idle per io cgroup tunable
If group_idle is made per io cgroup tunable, it gives more flexibility
in tuning the performance of each group. If no value is set, it will
just use the original default value.
Bug:
117857342
Bug:
132282125
Test: values could be set to each group correctly
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I9aba172419f1819f459e8305b909630fa8305978
Bart Van Assche [Tue, 7 Aug 2018 23:17:29 +0000 (16:17 -0700)]
cfq: Suppress compiler warnings about comparisons
This patch does not change any functionality but avoids that gcc
reports the following warnings when building with W=1:
block/cfq-iosched.c: In function ?cfq_back_seek_max_store?:
block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
if (__data < (MIN)) \
^
block/cfq-iosched.c:4756:1: note: in expansion of macro ?STORE_FUNCTION?
STORE_FUNCTION(cfq_back_seek_max_store, &cfqd->cfq_back_max, 0, UINT_MAX, 0);
^~~~~~~~~~~~~~
block/cfq-iosched.c: In function ?cfq_slice_idle_store?:
block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
if (__data < (MIN)) \
^
block/cfq-iosched.c:4759:1: note: in expansion of macro ?STORE_FUNCTION?
STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1);
^~~~~~~~~~~~~~
block/cfq-iosched.c: In function ?cfq_group_idle_store?:
block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
if (__data < (MIN)) \
^
block/cfq-iosched.c:4760:1: note: in expansion of macro ?STORE_FUNCTION?
STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1);
^~~~~~~~~~~~~~
block/cfq-iosched.c: In function ?cfq_low_latency_store?:
block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
if (__data < (MIN)) \
^
block/cfq-iosched.c:4765:1: note: in expansion of macro ?STORE_FUNCTION?
STORE_FUNCTION(cfq_low_latency_store, &cfqd->cfq_latency, 0, 1, 0);
^~~~~~~~~~~~~~
block/cfq-iosched.c: In function ?cfq_slice_idle_us_store?:
block/cfq-iosched.c:4775:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
if (__data < (MIN)) \
^
block/cfq-iosched.c:4782:1: note: in expansion of macro ?USEC_STORE_FUNCTION?
USEC_STORE_FUNCTION(cfq_slice_idle_us_store, &cfqd->cfq_slice_idle, 0, UINT_MAX);
^~~~~~~~~~~~~~~~~~~
block/cfq-iosched.c: In function ?cfq_group_idle_us_store?:
block/cfq-iosched.c:4775:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
if (__data < (MIN)) \
^
block/cfq-iosched.c:4783:1: note: in expansion of macro ?USEC_STORE_FUNCTION?
USEC_STORE_FUNCTION(cfq_group_idle_us_store, &cfqd->cfq_group_idle, 0, UINT_MAX);
^~~~~~~~~~~~~~~~~~~
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[wight554: fix 4.4 backport conflicts]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Bart Van Assche [Tue, 7 Aug 2018 23:17:28 +0000 (16:17 -0700)]
cfq: Annotate fall-through in a switch statement
This patch avoids that gcc complains about fall-through when building
with W=1.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hou Tao [Wed, 1 Mar 2017 01:02:33 +0000 (09:02 +0800)]
cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
When adding a cfq_group into the cfq service tree, we use CFQ_IDLE_DELAY
as the delay of cfq_group's vdisktime if there have been other cfq_groups
already.
When cfq is under iops mode, commit
9a7f38c42c2b ("cfq-iosched: Convert
from jiffies to nanoseconds") could result in a large iops delay and
lead to an abnormal io schedule delay for the added cfq_group. To fix
it, we just need to revert to the old CFQ_IDLE_DELAY value: HZ / 5
when iops mode is enabled.
Despite having the same value, the delay of a cfq_queue in idle class
and the delay of cfq_group are different things, so I define two new
macros for the delay of a cfq_group under time-slice mode and iops mode.
Fixes:
9a7f38c42c2b ("cfq-iosched: Convert from jiffies to nanoseconds")
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Matthias Kaehlcke [Fri, 26 May 2017 21:22:37 +0000 (14:22 -0700)]
cfq-iosched: Delete unused function min_vdisktime()
This fixes the following warning when building with clang:
block/cfq-iosched.c:970:19: error: unused function 'min_vdisktime'
[-Werror,-Wunused-function]
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Markus Elfring [Sat, 21 Jan 2017 21:44:07 +0000 (22:44 +0100)]
cfq-iosched: Adjust one function call together with a variable assignment
The script "checkpatch.pl" pointed information out like the following.
ERROR: do not use assignment in if condition
Thus fix the affected source code place.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
Alexander Potapenko [Mon, 23 Jan 2017 14:06:43 +0000 (15:06 +0100)]
block: Initialize cfqq->ioprio_class in cfq_get_queue()
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in cfq_init_cfqq():
==================================================================
BUG: KMSAN: use of unitialized memory
...
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<
ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
[<
ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:?
[<
ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:?
[< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
[<
ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
...
origin:
[<
ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
[<
ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:?
[<
ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:?
[< inline >] allocate_slab mm/slub.c:1627
[<
ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641
[< inline >] new_slab_objects mm/slub.c:2407
[<
ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
[< inline >] __slab_alloc mm/slub.c:2606
[< inline >] slab_alloc_node mm/slub.c:2669
[<
ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
[<
ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
...
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)
The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
before it's initialized.
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 28 Jun 2016 07:04:02 +0000 (09:04 +0200)]
cfq-iosched: Charge at least 1 jiffie instead of 1 ns
Commit
9a7f38c42c2b (cfq-iosched: Convert from jiffies to nanoseconds)
could result in charging just 1 ns to a cgroup submitting IO instead of 1
jiffie we always charged before. It is arguable what is the right amount
to change but for now lets retain the old behavior of always charging at
least one jiffie.
Fixes:
9a7f38c42c2b92391d9dabaf9f51df7cfe5608e4
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 28 Jun 2016 07:04:01 +0000 (09:04 +0200)]
cfq-iosched: Fix regression in bonnie++ rewrite performance
Commit
9a7f38c42c2 (cfq-iosched: Convert from jiffies to nanoseconds)
broke the condition for detecting starved sync IO in
cfq_completed_request() because rq->start_time remained in jiffies but
we compared it with nanosecond values. This manifested as a regression
in bonnie++ rewrite performance because we always ended up considering
sync IO starved and thus never increased async IO queue depth.
Since rq->start_time is used in a lot of places, converting it to ns
values would be non-trivial. So just revert the condition in CFQ to use
comparison with jiffies. This will lead to suboptimal results if
cfq_fifo_expire[1] will ever come close to 1 jiffie but so far we are
relatively far from that with the storage used with CFQ (the default
value is 128 ms).
Fixes:
9a7f38c42c2b92391d9dabaf9f51df7cfe5608e4
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 28 Jun 2016 07:04:00 +0000 (09:04 +0200)]
cfq-iosched: Convert slice_resid from u64 to s64
slice_resid can be both positive and negative. Commit
9a7f38c42c2b
(cfq-iosched: Convert from jiffies to nanoseconds) converted it from
long to u64. Although this did not introduce any functional regression
(the operations just overflow and the result was fine), it is certainly
wrong and could cause issues in future. So convert the type to more
appropriate s64.
Fixes:
9a7f38c42c2b92391d9dabaf9f51df7cfe5608e4
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Wed, 8 Jun 2016 13:11:39 +0000 (15:11 +0200)]
cfq-iosched: Convert to use highres timers
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jeff Moyer [Wed, 8 Jun 2016 13:11:38 +0000 (15:11 +0200)]
cfq-iosched: Expose microsecond interfaces
Expose interfaces to tune time slices of CFQ IO scheduler in
microseconds.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jeff Moyer [Wed, 8 Jun 2016 14:55:34 +0000 (08:55 -0600)]
cfq-iosched: Convert from jiffies to nanoseconds
Convert all time-keeping in CFQ IO scheduler from jiffies to nanoseconds
so that we can later make the intervals more fine-grained than jiffies.
One jiffie is several miliseconds and even for today's rotating disks
that is a noticeable amount of time and thus we leave disk unnecessarily
idle.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
[wight554: fix backport conflict]
Signed-off-by: Volodymyr Zhdanov <wight554@gmail.com>
Jan Kara [Tue, 12 Jan 2016 15:24:19 +0000 (16:24 +0100)]
cfq-iosched: Allow parent cgroup to preempt its child
Currently we don't allow sync workload of one cgroup to preempt sync
workload of any other cgroup. This is because we want to achieve service
separation between cgroups. However in cases where cgroup preempting is
ancestor of the current cgroup, there is no need of separation and
idling introduces unnecessary overhead. This hurts for example the case
when workload is isolated within a cgroup but journalling threads are in
root cgroup. Simple way to demostrate the issue is using:
dbench4 -c /usr/share/dbench4/client.txt -t 10 -D /mnt 1
on ext4 filesystem on plain SATA drive (mounted with barrier=0 to make
difference more visible). When all processes are in the root cgroup,
reported throughput is 153.132 MB/sec. When dbench process gets its own
blkio cgroup, reported throughput drops to 26.1006 MB/sec.
Fix the problem by making check in cfq_should_preempt() more benevolent
and allow preemption by ancestor cgroup. This improves the throughput
reported by dbench4 to 48.9106 MB/sec.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 12 Jan 2016 15:24:17 +0000 (16:24 +0100)]
cfq-iosched: Allow sync noidle workloads to preempt each other
The original idea with preemption of sync noidle queues (introduced in
commit
718eee0579b8 "cfq-iosched: fairness for sync no-idle queues") was
that we service all sync noidle queues together, we don't idle on any of
the queues individually and we idle only if there is no sync noidle
queue to be served. This intention also matches the original test:
if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD
&& new_cfqq->service_tree == cfqq->service_tree)
return true;
However since at that time cfqq->service_tree was not set for idling
queues, this test was unreliable and was replaced in commit
e4a229196a7c
"cfq-iosched: fix no-idle preemption logic" by:
if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD &&
cfqq_type(new_cfqq) == SYNC_NOIDLE_WORKLOAD &&
new_cfqq->service_tree->count == 1)
return true;
That was a reliable test but was actually doing something different -
now we preempt sync noidle queue only if the new queue is the only one
busy in the service tree.
These days cfq queue is kept in service tree even if it is idling and
thus the original check would be safe again. But since we actually check
that cfq queues are in the same cgroup, of the same priority class and
workload type (sync noidle), we know that new_cfqq is fine to preempt
cfqq. So just remove the service tree check.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 12 Jan 2016 15:24:16 +0000 (16:24 +0100)]
cfq-iosched: Reorder checks in cfq_should_preempt()
Move check for preemption by rt class up. There is no functional change
but it makes arguing about conditions simpler since we can be sure both
cfq queues are from the same ioprio class.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Park Ju Hyung [Tue, 26 Sep 2017 01:51:24 +0000 (10:51 +0900)]
f2fs: set ioprio of GC kthread to idle
GC should run conservatively as possible to reduce latency spikes to the user.
Setting ioprio to idle class will allow the kernel to schedule GC thread's I/O
to not affect any other processes' I/O requests.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
wloot [Wed, 10 Apr 2019 19:29:10 +0000 (03:29 +0800)]
gpu: kgsl: Make max_gpuclk no op
Tyler Nijmeh [Sun, 14 Oct 2018 18:17:44 +0000 (11:17 -0700)]
cpuidle: Make MENU cpuidle governor optional
Qualcomm devices actually use LPM-based governors (QCOM), therefore this
is a useless driver to have.
Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
wloot [Thu, 2 May 2019 15:04:36 +0000 (23:04 +0800)]
zram: Set default compressor to lz4
Park Ju Hyung [Sat, 13 Apr 2019 10:00:11 +0000 (19:00 +0900)]
dsp: asm: improve misleading logs
On case ASM_STREAM_CMD_OPEN_WRITE_COMPRESSED, payload size error log
might be shown when it's actually correct due to (payload[1] != 0)
check failing. Fix this.
Also, add missing new line and change size error logs to debug type
if checks are done solely to print debug messages.
Commit
979f3d57b439 ("dsp: asm: validate payload size before access")
introduced these payload size check logs.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Park Ju Hyung [Wed, 3 Apr 2019 04:23:31 +0000 (13:23 +0900)]
writeback: hardcode dirty_expire_centisecs=3000 (30s)
https://android-review.googlesource.com/c/platform/system/core/+/938362
Hardcode this and make /proc/sys/vm/dirty_expire_centisecs a no-op.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Jackie Liu [Tue, 4 Dec 2018 01:43:23 +0000 (09:43 +0800)]
arm64: crypto: add NEON accelerated XOR implementation
This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:
[ 93.837726] xor: measuring software checksum speed
[ 93.874039] 8regs : 7123.200 MB/sec
[ 93.914038] 32regs : 7180.300 MB/sec
[ 93.954043] arm64_neon: 9856.000 MB/sec
[ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec)
I believe this code can bring some optimization for
all arm64 platform. thanks for Ard Biesheuvel's suggestions.
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Park Ju Hyung [Wed, 13 Mar 2019 05:08:35 +0000 (14:08 +0900)]
arm64: crc32: always assume ARM64_HAS_CRC32
Our alternative framework is not ready for this.
Just hardcode this in.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Miguel Ojeda [Thu, 24 Jan 2019 14:59:11 +0000 (15:59 +0100)]
lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers here because crc32_le_base/__crc32c_le_base
aren't __pure while their target crc32_le/__crc32c_le are.
These aliases are used by architectures as a fallback in accelerated
versions of CRC32. See commit
9784d82db3eb ("lib/crc32: make core crc32()
routines weak so they can be overridden").
Therefore, being fallbacks, it is likely that even if the aliases
were called from C, there wouldn't be any optimizations possible.
Currently, the only user is arm64, which calls this from asm.
Still, marking the aliases as __pure makes sense and is a good idea
for documentation purposes and possible future optimizations,
which also silences the warning.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Ard Biesheuvel [Mon, 27 Aug 2018 11:02:42 +0000 (13:02 +0200)]
lib/crc32: make core crc32() routines weak so they can be overridden
Allow architectures to drop in accelerated CRC32 routines by making
the crc32_le/__crc32c_le entry points weak, and exposing non-weak
aliases for them that may be used by the accelerated versions as
fallbacks in case the instructions they rely upon are not available.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel [Tue, 27 Nov 2018 17:42:55 +0000 (18:42 +0100)]
arm64/lib: improve CRC32 performance for deep pipelines
Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.
Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.
Tested using the following test program:
#include <stdlib.h>
extern void crc32_le(unsigned short, char const*, int);
int main(void)
{
static const char buf[4096];
srand(
20181126);
for (int i = 0; i < 100 * 1000 * 1000; i++)
crc32_le(0, buf, rand() % 1024);
return 0;
}
On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from
$ time ./crc32
real 0m10.149s
user 0m10.149s
sys 0m0.000s
to
$ time ./crc32
real 0m7.915s
user 0m7.915s
sys 0m0.000s
Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Mon, 27 Aug 2018 11:02:44 +0000 (13:02 +0200)]
arm64/lib: add accelerated crc32 routines
Unlike crc32c(), which is wired up to the crypto API internally so the
optimal driver is selected based on the platform's capabilities,
crc32_le() is implemented as a library function using a slice-by-8 table
based C implementation. Even though few of the call sites may be
bottlenecks, calling a time variant implementation with a non-negligible
D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates
support for the CRC32 instructions that were optional in ARMv8.0, but are
already widely available, even on the Cortex-A53 based Raspberry Pi.
So implement routines that use these instructions if available, and fall
back to the existing generic routines otherwise. The selection is based
on alternatives patching.
Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since
CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE,
this just codifies the status quo.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Rutland [Fri, 27 Apr 2018 10:50:36 +0000 (11:50 +0100)]
arm64: avoid instrumenting atomic_ll_sc.o
Our out-of-line atomics are built with a special calling convention,
preventing pointless stack spilling, and allowing us to patch call sites
with ARMv8.1 atomic instructions.
Instrumentation inserted by the compiler may result in calls to
functions not following this special calling convention, resulting in
registers being unexpectedly clobbered, and various problems resulting
from this.
For example, if a kernel is built with KCOV and ARM64_LSE_ATOMICS, the
compiler inserts calls to __sanitizer_cov_trace_pc in the prologues of
the atomic functions. This has been observed to result in spurious
cmpxchg failures, leading to a hang early on in the boot process.
This patch avoids such issues by preventing instrumentation of our
out-of-line atomics.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Fri, 9 Feb 2018 13:19:47 +0000 (13:19 +0000)]
arm64: lse: Pass -fomit-frame-pointer to out-of-line ll/sc atomics
In cases where x30 is used as a temporary in the out-of-line ll/sc atomics
(e.g. atomic_fetch_add), the compiler tends to put out a full stackframe,
which included pointing the x29 at the new frame.
Since these things aren't traceable anyway, we can pass -fomit-frame-pointer
to reduce the work when spilling. Since this is incompatible with -pg, we
also remove that from the CFLAGS for this file.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Jason A. Donenfeld [Tue, 7 Nov 2017 02:24:04 +0000 (11:24 +0900)]
arm64: make label allocation style consistent in tishift
This is entirely cosmetic, but somehow it was missed when sending
differing versions of this patch. This just makes the file a bit more
uniform.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jason A. Donenfeld [Tue, 7 Nov 2017 01:49:54 +0000 (01:49 +0000)]
arm64: Implement __lshrti3 library function
Commit
fb8722735f50 ("arm64: support __int128 on gcc 5+") added support
for the __int128 data type, but this breaks the build in some configurations
where GCC ends up emitting calls to the __lshrti3 helper in libgcc, which
results in a link error:
kernel/sched/fair.o: In function `__calc_delta':
fair.c:(.text+0xca0): undefined reference to `__lshrti3'
kernel/time/timekeeping.o: In function `timekeeping_resume':
timekeeping.c:(.text+0x3f60): undefined reference to `__lshrti3'
make: *** [vmlinux] Error 1
Fix the build by providing an implementation of __lshrti3, like we do
already for __ashlti3 and __ashrti3.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Jason A. Donenfeld [Fri, 3 Nov 2017 14:18:58 +0000 (15:18 +0100)]
arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.
The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=
d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.
There are still a few instructions, __ashlti3 and __ashrti3, which
require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Julien Thierry [Fri, 13 Oct 2017 13:32:56 +0000 (14:32 +0100)]
arm64: use WFE for long delays
The current delay implementation uses the yield instruction, which is a
hint that it is beneficial to schedule another thread. As this is a hint,
it may be implemented as a NOP, causing all delays to be busy loops. This
is the case for many existing CPUs.
Taking advantage of the generic timer sending periodic events to all
cores, we can use WFE during delays to reduce power consumption. This is
beneficial only for delays longer than the period of the timer event
stream.
If timer event stream is not enabled, delays will behave as yield/busy
loops.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Julien Thierry [Fri, 13 Oct 2017 13:32:55 +0000 (14:32 +0100)]
arm_arch_timer: Expose event stream status
The arch timer configuration for a CPU might get reset after suspending
said CPU.
In order to reliably use the event stream in the kernel (e.g. for delays),
we keep track of the state where we can safely consider the event stream as
properly configured. After writing to cntkctl, we issue an ISB to ensure
that subsequent delay loops can rely on the event stream being enabled.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Tue, 4 Apr 2017 16:05:16 +0000 (17:05 +0100)]
arm64: arch_timer: Save cntkctl_el1 as a per-cpu variable
As we're about to allow per CPU cntkctl_el1 configuration, we cannot
rely on the register value to be common when performing power
management.
Let's turn saved_cntkctl into a per-cpu variable.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Park Ju Hyung [Wed, 13 Mar 2019 02:30:02 +0000 (11:30 +0900)]
Revert "PM / Suspend: Print wall time at suspend entry and exit"
This reverts commit
b9acbfee678bf41939a9b0bbe09281e53f8ac11a.
This is an expensive logging and not all that useful as it could get skewed.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Park Ju Hyung [Fri, 8 Mar 2019 14:02:21 +0000 (23:02 +0900)]
blk: disable IO_STAT completely
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
kdrag0n [Sun, 23 Dec 2018 06:36:19 +0000 (22:36 -0800)]
block: disable I/O stats accounting by default
While Android userspace (e.g. storaged) does use iostats via
/proc/diskstats, init will explicitly enable iostats for the devices on
which it is primarily used - sda and sdf. Avoid the 0.5-1% overhead for
block devices that do not need it.
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:36 +0000 (16:14 -0700)]
rbtree: cache leftmost node internally
Commit
cd9e61ed1eebbcd5dfad59475d41ec58d9b64b6a upstream.
Patch series "rbtree: Cache leftmost node internally", v4.
A series to extending rbtrees to internally cache the leftmost node such
that we can have fast overlap check optimization for all interval tree
users[1]. The benefits of this series are that:
(i) Unify users that do internal leftmost node caching.
(ii) Optimize all interval tree users.
(iii) Convert at least two new users (epoll and procfs) to the new interface.
This patch (of 16):
Red-black tree semantics imply that nodes with smaller or greater (or
equal for duplicates) keys always be to the left and right,
respectively. For the kernel this is extremely evident when considering
our rb_first() semantics. Enabling lookups for the smallest node in the
tree in O(1) can save a good chunk of cycles in not having to walk down
the tree each time. To this end there are a few core users that
explicitly do this, such as the scheduler and rtmutexes. There is also
the desire for interval trees to have this optimization allowing faster
overlap checking.
This patch introduces a new 'struct rb_root_cached' which is just the
root with a cached pointer to the leftmost node. The reason why the
regular rb_root was not extended instead of adding a new structure was
that this allows the user to have the choice between memory footprint
and actual tree performance. The new wrappers on top of the regular
rb_root calls are:
- rb_first_cached(cached_root) -- which is a fast replacement
for rb_first.
- rb_insert_color_cached(node, cached_root, new)
- rb_erase_cached(node, cached_root)
In addition, augmented cached interfaces are also added for basic
insertion and deletion operations; which becomes important for the
interval tree changes.
With the exception of the inserts, which adds a bool for updating the
new leftmost, the interfaces are kept the same. To this end, porting rb
users to the cached version becomes really trivial, and keeping current
rbtree semantics for users that don't care about the optimization
requires zero overhead.
Change-Id: I17f7605dfc2f797f6a5ec24693871ffb89505de6
Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Harsh Shandilya <harsh@prjkt.io>
Signed-off-by: mydongistiny <jaysonedson@gmail.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
kdrag0n [Thu, 27 Sep 2018 00:21:12 +0000 (17:21 -0700)]
mm: swap: swap pages one at a time
According to Google, this is optimal.
"By default, the Linux kernel swaps in 8 pages of memory at a time. When
using ZRAM, the incremental cost of reading 1 page at a time is
negligible and may help in case the device is under extreme memory
pressure."
Source: https://source.android.com/devices/tech/perf/low-ram
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:42 +0000 (16:14 -0700)]
rbtree: add some additional comments for rebalancing cases
While overall the code is very nicely commented, it might not be
immediately obvious from the diagrams what is going on. Add a very
brief summary of each case. Opposite cases where the node is the left
child are left untouched.
Link: http://lkml.kernel.org/r/20170719014603.19029-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
35dc67d7d922b2c9a1adb006c7a0f370eeb5c114)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:39 +0000 (16:14 -0700)]
rbtree: optimize root-check during rebalancing loop
The only times the nil-parent (root node) condition is true is when the
node is the first in the tree, or after fixing rbtree rule #4 and the
case 1 rebalancing made the node the root. Such conditions do not apply
most of the time:
(i) The common case in an rbtree is to have more than a single node,
so this is only true for the first rb_insert().
(ii) While there is a chance only one first rotation is needed, cases
where the node's uncle is black (cases 2,3) are more common as we can
have the following scenarios during the rotation looping:
case1 only, case1+1, case2+3, case1+2+3, case3 only, etc.
This patch, therefore, adds an unlikely() optimization to this
conditional. When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES, a
kernel build shows that the incorrect rate is less than 15%, and for
workloads that involve insert mostly trees overtime tend to have less
than 2% incorrect rate.
Link: http://lkml.kernel.org/r/20170719014603.19029-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
2aadf7fc7df9e70c99786ffb8452ccdd83d49e59)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: khusika <khusikadhamar@gmail.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Gaurav Jindal [Thu, 14 Jul 2016 12:04:20 +0000 (12:04 +0000)]
tick/nohz: Optimize nohz idle enter
tick_nohz_start_idle is called before checking whether the idle tick can be
stopped. If the tick cannot be stopped, calling tick_nohz_start_idle() is
pointless and just wasting CPU cycles.
Only invoke tick_nohz_start_idle() when can_stop_idle_tick() returns true. A
short one minute observation of the effect on ARM64 shows a reduction of calls
by 1.5% thus optimizing the idle entry sequence.
[tglx: Massaged changelog ]
Co-developed-by: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com>
Signed-off-by: Gaurav Jindal<gaurav.jindal@spreadtrum.com>
Link: http://lkml.kernel.org/r/20160714120416.GB21099@gaurav.jindal@spreadtrum.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Eric Dumazet [Tue, 16 May 2017 21:00:11 +0000 (14:00 -0700)]
tcp_westwood: use tcp_jiffies32 instead of tcp_time_stamp
This CC does not need 1 ms tcp_time_stamp and can use
the jiffy based 'timestamp'.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: ahmedradaideh <ahmed.radaideh@gmail.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Eric Dumazet [Tue, 16 May 2017 21:00:01 +0000 (14:00 -0700)]
tcp: introduce tcp_jiffies32
We abuse tcp_time_stamp for two different cases :
1) base to generate TCP Timestamp options (RFC 7323)
2) A 32bit version of jiffies since some TCP fields
are 32bit wide to save memory.
Since we want in the future to have 1ms TCP TS clock,
regardless of HZ value, we want to cleanup things.
tcp_jiffies32 is the truncated jiffies value,
which will be used only in places where we want a 'host'
timestamp.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: ahmedradaideh <ahmed.radaideh@gmail.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Arnd Bergmann [Tue, 10 Jul 2018 15:16:27 +0000 (17:16 +0200)]
arm64: make flatmem depend on !NUMA
Building without NUMA but with FLATMEM results in a link error
because mem_map[] is not available:
aarch64-linux-ld -EB -maarch64elfb --no-undefined -X -pie -shared -Bsymbolic --no-apply-dynamic-relocs --build-id -o .tmp_vmlinux1 -T ./arch/arm64/kernel/vmlinux.lds --whole-archive built-in.a --no-whole-archive --start-group arch/arm64/lib/lib.a lib/lib.a --end-group
init/do_mounts.o: In function `mount_block_root':
do_mounts.c:(.init.text+0x1e8): undefined reference to `mem_map'
arch/arm64/kernel/vdso.o: In function `vdso_init':
vdso.c:(.init.text+0xb4): undefined reference to `mem_map'
This uses the same trick as the other architectures, making flatmem
depend on !NUMA to avoid the broken configuration.
Fixes:
e7d4bac428ed ("arm64: add ARM64-specific support for flatmem")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Nikunj Kela [Fri, 6 Jul 2018 17:47:24 +0000 (10:47 -0700)]
arm64: add ARM64-specific support for flatmem
Flatmem is useful in reducing kernel memory usage.
One usecase is in kdump kernel. We are able to save
~14M by moving to flatmem scheme.
Cc: xe-kernel@external.cisco.com
Cc: Nikunj Kela <nkela@cisco.com>
Signed-off-by: Nikunj Kela <nkela@cisco.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Arnd Bergmann [Mon, 15 Jan 2018 16:07:22 +0000 (17:07 +0100)]
crypto: aes-generic - fix aes-generic regression on powerpc
My last bugfix added -Os on the command line, which unfortunately caused
a build regression on powerpc in some configurations.
I've done some more analysis of the original problem and found slightly
different workaround that avoids this regression and also results in
better performance on gcc-7.0: -fcode-hoisting is an optimization step
that got added in gcc-7 and that for all gcc-7 versions causes worse
performance.
This disables -fcode-hoisting on all compilers that understand the option.
For gcc-7.1 and 7.2 I found the same performance as my previous patch
(using -Os), in gcc-7.0 it was even better. On gcc-8 I could see no
change in performance from this patch. In theory, code hoisting should
not be able make things better for the AES cipher, so leaving it
disabled for gcc-8 only serves to simplify the Makefile change.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Link: https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg30418.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Fixes:
148b974deea9 ("crypto: aes-generic - build with -Os on gcc-7+")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arnd Bergmann [Wed, 3 Jan 2018 22:39:27 +0000 (23:39 +0100)]
crypto: aes-generic - build with -Os on gcc-7+
While testing other changes, I discovered that gcc-7.2.1 produces badly
optimized code for aes_encrypt/aes_decrypt. This is especially true when
CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely
large stack usage that in turn might cause kernel stack overflows:
crypto/aes_generic.c: In function 'aes_encrypt':
crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=]
crypto/aes_generic.c: In function 'aes_decrypt':
crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
I verified that this problem exists on all architectures that are
supported by gcc-7.2, though arm64 in particular is less affected than
the others. I also found that gcc-7.1 and gcc-8 do not show the extreme
stack usage but still produce worse code than earlier versions for this
file, apparently because of optimization passes that generally provide
a substantial improvement in object code quality but understandably fail
to find any shortcuts in the AES algorithm.
Possible workarounds include
a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier
patch I tried, which reliably fixed the stack usage, but caused a
serious performance regression in some versions, as later testing
found.
b) disabling UBSAN on this file or all ciphers, as suggested by Ard
Biesheuvel. This would lead to massively better crypto performance in
UBSAN-enabled kernels and avoid the stack usage, but there is a concern
over whether we should exclude arbitrary files from UBSAN at all.
c) Forcing the optimization level in a different way. Similar to a),
but rather than deselecting specific optimization stages,
this now uses "gcc -Os" for this file, regardless of the
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable
workaround for the stack consumption on all architecture, and I've
retested the performance results now on x86, cycles/byte (lower is
better) for cbc(aes-generic) with 256 bit keys:
-O2 -Os
gcc-6.3.1 14.9 15.1
gcc-7.0.1 14.7 15.3
gcc-7.1.1 15.3 14.7
gcc-7.2.1 16.8 15.9
gcc-8.0.0 15.5 15.6
This implements the option c) by enabling forcing -Os on all compiler
versions starting with gcc-7.1. As a workaround for PR83356, it would
only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows
better performance on gcc-7.1 without UBSAN, it seems appropriate to
use the faster version here as well.
Side note: during testing, I also played with the AES code in libressl,
which had a similar performance regression from gcc-6 to gcc-7.2,
but was three times slower overall. It might be interesting to
investigate that further and possibly port the Linux implementation
into that.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Cc: Richard Biener <rguenther@suse.de>
Cc: Jakub Jelinek <jakub@gcc.gnu.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arjan van de Ven [Sat, 29 Apr 2017 22:24:34 +0000 (22:24 +0000)]
kernel: time: reduce ntp wakeups
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Gao Xiang [Tue, 30 Oct 2018 22:07:28 +0000 (15:07 -0700)]
lib/lz4: update LZ4 decompressor module
Update the LZ4 compression module based on LZ4 v1.8.3 in order for the
erofs file system to use the newest LZ4_decompress_safe_partial() which
can now decode exactly the nb of bytes requested [1] to take place of the
open hacked code in the erofs file system itself.
Currently, apart from the erofs file system, no other users use
LZ4_decompress_safe_partial, so no worry about the interface.
In addition, LZ4 v1.8.x boosts up decompression speed compared to the
current code which is based on LZ4 v1.7.3, mainly due to shortcut
optimization for the specific common LZ4-sequences [2].
lzbench testdata (tested in kirin710, 8 cores, 4 big cores
at 2189Mhz, 2GB DDR RAM at 1622Mhz, with enwik8 testdata [3]):
Compressor name Compress. Decompress. Compr. size Ratio Filename
memcpy 5004 MB/s 4924 MB/s
100000000 100.00 enwik8
lz4hc 1.7.3 -9 12 MB/s 653 MB/s
42203253 42.20 enwik8
lz4hc 1.8.0 -9 12 MB/s 908 MB/s
42203096 42.20 enwik8
lz4hc 1.8.3 -9 11 MB/s 965 MB/s
42203094 42.20 enwik8
[1] https://github.com/lz4/lz4/issues/566
https://github.com/lz4/lz4/commit/
08d347b5b217b011ff7487130b79480d8cfdaeb8
[2] v1.8.1 perf: slightly faster compression and decompression speed
https://github.com/lz4/lz4/commit/
a31b7058cb97e4393da55e78a77a1c6f0c9ae038
v1.8.2 perf: slightly faster HC compression and decompression speed
https://github.com/lz4/lz4/commit/
45f8603aae389d34c689d3ff7427b314071ccd2c
https://github.com/lz4/lz4/commit/
1a191b3f8d26b50a7c1d41590b529ec308d768cd
[3] http://mattmahoney.net/dc/textdata.html
http://mattmahoney.net/dc/enwik8.zip
Link: http://lkml.kernel.org/r/1537181207-21932-1-git-send-email-gaoxiang25@huawei.com
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Tested-by: Guo Xuenan <guoxuenan@huawei.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Fang Wei <fangwei1@huawei.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: <weidu.du@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Tue, 3 Oct 2017 23:16:01 +0000 (16:16 -0700)]
lib/lz4: make arrays static const, reduces object code size
Don't populate the read-only arrays dec32table and dec64table on the
stack, instead make them both static const. Makes the object code
smaller by over 10K bytes:
Before:
text data bss dec hex filename
31500 0 0 31500 7b0c lib/lz4/lz4_decompress.o
After:
text data bss dec hex filename
20237 176 0 20413 4fbd lib/lz4/lz4_decompress.o
(gcc version 7.2.0 x86_64)
Link: http://lkml.kernel.org/r/20170921221939.20820-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sven Schmidt [Fri, 24 Feb 2017 23:01:25 +0000 (15:01 -0800)]
lib/lz4: remove back-compat wrappers
Remove the functions introduced as wrappers for providing backwards
compatibility to the prior LZ4 version. They're not needed anymore
since there's no callers left.
Link: http://lkml.kernel.org/r/1486321748-19085-6-git-send-email-4sschmid@informatik.uni-hamburg.de
Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sven Schmidt [Fri, 24 Feb 2017 23:01:19 +0000 (15:01 -0800)]
crypto: change LZ4 modules to work with new LZ4 module version
Update the crypto modules using LZ4 compression as well as the test
cases in testmgr.h to work with the new LZ4 module version.
Link: http://lkml.kernel.org/r/1486321748-19085-4-git-send-email-4sschmid@informatik.uni-hamburg.de
Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sven Schmidt [Fri, 24 Feb 2017 23:01:16 +0000 (15:01 -0800)]
lib/decompress_unlz4: change module to work with new LZ4 module version
Update the unlz4 wrapper to work with the updated LZ4 kernel module
version.
Link: http://lkml.kernel.org/r/1486321748-19085-3-git-send-email-4sschmid@informatik.uni-hamburg.de
Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sven Schmidt [Fri, 24 Feb 2017 23:01:12 +0000 (15:01 -0800)]
lib: update LZ4 compressor module
Patch series "Update LZ4 compressor module", v7.
This patchset updates the LZ4 compression module to a version based on
LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
which provides an "acceleration" parameter as a tradeoff between high
compression ratio and high compression speed.
We want to use LZ4 fast in order to support compression in lustre and
(mostly, based on that) investigate data reduction techniques in behalf
of storage systems.
Also, it will be useful for other users of LZ4 compression, as with LZ4
fast it is possible to enable applications to use fast and/or high
compression depending on the usecase. For instance, ZRAM is offering a
LZ4 backend and could benefit from an updated LZ4 in the kernel.
LZ4 homepage: http://www.lz4.org/
LZ4 source repository: https://github.com/lz4/lz4 Source version: 1.7.3
Benchmark (taken from [1], Core i5-4300U @1.9GHz):
----------------|--------------|----------------|----------
Compressor | Compression | Decompression | Ratio
----------------|--------------|----------------|----------
memcpy | 4200 MB/s | 4200 MB/s | 1.000
LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375
LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607
LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886
LZ4 default | 385 MB/s | 1850 MB/s | 2.101
[1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
[PATCH 1/5] lib: Update LZ4 compressor module
[PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module version
[PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
[PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 version
[PATCH 5/5] lib/lz4: Remove back-compat wrappers
This patch (of 5):
Update the LZ4 kernel module to LZ4 v1.7.3 by Yann Collet. The kernel
module is inspired by the previous work by Chanho Min. The updated LZ4
module will not break existing code since the patchset contains
appropriate changes.
API changes:
New method LZ4_compress_fast which differs from the variant available in
kernel by the new acceleration parameter, allowing to trade compression
ratio for more compression speed and vice versa.
LZ4_decompress_fast is the respective decompression method, featuring a
very fast decoder (multiple GB/s per core), able to reach RAM speed in
multi-core systems. The decompressor allows to decompress data
compressed with LZ4 fast as well as the LZ4 HC (high compression)
algorithm.
Also the useful functions LZ4_decompress_safe_partial and
LZ4_compress_destsize were added. The latter reverses the logic by
trying to compress as much data as possible from source to dest while
the former aims to decompress partial blocks of data.
A bunch of streaming functions were also added which allow
compressig/decompressing data in multiple steps (so called "streaming
mode").
The methods lz4_compress and lz4_decompress_unknownoutputsize are now
known as LZ4_compress_default respectivley LZ4_decompress_safe. The old
methods will be removed since there's no callers left in the code.
[arnd@arndb.de: fix KERNEL_LZ4 support]
Link: http://lkml.kernel.org/r/20170208211946.2839649-1-arnd@arndb.de
[akpm@linux-foundation.org: simplify]
[akpm@linux-foundation.org: fix the simplification]
[4sschmid@informatik.uni-hamburg.de: fix performance regressions]
Link: http://lkml.kernel.org/r/1486898178-17125-2-git-send-email-4sschmid@informatik.uni-hamburg.de
[4sschmid@informatik.uni-hamburg.de: v8]
Link: http://lkml.kernel.org/r/1487182598-15351-2-git-send-email-4sschmid@informatik.uni-hamburg.de
Link: http://lkml.kernel.org/r/1486321748-19085-2-git-send-email-4sschmid@informatik.uni-hamburg.de
Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rui Salvaterra [Sat, 9 Apr 2016 21:05:35 +0000 (22:05 +0100)]
lib: lz4: cleanup unaligned access efficiency detection
These identifiers are bogus. The interested architectures should define
HAVE_EFFICIENT_UNALIGNED_ACCESS whenever relevant to do so. If this
isn't true for some arch, it should be fixed in the arch definition.
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bongkyu Kim [Wed, 20 Jan 2016 23:01:08 +0000 (15:01 -0800)]
lz4: fix wrong compress buffer size for 64-bits
The current lz4 compress buffer is 16kb on 32-bits, 32kb on 64-bits
system. But, lz4 needs only 16kb on both. On 64-bits, this causes
wasted cpu cycles for additional memset during every compression.
In case of lz4hc, the current buffer size is (256kb + 8) on 32-bits,
(512kb + 16) on 64-bits. But, lz4hc needs only (256kb + 2 * pointer) on
both.
This patch fixes these wrong compress buffer sizes for 64-bits.
Signed-off-by: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Chanho Min <chanho.min@lge.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>