OSDN Git Service

sagit-ice-cold/kernel_xiaomi_msm8998.git
4 years agoASoC: wcd9335: use power efficient workingqueues
Julian Liu [Sun, 27 Oct 2019 14:21:36 +0000 (22:21 +0800)]
ASoC: wcd9335: use power efficient workingqueues

4 years agomm, vmstat: Add likelihood labels to quiet_vmstat conditions
Danny Lin [Fri, 10 May 2019 05:33:48 +0000 (22:33 -0700)]
mm, vmstat: Add likelihood labels to quiet_vmstat conditions

These labels are based on observations from a running system as well as
from inspecting the code:

!delayed_work_pending:
  true  = 3509732
  false = 7495535

!need_update:
  true  = 6656251
  false = 840000

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
4 years agomm: vmstat: use power efficient workingqueues
Julian Liu [Sun, 27 Oct 2019 14:11:01 +0000 (22:11 +0800)]
mm: vmstat: use power efficient workingqueues

4 years agoplatform: ipa: use power efficient workingqueues
Julian Liu [Sun, 27 Oct 2019 14:09:20 +0000 (22:09 +0800)]
platform: ipa: use power efficient workingqueues

4 years agoRevert "f2fs: Tune issue discard commands up to 128MB"
Julian Liu [Sun, 27 Oct 2019 13:27:16 +0000 (21:27 +0800)]
Revert "f2fs: Tune issue discard commands up to 128MB"

* We have a good gc strategy, don't need this
This reverts commit 8439a3a642a927cfb1432b5c50c44eaaa76a4686.

4 years agomsm: kgsl: Use for_each_sg() macro where possible
Sultan Alsawaf [Fri, 25 Oct 2019 17:52:33 +0000 (10:52 -0700)]
msm: kgsl: Use for_each_sg() macro where possible

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
4 years agoqseecom: Use for_each_sg() macro where possible
Sultan Alsawaf [Fri, 25 Oct 2019 17:52:01 +0000 (10:52 -0700)]
qseecom: Use for_each_sg() macro where possible

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
4 years agocrypto: msm: Use for_each_sg() macro where possible
Sultan Alsawaf [Fri, 25 Oct 2019 17:51:18 +0000 (10:51 -0700)]
crypto: msm: Use for_each_sg() macro where possible

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
4 years agokernel: sched: Mitigate non-boosted tasks preempting boosted tasks
Miguel de Dios [Mon, 29 Apr 2019 23:09:33 +0000 (16:09 -0700)]
kernel: sched: Mitigate non-boosted tasks preempting boosted tasks

Currently when a boosted task is scheduled we use prefer_idle to try and
get it to an idle core. Once it's scheduled, there is a possibility we
can schedule a non-boosted task on the same core where the boosted task
is running on. This change aims to mitigate that possibility by checking
if the core we're targeting has a boosted task and if so, use the next
best idle core instead.

Bug: 131626264
Change-Id: I3d321e1c71f96526f55f7f3a56e32db411311aa2
Signed-off-by: Miguel de Dios <migueldedios@google.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: Julian Liu <wlootlxt123@gmail.com>
4 years agof2fs: gc_wake is unlikely as true
Julian Liu [Fri, 25 Oct 2019 13:26:04 +0000 (21:26 +0800)]
f2fs: gc_wake is unlikely as true

4 years agodefconfig: Disable UNMAP_KERNEL_AT_EL0
Julian Liu [Fri, 25 Oct 2019 12:26:21 +0000 (20:26 +0800)]
defconfig: Disable UNMAP_KERNEL_AT_EL0

4 years agof2fs: Tune background GC for Android
Julian Liu [Sun, 20 Oct 2019 17:29:25 +0000 (01:29 +0800)]
f2fs: Tune background GC for Android

4 years agof2fs: Tune issue discard commands up to 128MB
Julian Liu [Thu, 24 Oct 2019 22:02:31 +0000 (06:02 +0800)]
f2fs: Tune issue discard commands up to 128MB

https://android.googlesource.com/device/google/coral/+/refs/tags/android-10.0.0_r9/init.hardware.rc#550

4 years agopage-writeback: Hardcode dirty_background_ratio to 10
Julian Liu [Thu, 24 Oct 2019 21:10:49 +0000 (05:10 +0800)]
page-writeback: Hardcode dirty_background_ratio to 10

4 years agocpufreq: schedutil: tune default rate limit from Pixel
Julian Liu [Thu, 24 Oct 2019 21:04:58 +0000 (05:04 +0800)]
cpufreq: schedutil: tune default rate limit from Pixel

4 years agoarm64: Select ARCH_HAS_FAST_MULTIPLIER
Robin Murphy [Tue, 24 Apr 2018 15:25:47 +0000 (16:25 +0100)]
arm64: Select ARCH_HAS_FAST_MULTIPLIER

It is probably safe to assume that all Armv8-A implementations have a
multiplier whose efficiency is comparable or better than a sequence of
three or so register-dependent arithmetic instructions. Select
ARCH_HAS_FAST_MULTIPLIER to get ever-so-slightly nicer codegen in the
few dusty old corners which care.

In a contrived benchmark calling hweight64() in a loop, this does indeed
turn out to be a small win overall, with no measurable impact on
Cortex-A57 but about 5% performance improvement on Cortex-A53.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
4 years agosdcardfs: fix wrong ENOENT when creating a file
Jaegeuk Kim [Tue, 26 Jun 2018 06:00:24 +0000 (23:00 -0700)]
sdcardfs: fix wrong ENOENT when creating a file

There is subtle race condtion where lower_dentry is null. If we retry
lookup again, we should get the correct dentry.

Bug: 110585947
Bug: 110464178
Bug: 80587794
Bug: 37231161
Bug: 110199687
Change-Id: I39b95de4649b034287776f5c8a5d197b6ebd9ada
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
4 years agoARM: dts: msm8998: Recalibrate busy costs using better data
Sultan Alsawaf [Mon, 22 Apr 2019 09:03:46 +0000 (02:03 -0700)]
ARM: dts: msm8998: Recalibrate busy costs using better data

Power and performance were measured on a production wahoo device in the
kernel in a high-precision manner. Use the resulting data to create a
better energy model that reflects the actual behavior of production
hardware.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
wloot: Adapt to 98897c5c ("ARM: dts: msm8998: Remove few bottom frequencies of low energy efficiency")

4 years agoarm64/neon: Disable -Wincompatible-pointer-types when building with Clang
Nathan Chancellor [Fri, 15 Feb 2019 01:39:59 +0000 (18:39 -0700)]
arm64/neon: Disable -Wincompatible-pointer-types when building with Clang

After commit cc9f8349cb33 ("arm64: crypto: add NEON accelerated XOR
implementation"), Clang builds for arm64 started failing with the
following error message.

arch/arm64/lib/xor-neon.c:58:28: error: incompatible pointer types
assigning to 'const unsigned long *' from 'uint64_t *' (aka 'unsigned
long long *') [-Werror,-Wincompatible-pointer-types]
                v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 + 6));
                                         ^~~~~~~~
/usr/lib/llvm-9/lib/clang/9.0.0/include/arm_neon.h:7538:47: note:
expanded from macro 'vld1q_u64'
  __ret = (uint64x2_t) __builtin_neon_vld1q_v(__p0, 51); \
                                              ^~~~

There has been quite a bit of debate and triage that has gone into
figuring out what the proper fix is, viewable at the link below, which
is still ongoing. Ard suggested disabling this warning with Clang with a
pragma so no neon code will have this type of error. While this is not
at all an ideal solution, this build error is the only thing preventing
KernelCI from having successful arm64 defconfig and allmodconfig builds
on linux-next. Getting continuous integration running is more important
so new warnings/errors or boot failures can be caught and fixed quickly.

Link: https://github.com/ClangBuiltLinux/linux/issues/283
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Julian Liu <wlootlxt123@gmail.com>
4 years agoarm64/neon: add workaround for ambiguous C99 stdint.h types
Jackie Liu [Tue, 4 Dec 2018 01:43:22 +0000 (09:43 +0800)]
arm64/neon: add workaround for ambiguous C99 stdint.h types

In a way similar to ARM commit 09096f6a0ee2 ("ARM: 7822/1: add workaround
for ambiguous C99 stdint.h types"), this patch redefines the macros that
are used in stdint.h so its definitions of uint64_t and int64_t are
compatible with those of the kernel.

This patch comes from: https://patchwork.kernel.org/patch/3540001/
Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

We mark this file as a private file and don't have to override asm/types.h

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
4 years agoion: fix a possible memory leak in ion_cma_allocate
Suren Baghdasaryan [Thu, 18 Apr 2019 19:42:44 +0000 (12:42 -0700)]
ion: fix a possible memory leak in ion_cma_allocate

The memory leak occurs when kmalloc() for info->table fails, info is freed
but info->cpu_addr allocation is left.
Fixes: eeeb940746de ("ion: add snapshot of ion support for MSM")

Bug: 130817249
Test: builds and boots
Change-Id: I7faf5be5129a46b2f874f4e3803470e5f5130a21
Reported-by: Mikael Magnusson <Mikael.Magnusson@sony.com>
Suggested-by: Mikael Magnusson <Mikael.Magnusson@sony.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
4 years agomm/page-writeback.c: place "not" inside of unlikely() statement in wb_domain_writeout...
Steven Rostedt (VMware) [Fri, 24 Feb 2017 22:59:24 +0000 (14:59 -0800)]
mm/page-writeback.c: place "not" inside of unlikely() statement in wb_domain_writeout_inc()

The likely/unlikely profiler noticed that the unlikely statement in
wb_domain_writeout_inc() is constantly wrong.  This is due to the "not"
(!) being outside the unlikely statement.  It is likely that
dom->period_time will be set, but unlikely that it wont be.  Move the
not into the unlikely statement.

Link: http://lkml.kernel.org/r/20170206120035.3c2e2b91@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolist: introduce list_for_each_entry_from_reverse helper
Jiri Pirko [Fri, 3 Feb 2017 09:29:05 +0000 (10:29 +0100)]
list: introduce list_for_each_entry_from_reverse helper

Similar to list_for_each_entry_continue and its reverse variant
list_for_each_entry_continue_reverse, introduce reverse helper for
list_for_each_entry_from.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomm/vmalloc: rework vmap_area_lock
Uladzislau Rezki (Sony) [Tue, 22 Oct 2019 15:58:00 +0000 (17:58 +0200)]
mm/vmalloc: rework vmap_area_lock

With the new allocation approach introduced in the 5.2 kernel, it
becomes possible to get rid of one global spinlock. By doing that
we can further improve the KVA from the performance point of view.

Basically we can have two independent locks, one for allocation
part and another one for deallocation, because of two different
entities: "free data structures" and "busy data structures".

As a result, allocation/deallocation operations can still interfere
between each other in case of running simultaneously on different
CPUs, it means there is still dependency, but with two locks it
becomes lower.

Summarizing:
  - it reduces the high lock contention
  - it allows to perform operations on "free" and "busy"
    trees in parallel on different CPUs. Please note it
    does not solve scalability issue.

Test results:
In order to evaluate this patch, we can run "vmalloc test driver"
to see how many CPU cycles it takes to complete all test cases
running sequentially. All online CPUs run it so it will cause
a high lock contention.

HiKey 960, ARM64, 8xCPUs, big.LITTLE:

<snip>
    sudo ./test_vmalloc.sh sequential_test_order=1
<snip>

<default>
[  390.950557] All test took CPU0=457126382 cycles
[  391.046690] All test took CPU1=454763452 cycles
[  391.128586] All test took CPU2=454539334 cycles
[  391.222669] All test took CPU3=455649517 cycles
[  391.313946] All test took CPU4=388272196 cycles
[  391.410425] All test took CPU5=384036264 cycles
[  391.492219] All test took CPU6=387432964 cycles
[  391.578433] All test took CPU7=387201996 cycles
<default>

<patched>
[  304.721224] All test took CPU0=391521310 cycles
[  304.821219] All test took CPU1=393533002 cycles
[  304.917120] All test took CPU2=392243032 cycles
[  305.008986] All test took CPU3=392353853 cycles
[  305.108944] All test took CPU4=297630721 cycles
[  305.196406] All test took CPU5=297548736 cycles
[  305.288602] All test took CPU6=297092392 cycles
[  305.381088] All test took CPU7=297293597 cycles
<patched>

~14%-23% patched variant is better.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
4 years agomm/vmalloc: add more comments to the adjust_va_to_fit_type()
Uladzislau Rezki (Sony) [Wed, 16 Oct 2019 09:54:38 +0000 (11:54 +0200)]
mm/vmalloc: add more comments to the adjust_va_to_fit_type()

When fit type is NE_FIT_TYPE there is a need in one extra object.
Usually the "ne_fit_preload_node" per-CPU variable has it and
there is no need in GFP_NOWAIT allocation, but there are exceptions.

This commit just adds more explanations, as a result giving
answers on questions like when it can occur, how often, under
which conditions and what happens if GFP_NOWAIT gets failed.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
4 years agomm/vmalloc: respect passed gfp_mask when do preloading
Uladzislau Rezki (Sony) [Wed, 16 Oct 2019 09:54:37 +0000 (11:54 +0200)]
mm/vmalloc: respect passed gfp_mask when do preloading

alloc_vmap_area() is given a gfp_mask for the page allocator.
Let's respect that mask and consider it even in the case when
doing regular CPU preloading, i.e. where a context can sleep.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 years agomm/vmalloc: remove preempt_disable/enable when do preloading
Uladzislau Rezki (Sony) [Wed, 16 Oct 2019 09:54:36 +0000 (11:54 +0200)]
mm/vmalloc: remove preempt_disable/enable when do preloading

Some background. The preemption was disabled before to guarantee
that a preloaded object is available for a CPU, it was stored for.

The aim was to not allocate in atomic context when spinlock
is taken later, for regular vmap allocations. But that approach
conflicts with CONFIG_PREEMPT_RT philosophy. It means that
calling spin_lock() with disabled preemption is forbidden
in the CONFIG_PREEMPT_RT kernel.

Therefore, get rid of preempt_disable() and preempt_enable() when
the preload is done for splitting purpose. As a result we do not
guarantee now that a CPU is preloaded, instead we minimize the
case when it is not, with this change.

For example i run the special test case that follows the preload
pattern and path. 20 "unbind" threads run it and each does
1000000 allocations. Only 3.5 times among 1000000 a CPU was
not preloaded. So it can happen but the number is negligible.

V2 - > V3:
    - update the commit message

V1 -> V2:
  - move __this_cpu_cmpxchg check when spin_lock is taken,
    as proposed by Andrew Morton
  - add more explanation in regard of preloading
  - adjust and move some comments

Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
4 years agomm: vmalloc: show number of vmalloc pages in /proc/meminfo
Roman Gushchin [Fri, 12 Jul 2019 04:00:13 +0000 (21:00 -0700)]
mm: vmalloc: show number of vmalloc pages in /proc/meminfo

Vmalloc() is getting more and more used these days (kernel stacks, bpf and
percpu allocator are new top users), and the total % of memory consumed by
vmalloc() can be pretty significant and changes dynamically.

/proc/meminfo is the best place to display this information: its top goal
is to show top consumers of the memory.

Since the VmallocUsed field in /proc/meminfo is not in use for quite a
long time (it has been defined to 0 by a5ad88ce8c7f ("mm: get rid of
'vmalloc_info' from /proc/meminfo")), let's reuse it for showing the
actual physical memory consumption of vmalloc().

Link: http://lkml.kernel.org/r/20190417194002.12369-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: move 'area->pages' after if statement
Austin Kim [Mon, 23 Sep 2019 22:36:42 +0000 (15:36 -0700)]
mm/vmalloc.c: move 'area->pages' after if statement

If !area->pages statement is true where memory allocation fails, area is
freed.

In this case 'area->pages = pages' should not executed.  So move
'area->pages = pages' after if statement.

[akpm@linux-foundation.org: give area->pages the same treatment]
Link: http://lkml.kernel.org/r/20190830035716.GA190684@LGEARND20B15
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc: modify struct vmap_area to reduce its size
Pengfei Li [Mon, 23 Sep 2019 22:36:39 +0000 (15:36 -0700)]
mm/vmalloc: modify struct vmap_area to reduce its size

Objective
---------

The current implementation of struct vmap_area wasted space.

After applying this commit, sizeof(struct vmap_area) has been
reduced from 11 words to 8 words.

Description
-----------

1) Pack "subtree_max_size", "vm" and "purge_list".  This is no problem
   because

A) "subtree_max_size" is only used when vmap_area is in "free" tree

B) "vm" is only used when vmap_area is in "busy" tree

C) "purge_list" is only used when vmap_area is in vmap_purge_list

2) Eliminate "flags".

;Since only one flag VM_VM_AREA is being used, and the same thing can be
done by judging whether "vm" is NULL, then the "flags" can be eliminated.

Link: http://lkml.kernel.org/r/20190716152656.12255-3-lpf.vector@gmail.com
Signed-off-by: Pengfei Li <lpf.vector@gmail.com>
Suggested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warning
Arnd Bergmann [Fri, 28 Jun 2019 19:07:09 +0000 (12:07 -0700)]
mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warning

gcc gets confused in pcpu_get_vm_areas() because there are too many
branches that affect whether 'lva' was initialized before it gets used:

  mm/vmalloc.c: In function 'pcpu_get_vm_areas':
  mm/vmalloc.c:991:4: error: 'lva' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      insert_vmap_area_augment(lva, &va->rb_node,
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       &free_vmap_area_root, &free_vmap_area_list);
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/vmalloc.c:916:20: note: 'lva' was declared here
    struct vmap_area *lva;
                      ^~~

Add an intialization to NULL, and check whether this has changed before
the first use.

[akpm@linux-foundation.org: tweak comments]
Link: http://lkml.kernel.org/r/20190618092650.2943749-1-arnd@arndb.de
Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Joel Fernandes <joelaf@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: fix percpu free VM area search criteria
Kuppuswamy Sathyanarayanan [Tue, 13 Aug 2019 22:37:31 +0000 (15:37 -0700)]
mm/vmalloc.c: fix percpu free VM area search criteria

Recent changes to the vmalloc code by commit 68ad4a330433
("mm/vmalloc.c: keep track of free blocks for vmap allocation") can
cause spurious percpu allocation failures.  These, in turn, can result
in panic()s in the slub code.  One such possible panic was reported by
Dave Hansen in following link https://lkml.org/lkml/2019/6/19/939.
Another related panic observed is,

 RIP: 0033:0x7f46f7441b9b
 Call Trace:
  dump_stack+0x61/0x80
  pcpu_alloc.cold.30+0x22/0x4f
  mem_cgroup_css_alloc+0x110/0x650
  cgroup_apply_control_enable+0x133/0x330
  cgroup_mkdir+0x41b/0x500
  kernfs_iop_mkdir+0x5a/0x90
  vfs_mkdir+0x102/0x1b0
  do_mkdirat+0x7d/0xf0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

VMALLOC memory manager divides the entire VMALLOC space (VMALLOC_START
to VMALLOC_END) into multiple VM areas (struct vm_areas), and it mainly
uses two lists (vmap_area_list & free_vmap_area_list) to track the used
and free VM areas in VMALLOC space.  And pcpu_get_vm_areas(offsets[],
sizes[], nr_vms, align) function is used for allocating congruent VM
areas for percpu memory allocator.  In order to not conflict with
VMALLOC users, pcpu_get_vm_areas allocates VM areas near the end of the
VMALLOC space.  So the search for free vm_area for the given requirement
starts near VMALLOC_END and moves upwards towards VMALLOC_START.

Prior to commit 68ad4a330433, the search for free vm_area in
pcpu_get_vm_areas() involves following two main steps.

Step 1:
    Find a aligned "base" adress near VMALLOC_END.
    va = free vm area near VMALLOC_END
Step 2:
    Loop through number of requested vm_areas and check,
        Step 2.1:
           if (base < VMALLOC_START)
              1. fail with error
        Step 2.2:
           // end is offsets[area] + sizes[area]
           if (base + end > va->vm_end)
               1. Move the base downwards and repeat Step 2
        Step 2.3:
           if (base + start < va->vm_start)
              1. Move to previous free vm_area node, find aligned
                 base address and repeat Step 2

But Commit 68ad4a330433 removed Step 2.2 and modified Step 2.3 as below:

        Step 2.3:
           if (base + start < va->vm_start || base + end > va->vm_end)
              1. Move to previous free vm_area node, find aligned
                 base address and repeat Step 2

Above change is the root cause of spurious percpu memory allocation
failures.  For example, consider a case where a relatively large vm_area
(~ 30 TB) was ignored in free vm_area search because it did not pass the
base + end < vm->vm_end boundary check.  Ignoring such large free
vm_area's would lead to not finding free vm_area within boundary of
VMALLOC_start to VMALLOC_END which in turn leads to allocation failures.

So modify the search algorithm to include Step 2.2.

Link: http://lkml.kernel.org/r/20190729232139.91131-1-sathyanarayanan.kuppuswamy@linux.intel.com
Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation")
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: sathyanarayanan kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc: do not keep unpurged areas in the busy tree
Uladzislau Rezki (Sony) [Tue, 16 Jul 2019 12:05:17 +0000 (14:05 +0200)]
mm/vmalloc: do not keep unpurged areas in the busy tree

The busy tree can be quite big, even though the area is freed
or unmapped it still stays there until "purge" logic removes
it.

1) Optimize and reduce the size of "busy" tree by removing a
node from it right away as soon as user triggers free paths.
It is possible to do so, because the allocation is done using
another augmented tree.

The vmalloc test driver shows the difference, for example the
"fix_size_alloc_test" is ~11% better comparing with default
configuration:

sudo ./test_vmalloc.sh performance

<default>
Summary: fix_size_alloc_test loops: 1000000 avg: 993985 usec
Summary: full_fit_alloc_test loops: 1000000 avg: 973554 usec
Summary: long_busy_list_alloc_test loops: 1000000 avg: 12617652 usec
<default>

<this patch>
Summary: fix_size_alloc_test loops: 1000000 avg: 882263 usec
Summary: full_fit_alloc_test loops: 1000000 avg: 973407 usec
Summary: long_busy_list_alloc_test loops: 1000000 avg: 12593929 usec
<this patch>

2) Since the busy tree now contains allocated areas only and does
not interfere with lazily free nodes, introduce the new function
show_purge_info() that dumps "unpurged" areas that is propagated
through "/proc/vmallocinfo".

3) Eliminate VM_LAZY_FREE flag.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
4 years agomm/vmalloc.c: switch to WARN_ON() and move it under unlink_va()
Uladzislau Rezki (Sony) [Thu, 6 Jun 2019 12:04:11 +0000 (14:04 +0200)]
mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va()

Trigger a warning if an object that is about to be freed is detached.
We used to have a BUG_ON(), but even though it is considered as faulty
behaviour that is not a good reason to break a system.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
4 years agomm/vmalloc.c: get rid of one single unlink_va() when merge
Uladzislau Rezki (Sony) [Thu, 6 Jun 2019 12:04:10 +0000 (14:04 +0200)]
mm/vmalloc.c: get rid of one single unlink_va() when merge

It does not make sense to try to "unlink" the node that is definitely not
linked with a list nor tree.  On the first merge step VA just points to
the previously disconnected busy area.

On the second step, check if the node has been merged and do "unlink" if
so, because now it points to an object that must be linked.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Hillf Danton <hdanton@sina.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
4 years agomm/vmalloc.c: preload a CPU with one object for split purpose
Uladzislau Rezki (Sony) [Thu, 6 Jun 2019 12:04:09 +0000 (14:04 +0200)]
mm/vmalloc.c: preload a CPU with one object for split purpose

Refactor the NE_FIT_TYPE split case when it comes to an allocation of one
extra object. We need it in order to build a remaining space. The preload
is done per CPU in non-atomic context with GFP_KERNEL flags.

More permissive parameters can be beneficial for systems which are suffer
from high memory pressure or low memory condition. For example on my KVM
system(4xCPUs, no swap, 256MB RAM) i can simulate the failure of page
allocation with GFP_NOWAIT flags. Using "stress-ng" tool and starting N
workers spinning on fork() and exit(), i can trigger below trace:

<snip>
[  179.815161] stress-ng-fork: page allocation failure: order:0, mode:0x40800(GFP_NOWAIT|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[  179.815168] CPU: 0 PID: 12612 Comm: stress-ng-fork Not tainted 5.2.0-rc3+ #1003
[  179.815170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[  179.815171] Call Trace:
[  179.815178]  dump_stack+0x5c/0x7b
[  179.815182]  warn_alloc+0x108/0x190
[  179.815187]  __alloc_pages_slowpath+0xdc7/0xdf0
[  179.815191]  __alloc_pages_nodemask+0x2de/0x330
[  179.815194]  cache_grow_begin+0x77/0x420
[  179.815197]  fallback_alloc+0x161/0x200
[  179.815200]  kmem_cache_alloc+0x1c9/0x570
[  179.815202]  alloc_vmap_area+0x32c/0x990
[  179.815206]  __get_vm_area_node+0xb0/0x170
[  179.815208]  __vmalloc_node_range+0x6d/0x230
[  179.815211]  ? _do_fork+0xce/0x3d0
[  179.815213]  copy_process.part.46+0x850/0x1b90
[  179.815215]  ? _do_fork+0xce/0x3d0
[  179.815219]  _do_fork+0xce/0x3d0
[  179.815226]  ? __do_page_fault+0x2bf/0x4e0
[  179.815229]  do_syscall_64+0x55/0x130
[  179.815231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  179.815234] RIP: 0033:0x7fedec4c738b
...
[  179.815237] RSP: 002b:00007ffda469d730 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  179.815239] RAX: ffffffffffffffda RBX: 00007ffda469d730 RCX: 00007fedec4c738b
[  179.815240] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  179.815241] RBP: 00007ffda469d780 R08: 00007fededd6e300 R09: 00007ffda47f50a0
[  179.815242] R10: 00007fededd6e5d0 R11: 0000000000000246 R12: 0000000000000000
[  179.815243] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
[  179.815245] Mem-Info:
[  179.815249] active_anon:12686 inactive_anon:14760 isolated_anon:0
                active_file:502 inactive_file:61 isolated_file:70
                unevictable:2 dirty:0 writeback:0 unstable:0
                slab_reclaimable:2380 slab_unreclaimable:7520
                mapped:15069 shmem:14813 pagetables:10833 bounce:0
                free:1922 free_pcp:229 free_cma:0
<snip>

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
4 years agomm/vmalloc.c: remove "node" argument
Uladzislau Rezki (Sony) [Thu, 6 Jun 2019 12:04:08 +0000 (14:04 +0200)]
mm/vmalloc.c: remove "node" argument

Remove unused argument from the __alloc_vmap_area() function.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
4 years agomm/vmalloc.c: convert vmap_lazy_nr to atomic_long_t
Uladzislau Rezki (Sony) [Tue, 14 May 2019 22:41:25 +0000 (15:41 -0700)]
mm/vmalloc.c: convert vmap_lazy_nr to atomic_long_t

vmap_lazy_nr variable has atomic_t type that is 4 bytes integer value on
both 32 and 64 bit systems.  lazy_max_pages() deals with "unsigned long"
that is 8 bytes on 64 bit system, thus vmap_lazy_nr should be 8 bytes on
64 bit as well.

Link: http://lkml.kernel.org/r/20190131162452.25879-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Joel Fernandes <joelaf@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro
Uladzislau Rezki (Sony) [Fri, 17 May 2019 21:31:37 +0000 (14:31 -0700)]
mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro

This macro adds some debug code to check that vmap allocations are
happened in ascending order.

By default this option is set to 0 and not active.  It requires
recompilation of the kernel to activate it.  Set to 1, compile the
kernel.

[urezki@gmail.com: v4]
Link: http://lkml.kernel.org/r/20190406183508.25273-4-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190402162531.10888-4-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro
Uladzislau Rezki (Sony) [Fri, 17 May 2019 21:31:34 +0000 (14:31 -0700)]
mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro

This macro adds some debug code to check that the augment tree is
maintained correctly, meaning that every node contains valid
subtree_max_size value.

By default this option is set to 0 and not active.  It requires
recompilation of the kernel to activate it.  Set to 1, compile the
kernel.

[urezki@gmail.com: v4]
Link: http://lkml.kernel.org/r/20190406183508.25273-3-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190402162531.10888-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: vmalloc: pass proper vm_start into debugobjects
Chintan Pandya [Fri, 8 Jun 2018 00:06:53 +0000 (17:06 -0700)]
mm: vmalloc: pass proper vm_start into debugobjects

Client can call vunmap with some intermediate 'addr' which may not be
the start of the VM area.  Entire unmap code works with vm->vm_start
which is proper but debug object API is called with 'addr'.  This could
be a problem within debug objects.

Pass proper start address into debug object API.

[akpm@linux-foundation.org: fix warning]
Link: http://lkml.kernel.org/r/1523961828-9485-3-git-send-email-cpandya@codeaurora.org
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: vmalloc: clean up vunmap to avoid pgtable ops twice
Chintan Pandya [Fri, 8 Jun 2018 00:06:46 +0000 (17:06 -0700)]
mm: vmalloc: clean up vunmap to avoid pgtable ops twice

vunmap does page table clear operations twice in the case when
DEBUG_PAGEALLOC_ENABLE_DEFAULT is enabled.

So, clean up the code as that is unintended.

As a perf gain, we save few us.  Below ftrace data was obtained while
doing 1 MB of vmalloc/vfree on ARM64 based SoC *without* this patch
applied.  After this patch, we can save ~3 us (on 1 extra
vunmap_page_range).

  CPU  DURATION                  FUNCTION CALLS
  |     |   |                     |   |   |   |
 6)               |  __vunmap() {
 6)               |    vmap_debug_free_range() {
 6)   3.281 us    |      vunmap_page_range();
 6) + 45.468 us   |    }
 6)   2.760 us    |    vunmap_page_range();
 6) ! 505.105 us  |  }

[cpandya@codeaurora.org: v3]
Link: http://lkml.kernel.org/r/1525176960-18408-1-git-send-email-cpandya@codeaurora.org
Link: http://lkml.kernel.org/r/1523876342-10545-1-git-send-email-cpandya@codeaurora.org
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: use rb_entry_safe
Geliang Tang [Wed, 22 Feb 2017 23:41:54 +0000 (15:41 -0800)]
mm/vmalloc.c: use rb_entry_safe

Use rb_entry_safe() instead of open-coding it.

Link: http://lkml.kernel.org/r/81bb9820e5b9e4a1c596b3e76f88abf8c4a76cb0.1482221947.git.geliangtang@gmail.com
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: simplify /proc/vmallocinfo implementation
zijun_hu [Tue, 13 Dec 2016 00:42:17 +0000 (16:42 -0800)]
mm/vmalloc.c: simplify /proc/vmallocinfo implementation

Many seq_file helpers exist for simplifying implementation of virtual
files especially, for /proc nodes.  however, the helpers for iteration
over list_head are available but aren't adopted to implement
/proc/vmallocinfo currently.

Simplify /proc/vmallocinfo implementation by using existing seq_file
helpers.

Link: http://lkml.kernel.org/r/57FDF2E5.1000201@zoho.com
Signed-off-by: zijun_hu <zijun_hu@htc.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmap: keep track of free blocks for vmap allocation
Uladzislau Rezki (Sony) [Sat, 6 Apr 2019 18:35:06 +0000 (20:35 +0200)]
mm/vmap: keep track of free blocks for vmap allocation

Currently an allocation of the new vmap area is done over busy
list iteration(complexity O(n)) until a suitable hole is found
between two busy areas. Therefore each new allocation causes
the list being grown. Due to over fragmented list and different
permissive parameters an allocation can take a long time. For
example on embedded devices it is milliseconds.

This patch organizes the KVA memory layout into free areas of the
1-ULONG_MAX range. It uses an augment red-black tree that keeps
blocks sorted by their offsets in pair with linked list keeping
the free space in order of increasing addresses.

Nodes are augmented with the size of the maximum available free
block in its left or right sub-tree. Thus, that allows to take a
decision and traversal toward the block that will fit and will
have the lowest start address, i.e. it is sequential allocation.

Allocation: to allocate a new block a search is done over the
tree until a suitable lowest(left most) block is large enough
to encompass: the requested size, alignment and vstart point.
If the block is bigger than requested size - it is split.

De-allocation: when a busy vmap area is freed it can either be
merged or inserted to the tree. Red-black tree allows efficiently
find a spot whereas a linked list provides a constant-time access
to previous and next blocks to check if merging can be done. In case
of merging of de-allocated memory chunk a large coalesced area is
created.

Complexity: ~O(log(N))

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
4 years agovmalloc: show lazy-purged vma info in vmallocinfo
Yisheng Xie [Mon, 10 Jul 2017 22:48:09 +0000 (15:48 -0700)]
vmalloc: show lazy-purged vma info in vmallocinfo

When ioremap a 67112960 bytes vm_area with the vmallocinfo:
 [..]
 0xec79b000-0xec7fa000  389120 ftl_add_mtd+0x4d0/0x754 pages=94 vmalloc
 0xec800000-0xecbe1000 4067328 kbox_proc_mem_write+0x104/0x1c4 phys=8b520000 ioremap

we get the result:
 0xf1000000-0xf5001000 67112960 devm_ioremap+0x38/0x7c phys=40000000 ioremap

For the align for ioremap must be less than '1 << IOREMAP_MAX_ORDER':

if (flags & VM_IOREMAP)
align = 1ul << clamp_t(int, get_count_order_long(size),
PAGE_SHIFT, IOREMAP_MAX_ORDER);

So it makes idiot like me a litte puzzled why this was a jump the
vm_area from 0xec800000-0xecbe1000 to 0xf1000000-0xf5001000, and leaving
0xed000000-0xf1000000 as a big hole.

This patch is to show all of vm_area, including vmas which are freeing
but still in the vmap_area_list, to make it more clear about why we will
get 0xf1000000-0xf5001000 in the above case.  And we will get a
vmallocinfo like:

 [..]
 0xec79b000-0xec7fa000  389120 ftl_add_mtd+0x4d0/0x754 pages=94 vmalloc
 0xec800000-0xecbe1000 4067328 kbox_proc_mem_write+0x104/0x1c4 phys=8b520000 ioremap
 [..]
 0xece7c000-0xece7e000    8192 unpurged vm_area
 0xece7e000-0xece83000   20480 vm_map_ram
 0xf0099000-0xf00aa000   69632 vm_map_ram

after this patch.

Link: http://lkml.kernel.org/r/1496649682-20710-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: zijun_hu <zijun_hu@htc.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
Wei Yang [Wed, 6 Sep 2017 23:24:09 +0000 (16:24 -0700)]
mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()

In pcpu_get_vm_areas(), it checks each range is not overlapped.  To make
sure it is, only (N^2)/2 comparison is necessary, while current code
does N^2 times.  By starting from the next range, it achieves the goal
and the continue could be removed.

Also,

 - the overlap check of two ranges could be done with one clause

 - one typo in comment is fixed.

Link: http://lkml.kernel.org/r/20170803063822.48702-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoRevert "mm: Update is_vmalloc_addr to account for vmalloc savings"
Artem Labazov [Tue, 2 Apr 2019 17:32:11 +0000 (20:32 +0300)]
Revert "mm: Update is_vmalloc_addr to account for vmalloc savings"

This reverts commit 8db21e1d697814324681f0046f19ccf116b467e9.

Signed-off-by: Artem Labazov <123321artyom@gmail.com>
4 years agomm: add priority threshold to __purge_vmap_area_lazy()
Uladzislau Rezki (Sony) [Thu, 24 Jan 2019 11:56:48 +0000 (12:56 +0100)]
mm: add priority threshold to __purge_vmap_area_lazy()

commit 763b218ddfaf ("mm: add preempt points into
__purge_vmap_area_lazy()")

introduced some preempt points, one of those is making an
allocation more prioritized over lazy free of vmap areas.

Prioritizing an allocation over freeing does not work well
all the time, i.e. it should be rather a compromise.

1) Number of lazy pages directly influence on busy list length
thus on operations like: allocation, lookup, unmap, remove, etc.

2) Under heavy stress of vmalloc subsystem i run into a situation
when memory usage gets increased hitting out_of_memory -> panic
state due to completely blocking of logic that frees vmap areas
in the __purge_vmap_area_lazy() function.

Establish a threshold passing which the freeing is prioritized
back over allocation creating a balance between each other.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
4 years agomm/vmalloc.c: fix align value calculation error
zijun_hu [Fri, 7 Oct 2016 23:57:26 +0000 (16:57 -0700)]
mm/vmalloc.c: fix align value calculation error

It causes double align requirement for __get_vm_area_node() if parameter
size is power of 2 and VM_IOREMAP is set in parameter flags, for example
size=0x10000 -> fls_long(0x10000)=17 -> align=0x20000

get_count_order_long() is implemented and can be used instead of
fls_long() for fixing the bug, for example size=0x10000 ->
get_count_order_long(0x10000)=16 -> align=0x10000

[akpm@linux-foundation.org: s/get_order_long()/get_count_order_long()/]
[zijun_hu@zoho.com: fixes]
Link: http://lkml.kernel.org/r/57AABC8B.1040409@zoho.com
[akpm@linux-foundation.org: locate get_count_order_long() next to get_count_order()]
[akpm@linux-foundation.org: move get_count_order[_long] definitions to pick up fls_long()]
[zijun_hu@htc.com: move out get_count_order[_long]() from __KERNEL__ scope]
Link: http://lkml.kernel.org/r/57B2C4CE.80303@zoho.com
Link: http://lkml.kernel.org/r/fc045ecf-20fa-0722-b3ac-9a6140488fad@zoho.com
Signed-off-by: zijun_hu <zijun_hu@htc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: zijun_hu <zijun_hu@htc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: fix overflow in vm_map_ram()
Guillermo Julián Moreno [Fri, 3 Jun 2016 21:55:33 +0000 (14:55 -0700)]
mm: fix overflow in vm_map_ram()

When remapping pages accounting for 4G or more memory space, the
operation 'count << PAGE_SHIFT' overflows as it is performed on an
integer.  Solution: cast before doing the bitshift.

[akpm@linux-foundation.org: fix vm_unmap_ram() also]
[akpm@linux-foundation.org: fix vmap() as well, per Guillermo]
Link: http://lkml.kernel.org/r/etPan.57175fb3.7a271c6b.2bd@naudit.es
Signed-off-by: Guillermo Julián Moreno <guillermo.julian@naudit.es>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc: use PAGE_ALIGNED() to check PAGE_SIZE alignment
Shawn Lin [Thu, 17 Mar 2016 21:20:37 +0000 (14:20 -0700)]
mm/vmalloc: use PAGE_ALIGNED() to check PAGE_SIZE alignment

We have PAGE_ALIGNED() in mm.h, so let's use it instead of IS_ALIGNED()
for checking PAGE_SIZE aligned case.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc: query dynamic DEBUG_PAGEALLOC setting
Joonsoo Kim [Thu, 17 Mar 2016 21:17:49 +0000 (14:17 -0700)]
mm/vmalloc: query dynamic DEBUG_PAGEALLOC setting

As CONFIG_DEBUG_PAGEALLOC can be enabled/disabled via kernel parameters
we can optimize some cases by checking the enablement state.

This is follow-up work for Christian's Optimize CONFIG_DEBUG_PAGEALLOC:

  https://lkml.org/lkml/2016/1/27/194

Remaining work is to make sparc to be aware of this but it looks not
easy for me so I skip that in this series.

This patch (of 5):

We can disable debug_pagealloc processing even if the code is complied
with CONFIG_DEBUG_PAGEALLOC.  This patch changes the code to query
whether it is enabled or not in runtime.

[akpm@linux-foundation.org: update comment, per David.  Adjust comment to use 80 cols]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: use macro IS_ALIGNED to judge the aligment
Wang Xiaoqiang [Sat, 16 Jan 2016 00:57:19 +0000 (16:57 -0800)]
mm/vmalloc.c: use macro IS_ALIGNED to judge the aligment

Just cleanup, no functional change.

Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm, vmalloc: remove VM_VPAGES
David Rientjes [Thu, 14 Jan 2016 23:19:35 +0000 (15:19 -0800)]
mm, vmalloc: remove VM_VPAGES

VM_VPAGES is unnecessary, it's easier to check is_vmalloc_addr() when
reading /proc/vmallocinfo.

[akpm@linux-foundation.org: remove VM_VPAGES reference via kvfree()]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vmalloc.c: use list_{next,first}_entry
Geliang Tang [Thu, 14 Jan 2016 23:19:08 +0000 (15:19 -0800)]
mm/vmalloc.c: use list_{next,first}_entry

To make the intention clearer, use list_{next,first}_entry instead of
list_entry.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agovmalloc: allow to account vmalloc to memcg
Vladimir Davydov [Thu, 14 Jan 2016 23:18:18 +0000 (15:18 -0800)]
vmalloc: allow to account vmalloc to memcg

Make vmalloc family functions allocate vmalloc area pages with
alloc_kmem_pages so that if __GFP_ACCOUNT is set they will be accounted
to memcg.  This is needed, at least, to account alloc_fdmem allocations.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agosched/idle: Optimize the generic idle loop
Gaurav Jindal (Gaurav Jindal) [Thu, 12 May 2016 10:13:33 +0000 (10:13 +0000)]
sched/idle: Optimize the generic idle loop

Currently, smp_processor_id() is used to fetch the current CPU in
cpu_idle_loop(). Every time the idle thread runs, it fetches the
current CPU using smp_processor_id().

Since the idle thread is per CPU, the current CPU is constant, so we
can lift the load out of the loop, saving execution cycles/time in the
loop.

x86-64:

Before patch (execution in loop):
148:    0f ae e8                lfence
14b:    65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
152:    00
153:    89 c0                   mov %eax,%eax
155:    49 0f a3 04 24          bt %rax,(%r12)

After patch (execution in loop):
150:    0f ae e8                lfence
153:    4d 0f a3 34 24          bt %r14,(%r12)

ARM64:

Before patch (execution in loop):
168:    d5033d9f        dsb     ld
16c:    b9405661        ldr     w1,[x19,#84]
170:    1100fc20        add     w0,w1,#0x3f
174:    6b1f003f        cmp     w1,wzr
178:    1a81b000        csel    w0,w0,w1,lt
17c:    130c7000        asr     w0,w0,#6
180:    937d7c00        sbfiz   x0,x0,#3,#32
184:    f8606aa0        ldr     x0,[x21,x0]
188:    9ac12401        lsr     x1,x0,x1
18c:    36000e61        tbz     w1,#0,358

After patch (execution in loop):
1a8:    d50339df        dsb     ld
1ac:    f8776ac0        ldr     x0,[x22,x23]
ab0:    ea18001f        tst     x0,x24
1b4:    54000ea0        b.eq    388

Further observance on ARM64 for 4 seconds shows that cpu_idle_loop is
called 8672 times. Shifting the code will save instructions executed
in loop and eventually time as well.

Signed-off-by: Gaurav Jindal <gaurav.jindal@spreadtrum.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sanjeev Yadav <sanjeev.yadav@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160512101330.GA488@gauravjindalubtnb.del.spreadtrum.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agosched/fair: honor sync only if CPU is about to goto idle
Rick Yiu [Thu, 17 May 2018 13:40:41 +0000 (21:40 +0800)]
sched/fair: honor sync only if CPU is about to goto idle

sync is causing excessive latencies during binder replies as its causing
migration of important tasks to busy CPU. Incase the CPU has a lot of
tasks running, prevent sync from happening

bug: 78790904
Change-Id: I8e4ef0d331a92b86111882bfe1b68b93a8b5a687
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
4 years agozram: show lzo compressor only be enabled
Julian Liu [Tue, 22 Oct 2019 13:03:47 +0000 (21:03 +0800)]
zram: show lzo compressor only be enabled

Fixes: 85a88878 ("zram: Set default compressor to lz4")

4 years agoiommu: Sync with suntan
Julian Liu [Sun, 20 Oct 2019 19:22:13 +0000 (03:22 +0800)]
iommu: Sync with suntan

* fix possible deadlock

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
4 years agof2fs: Revert rapid GC
Julian Liu [Sun, 20 Oct 2019 23:06:22 +0000 (07:06 +0800)]
f2fs: Revert rapid GC

4 years agozram: fix re-do compression
Julian Liu [Fri, 18 Oct 2019 16:52:15 +0000 (00:52 +0800)]
zram: fix re-do compression

4 years agodefconfig: Increase LMK reclaim frequency
wloot [Thu, 6 Jun 2019 14:11:31 +0000 (22:11 +0800)]
defconfig: Increase LMK reclaim frequency

4 years agoANDROID: sched/fair: initialise util_est values to 0 on fork
Chris Redpath [Tue, 23 Oct 2018 16:43:34 +0000 (17:43 +0100)]
ANDROID: sched/fair: initialise util_est values to 0 on fork

Since "sched/fair: Align PELT windows between cfs_rq and its se" the
upstream kernel has initialised the whole content of sched_avg to zero
on fork. When util_est was backported, we missed this and so ended up
with util_est values copied from the parent task.

Add the zero initialisation which is present upstream and ensure that
util_est values always start from a known point.

Fixes: 700f1172f7a7 ("BACKPORT: sched/fair: Add util_est on top of PELT")
Reported-by: Puja Gupta <pujag@quicinc.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Saravana Kannan <skannan@codeaurora.org>
Change-Id: I06995e4320d606a52761d0e773baf28fcd1e2680
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
4 years ago{chiron,sagit}_defconfig: Re-use Simple LMK
Julian Liu [Mon, 14 Oct 2019 10:41:10 +0000 (18:41 +0800)]
{chiron,sagit}_defconfig: Re-use Simple LMK

This reverts commit 522a29d7e10203da7d9f826660b0e94fb1a03bb0.

4 years agoANDROID: arm64: vdso: unconditionally set -Wl,--hash-style=sysv
Nick Desaulniers [Tue, 5 Mar 2019 21:36:43 +0000 (13:36 -0800)]
ANDROID: arm64: vdso: unconditionally set -Wl,--hash-style=sysv

The conditional check through cc-ldoption conflicts with
cfi-clang-flags once set. This is causing failures in
VtsKernelLinuxKselftest#vDSO_kselftest_vdso_test_64bit due to GOLD
defaulting to gnu rather than sysv as the hash style.

Bug: 122343936
Bug: 122902928
Test: m kselftest_vdso_test && \
  adb push $OUT/data/nativetest64/linux-kselftest/vDSO/kselftest_vdso_test /data/local/tmp/ && \
  adb shell /data/local/tmp/kselftest_vdso_test
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: Ice6ad2f99baa0eba2cc0813ed18486d1d9f4d612

4 years agodevfreq: msm_adreno_tz: Decrease busy time ceiling
Julian Liu [Thu, 10 Oct 2019 18:31:07 +0000 (02:31 +0800)]
devfreq: msm_adreno_tz: Decrease busy time ceiling

4 years agodevfreq: Weight stall cycles more for GPU bus DCVS
Kyle Piefer [Fri, 12 Oct 2018 20:39:54 +0000 (13:39 -0700)]
devfreq: Weight stall cycles more for GPU bus DCVS

Update GPU Bus DCVS to weight high stall cycles (as a percentage
of total bus usage) heavier in order to vote up the bus when the
GPU is stalling a lot waiting for data.

Change-Id: I8a331a48a1ab737c51f1001ea1991f09af9ef900
Signed-off-by: Kyle Piefer <kpiefer@codeaurora.org>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
4 years agodevfreq: Use busy cycles only for GPU bandwidth decisions
Carter Cooper [Mon, 8 Oct 2018 22:47:17 +0000 (16:47 -0600)]
devfreq: Use busy cycles only for GPU bandwidth decisions

Remove using the waiting bandwidth cycles for deciding
when to change bandwidth levels.

Change-Id: I65341c5c115684f19b5a3b2522d362e80315f2c9
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Signed-off-by: Kyle Piefer <kpiefer@codeaurora.org>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
4 years agoqpnp-smb2: guard codes correctly
Julian Liu [Wed, 9 Oct 2019 09:09:40 +0000 (17:09 +0800)]
qpnp-smb2: guard codes correctly

4 years agoqdsp6v2: q6asm: Remove excess arguments
wloot [Tue, 4 Jun 2019 16:57:38 +0000 (00:57 +0800)]
qdsp6v2: q6asm: Remove excess arguments

4 years agodiag: guard codes correctly
Julian Liu [Wed, 9 Oct 2019 08:57:12 +0000 (16:57 +0800)]
diag: guard codes correctly

4 years agoMakefile: Enable LLVM Polly optimizations if available
Volodymyr Zhdanov [Wed, 5 Jun 2019 11:14:05 +0000 (11:14 +0000)]
Makefile: Enable LLVM Polly optimizations if available

* https://polly.llvm.org/

4 years agolz4: remove unused functions
Park Ju Hyung [Mon, 30 Sep 2019 13:13:16 +0000 (22:13 +0900)]
lz4: remove unused functions

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
4 years agolz4: staticify functions
Park Ju Hyung [Mon, 30 Sep 2019 12:52:05 +0000 (21:52 +0900)]
lz4: staticify functions

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
4 years agolz4: do not export static symbol
Linus Torvalds [Fri, 20 Sep 2019 16:06:26 +0000 (09:06 -0700)]
lz4: do not export static symbol

Kbuild now complains (rightly) about it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinclude/linux/lz4.h: fix spelling and copy-paste errors in documentation
Tom Levy [Tue, 16 Jul 2019 23:30:24 +0000 (16:30 -0700)]
include/linux/lz4.h: fix spelling and copy-paste errors in documentation

Fix a few spelling and grammar errors, and two places where fast/safe in
the documentation did not match the function.

Link: http://lkml.kernel.org/r/20190321014452.13297-1-tomlevy93@gmail.com
Signed-off-by: Tom Levy <tomlevy93@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agocrypto: lz4 - fixed decompress function to return error code
Myungho Jung [Mon, 10 Apr 2017 00:34:22 +0000 (17:34 -0700)]
crypto: lz4 - fixed decompress function to return error code

Decompress function in LZ4 library is supposed to return an error code or
negative result. But, it returns -1 when any error is detected. Return
error code when the library returns negative value.

Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
4 years agomm/z3fold.c: claim page in the beginning of free
Vitaly Wool [Mon, 7 Oct 2019 00:58:22 +0000 (17:58 -0700)]
mm/z3fold.c: claim page in the beginning of free

There's a really hard to reproduce race in z3fold between z3fold_free()
and z3fold_reclaim_page().  z3fold_reclaim_page() can claim the page
after z3fold_free() has checked if the page was claimed and
z3fold_free() will then schedule this page for compaction which may in
turn lead to random page faults (since that page would have been
reclaimed by then).

Fix that by claiming page in the beginning of z3fold_free() and not
forgetting to clear the claim in the end.

[vitalywool@gmail.com: v2]
Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell
Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Markus Linnala <markus.linnala@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agorandom: reorder READ_ONCE() in get_random_uXX
Sebastian Andrzej Siewior [Fri, 30 Jun 2017 14:37:13 +0000 (16:37 +0200)]
random: reorder READ_ONCE() in get_random_uXX

Avoid the READ_ONCE in commit 4a072c71f49b ("random: silence compiler
warnings and fix race") if we can leave the function after
arch_get_random_XXX().

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: celtare21 <celtare21@gmail.com>
4 years agorandom: silence compiler warnings and fix race
Jason A. Donenfeld [Wed, 14 Jun 2017 22:45:26 +0000 (00:45 +0200)]
random: silence compiler warnings and fix race

Odd versions of gcc for the sh4 architecture will actually warn about
flags being used while uninitialized, so we set them to zero. Non crazy
gccs will optimize that out again, so it doesn't make a difference.

Next, over aggressive gccs could inline the expression that defines
use_lock, which could then introduce a race resulting in a lock
imbalance. By using READ_ONCE, we prevent that fate. Finally, we make
that assignment const, so that gcc can still optimize a nice amount.

Finally, we fix a potential deadlock between primary_crng.lock and
batched_entropy_reset_lock, where they could be called in opposite
order. Moving the call to invalidate_batched_entropy to outside the lock
rectifies this issue.

Fixes: b169c13de473a85b3c859bb36216a4cb5f00a54a
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: celtare21 <celtare21@gmail.com>
4 years agorandom: invalidate batched entropy after crng init
Jason A. Donenfeld [Wed, 7 Jun 2017 23:45:31 +0000 (19:45 -0400)]
random: invalidate batched entropy after crng init

It's possible that get_random_{u32,u64} is used before the crng has
initialized, in which case, its output might not be cryptographically
secure. For this problem, directly, this patch set is introducing the
*_wait variety of functions, but even with that, there's a subtle issue:
what happens to our batched entropy that was generated before
initialization. Prior to this commit, it'd stick around, supplying bad
numbers. After this commit, we force the entropy to be re-extracted
after each phase of the crng has initialized.

In order to avoid a race condition with the position counter, we
introduce a simple rwlock for this invalidation. Since it's only during
this awkward transition period, after things are all set up, we stop
using it, so that it doesn't have an impact on performance.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: celtare21 <celtare21@gmail.com>
4 years agodts: msm8998: add undervolt for GPU and CPU
freak07 [Wed, 27 Sep 2017 08:50:40 +0000 (10:50 +0200)]
dts: msm8998: add undervolt for GPU and CPU

Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Change-Id: I767e8aa325efe1b6b9361676190f9301b3a3614d

4 years ago{chiron,sagit}_defconfig: switch zRAM to use z3fold backend
Artem Labazov [Thu, 3 Jan 2019 15:04:32 +0000 (18:04 +0300)]
{chiron,sagit}_defconfig: switch zRAM to use z3fold backend

Signed-off-by: Artem Labazov <123321artyom@gmail.com>
Change-Id: Ibc1386eb15733b97d2c38ba1d2946ea0df19fe92

4 years agoz3fold: fix memory leak in kmem cache
Vitaly Wool [Mon, 23 Sep 2019 22:36:51 +0000 (15:36 -0700)]
z3fold: fix memory leak in kmem cache

Currently there is a leak in init_z3fold_page() -- it allocates handles
from kmem cache even for headless pages, but then they are never used and
never freed, so eventually kmem cache may get exhausted.  This patch
provides a fix for that.

Link: http://lkml.kernel.org/r/20190917185352.44cf285d3ebd9e64548de5de@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Tested-by: Markus Linnala <markus.linnala@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoz3fold: fix retry mechanism in page reclaim
Vitaly Wool [Mon, 23 Sep 2019 22:33:02 +0000 (15:33 -0700)]
z3fold: fix retry mechanism in page reclaim

z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration
it will have zhdr from the first one so that zhdr is no longer in line
with struct page.  That leads to crashes when the system is stressed.

Fix that by moving zhdr assignment up.

While at it, protect against using already freed handles by using own
local slots structure in z3fold_page_reclaim().

Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Reported-by: Chris Murphy <bugzilla@colorremedies.com>
Reported-by: Agustin Dall'Alba <agustin@dallalba.com.ar>
Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoRevert "mm/z3fold.c: fix race between migration and destruction"
Vitaly Wool [Mon, 23 Sep 2019 22:32:56 +0000 (15:32 -0700)]
Revert "mm/z3fold.c: fix race between migration and destruction"

With the original commit applied, z3fold_zpool_destroy() may get blocked
on wait_event() for indefinite time.  Revert this commit for the time
being to get rid of this problem since the issue the original commit
addresses is less severe.

Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com
Fixes: d776aaa9895eb6eb77 ("mm/z3fold.c: fix race between migration and destruction")
Reported-by: Agustín Dall'Alba <agustin@dallalba.com.ar>
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate
Gustavo A. R. Silva [Fri, 30 Aug 2019 23:04:43 +0000 (16:04 -0700)]
mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate

Fix lock/unlock imbalance by unlocking *zhdr* before return.

Addresses Coverity ID 1452811 ("Missing unlock")

Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor
Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: fix race between migration and destruction
Henry Burns [Sun, 25 Aug 2019 00:54:37 +0000 (17:54 -0700)]
mm/z3fold.c: fix race between migration and destruction

In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
However, we have no guarantee that migration isn't happening in the
background at that time.

Migration directly calls queue_work_on(pool->compact_wq), if destruction
wins that race we are using a destroyed workqueue.

Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: fix z3fold_destroy_pool() race condition
Henry Burns [Tue, 13 Aug 2019 22:37:25 +0000 (15:37 -0700)]
mm/z3fold.c: fix z3fold_destroy_pool() race condition

The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.

Calling z3fold_deregister_migration() before the workqueues are drained
means that there can be allocated pages referencing a freed inode,
causing any thread in compaction to be able to trip over the bad pointer
in PageMovable().

Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jonathan Adams <jwadams@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: fix z3fold_destroy_pool() ordering
Henry Burns [Tue, 13 Aug 2019 22:37:21 +0000 (15:37 -0700)]
mm/z3fold.c: fix z3fold_destroy_pool() ordering

The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.

If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:

   z3fold_destroy_pool()
     destroy_workqueue(pool->release_wq)
     destroy_workqueue(pool->compact_wq)
       drain_workqueue(pool->compact_wq)
         do_compact_page(zhdr)
           kref_put(&zhdr->refcount)
             __release_z3fold_page(zhdr, ...)
               queue_work_on(pool->release_wq, &pool->work) *BOOM*

So compact_wq needs to be destroyed before release_wq.

Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jonathan Adams <jwadams@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: reinitialize zhdr structs after migration
Henry Burns [Tue, 16 Jul 2019 23:26:21 +0000 (16:26 -0700)]
mm/z3fold.c: reinitialize zhdr structs after migration

z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
However, zhdr contains fields that can't be directly coppied over (ex:
list_head, a circular linked list).  We only need to initialize the
linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
these lists are empty

Additionally it is possible that zhdr->work has been placed in a
workqueue.  In this case we shouldn't migrate the page, as zhdr->work
references zhdr as opposed to new_zhdr.

Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com
Fixes: 1f862989b04ade61d3 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: remove z3fold_migration trylock
Henry Burns [Tue, 16 Jul 2019 23:26:18 +0000 (16:26 -0700)]
mm/z3fold.c: remove z3fold_migration trylock

z3fold_page_migrate() will never succeed because it attempts to acquire
a lock that has already been taken by migrate.c in __unmap_and_move().

  __unmap_and_move() migrate.c
    trylock_page(oldpage)
    move_to_new_page(oldpage_newpage)
      a_ops->migrate_page(oldpage, newpage)
        z3fold_page_migrate(oldpage, newpage)
          trylock_page(oldpage)

Link: http://lkml.kernel.org/r/20190710213238.91835-1-henryburns@google.com
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Snild Dolkow <snild@sony.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc
Henry Burns [Tue, 16 Jul 2019 23:26:03 +0000 (16:26 -0700)]
mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc

One of the gfp flags used to show that a page is movable is
__GFP_HIGHMEM.  Currently z3fold_alloc() fails when __GFP_HIGHMEM is
passed.  Now that z3fold pages are movable, we allow __GFP_HIGHMEM.  We
strip the movability related flags from the call to kmem_cache_alloc()
for our slots since it is a kernel allocation.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190712222118.108192-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns@google.com>
Acked-by: Vitaly Wool <vitalywool@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoRevert "z3fold: Don't fail on zRAM allocation requests"
Julian Liu [Sat, 5 Oct 2019 03:28:17 +0000 (11:28 +0800)]
Revert "z3fold: Don't fail on zRAM allocation requests"

This reverts commit 76183825603cbdb189e689465a929c5a82124a03.

4 years agomm/z3fold: don't try to use buddy slots after free
Vitaly Wool [Tue, 16 Jul 2019 23:25:48 +0000 (16:25 -0700)]
mm/z3fold: don't try to use buddy slots after free

As reported by Henry Burns:

Running z3fold stress testing with address sanitization showed zhdr->slots
was being used after it was freed.

  z3fold_free(z3fold_pool, handle)
    free_handle(handle)
      kmem_cache_free(pool->c_handle, zhdr->slots)
    release_z3fold_page_locked_list(kref)
      __release_z3fold_page(zhdr, true)
        zhdr_to_pool(zhdr)
          slots_to_pool(zhdr->slots)  *BOOM*

To fix this, add pointer to the pool back to z3fold_header and modify
zhdr_to_pool to return zhdr->pool.

Link: http://lkml.kernel.org/r/20190708134808.e89f3bfadd9f6ffd7eff9ba9@gmail.com
Fixes: 7c2b8baa61fe  ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: lock z3fold page before __SetPageMovable()
Henry Burns [Fri, 12 Jul 2019 03:52:14 +0000 (20:52 -0700)]
mm/z3fold.c: lock z3fold page before __SetPageMovable()

Following zsmalloc.c's example we call trylock_page() and unlock_page().
Also make z3fold_page_migrate() assert that newpage is passed in locked,
as per the documentation.

[akpm@linux-foundation.org: fix trylock_page return value test, per Shakeel]
Link: http://lkml.kernel.org/r/20190702005122.41036-1-henryburns@google.com
Link: http://lkml.kernel.org/r/20190702233538.52793-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns@google.com>
Suggested-by: Vitaly Wool <vitalywool@gmail.com>
Acked-by: Vitaly Wool <vitalywool@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Xidong Wang <wangxidong_97@163.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoz3fold: fix sheduling while atomic
Vitaly Wool [Sat, 1 Jun 2019 05:30:39 +0000 (22:30 -0700)]
z3fold: fix sheduling while atomic

kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so
we need to pass correct gfp flags to avoid "scheduling while atomic" bug.

Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com
Fixes: 7c2b8baa61fe5 ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoz3fold: don't bother with dentry_operations
David Howells [Tue, 21 May 2019 07:22:17 +0000 (08:22 +0100)]
z3fold: don't bother with dentry_operations

Don't bother with dentry_operations as no dentry is ever allocated.

Signed-off-by: David Howells <dhowells@redhat.com>