OSDN Git Service

uclinux-h8/linux.git
4 years agoMIPS: ingenic: GCW0: Update defconfig
Paul Cercueil [Mon, 13 Apr 2020 15:26:32 +0000 (17:26 +0200)]
MIPS: ingenic: GCW0: Update defconfig

Enable support for the new hardware that was added in the devicetree.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: ingenic: CI20: enable OST, PWM drivers in defconfig
Paul Cercueil [Mon, 13 Apr 2020 15:26:31 +0000 (17:26 +0200)]
MIPS: ingenic: CI20: enable OST, PWM drivers in defconfig

The OST driver provides a clocksource and sched_clock that are much more
accurate than the default ones.

The PWM driver allows to use the PWM pins on the external header of the
board.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: ingenic: DTS: Update GCW0 support
Paul Cercueil [Mon, 13 Apr 2020 15:26:30 +0000 (17:26 +0200)]
MIPS: ingenic: DTS: Update GCW0 support

Add support for the face buttons, the ACT8600 PMUC, the LCD panel
with backlight, the rumble, internal/external SD readers, and other
things.

Note that the otg-phy node was dropped in the process as it was neither
useful nor used, and was inside a non-compliant board "bus".

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: ingenic: DTS: Update JZ4770 support
Paul Cercueil [Mon, 13 Apr 2020 15:26:29 +0000 (17:26 +0200)]
MIPS: ingenic: DTS: Update JZ4770 support

Add support for the RTC, AIC, CODEC, MMC 0/1/2, ADC, GPU, LCD,
USB OTG, USB PHY controllers.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: ingenic: DTS: Add nodes for the watchdog/PWM/OST
Paul Cercueil [Mon, 13 Apr 2020 15:26:28 +0000 (17:26 +0200)]
MIPS: ingenic: DTS: Add nodes for the watchdog/PWM/OST

Add the TCU nodes to the JZ4780, JZ4770 and JZ4740 devicetree files.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: ingenic: DTS: Respect cell count of common properties
Paul Cercueil [Mon, 13 Apr 2020 15:26:27 +0000 (17:26 +0200)]
MIPS: ingenic: DTS: Respect cell count of common properties

If N fields of X cells should be provided, then that's what the
devicetree should represent, instead of having one single field of
(N*X) cells.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: ingenic: DTS: Fix invalid value in #dma-cells
Paul Cercueil [Mon, 13 Apr 2020 15:26:26 +0000 (17:26 +0200)]
MIPS: ingenic: DTS: Fix invalid value in #dma-cells

The driver requires two cells and not just one.

Since these nodes are both disabled as no hardware currently use them,
this fix does not really requires a Fixes: tag.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson64: Switch the order of RS780E and LS7A
Liangliang Huang [Wed, 29 Apr 2020 09:04:17 +0000 (17:04 +0800)]
MIPS: Loongson64: Switch the order of RS780E and LS7A

Sort the members of enum in alphabetical order is better to avoid
duplicate mistakes (because the list may be grow very large), so
fix it by exchanging the order.

Signed-off-by: Liangliang Huang <huangll@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson: Get host bridge information
Tiezhu Yang [Fri, 3 Apr 2020 09:29:49 +0000 (17:29 +0800)]
MIPS: Loongson: Get host bridge information

Read the address of host bridge configuration space to get the vendor ID
and device ID of host bridge, and then we can distinguish various types
of host bridge such as LS7A or RS780E.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: oprofile: remove unneeded semicolon in common.c
Jason Yan [Tue, 28 Apr 2020 06:32:54 +0000 (14:32 +0800)]
MIPS: oprofile: remove unneeded semicolon in common.c

Fix the following coccicheck warning:

arch/mips/oprofile/common.c:113:2-3: Unneeded semicolon

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Kernel: Identify Loongson-2K processors
Jiaxun Yang [Wed, 22 Apr 2020 14:43:44 +0000 (22:43 +0800)]
MIPS: Kernel: Identify Loongson-2K processors

Loongson-2K (Loongson64 Reduced) is a family of SoC shipped with
gs264e core.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson: Add support for perf tool
Tiezhu Yang [Sun, 26 Apr 2020 09:30:45 +0000 (17:30 +0800)]
MIPS: Loongson: Add support for perf tool

In order to use perf tool on the Loongson platform, we should enable kernel
support for various performance events provided by software and hardware,
so add CONFIG_PERF_EVENTS=y to loongson3_defconfig.

E.g. without this patch:

[loongson@localhost perf]$ ./perf list

List of pre-defined events (to be used in -e):

  duration_time                                      [Tool event]

  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
   (see 'man perf-list' on how to encode it)

  mem:<addr>[/len][:access]                          [Hardware breakpoint]

With this patch:

[loongson@localhost perf]$ ./perf list

List of pre-defined events (to be used in -e):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]

  duration_time                                      [Tool event]

  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]

  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
   (see 'man perf-list' on how to encode it)

  mem:<addr>[/len][:access]                          [Hardware breakpoint]

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Rename the "Fill" cache ops to avoid build failure
Huacai Chen [Sun, 26 Apr 2020 11:09:52 +0000 (19:09 +0800)]
MIPS: Rename the "Fill" cache ops to avoid build failure

MIPS define a "Fill" macro as a cache operation in cacheops.h, this
will cause build failure under some special configurations because in
seq_file.c there is a "Fill" label. To avoid this failure we rename the
"Fill" macro to "Fill_I" which has the same coding style as other cache
operations in cacheops.h (we think renaming the "Fill" macro is more
reasonable than renaming the "Fill" label).

Callers of "Fill" macro is also updated.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Clear XContext at boot time
Jiaxun Yang [Wed, 22 Apr 2020 14:45:34 +0000 (22:45 +0800)]
MIPS: Clear XContext at boot time

XContext might be dirty at boot time. We need to clear it
to ensure early stackframe is safe.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: arch_send_call_function_single_ipi() calling conventions change
Liangliang Huang [Thu, 23 Apr 2020 23:44:21 +0000 (19:44 -0400)]
MIPS: arch_send_call_function_single_ipi() calling conventions change

Use mp_ops->send_ipi_single() instead of mp_ops->send_ipi_mask() in
arch_send_call_function_single_ipi(). send_ipi_single() can send
IPI signal to a special cpu more efficiently.

Signed-off-by: Liangliang Huang <huangll@lemote.com>
Reviewed-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson-3: Add some unaligned instructions emulation
Huacai Chen [Fri, 24 Apr 2020 10:56:46 +0000 (18:56 +0800)]
MIPS: Loongson-3: Add some unaligned instructions emulation

1, Add unaligned gslq, gssq, gslqc1, gssqc1 emulation;
2, Add unaligned gsl{h, w, d}x, gss{h, w, d}x emulation;
3, Add unaligned gslwxc1, gsswxc1, gsldxc1, gssdxc1 emulation.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Pei Huang <huangpei@loongson.cn>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Move unaligned load/store helpers to inst.h
Huacai Chen [Fri, 24 Apr 2020 10:56:45 +0000 (18:56 +0800)]
MIPS: Move unaligned load/store helpers to inst.h

Move unaligned load/store helpers from unaligned.c to inst.h, then
other parts of the kernel can use these helpers.

Use __ASSEMBLY__ to guard the definition of "LONG" in asm.h to avoid
build error on IPxx platforms.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Pei Huang <huangpei@loongson.cn>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Fix the declaration conflict of mm_isBranchInstr()
Huacai Chen [Fri, 24 Apr 2020 10:56:44 +0000 (18:56 +0800)]
MIPS: Fix the declaration conflict of mm_isBranchInstr()

mm_isBranchInstr() is declared both in branch.h and in fpu_emulator.h
but the two declarations are conflict. If both of them are included by
a same file, they will cause a build error:

./arch/mips/include/asm/branch.h:33:19: error: static declaration of 'mm_isBranchInstr' follows non-static declaration
 static inline int mm_isBranchInstr(struct pt_regs *regs,
                   ^
./arch/mips/include/asm/fpu_emulator.h:177:5: note: previous declaration of 'mm_isBranchInstr' was here
 int mm_isBranchInstr(struct pt_regs *regs, struct mm_decoded_insn dec_insn,

Fix this error by removing both isBranchInstr() and mm_isBranchInstr()
in fpu_emulator.h, and declaring both of them in branch.h.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoASoC: txx9: add back the hack for a too small resource_size_t
Christoph Hellwig [Tue, 21 Apr 2020 17:11:36 +0000 (19:11 +0200)]
ASoC: txx9: add back the hack for a too small resource_size_t

Looks like I misread the Kconfig magic and this driver can be compiled
into 32-bit kernels.  Add back the hack to extent the range of the
resource_size_t, and include the header with the txx9-specific ioremap
magic for that.

Fixes: acfaaf52ebfd ("ASoC: txx9: don't work around too small resource_size_t")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Make sparse_init() using top-down allocation
Tiezhu Yang [Tue, 21 Apr 2020 11:59:46 +0000 (19:59 +0800)]
MIPS: Make sparse_init() using top-down allocation

In the current code, if CONFIG_SWIOTLB is set, when failed to get IO TLB
memory from the low pages by plat_swiotlb_setup(), it may lead to the boot
process failed with kernel panic.

(1) On the Loongson and SiByte platform
arch/mips/loongson64/dma.c
arch/mips/sibyte/common/dma.c
void __init plat_swiotlb_setup(void)
{
swiotlb_init(1);
}

kernel/dma/swiotlb.c
void  __init
swiotlb_init(int verbose)
{
...
vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
return;
...
pr_warn("Cannot allocate buffer");
no_iotlb_memory = true;
}

phys_addr_t swiotlb_tbl_map_single()
{
...
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier ...");
...
}

(2) On the Cavium OCTEON platform
arch/mips/cavium-octeon/dma-octeon.c
void __init plat_swiotlb_setup(void)
{
...
octeon_swiotlb = memblock_alloc_low(swiotlbsize, PAGE_SIZE);
if (!octeon_swiotlb)
panic("%s: Failed to allocate %zu bytes align=%lx\n",
      __func__, swiotlbsize, PAGE_SIZE);
...
}

Because IO_TLB_DEFAULT_SIZE is 64M, if the rest size of low memory is less
than 64M when call plat_swiotlb_setup(), we can easily reproduce the panic
case.

In order to reduce the possibility of kernel panic when failed to get IO
TLB memory under CONFIG_SWIOTLB, it is better to allocate low memory as
small as possible before plat_swiotlb_setup(), so make sparse_init() using
top-down allocation.

Reported-by: Juxin Gao <gaojuxin@loongson.cn>
Co-developed-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Cleanup code about plat_mem_setup()
Tiezhu Yang [Tue, 21 Apr 2020 11:59:45 +0000 (19:59 +0800)]
MIPS: Cleanup code about plat_mem_setup()

In the current code, plat_mem_setup() is called by arch_mem_init() instead
of setup_arch() and has been declared in asm/bootinfo.h, so modify the code
comment to reflect the reality and remove the useless duplicate declartion
in arch/mips/kernel/setup.c.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Do not initialise globals to 0
Tiezhu Yang [Tue, 21 Apr 2020 11:59:44 +0000 (19:59 +0800)]
MIPS: Do not initialise globals to 0

Fix the following checkpatch error:

ERROR: do not initialise globals to 0
#834: FILE: arch/mips/kernel/setup.c:834:
+int hw_coherentio = 0; /* Actual hardware supported DMA coherency setting. */

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson64: Mark RS780 HPET as broken
Jiaxun Yang [Mon, 20 Apr 2020 13:45:29 +0000 (21:45 +0800)]
MIPS: Loongson64: Mark RS780 HPET as broken

This driver is using some dangerous hack to set MMIO address for HPET,
which might break systems with other kinds of PCH.

Also, as Loongson-3 cpufreq driver never appeared in mainline,
this driver rarely got used.

So we temporarily mark it as broken until we find a better solution.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: DTS: Loongson64: Add ACPI Controller Node
Jiaxun Yang [Mon, 20 Apr 2020 13:45:28 +0000 (21:45 +0800)]
MIPS: DTS: Loongson64: Add ACPI Controller Node

Add ACPI Controller Node for RS780E PCH to fit newly added driver.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agodt-bindings: Document Loongson RS780E PCH ACPI Controller
Jiaxun Yang [Mon, 20 Apr 2020 13:45:27 +0000 (21:45 +0800)]
dt-bindings: Document Loongson RS780E PCH ACPI Controller

This controller is attached under ISA Bus and can be found
in Loongson-3 systems with RS780E PCH.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson64: Make RS780E ACPI as a platform driver
Jiaxun Yang [Mon, 20 Apr 2020 13:45:26 +0000 (21:45 +0800)]
MIPS: Loongson64: Make RS780E ACPI as a platform driver

Make RS780E ACPI as a platform driver so we can enable it
by DeviceTree selectively.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson64: Remove dead RTC code
Jiaxun Yang [Mon, 20 Apr 2020 13:45:25 +0000 (21:45 +0800)]
MIPS: Loongson64: Remove dead RTC code

RTC is now enabled by devicetree. So platform code is
no longer needed.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: alchemy: Fix build error after ioremap cleanup
Thomas Bogendoerfer [Mon, 20 Apr 2020 11:28:54 +0000 (13:28 +0200)]
MIPS: alchemy: Fix build error after ioremap cleanup

IOremap changes caused following build error:

arch/mips/alchemy/common/setup.c:99:9: error: implicit declaration of function
+‘remap_pfn_range’; did you mean ‘io_remap_pfn_range’?
+[-Werror=implicit-function-declaration]

Fixed my including linux/mm.h

Fixes: d399157283fb ("MIPS: cleanup fixup_bigphys_addr handling")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: DTS: CI20: make DM9000 Ethernet controller use NVMEM to find the default MAC...
H. Nikolaus Schaller [Fri, 28 Feb 2020 16:00:53 +0000 (17:00 +0100)]
MIPS: DTS: CI20: make DM9000 Ethernet controller use NVMEM to find the default MAC address

There is a unique MAC address programmed into the eFuses
of the JZ4780 chip in the CI20 factory. By using this
for initializing the DM9000 Ethernet controller, every
CI20 board has an individual - but stable - MAC address
and DHCP can assign stable IP addresses.

Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: DTS: JZ4780: define node for JZ4780 efuse
PrasannaKumar Muralidharan [Fri, 28 Feb 2020 16:00:52 +0000 (17:00 +0100)]
MIPS: DTS: JZ4780: define node for JZ4780 efuse

This patch brings support for the JZ4780 efuse. Currently it only exposes
a read only access to the entire 8K bits efuse memory and the
ethernet mac address for the davicom dm9000 chip on the CI20 board.

It also changes the nemc ranges definition to give the driver
access to the efuse registers, which are in the middle of the
nemc reg range.

Tested-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: use ioremap_page_range
Christoph Hellwig [Thu, 16 Apr 2020 15:00:11 +0000 (17:00 +0200)]
MIPS: use ioremap_page_range

Use the generic ioremap_page_range helper instead of reimplementing it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: move ioremap_prot und iounmap out of line
Christoph Hellwig [Thu, 16 Apr 2020 15:00:10 +0000 (17:00 +0200)]
MIPS: move ioremap_prot und iounmap out of line

Neither of these interfaces is anywhere near the fast path.  Move them
out of line and avoid exposing implementation details to the drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: split out the 64-bit ioremap implementation
Christoph Hellwig [Thu, 16 Apr 2020 15:00:09 +0000 (17:00 +0200)]
MIPS: split out the 64-bit ioremap implementation

Split out the mips64 ioremap implementation entirely, as it will never use
page table based remapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: merge __ioremap_mode into ioremap_prot
Christoph Hellwig [Thu, 16 Apr 2020 15:00:08 +0000 (17:00 +0200)]
MIPS: merge __ioremap_mode into ioremap_prot

There is no reason to have two ioremap with flags interfaces.  Merge
the historic mips __ioremap_mode into ioremap_prot which is a generic
kernel interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: cleanup fixup_bigphys_addr handling
Christoph Hellwig [Thu, 16 Apr 2020 15:00:07 +0000 (17:00 +0200)]
MIPS: cleanup fixup_bigphys_addr handling

fixup_bigphys_addr is only provided by the alchemy platform.  Remove
all the stubs, and ensure we only call it if it is actually implemented.

Also don't bother implementing io_remap_pfn_range if we don't have to,
and move the remaining implementation to alchemy platform code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: remove cpu_has_64bit_addresses
Christoph Hellwig [Thu, 16 Apr 2020 15:00:06 +0000 (17:00 +0200)]
MIPS: remove cpu_has_64bit_addresses

This macro is identical to CONFIG_64BIT, and using a Kconfig variable
for the only places that checks them (the ioremap implementation) will
simplify later patches in this series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoASoC: txx9: don't work around too small resource_size_t
Christoph Hellwig [Thu, 16 Apr 2020 15:00:05 +0000 (17:00 +0200)]
ASoC: txx9: don't work around too small resource_size_t

The txx9 sound driver deends on HAS_TXX9_ACLC, which is only set for
three tx49xx SOCs, and thus always has a 64-bit phys_addr_t and
resource_size_t.  Instead of poking into ioremap internals to work
around a potentially too small resource_size_t just add a BUILD_BUG_ON
to catch such a case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Netlogic: remove unneeded semicolon in fmn_message_handler()
Jason Yan [Sat, 18 Apr 2020 08:18:34 +0000 (16:18 +0800)]
MIPS: Netlogic: remove unneeded semicolon in fmn_message_handler()

Fix the following coccicheck warning:

arch/mips/netlogic/xlr/fmn.c:106:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agomips: loongsoon2ef: remove private clk api
Arnd Bergmann [Thu, 9 Apr 2020 09:02:28 +0000 (11:02 +0200)]
mips: loongsoon2ef: remove private clk api

As platforms are moving to COMMON_CLK in general, loongson2ef
stuck out as something that has a private implementation but
does not actually use it except for setting the frequency of
the CPU itself from the loongson2_cpufreq driver.

Change that driver to call the register setting function directly
and remove the rest of the stub implementation.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Fix typo for user_ld macro definition
bibo mao [Wed, 15 Apr 2020 09:56:43 +0000 (17:56 +0800)]
MIPS: Fix typo for user_ld macro definition

There is typo for macro user_ld if __ASSEMBLY__ is declared, this
patch fixes this issue.

Signed-off-by: bibo mao <maobibo@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Kill MIPS_GIC_IRQ_BASE
Jiaxun Yang [Thu, 26 Mar 2020 06:16:58 +0000 (14:16 +0800)]
MIPS: Kill MIPS_GIC_IRQ_BASE

It never got used by any driver.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: xilfpga: Removed unused header files
bibo mao [Thu, 26 Mar 2020 07:42:47 +0000 (03:42 -0400)]
MIPS: xilfpga: Removed unused header files

Header in directory asm/mach-xilfpga is not used any more.
Remove it here, and it passes to compile with xilfpga_defconfig

Signed-off-by: bibo mao <maobibo@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agomips: define pud_index() regardless of page table folding
Mike Rapoport [Thu, 2 Apr 2020 08:16:14 +0000 (11:16 +0300)]
mips: define pud_index() regardless of page table folding

Commit 31168f033e37 ("mips: drop __pXd_offset() macros that duplicate
pXd_index() ones") is correct that pud_index() & __pud_offset() are the
same when pud_index() is actually provided, however it does not take into
account the __PAGETABLE_PUD_FOLDED case. This has broken MIPS KVM
compilation because it relied on availability of pud_index().

Define pud_index() regardless of page table folded. It will evaluate to
actual index for 4-level pagetables and to 0 for folded PUD level.

Link: https://lore.kernel.org/lkml/20200331154749.5457-1-pbonzini@redhat.com
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: TXx9: Fix Kconfig warnings
YueHaibing [Fri, 3 Apr 2020 09:49:04 +0000 (17:49 +0800)]
MIPS: TXx9: Fix Kconfig warnings

If TTY and SND is not n, we got this warnings:

WARNING: unmet direct dependencies detected for HAS_TXX9_SERIAL
  Depends on [n]: TTY [=n] && HAS_IOMEM [=y]
  Selected by [y]:
  - SOC_TX3927 [=y]

WARNING: unmet direct dependencies detected for HAS_TXX9_SERIAL
  Depends on [n]: TTY [=n] && HAS_IOMEM [=y]
  Selected by [y]:
  - SOC_TX4938 [=y]

Only dependencies is enabled, they can be enabled, so
use 'imply' instead of 'select' to fix this.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoMIPS: Loongson: Use CONFIG_NR_CPUS_DEFAULT_64 to support more CPUs
Tiezhu Yang [Tue, 31 Mar 2020 07:00:06 +0000 (15:00 +0800)]
MIPS: Loongson: Use CONFIG_NR_CPUS_DEFAULT_64 to support more CPUs

When I update the mainline kernel on the Loongson 2-way platform which
has 8 CPUs, it only shows 4 CPUs due to NR_CPUS is 4, this is obviously
wrong.

In order to support more CPUs on the Loongson platform, it is better
to use CONFIG_NR_CPUS_DEFAULT_64 instead of CONFIG_NR_CPUS_DEFAULT_4
to specify the maximum number of CPUs which the kernel will support.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 years agoLinux 5.7-rc1
Linus Torvalds [Sun, 12 Apr 2020 19:35:55 +0000 (12:35 -0700)]
Linux 5.7-rc1

4 years agoMAINTAINERS: sort field names for all entries
Linus Torvalds [Sun, 12 Apr 2020 18:04:58 +0000 (11:04 -0700)]
MAINTAINERS: sort field names for all entries

This sorts the actual field names too, potentially causing even more
chaos and confusion at merge time if you have edited the MAINTAINERS
file.  But the end result is a more consistent layout, and hopefully
it's a one-time pain minimized by doing this just before the -rc1
release.

This was entirely scripted:

  ./scripts/parse-maintainers.pl --input=MAINTAINERS --output=MAINTAINERS --order

Requested-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMAINTAINERS: sort entries by entry name
Linus Torvalds [Sun, 12 Apr 2020 18:03:52 +0000 (11:03 -0700)]
MAINTAINERS: sort entries by entry name

They are all supposed to be sorted, but people who add new entries don't
always know the alphabet.  Plus sometimes the entry names get edited,
and people don't then re-order the entry.

Let's see how painful this will be for merging purposes (the MAINTAINERS
file is often edited in various different trees), but Joe claims there's
relatively few patches in -next that touch this, and doing it just
before -rc1 is likely the best time.  Fingers crossed.

This was scripted with

  /scripts/parse-maintainers.pl --input=MAINTAINERS --output=MAINTAINERS

but then I also ended up manually upper-casing a few entry names that
stood out when looking at the end result.

Requested-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 12 Apr 2020 17:17:16 +0000 (10:17 -0700)]
Merge tag 'x86-urgent-2020-04-12' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of three patches to fix the fallout of the newly added split
  lock detection feature.

  It addressed the case where a KVM guest triggers a split lock #AC and
  KVM reinjects it into the guest which is not prepared to handle it.

  Add proper sanity checks which prevent the unconditional injection
  into the guest and handles the #AC on the host side in the same way as
  user space detections are handled. Depending on the detection mode it
  either warns and disables detection for the task or kills the task if
  the mode is set to fatal"

* tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest
  KVM: x86: Emulate split-lock access as a write in emulator
  x86/split_lock: Provide handle_guest_split_lock()

4 years agoMerge tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Apr 2020 17:13:14 +0000 (10:13 -0700)]
Merge tag 'timers-urgent-2020-04-12' of git://git./linux/kernel/git/tip/tip

Pull time(keeping) updates from Thomas Gleixner:

 - Fix the time_for_children symlink in /proc/$PID/ so it properly
   reflects that it part of the 'time' namespace

 - Add the missing userns limit for the allowed number of time
   namespaces, which was half defined but the actual array member was
   not added. This went unnoticed as the array has an exessive empty
   member at the end but introduced a user visible regression as the
   output was corrupted.

 - Prevent further silent ucount corruption by adding a BUILD_BUG_ON()
   to catch half updated data.

* tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ucount: Make sure ucounts in /proc/sys/user don't regress again
  time/namespace: Add max_time_namespaces ucount
  time/namespace: Fix time_for_children symlink

4 years agoMerge tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Apr 2020 17:09:19 +0000 (10:09 -0700)]
Merge tag 'sched-urgent-2020-04-12' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes/updates from Thomas Gleixner:

 - Deduplicate the average computations in the scheduler core and the
   fair class code.

 - Fix a raise between runtime distribution and assignement which can
   cause exceeding the quota by up to 70%.

 - Prevent negative results in the imbalanace calculation

 - Remove a stale warning in the workqueue code which can be triggered
   since the call site was moved out of preempt disabled code. It's a
   false positive.

 - Deduplicate the print macros for procfs

 - Add the ucmap values to the SCHED_DEBUG procfs output for completness

* tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Add task uclamp values to SCHED_DEBUG procfs
  sched/debug: Factor out printing formats into common macros
  sched/debug: Remove redundant macro define
  sched/core: Remove unused rq::last_load_update_tick
  workqueue: Remove the warning in wq_worker_sleeping()
  sched/fair: Fix negative imbalance in imbalance calculation
  sched/fair: Fix race between runtime distribution and assignment
  sched/fair: Align rq->avg_idle and rq->avg_scan_cost

4 years agoMerge tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 12 Apr 2020 17:05:24 +0000 (10:05 -0700)]
Merge tag 'perf-urgent-2020-04-12' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Three fixes/updates for perf:

   - Fix the perf event cgroup tracking which tries to track the cgroup
     even for disabled events.

   - Add Ice Lake server support for uncore events

   - Disable pagefaults when retrieving the physical address in the
     sampling code"

* tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Disable page faults when getting phys address
  perf/x86/intel/uncore: Add Ice Lake server uncore support
  perf/cgroup: Correct indirection in perf_less_group_idx()
  perf/core: Fix event cgroup tracking

4 years agoMerge tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Apr 2020 16:47:10 +0000 (09:47 -0700)]
Merge tag 'locking-urgent-2020-04-12' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Three small fixes/updates for the locking core code:

   - Plug a task struct reference leak in the percpu rswem
     implementation.

   - Document the refcount interaction with PID_MAX_LIMIT

   - Improve the 'invalid wait context' data dump in lockdep so it
     contains all information which is required to decode the problem"

* tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Improve 'invalid wait context' splat
  locking/refcount: Document interaction with PID_MAX_LIMIT
  locking/percpu-rwsem: Fix a task_struct refcount

4 years agoMerge tag '5.7-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 12 Apr 2020 16:41:01 +0000 (09:41 -0700)]
Merge tag '5.7-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Ten cifs/smb fixes:

   - five RDMA (smbdirect) related fixes

   - add experimental support for swap over SMB3 mounts

   - also a fix which improves performance of signed connections"

* tag '5.7-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: enable swap on SMB3 mounts
  smb3: change noisy error message to FYI
  smb3: smbdirect support can be configured by default
  cifs: smbd: Do not schedule work to send immediate packet on every receive
  cifs: smbd: Properly process errors on ib_post_send
  cifs: Allocate crypto structures on the fly for calculating signatures of incoming packets
  cifs: smbd: Update receive credits before sending and deal with credits roll back on failure before sending
  cifs: smbd: Check send queue size before posting a send
  cifs: smbd: Merge code to track pending packets
  cifs: ignore cached share root handle closing errors

4 years agoMerge tag 'nfs-for-5.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sun, 12 Apr 2020 16:39:47 +0000 (09:39 -0700)]
Merge tag 'nfs-for-5.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfix from Trond Myklebust:
 "Fix an RCU read lock leakage in pnfs_alloc_ds_commits_list()"

* tag 'nfs-for-5.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS: Fix RCU lock leakage

4 years agoMerge tag 'nios2-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan...
Linus Torvalds [Sat, 11 Apr 2020 18:38:44 +0000 (11:38 -0700)]
Merge tag 'nios2-v5.7-rc1' of git://git./linux/kernel/git/lftan/nios2

Pull nios2 updates from Ley Foon Tan:

 - Remove nios2-dev@lists.rocketboards.org from MAINTAINERS

 - remove 'resetvalue' property

 - rename 'altr,gpio-bank-width' -> 'altr,ngpio'

 - enable the common clk subsystem on Nios2

* tag 'nios2-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  MAINTAINERS: Remove nios2-dev@lists.rocketboards.org
  arch: nios2: remove 'resetvalue' property
  arch: nios2: rename 'altr,gpio-bank-width' -> 'altr,ngpio'
  arch: nios2: Enable the common clk subsystem on Nios2

4 years agoMerge tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sat, 11 Apr 2020 18:34:36 +0000 (11:34 -0700)]
Merge tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - fix an integer truncation in dma_direct_get_required_mask
   (Kishon Vijay Abraham)

 - fix the display of dma mapping types (Grygorii Strashko)

* tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-debug: fix displaying of dma allocation type
  dma-direct: fix data truncation in dma_direct_get_required_mask()

4 years agoMerge tag 'kbuild-v5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Sat, 11 Apr 2020 16:46:12 +0000 (09:46 -0700)]
Merge tag 'kbuild-v5.7-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - raise minimum supported binutils version to 2.23

 - remove old CONFIG_AS_* macros that we know binutils >= 2.23 supports

 - move remaining CONFIG_AS_* tests to Kconfig from Makefile

 - enable -Wtautological-compare warnings to catch more issues

 - do not support GCC plugins for GCC <= 4.7

 - fix various breakages of 'make xconfig'

 - include the linker version used for linking the kernel into
   LINUX_COMPILER, which is used for the banner, and also exposed to
   /proc/version

 - link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y, which
   allows us to remove the lib-ksyms.o workaround, and to solve the last
   known issue of the LLVM linker

 - add dummy tools in scripts/dummy-tools/ to enable all compiler tests
   in Kconfig, which will be useful for distro maintainers

 - support the single switch, LLVM=1 to use Clang and all LLVM utilities
   instead of GCC and Binutils.

 - support LLVM_IAS=1 to enable the integrated assembler, which is still
   experimental

* tag 'kbuild-v5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (36 commits)
  kbuild: fix comment about missing include guard detection
  kbuild: support LLVM=1 to switch the default tools to Clang/LLVM
  kbuild: replace AS=clang with LLVM_IAS=1
  kbuild: add dummy toolchains to enable all cc-option etc. in Kconfig
  kbuild: link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y
  MIPS: fw: arc: add __weak to prom_meminit and prom_free_prom_memory
  kbuild: remove -I$(srctree)/tools/include from scripts/Makefile
  kbuild: do not pass $(KBUILD_CFLAGS) to scripts/mkcompile_h
  Documentation/llvm: fix the name of llvm-size
  kbuild: mkcompile_h: Include $LD version in /proc/version
  kconfig: qconf: Fix a few alignment issues
  kconfig: qconf: remove some old bogus TODOs
  kconfig: qconf: fix support for the split view mode
  kconfig: qconf: fix the content of the main widget
  kconfig: qconf: Change title for the item window
  kconfig: qconf: clean deprecated warnings
  gcc-plugins: drop support for GCC <= 4.7
  kbuild: Enable -Wtautological-compare
  x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2
  crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'
  ...

4 years agomailmap: Add Sedat Dilek (replacement for expired email address)
Sedat Dilek [Sat, 11 Apr 2020 13:29:43 +0000 (15:29 +0200)]
mailmap: Add Sedat Dilek (replacement for expired email address)

I do not longer work for credativ Germany.

Please, use my private email address instead.

This is for the case when people want to CC me on
patches sent from my old business email address.

Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agopNFS: Fix RCU lock leakage
Trond Myklebust [Sat, 11 Apr 2020 15:37:18 +0000 (11:37 -0400)]
pNFS: Fix RCU lock leakage

Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking
the RCU lock.

Fixes: a9901899b649 ("pNFS: Add infrastructure for cleaning up per-layout commit structures")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoKVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest
Xiaoyao Li [Fri, 10 Apr 2020 11:54:02 +0000 (13:54 +0200)]
KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest

Two types of #AC can be generated in Intel CPUs:
 1. legacy alignment check #AC
 2. split lock #AC

Reflect #AC back into the guest if the guest has legacy alignment checks
enabled or if split lock detection is disabled.

If the #AC is not a legacy one and split lock detection is enabled, then
invoke handle_guest_split_lock() which will either warn and disable split
lock detection for this task or force SIGBUS on it.

[ tglx: Switch it to handle_guest_split_lock() and rename the misnamed
  helper function. ]

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200410115517.176308876@linutronix.de
4 years agoKVM: x86: Emulate split-lock access as a write in emulator
Xiaoyao Li [Fri, 10 Apr 2020 11:54:01 +0000 (13:54 +0200)]
KVM: x86: Emulate split-lock access as a write in emulator

Emulate split-lock accesses as writes if split lock detection is on
to avoid #AC during emulation, which will result in a panic(). This
should never occur for a well-behaved guest, but a malicious guest can
manipulate the TLB to trigger emulation of a locked instruction[1].

More discussion can be found at [2][3].

[1] https://lkml.kernel.org/r/8c5b11c9-58df-38e7-a514-dc12d687b198@redhat.com
[2] https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com
[3] https://lkml.kernel.org/r/20200227001117.GX9940@linux.intel.com

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200410115517.084300242@linutronix.de
4 years agox86/split_lock: Provide handle_guest_split_lock()
Thomas Gleixner [Fri, 10 Apr 2020 11:54:00 +0000 (13:54 +0200)]
x86/split_lock: Provide handle_guest_split_lock()

Without at least minimal handling for split lock detection induced #AC,
VMX will just run into the same problem as the VMWare hypervisor, which
was reported by Kenneth.

It will inject the #AC blindly into the guest whether the guest is
prepared or not.

Provide a function for guest mode which acts depending on the host
SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a
warning, disable SLD and mark the task accordingly. Otherwise force
SIGBUS.

 [ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200410115516.978037132@linutronix.de
Link: https://lkml.kernel.org/r/20200402123258.895628824@linutronix.de
4 years agokbuild: fix comment about missing include guard detection
Masahiro Yamada [Wed, 8 Apr 2020 18:29:19 +0000 (03:29 +0900)]
kbuild: fix comment about missing include guard detection

The keyword here is 'twice' to explain the trick.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 11 Apr 2020 00:57:48 +0000 (17:57 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge yet more updates from Andrew Morton:

 - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
   gup, hugetlb, pagemap, memremap)

 - Various other things (hfs, ocfs2, kmod, misc, seqfile)

* akpm: (34 commits)
  ipc/util.c: sysvipc_find_ipc() should increase position index
  kernel/gcov/fs.c: gcov_seq_next() should increase position index
  fs/seq_file.c: seq_read(): add info message about buggy .next functions
  drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
  change email address for Pali Rohár
  selftests: kmod: test disabling module autoloading
  selftests: kmod: fix handling test numbers above 9
  docs: admin-guide: document the kernel.modprobe sysctl
  fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
  kmod: make request_module() return an error when autoloading is disabled
  mm/memremap: set caching mode for PCI P2PDMA memory to WC
  mm/memory_hotplug: add pgprot_t to mhp_params
  powerpc/mm: thread pgprot_t through create_section_mapping()
  x86/mm: introduce __set_memory_prot()
  x86/mm: thread pgprot_t through init_memory_mapping()
  mm/memory_hotplug: rename mhp_restrictions to mhp_params
  mm/memory_hotplug: drop the flags field from struct mhp_restrictions
  mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
  mm/vma: introduce VM_ACCESS_FLAGS
  mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
  ...

4 years agoMerge tag 'docs-5.7-2' of git://git.lwn.net/linux
Linus Torvalds [Sat, 11 Apr 2020 00:53:43 +0000 (17:53 -0700)]
Merge tag 'docs-5.7-2' of git://git.lwn.net/linux

Pull Documentation fixes from Jonathan Corbet:
 "A handful of late-arriving fixes for the documentation tree"

* tag 'docs-5.7-2' of git://git.lwn.net/linux:
  Documentation: android: binderfs: add 'stats' mount option
  Documentation: driver-api/usb/writing_usb_driver.rst Updates documentation links
  docs: driver-api: address duplicate label warning
  Documentation: sysrq: fix RST formatting
  docs: kernel-parameters.txt: Fix broken references
  docs: kernel-parameters.txt: Remove nompx
  docs: filesystems: fix typo in qnx6.rst

4 years agoMerge tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubca...
Linus Torvalds [Sat, 11 Apr 2020 00:50:01 +0000 (17:50 -0700)]
Merge tag 'for-linus-5.7-ofs1' of git://git./linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "A fix and two cleanups.

  Fix:

   - Christoph Hellwig noticed that some logic I added to
     orangefs_file_read_iter introduced a race condition, so he sent a
     reversion patch. I had to modify his patch since reverting at this
     point broke Orangefs.

  Cleanups:

   - Christoph Hellwig noticed that we were doing some unnecessary work
     in orangefs_flush, so he sent in a patch that removed the un-needed
     code.

   - Al Viro told me he had trouble building Orangefs. Orangefs should
     be easy to build, even for Al :-).

     I looked back at the test server build notes in orangefs.txt, just
     in case that's where the trouble really is, and found a couple of
     typos and made a couple of clarifications"

* tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: clarify build steps for test server in orangefs.txt
  orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush
  orangefs: get rid of knob code...

4 years agoMerge tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensa
Linus Torvalds [Sat, 11 Apr 2020 00:39:20 +0000 (17:39 -0700)]
Merge tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensa

Pull xtensa updates from Max Filippov:

 - replace setup_irq() by request_irq()

 - cosmetic fixes in xtensa Kconfig and boot/Makefile

* tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensa:
  arch/xtensa: fix grammar in Kconfig help text
  xtensa: remove meaningless export ccflags-y
  xtensa: replace setup_irq() by request_irq()

4 years agoMerge tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 11 Apr 2020 00:20:06 +0000 (17:20 -0700)]
Merge tag 'for-linus-5.7-rc1b-tag' of git://git./linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:

 - two cleanups

 - fix a boot regression introduced in this merge window

 - fix wrong use of memory allocation flags

* tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: fix booting 32-bit pv guest
  x86/xen: make xen_pvmmu_arch_setup() static
  xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
  xen: Use evtchn_type_t as a type for event channels

4 years agoipc/util.c: sysvipc_find_ipc() should increase position index
Vasily Averin [Fri, 10 Apr 2020 21:34:13 +0000 (14:34 -0700)]
ipc/util.c: sysvipc_find_ipc() should increase position index

If seq_file .next function does not change position index, read after
some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/b7a20945-e315-8bb0-21e6-3875c14a8494@virtuozzo.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokernel/gcov/fs.c: gcov_seq_next() should increase position index
Vasily Averin [Fri, 10 Apr 2020 21:34:10 +0000 (14:34 -0700)]
kernel/gcov/fs.c: gcov_seq_next() should increase position index

If seq_file .next function does not change position index, read after
some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Waiman Long <longman@redhat.com>
Link: http://lkml.kernel.org/r/f65c6ee7-bd00-f910-2f8a-37cc67e4ff88@virtuozzo.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/seq_file.c: seq_read(): add info message about buggy .next functions
Vasily Averin [Fri, 10 Apr 2020 21:34:06 +0000 (14:34 -0700)]
fs/seq_file.c: seq_read(): add info message about buggy .next functions

Patch series "seq_file .next functions should increase position index".

In Aug 2018 NeilBrown noticed commit 1f4aace60b0e ("fs/seq_file.c:
simplify seq_file iteration code and interface")

"Some ->next functions do not increment *pos when they return NULL...
Note that such ->next functions are buggy and should be fixed.  A simple
demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size
larger than the size of /proc/swaps.  This will always show the whole
last line of /proc/swaps"

Described problem is still actual.  If you make lseek into middle of
last output line following read will output end of last line and whole
last line once again.

  $ dd if=/proc/swaps bs=1  # usual output
  Filename Type Size Used Priority
  /dev/dm-0                             partition 4194812 97536 -2
  104+0 records in
  104+0 records out
  104 bytes copied

  $ dd if=/proc/swaps bs=40 skip=1    # last line was generated twice
  dd: /proc/swaps: cannot skip to specified offset
  v/dm-0                                partition 4194812 97536 -2
  /dev/dm-0                             partition 4194812 97536 -2
  3+1 records in
  3+1 records out
  131 bytes copied

There are lot of other affected files, I've found 30+ including
/proc/net/ip_tables_matches and /proc/sysvipc/*

I've sent patches into maillists of affected subsystems already, this
patch-set fixes the problem in files related to pstore, tracing, gcov,
sysvipc and other subsystems processed via linux-kernel@ mailing list
directly

https://bugzilla.kernel.org/show_bug.cgi?id=206283

This patch (of 4):

Add debug code to seq_read() to detect missed or out-of-tree incorrect
.next seq_file functions.

[akpm@linux-foundation.org: s/pr_info/pr_info_ratelimited/, per Qian Cai]
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Waiman Long <longman@redhat.com>
Link: http://lkml.kernel.org/r/244674e5-760c-86bd-d08a-047042881748@virtuozzo.com
Link: http://lkml.kernel.org/r/7c24087c-e280-e580-5b0c-0cdaeb14cd18@virtuozzo.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodrivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
kbuild test robot [Fri, 10 Apr 2020 21:34:03 +0000 (14:34 -0700)]
drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings

Remove dev_err() messages after platform_get_irq*() failures.
platform_get_irq() already prints an error.

Generated by: scripts/coccinelle/api/platform_get_irq.cocci

Fixes: 6c41ac96ad92 ("dmaengine: tegra-apb: Support COMPILE_TEST")
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Cc: Laxman Dewangan <ldewangan@nvidia.com>
Cc: Vinod Koul <vinod.koul@linux.intel.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Jon Hunter <jonathanh@nvidia.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002271133450.2973@hadrien
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agochange email address for Pali Rohár
Pali Rohár [Fri, 10 Apr 2020 21:34:00 +0000 (14:34 -0700)]
change email address for Pali Rohár

For security reasons I stopped using gmail account and kernel address is
now up-to-date alias to my personal address.

People periodically send me emails to address which they found in source
code of drivers, so this change reflects state where people can contact
me.

[ Added .mailmap entry as per Joe Perches  - Linus ]
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Link: http://lkml.kernel.org/r/20200307104237.8199-1-pali@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoselftests: kmod: test disabling module autoloading
Eric Biggers [Fri, 10 Apr 2020 21:33:57 +0000 (14:33 -0700)]
selftests: kmod: test disabling module autoloading

Test that request_module() fails with -ENOENT when
/proc/sys/kernel/modprobe contains (a) a nonexistent path, and (b) an
empty path.

Case (b) is a regression test for the patch "kmod: make request_module()
return an error when autoloading is disabled".

Tested with 'kmod.sh -t 0010 && kmod.sh -t 0011', and also simply with
'kmod.sh' to run all kmod tests.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: NeilBrown <neilb@suse.com>
Link: http://lkml.kernel.org/r/20200312202552.241885-5-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoselftests: kmod: fix handling test numbers above 9
Eric Biggers [Fri, 10 Apr 2020 21:33:53 +0000 (14:33 -0700)]
selftests: kmod: fix handling test numbers above 9

get_test_count() and get_test_enabled() were broken for test numbers
above 9 due to awk interpreting a field specification like '$0010' as
octal rather than decimal.  Fix it by stripping the leading zeroes.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: NeilBrown <neilb@suse.com>
Link: http://lkml.kernel.org/r/20200318230515.171692-5-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodocs: admin-guide: document the kernel.modprobe sysctl
Eric Biggers [Fri, 10 Apr 2020 21:33:50 +0000 (14:33 -0700)]
docs: admin-guide: document the kernel.modprobe sysctl

Document the kernel.modprobe sysctl in the same place that all the other
kernel.* sysctls are documented.  Make sure to mention how to use this
sysctl to completely disable module autoloading, and how this sysctl
relates to CONFIG_STATIC_USERMODEHELPER.

[ebiggers@google.com: v5]
Link: http://lkml.kernel.org/r/20200318230515.171692-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: NeilBrown <neilb@suse.com>
Link: http://lkml.kernel.org/r/20200312202552.241885-4-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
Eric Biggers [Fri, 10 Apr 2020 21:33:47 +0000 (14:33 -0700)]
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()

After request_module(), nothing is stopping the module from being
unloaded until someone takes a reference to it via try_get_module().

The WARN_ONCE() in get_fs_type() is thus user-reachable, via userspace
running 'rmmod' concurrently.

Since WARN_ONCE() is for kernel bugs only, not for user-reachable
situations, downgrade this warning to pr_warn_once().

Keep it printed once only, since the intent of this warning is to detect
a bug in modprobe at boot time.  Printing the warning more than once
wouldn't really provide any useful extra information.

Fixes: 41124db869b7 ("fs: warn in case userspace lied about modprobe return")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: NeilBrown <neilb@suse.com>
Cc: <stable@vger.kernel.org> [4.13+]
Link: http://lkml.kernel.org/r/20200312202552.241885-3-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agokmod: make request_module() return an error when autoloading is disabled
Eric Biggers [Fri, 10 Apr 2020 21:33:43 +0000 (14:33 -0700)]
kmod: make request_module() return an error when autoloading is disabled

Patch series "module autoloading fixes and cleanups", v5.

This series fixes a bug where request_module() was reporting success to
kernel code when module autoloading had been completely disabled via
'echo > /proc/sys/kernel/modprobe'.

It also addresses the issues raised on the original thread
(https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
bydocumenting the modprobe sysctl, adding a self-test for the empty path
case, and downgrading a user-reachable WARN_ONCE().

This patch (of 4):

It's long been possible to disable kernel module autoloading completely
(while still allowing manual module insertion) by setting
/proc/sys/kernel/modprobe to the empty string.

This can be preferable to setting it to a nonexistent file since it
avoids the overhead of an attempted execve(), avoids potential
deadlocks, and avoids the call to security_kernel_module_request() and
thus on SELinux-based systems eliminates the need to write SELinux rules
to dontaudit module_request.

However, when module autoloading is disabled in this way,
request_module() returns 0.  This is broken because callers expect 0 to
mean that the module was successfully loaded.

Apparently this was never noticed because this method of disabling
module autoloading isn't used much, and also most callers don't use the
return value of request_module() since it's always necessary to check
whether the module registered its functionality or not anyway.

But improperly returning 0 can indeed confuse a few callers, for example
get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:

if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
fs = __get_fs_type(name, len);
WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
}

This is easily reproduced with:

echo > /proc/sys/kernel/modprobe
mount -t NONEXISTENT none /

It causes:

request_module fs-NONEXISTENT succeeded, but still no fs?
WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
[...]

This should actually use pr_warn_once() rather than WARN_ONCE(), since
it's also user-reachable if userspace immediately unloads the module.
Regardless, request_module() should correctly return an error when it
fails.  So let's make it return -ENOENT, which matches the error when
the modprobe binary doesn't exist.

I've also sent patches to document and test this case.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Ben Hutchings <benh@debian.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memremap: set caching mode for PCI P2PDMA memory to WC
Logan Gunthorpe [Fri, 10 Apr 2020 21:33:39 +0000 (14:33 -0700)]
mm/memremap: set caching mode for PCI P2PDMA memory to WC

PCI BAR IO memory should never be mapped as WB, however prior to this
the PAT bits were set WB and it was typically overridden by MTRR
registers set by the firmware.

Set PCI P2PDMA memory to be UC as this is what it currently, typically,
ends up being mapped as on x86 after the MTRR registers override the
cache setting.

Future use-cases may need to generalize this by adding flags to select
the caching type, as some P2PDMA cases may not want UC.  However, those
use-cases are not upstream yet and this can be changed when they arrive.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-8-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: add pgprot_t to mhp_params
Logan Gunthorpe [Fri, 10 Apr 2020 21:33:36 +0000 (14:33 -0700)]
mm/memory_hotplug: add pgprot_t to mhp_params

devm_memremap_pages() is currently used by the PCI P2PDMA code to create
struct page mappings for IO memory.  At present, these mappings are
created with PAGE_KERNEL which implies setting the PAT bits to be WB.
However, on x86, an mtrr register will typically override this and force
the cache type to be UC-.  In the case firmware doesn't set this
register it is effectively WB and will typically result in a machine
check exception when it's accessed.

Other arches are not currently likely to function correctly seeing they
don't have any MTRR registers to fall back on.

To solve this, provide a way to specify the pgprot value explicitly to
arch_add_memory().

Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
simple change to pass the pgprot_t down to their respective functions
which set up the page tables.  For x86_32, set the page tables
explicitly using _set_memory_prot() (seeing they are already mapped).

For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
should be fine, for now, seeing these architectures don't support
ZONE_DEVICE.

A check in __add_pages() is also added to ensure the pgprot parameter
was set for all arches.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agopowerpc/mm: thread pgprot_t through create_section_mapping()
Logan Gunthorpe [Fri, 10 Apr 2020 21:33:32 +0000 (14:33 -0700)]
powerpc/mm: thread pgprot_t through create_section_mapping()

In prepartion to support a pgprot_t argument for arch_add_memory().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-6-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agox86/mm: introduce __set_memory_prot()
Logan Gunthorpe [Fri, 10 Apr 2020 21:33:28 +0000 (14:33 -0700)]
x86/mm: introduce __set_memory_prot()

For use in the 32bit arch_add_memory() to set the pgprot type of the
memory to add.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-5-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agox86/mm: thread pgprot_t through init_memory_mapping()
Logan Gunthorpe [Fri, 10 Apr 2020 21:33:24 +0000 (14:33 -0700)]
x86/mm: thread pgprot_t through init_memory_mapping()

In preparation to support a pgprot_t argument for arch_add_memory().

It's required to move the prototype of init_memory_mapping() seeing the
original location came before the definition of pgprot_t.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: rename mhp_restrictions to mhp_params
Logan Gunthorpe [Fri, 10 Apr 2020 21:33:21 +0000 (14:33 -0700)]
mm/memory_hotplug: rename mhp_restrictions to mhp_params

The mhp_restrictions struct really doesn't specify anything resembling a
restriction anymore so rename it to be mhp_params as it is a list of
extended parameters.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: drop the flags field from struct mhp_restrictions
Logan Gunthorpe [Fri, 10 Apr 2020 21:33:17 +0000 (14:33 -0700)]
mm/memory_hotplug: drop the flags field from struct mhp_restrictions

Patch series "Allow setting caching mode in arch_add_memory() for
P2PDMA", v4.

Currently, the page tables created using memremap_pages() are always
created with the PAGE_KERNEL cacheing mode.  However, the P2PDMA code is
creating pages for PCI BAR memory which should never be accessed through
the cache and instead use either WC or UC.  This still works in most
cases, on x86, because the MTRR registers typically override the caching
settings in the page tables for all of the IO memory to be UC-.
However, this tends not to work so well on other arches or some rare x86
machines that have firmware which does not setup the MTRR registers in
this way.

Instead of this, this series proposes a change to arch_add_memory() to
take the pgprot required by the mapping which allows us to explicitly
set pagetable entries for P2PDMA memory to UC.

This changes is pretty routine for most of the arches: x86_64, arm64 and
powerpc simply need to thread the pgprot through to where the page
tables are setup.  x86_32 unfortunately sets up the page tables at boot
so must use _set_memory_prot() to change their caching mode.  ia64, s390
and sh don't appear to have an easy way to change the page tables so,
for now at least, we just return -EINVAL on such mappings and thus they
will not support P2PDMA memory until the work for this is done.  This
should be fine as they don't yet support ZONE_DEVICE.

This patch (of 7):

This variable is not used anywhere and should therefore be removed from
the structure.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-2-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/special: create generic fallbacks for pte_special() and pte_mkspecial()
Anshuman Khandual [Fri, 10 Apr 2020 21:33:13 +0000 (14:33 -0700)]
mm/special: create generic fallbacks for pte_special() and pte_mkspecial()

Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL
but required to define quite similar fallback stubs for special page
table entry helpers such as pte_special() and pte_mkspecial(), as they
get build in generic MM without a config check.  This creates two
generic fallback stub definitions for these helpers, eliminating much
code duplication.

mips platform has a special case where pte_special() and pte_mkspecial()
visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires.
This restricts those symbol visibility in order to avoid redefinitions
which is now exposed through this new generic stubs and subsequent build
failure.  arm platform set_pte_at() definition needs to be moved into a
C file just to prevent a build failure.

[anshuman.khandual@arm.com: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas]
Link: http://lkml.kernel.org/r/1583851924-21603-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Stafford Horne <shorne@gmail.com> [openrisc]
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: http://lkml.kernel.org/r/1583802551-15406-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vma: introduce VM_ACCESS_FLAGS
Anshuman Khandual [Fri, 10 Apr 2020 21:33:09 +0000 (14:33 -0700)]
mm/vma: introduce VM_ACCESS_FLAGS

There are many places where all basic VMA access flags (read, write,
exec) are initialized or checked against as a group.  One such example
is during page fault.  Existing vma_is_accessible() wrapper already
creates the notion of VMA accessibility as a group access permissions.

Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
will not only reduce code duplication but also extend the VMA
accessibility concept in general.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Springer <rspringer@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
Anshuman Khandual [Fri, 10 Apr 2020 21:33:05 +0000 (14:33 -0700)]
mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS

There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
existing VM_STACK_DEFAULT_FLAGS.  While here, also define some more
macros with standard VMA access flag combinations that are used
frequently across many platforms.  Apart from simplification, this
reduces code duplication as well.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory.c: add vm_insert_pages()
Arjun Roy [Fri, 10 Apr 2020 21:33:01 +0000 (14:33 -0700)]
mm/memory.c: add vm_insert_pages()

Add the ability to insert multiple pages at once to a user VM with lower
PTE spinlock operations.

The intention of this patch-set is to reduce atomic ops for tcp zerocopy
receives, which normally hits the same spinlock multiple times
consecutively.

[akpm@linux-foundation.org: pte_alloc() no longer takes the `addr' argument]
[arjunroy@google.com: add missing page_count() check to vm_insert_pages()]
Link: http://lkml.kernel.org/r/20200214005929.104481-1-arjunroy.kdev@gmail.com
[arjunroy@google.com: vm_insert_pages() checks if pte_index defined]
Link: http://lkml.kernel.org/r/20200228054714.204424-2-arjunroy.kdev@gmail.com
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200128025958.43490-2-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: define pte_index as macro for x86
Arjun Roy [Fri, 10 Apr 2020 21:32:58 +0000 (14:32 -0700)]
mm: define pte_index as macro for x86

pte_index() is either defined as a macro (e.g.  sparc64) or as an
inlined function (e.g.  x86).  vm_insert_pages() depends on pte_index
but it is not defined on all platforms (e.g.  m68k).

To fix compilation of vm_insert_pages() on architectures not providing
pte_index(), we perform the following fix:

0. For platforms where it is meaningful, and defined as a macro, no
    change is needed.
1. For platforms where it is meaningful and defined as an inlined
    function, and we want to use it with vm_insert_pages(), we define
    a degenerate macro of the form:  #define pte_index pte_index
2. vm_insert_pages() checks for the existence of a pte_index macro
   definition. If found, it implements a batched insert. If not found,
   it devolves to calling vm_insert_page() in a loop.

This patch implements step 1 for x86.

v3 of this patch fixes a compilation warning for an unused method.
v2 of this patch moved a macro definition to a more readable location.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200228054714.204424-1-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: bring sparc pte_index() semantics inline with other platforms
Arjun Roy [Fri, 10 Apr 2020 21:32:54 +0000 (14:32 -0700)]
mm: bring sparc pte_index() semantics inline with other platforms

pte_index() on platforms other than sparc return a numerical index.  On
sparc, it returns a pte_t*.  This presents an issue for
vm_insert_pages(), which relies on pte_index() to find the offset for a
pte within a pmd, for batched inserts.

This patch:
1. Modifies pte_index() for sparc to return a numerical index, like
   other platforms,
2. Defines pte_entry() for sparc which returns a pte_t*
   (as pte_index() used to),
3. Converts existing sparc callers for pte_index() to use pte_entry().

[sfr@canb.auug.org.au: remove pte_entry and just directly modified pte_offset_kernel instead]
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arjun Roy <arjunroy.kdev@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Link: http://lkml.kernel.org/r/20200227105045.6b421d9f@canb.auug.org.au
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory.c: refactor insert_page to prepare for batched-lock insert
Arjun Roy [Fri, 10 Apr 2020 21:32:51 +0000 (14:32 -0700)]
mm/memory.c: refactor insert_page to prepare for batched-lock insert

Add helper methods for vm_insert_page()/insert_page() to prepare for
vm_insert_pages(), which batch-inserts pages to reduce spinlock
operations when inserting multiple consecutive pages into the user page
table.

The intention of this patch-set is to reduce atomic ops for tcp zerocopy
receives, which normally hits the same spinlock multiple times
consecutively.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200128025958.43490-1-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
Jaewon Kim [Fri, 10 Apr 2020 21:32:48 +0000 (14:32 -0700)]
mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area

On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
arch_get_unmapped_area_topdown did not set align_offset.  Internally on
both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
then info->align_offset was meaningless.

But commit df529cabb7a2 ("mm: mmap: add trace point of
vm_unmapped_area") always prints info->align_offset even though it is
uninitialized.

Fix this uninitialized value issue by setting it to 0 explicitly.

Before:
  vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022

After:
  vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: hugetlb: optionally allocate gigantic hugepages using cma
Roman Gushchin [Fri, 10 Apr 2020 21:32:45 +0000 (14:32 -0700)]
mm: hugetlb: optionally allocate gigantic hugepages using cma

Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation
at runtime") has added the run-time allocation of gigantic pages.

However it actually works only at early stages of the system loading,
when the majority of memory is free.  After some time the memory gets
fragmented by non-movable pages, so the chances to find a contiguous 1GB
block are getting close to zero.  Even dropping caches manually doesn't
help a lot.

At large scale rebooting servers in order to allocate gigantic hugepages
is quite expensive and complex.  At the same time keeping some constant
percentage of memory in reserved hugepages even if the workload isn't
using it is a big waste: not all workloads can benefit from using 1 GB
pages.

The following solution can solve the problem:
1) On boot time a dedicated cma area* is reserved. The size is passed
   as a kernel argument.
2) Run-time allocations of gigantic hugepages are performed using the
   cma allocator and the dedicated cma area

In this case gigantic hugepages can be allocated successfully with a
high probability, however the memory isn't completely wasted if nobody
is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
etc.

* On a multi-node machine a per-node cma area is allocated on each node.
  Following gigantic hugetlb allocation are using the first available
  numa node if the mask isn't specified by a user.

Usage:
1) configure the kernel to allocate a cma area for hugetlb allocations:
   pass hugetlb_cma=10G as a kernel argument

2) allocate hugetlb pages as usual, e.g.
   echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

If the option isn't enabled or the allocation of the cma area failed,
the current behavior of the system is preserved.

x86 and arm-64 are covered by this patch, other architectures can be
trivially added later.

The patch contains clean-ups and fixes proposed and implemented by Aslan
Bakirov and Randy Dunlap.  It also contains ideas and suggestions
proposed by Rik van Riel, Michal Hocko and Mike Kravetz.  Thanks!

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Andreas Schaufler <andreas.schaufler@gmx.de>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Aslan Bakirov <aslan@fb.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: cma: NUMA node interface
Aslan Bakirov [Fri, 10 Apr 2020 21:32:42 +0000 (14:32 -0700)]
mm: cma: NUMA node interface

I've noticed that there is no interface exposed by CMA which would let
me to declare contigous memory on particular NUMA node.

This patchset adds the ability to try to allocate contiguous memory on a
specific node.  It will fallback to other nodes if the specified one
doesn't work.

Implement a new method for declaring contigous memory on particular node
and keep cma_declare_contiguous() as a wrapper.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Aslan Bakirov <aslan@fb.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Andreas Schaufler <andreas.schaufler@gmx.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Link: http://lkml.kernel.org/r/20200407163840.92263-2-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: no need try to truncate file beyond i_size
Changwei Ge [Fri, 10 Apr 2020 21:32:38 +0000 (14:32 -0700)]
ocfs2: no need try to truncate file beyond i_size

Linux fallocate(2) with FALLOC_FL_PUNCH_HOLE mode set, its offset can
exceed the inode size.  Ocfs2 now doesn't allow that offset beyond inode
size.  This restriction is not necessary and violates fallocate(2)
semantics.

If fallocate(2) offset is beyond inode size, just return success and do
nothing further.

Otherwise, ocfs2 will crash the kernel.

  kernel BUG at fs/ocfs2//alloc.c:7264!
   ocfs2_truncate_inline+0x20f/0x360 [ocfs2]
   ocfs2_remove_inode_range+0x23c/0xcb0 [ocfs2]
   __ocfs2_change_file_space+0x4a5/0x650 [ocfs2]
   ocfs2_fallocate+0x83/0xa0 [ocfs2]
   vfs_fallocate+0x148/0x230
   SyS_fallocate+0x48/0x80
   do_syscall_64+0x79/0x170

Signed-off-by: Changwei Ge <chge@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200407082754.17565-1-chge@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_alloc: make pcpu_drain_mutex and pcpu_drain static
Jason Yan [Fri, 10 Apr 2020 21:32:32 +0000 (14:32 -0700)]
mm/page_alloc: make pcpu_drain_mutex and pcpu_drain static

Fix the following sparse warning:

  mm/page_alloc.c:106:1: warning: symbol 'pcpu_drain_mutex' was not declared. Should it be static?
  mm/page_alloc.c:107:1: warning: symbol '__pcpu_scope_pcpu_drain' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200407023925.46438-1-yanaijie@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_alloc.c: fix kernel-doc warning
Randy Dunlap [Fri, 10 Apr 2020 21:32:29 +0000 (14:32 -0700)]
mm/page_alloc.c: fix kernel-doc warning

Add description of function parameter 'mt' to fix kernel-doc warning:

  mm/page_alloc.c:3246: warning: Function parameter or member 'mt' not described in '__putback_isolated_page'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: http://lkml.kernel.org/r/02998bd4-0b82-2f15-2570-f86130304d1e@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodocs: mm: slab.h: fix a broken cross-reference
Mauro Carvalho Chehab [Fri, 10 Apr 2020 21:32:25 +0000 (14:32 -0700)]
docs: mm: slab.h: fix a broken cross-reference

There is a typo at the cross-reference link, causing this warning:

  include/linux/slab.h:11: WARNING: undefined label: memory-allocation (if the link has no caption the label must precede a section header)

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: http://lkml.kernel.org/r/0aeac24235d356ebd935d11e147dcc6edbb6465c.1586359676.git.mchehab+huawei@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>