OSDN Git Service

uclinux-h8/linux.git
5 years agopowerpc/eeh: Fix debugfs_simple_attr.cocci warnings
YueHaibing [Thu, 20 Dec 2018 02:42:51 +0000 (02:42 +0000)]
powerpc/eeh: Fix debugfs_simple_attr.cocci warnings

Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg
Diana Craciun [Wed, 12 Dec 2018 14:03:10 +0000 (16:03 +0200)]
powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Update Spectre v2 reporting
Diana Craciun [Wed, 12 Dec 2018 14:03:09 +0000 (16:03 +0200)]
powerpc/fsl: Update Spectre v2 reporting

Report branch predictor state flush as a mitigation for
Spectre variant 2.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used
Diana Craciun [Wed, 12 Dec 2018 14:03:08 +0000 (16:03 +0200)]
powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used

If the user choses not to use the mitigations, replace
the code sequence with nops.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Flush branch predictor when entering KVM
Diana Craciun [Wed, 12 Dec 2018 14:03:07 +0000 (16:03 +0200)]
powerpc/fsl: Flush branch predictor when entering KVM

Switching from the guest to host is another place
where the speculative accesses can be exploited.
Flush the branch predictor when entering KVM.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)
Diana Craciun [Wed, 12 Dec 2018 14:03:06 +0000 (16:03 +0200)]
powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)

In order to protect against speculation attacks on
indirect branches, the branch predictor is flushed at
kernel entry to protect for the following situations:
- userspace process attacking another userspace process
- userspace process attacking the kernel
Basically when the privillege level change (i.e.the kernel
is entered), the branch predictor state is flushed.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Flush the branch predictor at each kernel entry (64bit)
Diana Craciun [Wed, 12 Dec 2018 14:03:05 +0000 (16:03 +0200)]
powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)

In order to protect against speculation attacks on
indirect branches, the branch predictor is flushed at
kernel entry to protect for the following situations:
- userspace process attacking another userspace process
- userspace process attacking the kernel
Basically when the privillege level change (i.e. the
kernel is entered), the branch predictor state is flushed.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Add nospectre_v2 command line argument
Diana Craciun [Wed, 12 Dec 2018 14:03:04 +0000 (16:03 +0200)]
powerpc/fsl: Add nospectre_v2 command line argument

When the command line argument is present, the Spectre variant 2
mitigations are disabled.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Emulate SPRN_BUCSR register
Diana Craciun [Wed, 12 Dec 2018 14:03:03 +0000 (16:03 +0200)]
powerpc/fsl: Emulate SPRN_BUCSR register

In order to flush the branch predictor the guest kernel performs
writes to the BUCSR register which is hypervisor privilleged. However,
the branch predictor is flushed at each KVM entry, so the branch
predictor has been already flushed, so just return as soon as possible
to guest.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Tweak comment formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Fix spectre_v2 mitigations reporting
Diana Craciun [Wed, 12 Dec 2018 14:03:02 +0000 (16:03 +0200)]
powerpc/fsl: Fix spectre_v2 mitigations reporting

Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect:

  $ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
  "Mitigation: Software count cache flush"

Which is wrong. Fix it to report vulnerable for now.

Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Add macro to flush the branch predictor
Diana Craciun [Wed, 12 Dec 2018 14:03:01 +0000 (16:03 +0200)]
powerpc/fsl: Add macro to flush the branch predictor

The BUCSR register can be used to invalidate the entries in the
branch prediction mechanisms.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Add infrastructure to fixup branch predictor flush
Diana Craciun [Wed, 12 Dec 2018 14:03:00 +0000 (16:03 +0200)]
powerpc/fsl: Add infrastructure to fixup branch predictor flush

In order to protect against speculation attacks (Spectre
variant 2) on NXP PowerPC platforms, the branch predictor
should be flushed when the privillege level is changed.
This patch is adding the infrastructure to fixup at runtime
the code sections that are performing the branch predictor flush
depending on a boot arg parameter which is added later in a
separate patch.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/prom: move the device tree if not in declared memory.
Christophe Leroy [Mon, 17 Dec 2018 14:18:27 +0000 (14:18 +0000)]
powerpc/prom: move the device tree if not in declared memory.

If the device tree doesn't reside in the memory which is declared
inside it, it has to be moved as well as this memory will not be
mapped by the kernel.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Add some documentation of ISA versions
Michael Ellerman [Mon, 5 Nov 2018 09:01:01 +0000 (20:01 +1100)]
powerpc: Add some documentation of ISA versions

Add some documentation on which CPU versions map to which ISA
versions. This is all publicly available information, some of it
already in the kernel source, but it's much nicer to have it all in
one place.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/configs: Don't enable PPC_EARLY_DEBUG in defconfigs
Michael Ellerman [Fri, 14 Dec 2018 13:57:11 +0000 (00:57 +1100)]
powerpc/configs: Don't enable PPC_EARLY_DEBUG in defconfigs

This reverts the remains of commit b9ef7d6b11c1 ("powerpc: Update
default configurations").

That commit was proceeded by a commit which added a config option to
control use of BOOTX for early debug, ie. PPC_EARLY_DEBUG_BOOTX, and
then the update of the defconfigs was intended to not change behaviour
by then enabling the new config option.

However enabling PPC_EARLY_DEBUG had other consequences, notably
causing us to register the udbg console at the end of udbg_early_init().

This means on a system which doesn't have anything that BOOTX can
use (most systems), we register the udbg console very early but the
bootx code just throws everything away, meaning early boot messages
are never printed to the console.

What we want to happen is for the udbg console to only be registered
later (from setup_arch()) once we've setup udbg_putc, and then all
early boot messages will be replayed.

Fixes: b9ef7d6b11c1 ("powerpc: Update default configurations")
Reported-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: eeh_event: convert semaphore to completion
Arnd Bergmann [Mon, 10 Dec 2018 21:51:57 +0000 (22:51 +0100)]
powerpc: eeh_event: convert semaphore to completion

For this use case, completions and semaphores are equivalent,
but semaphores are an awkward interface that should generally
be avoided, so use the completion instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoocxl/afu_irq: Don't include <asm/pnv-ocxl.h>
Greg Kurz [Mon, 10 Dec 2018 15:13:38 +0000 (16:13 +0100)]
ocxl/afu_irq: Don't include <asm/pnv-ocxl.h>

The AFU irq code doesn't need to reach out to the platform.

Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoocxl: Clarify error path in setup_xsl_irq()
Greg Kurz [Mon, 10 Dec 2018 15:18:13 +0000 (16:18 +0100)]
ocxl: Clarify error path in setup_xsl_irq()

Implementing rollback with goto and labels is a common practice that
leads to prettier and more maintainable code. FWIW, this design pattern
is already being used in alloc_link() a few lines below in this file.

Do the same in setup_xsl_irq().

Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/44x/bamboo: Fix PCI range
Benjamin Herrenschmidt [Tue, 11 Dec 2018 02:27:33 +0000 (13:27 +1100)]
powerpc/44x/bamboo: Fix PCI range

The bamboo dts has a bug: it uses a non-naturally aligned range
for PCI memory space. This isnt' supported by the code, thus
causing PCI to break on this system.

This is due to the fact that while the chip memory map has 1G
reserved for PCI memory, it's only 512M aligned. The code doesn't
know how to split that into 2 different PMMs and fails, so limit
the region to 512M.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pasemi: Add Nemo board IRQ initroutine
Darren Stevens [Sun, 19 Aug 2018 20:26:28 +0000 (21:26 +0100)]
powerpc/pasemi: Add Nemo board IRQ initroutine

Add a IRQ init routine for the Nemo board which inits and attatches
the i8259 found in the SB600, and a cascade routine to dispatch the
interrupts.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pasemi: Add Nemo board device init code.
Darren Stevens [Sun, 19 Aug 2018 20:23:57 +0000 (21:23 +0100)]
powerpc/pasemi: Add Nemo board device init code.

Add routines for Nemo specific devices to init at boot time, these
being board level power-off and SB600's rtc.

Also add a run time variable to prevent these being activated
if we boot on a reference board.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pasemi: Add Nemo board IRQ initroutine
Darren Stevens [Sun, 19 Aug 2018 20:21:47 +0000 (21:21 +0100)]
powerpc/pasemi: Add Nemo board IRQ initroutine

Add a IRQ init routine for the Nemo board which inits and attatches
the i8259 found in the SB600, and a cascade routine to dispatch the
interrupts.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pasemi: Add PCI initialisation for Nemo board.
Darren Stevens [Sun, 19 Aug 2018 20:21:55 +0000 (21:21 +0100)]
powerpc/pasemi: Add PCI initialisation for Nemo board.

The A-Eon Amigaone X1000's Nemo motherboard has an AMD SB600
connected to one of the PCI-e root ports on its PaSemi
Pwrficient 1628M SoC. Normally the SB600 southbridge would be
connected to a hidden PCI-e port on the system's northbridge,
and as a result doesn't fully comply with the PCI-e spec.

Add code to relax the PCI-e detection in both the root port
and the Linux kernel allowing on board devices to be detected.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Make NULL pointer deferences explicit on bad page faults.
Christophe Leroy [Fri, 14 Dec 2018 15:23:33 +0000 (15:23 +0000)]
powerpc/mm: Make NULL pointer deferences explicit on bad page faults.

As several other arches including x86, this patch makes it explicit
that a bad page fault is a NULL pointer dereference when the fault
address is lower than PAGE_SIZE

In the mean time, this page makes all bad_page_fault() messages
shorter so that they remain on one single line. And it prefixes them
by "BUG: " so that they get easily grepped.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Avoid pr_cont()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/ptrace: Combine SYSCALL_EMU & SYSCALL_TRACE handling
Dmitry V. Levin [Sun, 16 Dec 2018 17:28:28 +0000 (20:28 +0300)]
powerpc/ptrace: Combine SYSCALL_EMU & SYSCALL_TRACE handling

Combine the SYSCALL_EMU and SYSCALL_TRACE handling so that we only
call tracehook_report_syscall_entry() in one place.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
[mpe: Flesh out change log, s/cached_flags/flags/, reflow comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: use mm zones more sensibly
Christoph Hellwig [Sun, 16 Dec 2018 16:53:49 +0000 (17:53 +0100)]
powerpc: use mm zones more sensibly

Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.

Move to a scheme closer to what other architectures use (and I dare to
say the intent of the system):

 - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
 - ZONE_NORMAL: everything addressable by the kernel
 - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels

Also provide information on how ZONE_DMA is used by defining
ARCH_ZONE_DMA_BITS.

Contains various fixes from Benjamin Herrenschmidt.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agocxl: drop the dma_set_mask callback from vphb
Christoph Hellwig [Sun, 16 Dec 2018 17:19:51 +0000 (18:19 +0100)]
cxl: drop the dma_set_mask callback from vphb

The CXL code never even looks at the dma mask, so there is no good
reason for this sanity check.  Remove it because it gets in the way
of the dma ops refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: split the two __dma_alloc_coherent implementations
Christoph Hellwig [Sun, 16 Dec 2018 17:19:50 +0000 (18:19 +0100)]
powerpc/dma: split the two __dma_alloc_coherent implementations

The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
any code with the one for systems with coherent caches.  Split it off
and merge it with the helpers in dma-noncoherent.c that have no other
callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove the unused dma_iommu_ops export
Christoph Hellwig [Sun, 16 Dec 2018 17:19:49 +0000 (18:19 +0100)]
powerpc/dma: remove the unused dma_iommu_ops export

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove the unused ISA_DMA_THRESHOLD export
Christoph Hellwig [Sun, 16 Dec 2018 17:19:48 +0000 (18:19 +0100)]
powerpc/dma: remove the unused ISA_DMA_THRESHOLD export

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define
Christoph Hellwig [Sun, 16 Dec 2018 17:19:47 +0000 (18:19 +0100)]
powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agocrypto4xx_core: don't abuse __dma_sync_page
Christoph Hellwig [Sun, 16 Dec 2018 17:19:46 +0000 (18:19 +0100)]
crypto4xx_core: don't abuse __dma_sync_page

This function is internal to the DMA API implementation. Instead use
the DMA API to properly unmap. Note that the DMA API usage in this
driver is a disaster and urgently needs some work - it is missing all
the unmaps, seems to do a secondary map where it looks like it should
to a unmap in one place to work around cache coherency and the
directions passed in seem to be partially wrong.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: properly wire up the unmap_page and unmap_sg methods
Christoph Hellwig [Sun, 16 Dec 2018 17:19:45 +0000 (18:19 +0100)]
powerpc/dma: properly wire up the unmap_page and unmap_sg methods

The unmap methods need to transfer memory ownership back from the
device to the cpu by identical means as dma_sync_*_to_cpu. I'm not
sure powerpc needs to do any work in this transfer direction, but
given that it does invalidate the caches in dma_sync_*_to_cpu already
we should make sure we also do so on unmapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[mpe: s/dir/direction in dma_nommu_unmap_page()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: allow NOT_COHERENT_CACHE for amigaone
Christoph Hellwig [Sun, 16 Dec 2018 17:19:44 +0000 (18:19 +0100)]
powerpc: allow NOT_COHERENT_CACHE for amigaone

AMIGAONE selects NOT_COHERENT_CACHE, so we better allow it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/prom: fix early DEBUG messages
Christophe Leroy [Fri, 14 Dec 2018 10:27:47 +0000 (10:27 +0000)]
powerpc/prom: fix early DEBUG messages

This patch fixes early DEBUG messages in prom.c:
- Use %px instead of %p to see the addresses
- Cast memblock_phys_mem_size() with (unsigned long long) to
avoid build failure when phys_addr_t is not 64 bits.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoocxl: Fix endiannes bug in ocxl_link_update_pe()
Greg Kurz [Sun, 16 Dec 2018 21:28:50 +0000 (22:28 +0100)]
ocxl: Fix endiannes bug in ocxl_link_update_pe()

All fields in the PE are big-endian. Use cpu_to_be32() like everywhere
else something is written to the PE. Otherwise a wrong TID will be used
by the NPU. If this TID happens to point to an existing thread sharing
the same mm, it could be woken up by error. This is highly improbable
though. The likely outcome of this is the NPU not finding the target
thread and forcing the AFU into sending an interrupt, which userspace
is supposed to handle anyway.

Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9")
Cc: stable@vger.kernel.org # v4.18
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Avoid unsupported flags with clang
Joel Stanley [Mon, 12 Nov 2018 05:28:06 +0000 (15:58 +1030)]
powerpc/32: Avoid unsupported flags with clang

When building for ppc32 with clang these flags are unsupported:

  -ffixed-r2 and -mmultiple

llvm's lib/Target/PowerPC/PPCRegisterInfo.cpp marks r2 as reserved on
when building for SVR4ABI and !ppc64:

  // The SVR4 ABI reserves r2 and r13
  if (Subtarget.isSVR4ABI()) {
    // We only reserve r2 if we need to use the TOC pointer. If we have no
    // explicit uses of the TOC pointer (meaning we're a leaf function with
    // no constant-pool loads, etc.) and we have no potential uses inside an
    // inline asm block, then we can treat r2 has an ordinary callee-saved
    // register.
    const PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
    if (!TM.isPPC64() || FuncInfo->usesTOCBasePtr() || MF.hasInlineAsm())
      markSuperRegs(Reserved, PPC::R2);  // System-reserved register
    markSuperRegs(Reserved, PPC::R13); // Small Data Area pointer register
  }

This means we can safely omit -ffixed-r2 when building for 32-bit
targets.

The -mmultiple/-mno-multiple flags are not supported by clang, so
platforms that might support multiple miss out on using multiple word
instructions.

We wrap these flags in cc-option so that when Clang gains support the
kernel will be able use these flags.

Clang 8 can then build a ppc44x_defconfig which boots in Qemu:

  make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-  ppc44x_defconfig
  ./scripts/config -e CONFIG_DEVTMPFS -d DEVTMPFS_MOUNT
  make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-

  qemu-system-ppc -M bamboo \
   -kernel arch/powerpc/boot/zImage \
   -dtb arch/powerpc/boot/dts/bamboo.dtb \
   -initrd ~/ppc32-440-rootfs.cpio \
   -nographic -serial stdio -monitor pty -append "console=ttyS0"

Link: https://github.com/ClangBuiltLinux/linux/issues/261
Link: https://bugs.llvm.org/show_bug.cgi?id=39556
Link: https://bugs.llvm.org/show_bug.cgi?id=39555
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoraid6/ppc: Fix build for clang
Joel Stanley [Fri, 2 Nov 2018 00:44:55 +0000 (11:14 +1030)]
raid6/ppc: Fix build for clang

We cannot build these files with clang as it does not allow altivec
instructions in assembly when -msoft-float is passed.

Jinsong Ji <jji@us.ibm.com> wrote:
> We currently disable Altivec/VSX support when enabling soft-float.  So
> any usage of vector builtins will break.
>
> Enable Altivec/VSX with soft-float may need quite some clean up work, so
> I guess this is currently a limitation.
>
> Removing -msoft-float will make it work (and we are lucky that no
> floating point instructions will be generated as well).

This is a workaround until the issue is resolved in clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=31177
Link: https://github.com/ClangBuiltLinux/linux/issues/239
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/perf: Remove l2 bus events from HW cache event array
Madhavan Srinivasan [Sun, 10 Jun 2018 14:27:02 +0000 (19:57 +0530)]
powerpc/perf: Remove l2 bus events from HW cache event array

Remove PM_L2_ST_MISS and PM_L2_ST from HW cache event array since
these are bus events. And these needs to be programmed in groups.
Hence remove them.

Fixes: f1fb60bfde65 ('powerpc/perf: Export Power9 generic and cache events to sysfs')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/perf: Add constraints for power9 l2/l3 bus events
Madhavan Srinivasan [Sun, 10 Jun 2018 14:27:01 +0000 (19:57 +0530)]
powerpc/perf: Add constraints for power9 l2/l3 bus events

In previous generation processors, both bus events and direct
events of performance monitoring unit can be individually
programmabled and monitored in PMCs.

But in Power9, L2/L3 bus events are always available as a
"bank" of 4 events. To obtain the counts for any of the
l2/l3 bus events in a given bank, the user will have to
program PMC4 with corresponding l2/l3 bus event for that
bank.

Patch enforce two contraints incase of L2/L3 bus events.

1)Any L2/L3 event when programmed is also expected to program corresponding
PMC4 event from that group.
2)PMC4 event should always been programmed first due to group constraint
logic limitation

For ex. consider these L3 bus events

PM_L3_PF_ON_CHIP_MEM (0x460A0),
PM_L3_PF_MISS_L3 (0x160A0),
PM_L3_CO_MEM (0x260A0),
PM_L3_PF_ON_CHIP_CACHE (0x360A0),

1) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
perf stat -e "{r160A0,r260A0,r360A0}" < >

And this is a VALID group for L3 Bus events:
perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < >

2) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
perf stat -e "{r260A0,r360A0}" < >

And this is a VALID group for L3 Bus events:
perf stat -e "{r460A0,r260A0,r360A0}" < >

3) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
perf stat -e "{r360A0}" < >

And this is a VALID group for L3 Bus events:
perf stat -e "{r460A0,r360A0}" < >

Patch here implements group constraint logic suggested by Michael Ellerman.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/perf: Fix unit_sel/cache_sel checks
Madhavan Srinivasan [Mon, 9 Oct 2017 14:12:40 +0000 (19:42 +0530)]
powerpc/perf: Fix unit_sel/cache_sel checks

Raw event code has couple of fields "unit" and "cache" in it, to capture
the "unit" to monitor for a given pmcxsel and cache reload qualifier to
program in MMCR1.

isa207_get_constraint() refers "unit" field to update the MMCRC (L2/L3)
Event bus control fields with "cache" bits of the raw event code.
These are power8 specific and not supported by PowerISA v3.0 pmu. So wrap
the checks to be power8 specific. Also, "cache" bit field is referred to
update MMCR1[16:17] and this check can be power8 specific.

Fixes: 7ffd948fae4cd ('powerpc/perf: factor out power8 pmu functions')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/perf: Cleanup cache_sel bits comment
Madhavan Srinivasan [Mon, 9 Oct 2017 14:12:39 +0000 (19:42 +0530)]
powerpc/perf: Cleanup cache_sel bits comment

Update the raw event code comment in power9-pmu.c with respect to
"cache" bits, since power9 MMCRC does not support these.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/perf: Update perf_regs structure to include SIER
Madhavan Srinivasan [Sun, 9 Dec 2018 09:25:35 +0000 (14:55 +0530)]
powerpc/perf: Update perf_regs structure to include SIER

On each sample, Sample Instruction Event Register (SIER) content
is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
but instead, SIER content is saved in the "dar" register of pt_regs.

Patch adds another entry to the perf_regs structure to include the "SIER"
printing which internally maps to the "dar" of pt_regs.

It also check for the SIER availability in the platform and present
value accordingly

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/perf: Fix thresholding counter data for unknown type
Madhavan Srinivasan [Sun, 9 Dec 2018 09:18:15 +0000 (14:48 +0530)]
powerpc/perf: Fix thresholding counter data for unknown type

MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value.
Thresholding counter can be used to count latency cycles such as
load miss to reload. But threshold counter value is not relevant
when the sampled instruction type is unknown or reserved. Patch to
fix the thresholding counter value to zero when sampled instruction
type is unknown or reserved.

Fixes: 170a315f41c6('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/hash: Handle user access of kernel address gracefully
Aneesh Kumar K.V [Mon, 26 Nov 2018 14:35:04 +0000 (20:05 +0530)]
powerpc/mm/hash: Handle user access of kernel address gracefully

In commit 2865d08dd9ea ("powerpc/mm: Move the DSISR_PROTFAULT sanity
check") we moved the protection fault access check before the vma
lookup. That means we hit that WARN_ON when user space accesses a
kernel address. Before that commit this was handled by find_vma() not
finding vma for the kernel address and considering that access as bad
area access.

Avoid the confusing WARN_ON and convert that to a ratelimited printk.

With the patch we now get:

for load:
  a.out[5997]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
  a.out[5997]: segfault (11) at c00000000000dea0 nip 1317c0798 lr 7fff80d6441c code 1 in a.out[1317c0000+10000]
  a.out[5997]: code: 60000000 60420000 3c4c0002 38427790 4bffff20 3c4c0002 38427784 fbe1fff8
  a.out[5997]: code: f821ffc1 7c3f0b78 60000000 e9228030 <89290000993f002f 60000000 383f0040

for exec:
  a.out[6067]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
  a.out[6067]: segfault (11) at c00000000000dea0 nip c00000000000dea0 lr 129d507b0 code 1
  a.out[6067]: Bad NIP, not dumping instructions.

Fixes: 2865d08dd9ea ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Breno Leitao <leitao@debian.org>
[mpe: Don't split printk() string across lines]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: add exec protection on powerpc 603
Christophe Leroy [Wed, 28 Nov 2018 17:21:10 +0000 (17:21 +0000)]
powerpc/mm: add exec protection on powerpc 603

The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

There is one "reserved" PTE bit available, this patch uses
it for _PAGE_EXEC.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: define an empty slice_init_new_context_exec()
Christophe Leroy [Mon, 10 Dec 2018 12:23:05 +0000 (12:23 +0000)]
powerpc/mm: define an empty slice_init_new_context_exec()

Define slice_init_new_context_exec() at all time to avoid

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/uaccess: fix warning/error with access_ok()
Christophe Leroy [Mon, 10 Dec 2018 06:50:09 +0000 (06:50 +0000)]
powerpc/uaccess: fix warning/error with access_ok()

With the following piece of code, the following compilation warning
is encountered:

if (_IOC_DIR(ioc) != _IOC_NONE) {
int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) {

drivers/platform/test/dev.c: In function 'my_ioctl':
drivers/platform/test/dev.c:219:7: warning: unused variable 'verify' [-Wunused-variable]
   int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

This patch fixes it by referencing 'type' in the macro allthough
doing nothing with it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: remove remaining bits from CONFIG_APUS
Christophe Leroy [Tue, 4 Dec 2018 17:59:15 +0000 (17:59 +0000)]
powerpc: remove remaining bits from CONFIG_APUS

commit f21f49ea639a ("[POWERPC] Remove the dregs of APUS support from
arch/powerpc") removed CONFIG_APUS, but forgot to remove the logic
which adapts tophys() and tovirt() for it.

This patch removes the last stale pieces.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/xmon: fix dump_segments()
Christophe Leroy [Fri, 16 Nov 2018 17:31:08 +0000 (17:31 +0000)]
powerpc/xmon: fix dump_segments()

mfsrin() takes segment num from bits 31-28 (IBM bits 0-3).

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Clarify bit numbering]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: add exception frame marker
Christophe Leroy [Tue, 27 Nov 2018 17:38:16 +0000 (17:38 +0000)]
powerpc/8xx: add exception frame marker

This patch adds STACK_FRAME_REGS_MARKER in the stack at exception entry
in order to see interrupts in call traces as below:

[    0.013964] Call Trace:
[    0.014014] [c0745db0] [c007a9d4] tick_periodic.constprop.5+0xd8/0x104 (unreliable)
[    0.014086] [c0745dc0] [c007aa20] tick_handle_periodic+0x20/0x9c
[    0.014181] [c0745de0] [c0009cd0] timer_interrupt+0xa0/0x264
[    0.014258] [c0745e10] [c000e484] ret_from_except+0x0/0x14
[    0.014390] --- interrupt: 901 at console_unlock.part.7+0x3f4/0x528
[    0.014390]     LR = console_unlock.part.7+0x3f0/0x528
[    0.014455] [c0745ee0] [c0050334] console_unlock.part.7+0x114/0x528 (unreliable)
[    0.014542] [c0745f30] [c00524e0] register_console+0x3d8/0x44c
[    0.014625] [c0745f60] [c0675aac] cpm_uart_console_init+0x18/0x2c
[    0.014709] [c0745f70] [c06614f4] console_init+0x114/0x1cc
[    0.014795] [c0745fb0] [c0658b68] start_kernel+0x300/0x3d8
[    0.014864] [c0745ff0] [c00022cc] start_here+0x44/0x98

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/book3s/32: fix number of bats in p/v_block_mapped()
Christophe Leroy [Fri, 16 Nov 2018 17:27:42 +0000 (17:27 +0000)]
powerpc/book3s/32: fix number of bats in p/v_block_mapped()

This patch fixes the loop in p_block_mapped() and v_block_mapped()
to scan the entire bat_addrs[] array.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Eliminate not possible mmu features at compile time
Christophe Leroy [Fri, 16 Nov 2018 17:08:03 +0000 (17:08 +0000)]
powerpc/mm: Eliminate not possible mmu features at compile time

Depending on the CONFIG selected, many of the MMU features are
not possible. Lets only get the possible ones in MMU_FTRS_POSSIBLE.

This allows gcc to get rid at compile time of code related to
not possible features.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/smp: Use code patching to restore reset vector
Christophe Leroy [Fri, 9 Nov 2018 17:33:32 +0000 (17:33 +0000)]
powerpc/smp: Use code patching to restore reset vector

Instead of hardcoding reset vector restore, use patch_instruction()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/44x: use patch_sites for TLB handlers patching
Christophe Leroy [Fri, 9 Nov 2018 17:33:30 +0000 (17:33 +0000)]
powerpc/44x: use patch_sites for TLB handlers patching

Use patch sites and associated helpers to manage TLB handlers
patching instead of hardcoding.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/signal: Use code patching instead of hardcoding
Christophe Leroy [Fri, 9 Nov 2018 17:33:28 +0000 (17:33 +0000)]
powerpc/signal: Use code patching instead of hardcoding

Instead of hardcoding code modifications, use code patching functions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: use modify_instruction_site()
Christophe Leroy [Fri, 9 Nov 2018 17:33:26 +0000 (17:33 +0000)]
powerpc/8xx: use modify_instruction_site()

Instead of hardcoding the TLB handlers patching, use
the newly created modify_instruction_site() helper.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/book3s/32: Use patch_site to patch hash functions
Christophe Leroy [Fri, 9 Nov 2018 17:33:24 +0000 (17:33 +0000)]
powerpc/book3s/32: Use patch_site to patch hash functions

Use patch_sites and the new modify_instruction_site() function
instead of hardcoding hash functions patching.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/book3s/32: Use MMU_FTR_HPTE_TABLE in head_32.S
Christophe Leroy [Fri, 9 Nov 2018 17:33:22 +0000 (17:33 +0000)]
powerpc/book3s/32: Use MMU_FTR_HPTE_TABLE in head_32.S

Instead of manually patching a blr at hash_page() entry in
MMU_init_hw(), this patch adds a features section in head_32.S

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: use patch_site_addr() in machine_init()
Christophe Leroy [Fri, 9 Nov 2018 17:33:20 +0000 (17:33 +0000)]
powerpc/32: use patch_site_addr() in machine_init()

Use patch_site_addr() instead of hardcoding the
address calculation in machine_init()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: add modify_instruction() and modify_instruction_site()
Christophe Leroy [Fri, 9 Nov 2018 17:33:17 +0000 (17:33 +0000)]
powerpc: add modify_instruction() and modify_instruction_site()

Add two helpers to avoid hardcoding of instructions modifications.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: simplify patch_instruction_site() and patch_branch_site()
Christophe Leroy [Fri, 9 Nov 2018 17:33:15 +0000 (17:33 +0000)]
powerpc: simplify patch_instruction_site() and patch_branch_site()

Using patch_site_addr() helper, patch_instruction_site() and
patch_branch_site() can be simplified and inlined.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: remove unused variable
Christophe Leroy [Thu, 18 Oct 2018 05:22:06 +0000 (05:22 +0000)]
powerpc/mm: remove unused variable

In file included from ./include/linux/hugetlb.h:445:0,
                 from arch/powerpc/kernel/setup-common.c:37:
./arch/powerpc/include/asm/hugetlb.h: In function â€˜huge_ptep_clear_flush’:
./arch/powerpc/include/asm/hugetlb.h:154:8: error: variable â€˜pte’ set but not used [-Werror=unused-but-set-variable]

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: implement CONFIG_DEBUG_VIRTUAL
Christophe Leroy [Wed, 19 Dec 2018 07:09:39 +0000 (07:09 +0000)]
powerpc: implement CONFIG_DEBUG_VIRTUAL

This patch implements CONFIG_DEBUG_VIRTUAL to warn about
incorrect use of virt_to_phys() and page_to_phys()

Below is the result of test_debug_virtual:

[    1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 test_debug_virtual_init+0x3c/0xd4
[    1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc5-00560-g6bfb52e23a00-dirty #532
[    1.457259] NIP:  c066c550 LR: c0650ccc CTR: c066c514
[    1.462257] REGS: c900bdb0 TRAP: 0700   Not tainted  (4.20.0-rc5-00560-g6bfb52e23a00-dirty)
[    1.471184] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 48000422  XER: 20000000
[    1.477811]
[    1.477811] GPR00: c0650ccc c900be60 c60d0000 00000000 006000c0 c9000000 00009032 c7fa0020
[    1.477811] GPR08: 00002400 00000001 09000000 00000000 c07b5d04 00000000 c00037d8 00000000
[    1.477811] GPR16: 00000000 00000000 00000000 00000000 c0760000 c0740000 00000092 c0685bb0
[    1.477811] GPR24: c065042c c068a734 c0685b8c 00000006 00000000 c0760000 c075c3c0 ffffffff
[    1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4
[    1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc
[    1.523163] Call Trace:
[    1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable)
[    1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc
[    1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc
[    1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110
[    1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c
[    1.553971] Instruction dump:
[    1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 554893be
[    1.564566] 7d294010 7d294910 90010034 39290001 <0f0900007c3e0b78 955e0008 3fe0c062
[    1.572425] ---[ end trace 6f6984225b280ad6 ]---
[    1.577467] PA: 0x09000000 for VA: 0xc9000000
[    1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agolib: fix build failure in CONFIG_DEBUG_VIRTUAL test
Christophe Leroy [Mon, 10 Dec 2018 08:08:28 +0000 (08:08 +0000)]
lib: fix build failure in CONFIG_DEBUG_VIRTUAL test

On several arches, virt_to_phys() is in io.h

Build fails without it:

  CC      lib/test_debug_virtual.o
lib/test_debug_virtual.c: In function 'test_debug_virtual_init':
lib/test_debug_virtual.c:26:7: error: implicit declaration of function 'virt_to_phys' [-Werror=implicit-function-declaration]
  pa = virt_to_phys(va);
       ^

Fixes: e4dace361552 ("lib: add test module for CONFIG_DEBUG_VIRTUAL")
CC: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Move the old 6xx -mcpu logic before the TARGET_CPU logic
Mathieu Malaterre [Wed, 5 Dec 2018 17:53:55 +0000 (18:53 +0100)]
powerpc/32: Move the old 6xx -mcpu logic before the TARGET_CPU logic

The code:

  ifdef CONFIG_6xx
  KBUILD_CFLAGS          += -mcpu=powerpc
  endif

was added in 2006 in commit f48b8296b315 ("[PATCH] powerpc32: Set cpu
explicitly in kernel compiles"). This change was acceptable since the
TARGET_CPU logic was 64-bit only.

Since commit 0e00a8c9fd92 ("powerpc: Allow CPU selection
also on PPC32") this logic is no longer acceptable after the TARGET_CPU
specific. It currently appends -mcpu=powerpc at the end of the command
line, after any TARGET_CPU specific:

  gcc -Wp,-MD,init/.do_mounts.o.d ...
    -mcpu=powerpc -mbig-endian -m32 ...
    -mcpu=e300c2 ...
    -mcpu=powerpc ...
    ../init/do_mounts.c

Fixes: 0e00a8c9fd92 ("powerpc: Allow CPU selection also on PPC32")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/ipic: Remove unused ipic_set_priority()
Michael Ellerman [Tue, 4 Dec 2018 10:09:14 +0000 (21:09 +1100)]
powerpc/ipic: Remove unused ipic_set_priority()

ipic_set_priority() has been unused since 2006 when the last usage was
removed in commit b9f0f1bb2bca ("[POWERPC] Adapt ipic driver to new
host_ops interface, add set_irq_type to set IRQ sense").

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoMerge branch 'fixes' into next
Michael Ellerman [Mon, 17 Dec 2018 11:11:54 +0000 (22:11 +1100)]
Merge branch 'fixes' into next

Merge our fixes branch again, this has a couple of build fixes and also
a change to do_syscall_trace_enter() that will conflict with a patch we
want to apply in next.

5 years agopowerpc/ptrace: replace ptrace_report_syscall() with a tracehook call
Elvira Khabirova [Fri, 7 Dec 2018 15:56:05 +0000 (18:56 +0300)]
powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call

Arch code should use tracehook_*() helpers, as documented in
include/linux/tracehook.h, ptrace_report_syscall() is not expected to
be used outside that file.

The patch does not look very nice, but at least it is correct
and opens the way for PTRACE_GET_SYSCALL_INFO API.

Co-authored-by: Dmitry V. Levin <ldv@altlinux.org>
Fixes: 5521eb4bca2d ("powerpc/ptrace: Add support for PTRACE_SYSEMU")
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
[mpe: Take this as a minimal fix for 4.20, we'll rework it later]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Fallback to RAM if the altmap is unusable
Oliver O'Halloran [Thu, 6 Dec 2018 15:17:14 +0000 (02:17 +1100)]
powerpc/mm: Fallback to RAM if the altmap is unusable

The "altmap" is used to provide a pool of memory that is reserved for
the vmemmap backing of hot-plugged memory. This is useful when adding
large amount of ZONE_DEVICE memory to a system with a limited amount of
normal memory.

On ppc64 we use huge pages to map the vmemmap which requires the backing
storage to be contigious and aligned to the hugepage size. The altmap
implementation allows for the altmap provider to reserve a few PFNs at
the start of the range for it's own uses and when this occurs the
first chunk of the altmap is not usable for hugepage mappings. On hash
there is no sane way to fall back to a normal sized page mapping so we
fail the allocation. This results in memory hotplug failing with
ENOMEM when the new range doesn't fall into an existing vmemmap block.

This patch handles this case by falling back to using system memory
rather than failing if we cannot allocate from the altmap. This
fallback should only ever be used for the first vmemmap block so it
should not cause excess memory consumption.

Fixes: 7b73d978a5d0 ("mm: pass the vmem_altmap to vmemmap_populate")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/papr_scm: Use ibm,unit-guid as the iset cookie
Oliver O'Halloran [Thu, 6 Dec 2018 15:17:13 +0000 (02:17 +1100)]
powerpc/papr_scm: Use ibm,unit-guid as the iset cookie

The interleave set cookie is used to determine if a label stored in the
metadata space should be applied to the current region. This is
important in the case of NVDIMMs since the firmware may change the
interleaving configuration of a DIMM which would invalidate the existing
labels. In our case the hypervisor hides those details from us so we
don't really care, but libnvdimm still requires the interleave set
cookie to be non-zero.

For our purposes we just need the set cookie to be unique and fixed for
a given PAPR SCM region and using the unit-guid (really a UUID) is fine
for this purpose.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Use kernel types (u64)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/papr_scm: Fix DIMM device registration race
Oliver O'Halloran [Thu, 6 Dec 2018 15:17:12 +0000 (02:17 +1100)]
powerpc/papr_scm: Fix DIMM device registration race

When a new nvdimm device is registered with libnvdimm via
nvdimm_create() it is added as a device on the nvdimm bus. The probe
function for the DIMM driver is potentially quite slow so actually
registering and probing the device is done in an async domain rather
than immediately after device creation. This can result in a race where
the region device (created 2nd) is probed first and fails to activate at
boot.

To fix this we use the same approach as the ACPI/NFIT driver which is to
check that all the DIMM devices registered successfully. LibNVDIMM
provides the nvdimm_bus_count_dimms() function which synchronises with
the async domain and verifies that the dimm was successfully registered
with the bus.

If either of these does not occur then we bail.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/papr_scm: Remove endian conversions
Oliver O'Halloran [Thu, 6 Dec 2018 15:17:11 +0000 (02:17 +1100)]
powerpc/papr_scm: Remove endian conversions

The return values of a h-call are returned in the CPU registers and
written to the provided buffer by the plpar_hcall() wrapper. As a result
the values written to memory are always in the native endian and should
not be byte swapped.

The inital implementation of the H-Call interface was done in qemu and
the returned values were byte swapped unnecessarily in both the
hypervisor and in the driver so this was only noticed when bringing up
the PowerVM implementation.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/papr_scm: Update DT properties
Oliver O'Halloran [Thu, 6 Dec 2018 15:17:10 +0000 (02:17 +1100)]
powerpc/papr_scm: Update DT properties

The ibm,unit-sizes property was originally specified as an array of two
u32s corresponding to the memory block size, and the number of blocks
available in that region. A fairly last-minute change to the SCM DT
specification was splitting that into two seperate u64 properties:
ibm,block-sizes and ibm,number-of-blocks that convey the same
information. No firmware / hypervisor that emitted the ibm,unit-size
property ever appeared in the wild.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Use kernel types (u32/u64)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/papr_scm: Fix resource end address
Oliver O'Halloran [Thu, 6 Dec 2018 15:17:09 +0000 (02:17 +1100)]
powerpc/papr_scm: Fix resource end address

Fix an off-by-one error in the memory resource range. This resource is
used to determine the address range of the memory to be hot-plugged as
ZONE_DEVICE memory. The current end address results in the kernel
attempting to map an additional memblock and the hypervisor may reject
the mapping resulting in the entire hot-plug failing.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/papr_scm: Use depend instead of select
Oliver O'Halloran [Thu, 6 Dec 2018 15:17:08 +0000 (02:17 +1100)]
powerpc/papr_scm: Use depend instead of select

Making PAPR_SCM select LIBNVDIMM results in circular dependencies in
Kconfig when another symbol depends on it. Fix this by replacing the
select with a depends.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Reported-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT
Sandipan Das [Thu, 6 Dec 2018 09:27:01 +0000 (14:57 +0530)]
powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT

Now that there are different variants of pt_regs for userspace and
kernel, the uapi for the BPF_PROG_TYPE_PERF_EVENT program type must be
changed by exporting the user_pt_regs structure instead of the pt_regs
structure that is in-kernel only.

Fixes: 002af9391bfb ("powerpc: Split user/kernel definitions of struct pt_regs")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/boot: Fix build failures with -j 1
Michael Ellerman [Mon, 3 Dec 2018 08:55:55 +0000 (19:55 +1100)]
powerpc/boot: Fix build failures with -j 1

In commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to
wrapper") we added a dependency to serial.c on autoconf.h:

  $(obj)/serial.c: $(obj)/autoconf.h

This works when building in-tree (ie. with KBUILD_OUTPUT unset)
because the obj tree is the src tree.

But when building with eg. O=build and -j 1 the build fails:

  gcc ... -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o arch/powerpc/boot/serial.c
  gcc: error: arch/powerpc/boot/serial.c: No such file or directory

Why this is only happening with -j 1 is not clear, when building with
-j greater than 1 somehow we decide to look for serial.c in the src
tree (../), eg:

  gcc -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o ../arch/powerpc/boot/serial.c

Regardless we shouldn't be specifying a dependency on serial.c in the
build tree, we want to add a dependency to the version in $(srctree)
so fix the rule to say that.

Fixes: 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper")
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: dump block address translation on book3s/32
Christophe Leroy [Mon, 3 Dec 2018 17:40:25 +0000 (17:40 +0000)]
powerpc/mm: dump block address translation on book3s/32

This patch adds a debugfs file to dump block address translation:

~# cat /sys/kernel/debug/powerpc/block_address_translation
---[ Instruction Block Address Translations ]---
0:         -
1:         -
2: 0xc0000000-0xcfffffff 0x00000000 Kernel EXEC coherent
3: 0xd0000000-0xdfffffff 0x10000000 Kernel EXEC coherent
4:         -
5:         -
6:         -
7:         -

---[ Data Block Address Translations ]---
0:         -
1:         -
2: 0xc0000000-0xcfffffff 0x00000000 Kernel RW coherent
3: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
4:         -
5:         -
6:         -
7:         -

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: dump segment registers on book3s/32
Christophe Leroy [Fri, 16 Nov 2018 17:06:43 +0000 (17:06 +0000)]
powerpc/mm: dump segment registers on book3s/32

This patch creates a debugfs file to see content of
segment registers

  # cat /sys/kernel/debug/segment_registers
  ---[ User Segments ]---
  0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xade2b0
  0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xade3c1
  0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xade4d2
  0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xade5e3
  0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xade6f4
  0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xade805
  0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xade916
  0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xadea27
  0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xadeb38
  0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xadec49
  0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xaded5a
  0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xadee6b

  ---[ Kernel Segments ]---
  0xc0000000-0xcfffffff Kern key 0 User key 1 VSID 0x000ccc
  0xd0000000-0xdfffffff Kern key 0 User key 1 VSID 0x000ddd
  0xe0000000-0xefffffff Kern key 0 User key 1 VSID 0x000eee
  0xf0000000-0xffffffff Kern key 0 User key 1 VSID 0x000fff

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Move it under /sys/kernel/debug/powerpc, make sr_init() __init]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/math-emu: Update macros from GCC
Joel Stanley [Mon, 3 Dec 2018 23:07:46 +0000 (09:37 +1030)]
powerpc/math-emu: Update macros from GCC

The add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros originate
from GCC's longlong.h which in turn was copied from GMP's longlong.h a
few decades ago.

This was found when compiling with clang:

   arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
   inline asm context requiring an l-value: remove the cast or build with
   -fheinous-gnu-extensions
           FP_ADD_D(R, T, B);
           ^~~~~~~~~~~~~~~~~
   ...

   ./arch/powerpc/include/asm/sfp-machine.h:283:27: note: expanded from
   macro 'sub_ddmmss'
                  : "=r" ((USItype)(sh)),                                  \
                          ~~~~~~~~~~^~~

Segher points out: this was fixed in GCC over 16 years ago
( https://gcc.gnu.org/r56600 ), and in GMP (where it comes from)
presumably before that.

Update the add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros to
the latest GCC version in order to git rid of the invalid casts. These
were taken as-is from GCC's longlong in order to make future syncs
obvious. Other parts of sfp-machine.h were left as-is as the file
contains more features than present in longlong.h.

Link: https://github.com/ClangBuiltLinux/linux/issues/260
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/tools/checkpatch: Ignore DT_SPLIT_BINDING_PATCH
Russell Currey [Tue, 4 Dec 2018 05:11:54 +0000 (16:11 +1100)]
powerpc/tools/checkpatch: Ignore DT_SPLIT_BINDING_PATCH

From what I've seen, every time this warning comes up it's bogus,
so let's ignore it.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: regroup TLB handler routines
Christophe Leroy [Thu, 29 Nov 2018 14:07:26 +0000 (14:07 +0000)]
powerpc/8xx: regroup TLB handler routines

As this is running with MMU off, the CPU only does speculative
fetch for code in the same page.

Following the significant size reduction of TLB handler routines,
the side handlers can be brought back close to the main part,
ie in the same page.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers
Christophe Leroy [Thu, 29 Nov 2018 14:07:24 +0000 (14:07 +0000)]
powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers

This patch reworks the TLB Miss handler in order to not use r12
register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2.

In the DAR Fixup code we can now use SPRN_M_TW, freeing
SPRN_SPRG_SCRATCH2.

Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: reintroduce 16K pages with HW assistance
Christophe Leroy [Thu, 29 Nov 2018 14:07:21 +0000 (14:07 +0000)]
powerpc/8xx: reintroduce 16K pages with HW assistance

Using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

In order to use hardware assistance with 16K pages, this patch does
the following modifications:
- Make PGD size independent of the main page size
- In 16k pages mode, redefine pte_t as a struct with 4 elements,
and populate those 4 elements in __set_pte_at() and pte_update()
- Adapt the size of the hugepage tables.
- Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: Enable 512k hugepage support with HW assistance
Christophe Leroy [Thu, 29 Nov 2018 14:07:19 +0000 (14:07 +0000)]
powerpc/8xx: Enable 512k hugepage support with HW assistance

For using 512k pages with hardware assistance, the PTEs have to be spread
every 128 bytes in the L2 table.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: Enable 8M hugepage support with HW assistance
Christophe Leroy [Thu, 29 Nov 2018 14:07:17 +0000 (14:07 +0000)]
powerpc/8xx: Enable 8M hugepage support with HW assistance

HW assistance naturally supports 8M huge pages without
further modifications.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: Use hardware assistance in TLB handlers
Christophe Leroy [Thu, 29 Nov 2018 14:07:15 +0000 (14:07 +0000)]
powerpc/8xx: Use hardware assistance in TLB handlers

Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.

The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.

However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

This patch modifies the TLB handlers to use HW assistance for 4K PAGES.

Before that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 80 ticks
- DTLB miss: 62 ticks
After that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 72 ticks
- DTLB miss: 54 ticks
So the improvement is 10% for ITLB and 13% for DTLB misses

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: Temporarily disable 16k pages and hugepages
Christophe Leroy [Thu, 29 Nov 2018 14:07:13 +0000 (14:07 +0000)]
powerpc/8xx: Temporarily disable 16k pages and hugepages

In preparation of making use of hardware assistance in TLB handlers,
this patch temporarily disables 16K pages and hugepages. The reason
is that when using HW assistance in 4K pages mode, the linux model
fit with the HW model for 4K pages and 8M pages.

However for 16K pages and 512K mode some additional work is needed
to get linux model fit with HW model.
For the 8M pages, they will naturaly come back when we switch to
HW assistance, without any additional handling.
In order to keep the following patch smaller, the removal of the
current special handling for 8M pages gets removed here as well.

Therefore the 4K pages mode will be implemented first and without
support for 512k hugepages. Then the 512k hugepages will be brought
back. And the 16K pages will be implemented in the following step.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: Move SW perf counters in first 32kb of memory
Christophe Leroy [Thu, 29 Nov 2018 14:07:11 +0000 (14:07 +0000)]
powerpc/8xx: Move SW perf counters in first 32kb of memory

In order to simplify time critical exceptions handling 8xx
specific SW perf counters, this patch moves the counters into
the beginning of memory. This is possible because .text is readable
and the counters are never modified outside of the handlers.

By doing this, we avoid having to set a second register with
the upper part of the address of the counters.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: remove unnecessary test in pgtable_cache_init()
Christophe Leroy [Thu, 29 Nov 2018 14:07:09 +0000 (14:07 +0000)]
powerpc/mm: remove unnecessary test in pgtable_cache_init()

pgtable_cache_add() gracefully handles the case when a cache that
size already exists by returning early with the following test:

if (PGT_CACHE(shift))
return; /* Already have a cache of this size */

It is then not needed to test the existence of the cache before.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: fix a warning when a cache is common to PGD and hugepages
Christophe Leroy [Thu, 29 Nov 2018 14:07:07 +0000 (14:07 +0000)]
powerpc/mm: fix a warning when a cache is common to PGD and hugepages

While implementing TLB miss HW assistance on the 8xx, the following
warning was encountered:

[  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 ___slab_alloc.constprop.30+0x26c/0x46c
[  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 4.18.0-rc8-00664-g2dfff9121c55 #671
[  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 00000004
[  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  (4.18.0-rc8-00664-g2dfff9121c55)
[  423.733147] MSR:  00021032 <ME,IR,DR,RI>  CR: 24224848  XER: 20000000
[  423.733319]
[  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 c0011b34 c7fa41e0 c455be30
[  423.733319] GPR08: 00000001 c00103a0 c7fa41e0 c49afcc4 24282842 10018840 c079b37c 00000040
[  423.733319] GPR16: 73f00000 00210d00 00000000 00000001 c455a000 00000100 00000200 c455a000
[  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 c7fa41e0 00000000 00009032
[  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
[  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734283] Call Trace:
[  423.734326] [c455bc50] [00000100] 0x100 (unreliable)
[  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
[  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
[  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
[  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
[  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
[  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
[  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
[  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
[  423.735271] Instruction dump:
[  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 4bfffe24 81370010
[  423.735536] 71280004 41a2ff88 4840c571 4bffff80 <0fe000004bfffeb8 81340010 712a0004
[  423.735757] ---[ end trace e9b222919a470790 ]---

This warning occurs when calling kmem_cache_zalloc() on a
cache having a constructor.

In this case it happens because PGD cache and 512k hugepte cache are
the same size (4k). While a cache with constructor is created for
the PGD, hugepages create cache without constructor and uses
kmem_cache_zalloc(). As both expect a cache with the same size,
the hugepages reuse the cache created for PGD, hence the conflict.

In order to avoid this conflict, this patch:
- modifies pgtable_cache_add() so that a zeroising constructor is
added for any cache size.
- replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER)
Christophe Leroy [Thu, 29 Nov 2018 14:07:05 +0000 (14:07 +0000)]
powerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER)

Instead of opencoding cache handling for the special case
of hugepage tables having a single pte_t element, this
patch makes use of the common pgtable_cache helpers

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: enable the use of page table cache of order 0
Christophe Leroy [Thu, 29 Nov 2018 14:07:03 +0000 (14:07 +0000)]
powerpc/mm: enable the use of page table cache of order 0

hugepages uses a cache of order 0. Lets allow page tables
of order 0 in the common part in order to avoid open coding
in hugetlb

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Extend pte_fragment functionality to PPC32
Christophe Leroy [Thu, 29 Nov 2018 14:07:01 +0000 (14:07 +0000)]
powerpc/mm: Extend pte_fragment functionality to PPC32

In order to allow the 8xx to handle pte_fragments, this patch
extends the use of pte_fragments to PPC32 platforms.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: add helpers to get/set mm.context->pte_frag
Christophe Leroy [Thu, 29 Nov 2018 14:06:59 +0000 (14:06 +0000)]
powerpc/mm: add helpers to get/set mm.context->pte_frag

In order to handle pte_fragment functions with single fragment
without adding pte_frag in all mm_context_t, this patch creates
two helpers which do nothing on platforms using a single fragment.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Move pgtable_t into platform headers
Christophe Leroy [Thu, 29 Nov 2018 14:06:57 +0000 (14:06 +0000)]
powerpc/mm: Move pgtable_t into platform headers

This patch move pgtable_t into platform headers.

It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: move platform specific mmu-xxx.h in platform directories
Christophe Leroy [Thu, 29 Nov 2018 14:06:55 +0000 (14:06 +0000)]
powerpc/mm: move platform specific mmu-xxx.h in platform directories

The purpose of this patch is to move platform specific
mmu-xxx.h files in platform directories like pte-xxx.h files.

In the meantime this patch creates common nohash and
nohash/32 + nohash/64 mmu.h files for future common parts.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Avoid useless lock with single page fragments
Christophe Leroy [Thu, 29 Nov 2018 14:06:53 +0000 (14:06 +0000)]
powerpc/mm: Avoid useless lock with single page fragments

There is no point in taking the page table lock as pte_frag or
pmd_frag are always NULL when we have only one fragment.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Move pte_fragment_alloc() to a common location
Christophe Leroy [Thu, 29 Nov 2018 14:06:51 +0000 (14:06 +0000)]
powerpc/mm: Move pte_fragment_alloc() to a common location

In preparation of next patch which generalises the use of
pte_fragment_alloc() for all, this patch moves the related functions
in a place that is common to all subarches.

The 8xx will need that for supporting 16k pages, as in that mode
page tables still have a size of 4k.

Since pte_fragment with only once fragment is not different
from what is done in the general case, we can easily migrate all
subarchs to pte fragments.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>