OSDN Git Service

uclinux-h8/linux.git
3 years agopowerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT
Athira Rajeev [Mon, 22 Mar 2021 14:57:23 +0000 (10:57 -0400)]
powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT

Performance Monitoring Unit (PMU) registers in powerpc provides
information on cycles elapsed between different stages in the
pipeline. This can be used for application tuning. On ISA v3.1
platform, this information is exposed by sampling registers.
Patch adds kernel support to capture two of the cycle counters
as part of perf sample using the sample type:
PERF_SAMPLE_WEIGHT_STRUCT.

The power PMU function 'get_mem_weight' currently uses 64 bit weight
field of perf_sample_data to capture memory latency. But following the
introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
64-bit or 32-bit value depending on the architexture support for
PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
pipeline stage cycles info. Hence update the ppmu functions to work for
64-bit and 32-bit weight values.

If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
latency is stored in the low 32bits of perf_sample_weight structure.
Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
two 16 bit fields of perf_sample_weight structure.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1616425047-1666-2-git-send-email-atrajeev@linux.vnet.ibm.com
3 years agoDocumentation/powerpc: Add proper links for manual and tests
Haren Myneni [Sun, 18 Apr 2021 19:29:42 +0000 (12:29 -0700)]
Documentation/powerpc: Add proper links for manual and tests

The links that are mentioned in this document are no longer
valid. So changed the proper links for NXGZIP user manual and
test cases.

Reported-by: Bulent Abali <abali@us.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/08511c1e92ac239f20ac88c73c59d1f8cf02e6ad.camel@linux.ibm.com
3 years agopowerpc/pseries: Set UNISOLATE on dlpar_cpu_remove() failure
Daniel Henrique Barboza [Fri, 16 Apr 2021 21:02:16 +0000 (18:02 -0300)]
powerpc/pseries: Set UNISOLATE on dlpar_cpu_remove() failure

The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
for both QEMU and phyp. This gives us an opportunity to use this
behavior to signal the hypervisor layer when an error during device
removal happens, allowing it to do a proper error handling, while not
breaking QEMU/phyp implementations that don't have this support.

This patch introduces this idea by unisolating all CPU DRCs that failed
to be removed by dlpar_cpu_remove_by_index(), when handling the
PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
only because its the only CPU removal event QEMU uses, and there's no
need at this moment to add this mechanism for phyp only code.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210416210216.380291-3-danielhb413@gmail.com
3 years agopowerpc/pseries: Introduce dlpar_unisolate_drc()
Daniel Henrique Barboza [Fri, 16 Apr 2021 21:02:15 +0000 (18:02 -0300)]
powerpc/pseries: Introduce dlpar_unisolate_drc()

Next patch will execute a set-indicator call in hotplug-cpu.c.

Create a dlpar_unisolate_drc() helper to avoid spreading more
rtas_set_indicator() calls outside of dlpar.c.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210416210216.380291-2-danielhb413@gmail.com
3 years agopowerpc/pseries/mce: Fix a typo in error type assignment
Ganesh Goudar [Fri, 16 Apr 2021 12:57:50 +0000 (18:27 +0530)]
powerpc/pseries/mce: Fix a typo in error type assignment

The error type is ICACHE not DCACHE, for case MCE_ERROR_TYPE_ICACHE.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210416125750.49550-1-ganeshgr@linux.ibm.com
3 years agopowerpc/fadump: Fix compile error since trap type change
Michael Ellerman [Mon, 19 Apr 2021 12:24:32 +0000 (22:24 +1000)]
powerpc/fadump: Fix compile error since trap type change

sfr reports that the allyesconfig build fails with:

  arch/powerpc/kernel/fadump.c: In function 'crash_fadump':
  arch/powerpc/kernel/fadump.c:731:28: error: 'INTERRUPT_SYSTEM_RESET' undeclared
    731 |  if (TRAP(&(fdh->regs)) == INTERRUPT_SYSTEM_RESET) {

Add an include of interrupt.h to fix it.

Fixes: 7153d4bf0b37 ("powerpc/traps: Enhance readability for trap types")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Reformat change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210419191425.281dc58a@canb.auug.org.au
3 years agopowerpc/perf: Add platform specific check_attr_config
Madhavan Srinivasan [Thu, 8 Apr 2021 07:45:04 +0000 (13:15 +0530)]
powerpc/perf: Add platform specific check_attr_config

Add platform specific attr.config value checks. Patch
includes checks for both power9 and power10.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408074504.248211-2-maddy@linux.ibm.com
3 years agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Sun, 18 Apr 2021 13:55:12 +0000 (23:55 +1000)]
Merge branch 'topic/ppc-kvm' into next

Merge some powerpc KVM patches we are keeping in a topic branch just in
case anyone else needs to merge them.

3 years agopowerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processors
Nicholas Piggin [Fri, 2 Apr 2021 02:41:24 +0000 (12:41 +1000)]
powerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processors

Starting with ISA v3.1, LPCR[AIL] no longer controls the interrupt
mode for HV=1 interrupts. Instead, a new LPCR[HAIL] bit is defined
which behaves like AIL=3 for HV interrupts when set.

Set HAIL on bare metal to give us mmu-on interrupts and improve
performance.

This also fixes an scv bug: we don't implement scv real mode (AIL=0)
vectors because they are at an inconvenient location, so we just
disable scv support when AIL can not be set. However powernv assumes
that LPCR[AIL] will enable AIL mode so it enables scv support despite
HV interrupts being AIL=0, which causes scv interrupts to go off into
the weeds.

Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210402024124.545826-1-npiggin@gmail.com
3 years agopowerpc/smp: Set numa node before updating mask
Srikar Dronamraju [Thu, 1 Apr 2021 15:42:00 +0000 (21:12 +0530)]
powerpc/smp: Set numa node before updating mask

Geethika reported a trace when doing a dlpar CPU add.

------------[ cut here ]------------
WARNING: CPU: 152 PID: 1134 at kernel/sched/topology.c:2057
CPU: 152 PID: 1134 Comm: kworker/152:1 Not tainted 5.12.0-rc5-master #5
Workqueue: events cpuset_hotplug_workfn
NIP:  c0000000001cfc14 LR: c0000000001cfc10 CTR: c0000000007e3420
REGS: c0000034a08eb260 TRAP: 0700   Not tainted  (5.12.0-rc5-master+)
MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28828422  XER: 00000020
CFAR: c0000000001fd888 IRQMASK: 0 #012GPR00: c0000000001cfc10
c0000034a08eb500 c000000001f35400 0000000000000027 #012GPR04:
c0000035abaa8010 c0000035abb30a00 0000000000000027 c0000035abaa8018
#012GPR08: 0000000000000023 c0000035abaaef48 00000035aa540000
c0000035a49dffe8 #012GPR12: 0000000028828424 c0000035bf1a1c80
0000000000000497 0000000000000004 #012GPR16: c00000000347a258
0000000000000140 c00000000203d468 c000000001a1a490 #012GPR20:
c000000001f9c160 c0000034adf70920 c0000034aec9fd20 0000000100087bd3
#012GPR24: 0000000100087bd3 c0000035b3de09f8 0000000000000030
c0000035b3de09f8 #012GPR28: 0000000000000028 c00000000347a280
c0000034aefe0b00 c0000000010a2a68
NIP [c0000000001cfc14] build_sched_domains+0x6a4/0x1500
LR [c0000000001cfc10] build_sched_domains+0x6a0/0x1500
Call Trace:
[c0000034a08eb500] [c0000000001cfc10] build_sched_domains+0x6a0/0x1500 (unreliable)
[c0000034a08eb640] [c0000000001d1e6c] partition_sched_domains_locked+0x3ec/0x530
[c0000034a08eb6e0] [c0000000002936d4] rebuild_sched_domains_locked+0x524/0xbf0
[c0000034a08eb7e0] [c000000000296bb0] rebuild_sched_domains+0x40/0x70
[c0000034a08eb810] [c000000000296e74] cpuset_hotplug_workfn+0x294/0xe20
[c0000034a08ebc30] [c000000000178dd0] process_one_work+0x300/0x670
[c0000034a08ebd10] [c0000000001791b8] worker_thread+0x78/0x520
[c0000034a08ebda0] [c000000000185090] kthread+0x1a0/0x1b0
[c0000034a08ebe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
7d2903a6 4e800421 e8410018 7f67db78 7fe6fb78 7f45d378 7f84e378 7c681b78
3c62ff1a 3863c6f8 4802dc35 60000000 <0fe000003920fff4 f9210070 e86100a0
---[ end trace 532d9066d3d4d7ec ]---

Some of the per-CPU masks use cpu_cpu_mask as a filter to limit the search
for related CPUs. On a dlpar add of a CPU, update cpu_cpu_mask before
updating the per-CPU masks. This will ensure the cpu_cpu_mask is updated
correctly before its used in setting the masks. Setting the numa_node will
ensure that when cpu_cpu_mask() gets called, the correct node number is
used. This code movement helped fix the above call trace.

Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210401154200.150077-1-srikar@linux.vnet.ibm.com
3 years agopowerpc/traps: Enhance readability for trap types
Xiongwei Song [Wed, 14 Apr 2021 11:00:33 +0000 (19:00 +0800)]
powerpc/traps: Enhance readability for trap types

Define macros to list ppc interrupt types in interttupt.h, replace the
reference of the trap hex values with these macros.

Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S,
arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
3 years agopowerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
Tony Ambardar [Thu, 17 Sep 2020 13:54:37 +0000 (06:54 -0700)]
powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h

A few archs like powerpc have different errno.h values for macros
EDEADLOCK and EDEADLK. In code including both libc and linux versions of
errno.h, this can result in multiple definitions of EDEADLOCK in the
include chain. Definitions to the same value (e.g. seen with mips) do
not raise warnings, but on powerpc there are redefinitions changing the
value, which raise warnings and errors (if using "-Werror").

Guard against these redefinitions to avoid build errors like the following,
first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with
musl 1.1.24:

  In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5,
                   from ../../include/linux/err.h:8,
                   from libbpf.c:29:
  ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror]
   #define EDEADLOCK EDEADLK

  In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10,
                   from libbpf.c:26:
  toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition
   #define EDEADLOCK       58

  cc1: all warnings being treated as errors

Cc: Stable <stable@vger.kernel.org>
Reported-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200917135437.1238787-1-Tony.Ambardar@gmail.com
3 years agopowerpc/smp: Cache CPU to chip lookup
Srikar Dronamraju [Thu, 15 Apr 2021 12:09:34 +0000 (17:39 +0530)]
powerpc/smp: Cache CPU to chip lookup

On systems with large CPUs per node, even with the filtered matching of
related CPUs, there can be large number of calls to cpu_to_chip_id for
the same CPU. For example with 4096 vCPU, 1 node QEMU configuration,
with 4 threads per core, system could be see upto 1024 calls to
cpu_to_chip_id() for the same CPU. On a given system, cpu_to_chip_id()
for a given CPU would always return the same. Hence cache the result in
a lookup table for use in subsequent calls.

Since all CPUs sharing the same core will belong to the same chip, the
lookup_table has an entry for one CPU per core.  chip_id_lookup_table is
not being freed and would be used on subsequent CPU online post CPU
offline.

Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-4-srikar@linux.vnet.ibm.com
3 years agoRevert "powerpc/topology: Update topology_core_cpumask"
Srikar Dronamraju [Thu, 15 Apr 2021 12:09:33 +0000 (17:39 +0530)]
Revert "powerpc/topology: Update topology_core_cpumask"

Now that cpu_core_mask has been reintroduced, lets revert
commit 4bce545903fa ("powerpc/topology: Update topology_core_cpumask")

Post this commit, lscpu should reflect topologies as requested by a user
when a QEMU instance is launched with NUMA spanning multiple sockets.

Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-3-srikar@linux.vnet.ibm.com
3 years agopowerpc/smp: Reintroduce cpu_core_mask
Srikar Dronamraju [Thu, 15 Apr 2021 12:09:32 +0000 (17:39 +0530)]
powerpc/smp: Reintroduce cpu_core_mask

Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop
updating cpu_core_mask") QEMU was unable to set single NUMA node SMP
topologies such as:
 -smp 8,maxcpus=8,cores=2,threads=2,sockets=2
 i.e he expected 2 sockets in one NUMA node.

The above commit helped to reduce boot time on Large Systems for
example 4096 vCPU single socket QEMU instance. PAPR is silent on
having more than one socket within a NUMA node.

cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the
number of sockets is different from the number of NUMA nodes.

One option is to reintroduce cpu_core_mask but use a slightly
different method to arrive at the cpu_core_mask. Previously each CPU's
chip-id would be compared with all other CPU's chip-id to verify if
both the CPUs were related at the chip level. Now if a CPU 'A' is
found related / (unrelated) to another CPU 'B', all the thread
siblings of 'A' and thread siblings of 'B' are automatically marked as
related / (unrelated).

Also if a platform doesn't support ibm,chip-id property, i.e its
cpu_to_chip_id returns -1, cpu_core_map holds a copy of
cpu_cpu_mask().

Fixes: 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask")
Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-2-srikar@linux.vnet.ibm.com
3 years agopowerpc/xive: Use the "ibm, chip-id" property only under PowerNV
Cédric Le Goater [Tue, 13 Apr 2021 13:03:52 +0000 (15:03 +0200)]
powerpc/xive: Use the "ibm, chip-id" property only under PowerNV

The 'chip_id' field of the XIVE CPU structure is used to choose a
target for a source located on the same chip. For that, the XIVE
driver queries the chip identifier from the "ibm,chip-id" property
and compares it to a 'src_chip' field identifying the chip of a
source. This information is only available on the PowerNV platform,
'src_chip' being assigned to XIVE_INVALID_CHIP_ID under pSeries.

The "ibm,chip-id" property is also not available on all platforms. It
was first introduced on PowerNV and later, under QEMU for pSeries/KVM.
However, the property is not part of PAPR and does not exist under
pSeries/PowerVM.

Assign 'chip_id' to XIVE_INVALID_CHIP_ID by default and let the
PowerNV platform override the value with the "ibm,chip-id" property.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210413130352.1183267-1-clg@kaod.org
3 years agopowerpc/pseries: extract host bridge from pci_bus prior to bus removal
Tyrel Datwyler [Thu, 11 Feb 2021 18:24:35 +0000 (12:24 -0600)]
powerpc/pseries: extract host bridge from pci_bus prior to bus removal

The pci_bus->bridge reference may no longer be valid after
pci_bus_remove() resulting in passing a bad value to device_unregister()
for the associated bridge device.

Store the host_bridge reference in a separate variable prior to
pci_bus_remove().

Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration during PHB removal")
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210211182435.47968-1-tyreld@linux.ibm.com
3 years agomacintosh/via-pmu: Fix build warning
Michael Ellerman [Fri, 16 Apr 2021 11:38:04 +0000 (21:38 +1000)]
macintosh/via-pmu: Fix build warning

Now that __fake_sleep is static, we get a warning about it being unused
in some configurations:

  drivers/macintosh/via-pmu.c:190:12: warning: '__fake_sleep' defined but not used
    190 | static int __fake_sleep;

Move it inside the ifdef where it's used to avoid the warning.

Fixes: 95d143923379 ("macintosh/via-pmu: Make some symbols static")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210416114139.772236-1-mpe@ellerman.id.au
3 years agopowerpc/papr_scm: Fix build error due to wrong printf specifier
Michael Ellerman [Fri, 16 Apr 2021 11:07:06 +0000 (21:07 +1000)]
powerpc/papr_scm: Fix build error due to wrong printf specifier

When I changed the rc variable to be long rather than int64_t I
neglected to update the printk(), leading to a build break:

  arch/powerpc/platforms/pseries/papr_scm.c: In function 'papr_scm_pmem_flush':
  arch/powerpc/platforms/pseries/papr_scm.c:144:26: warning: format
    '%lld' expects argument of type 'long long int', but argument 3 has
    type 'long int' [-Wformat=]

Fixes: 75b7c05ebf90 ("powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210416111209.765444-2-mpe@ellerman.id.au
3 years agopowerpc/configs: Add PAPR_SCM to pseries_defconfig
Michael Ellerman [Fri, 16 Apr 2021 11:05:47 +0000 (21:05 +1000)]
powerpc/configs: Add PAPR_SCM to pseries_defconfig

This is a pseries only driver, it should be built by default as part of
pseries_defconfig to get some build coverage.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210416111209.765444-1-mpe@ellerman.id.au
3 years agopowerpc/mm/radix: Make radix__change_memory_range() static
Michael Ellerman [Tue, 13 Apr 2021 13:54:27 +0000 (23:54 +1000)]
powerpc/mm/radix: Make radix__change_memory_range() static

The lkp bot pointed out that with W=1 we get:

  arch/powerpc/mm/book3s64/radix_pgtable.c:183:6: error: no previous
  prototype for 'radix__change_memory_range'

Which is really saying that it could be static, make it so.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agopowerpc/vdso: Add support for time namespaces
Christophe Leroy [Wed, 31 Mar 2021 16:48:47 +0000 (16:48 +0000)]
powerpc/vdso: Add support for time namespaces

This patch adds the necessary glue to provide time namespaces.

Things are mainly copied from ARM64.

__arch_get_timens_vdso_data() calculates timens vdso data position
based on the vdso data position, knowing it is the next page in vvar.
This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate
the page relative to running code position.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # vDSO parts
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1a15495f80ec19a87b16cf874dbf7c3fa5ec40fe.1617209142.git.christophe.leroy@csgroup.eu
3 years agopowerpc/vdso: Separate vvar vma from vdso
Dmitry Safonov [Wed, 31 Mar 2021 16:48:46 +0000 (16:48 +0000)]
powerpc/vdso: Separate vvar vma from vdso

Since commit 511157ab641e ("powerpc/vdso: Move vdso datapage up front")
VVAR page is in front of the VDSO area. In result it breaks CRIU
(Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]"
from /proc/../maps points at ELF/vdso image, rather than at VVAR data page.
Laurent made a patch to keep CRIU working (by reading aux vector).
But I think it still makes sence to separate two mappings into different
VMAs. It will also make ppc64 less "special" for userspace and as
a side-bonus will make VVAR page un-writable by debugger (which previously
would COW page and can be unexpected).

I opportunistically Cc stable on it: I understand that usually such
stuff isn't a stable material, but that will allow us in CRIU have
one workaround less that is needed just for one release (v5.11) on
one platform (ppc64), which we otherwise have to maintain.
I wouldn't go as far as to say that the commit 511157ab641e is ABI
regression as no other userspace got broken, but I'd really appreciate
if it gets backported to v5.11 after v5.12 is released, so as not
to complicate already non-simple CRIU-vdso code. Thanks!

[1]: https://github.com/checkpoint-restore/criu/issues/1417

Cc: stable@vger.kernel.org # v5.11
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # vDSO parts.
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f401eb1ebc0bfc4d8f0e10dc8e525fd409eb68e2.1617209142.git.christophe.leroy@csgroup.eu
3 years agolib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data()
Christophe Leroy [Wed, 31 Mar 2021 16:48:45 +0000 (16:48 +0000)]
lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data()

For the same reason as commit e876f0b69dc9 ("lib/vdso: Allow
architectures to provide the vdso data pointer"), powerpc wants to
avoid calculation of relative position to code.

As the timens_vdso_data is next page to vdso_data, provide
vdso_data pointer to __arch_get_timens_vdso_data() in order
to ease the calculation on powerpc in following patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/539c4204b1baa77c55f758904a1ea239abbc7a5c.1617209142.git.christophe.leroy@csgroup.eu
3 years agolib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline()
Christophe Leroy [Wed, 31 Mar 2021 16:48:44 +0000 (16:48 +0000)]
lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline()

In the same spirit as commit c966533f8c6c ("lib/vdso: Mark do_hres()
and do_coarse() as __always_inline"), mark do_hres_timens() and
do_coarse_timens() __always_inline.

The measurement below in on a non timens process, ie on the fastest path.

On powerpc32, without the patch:

clock-gettime-monotonic-raw:    vdso: 1155 nsec/call
clock-gettime-monotonic-coarse:    vdso: 813 nsec/call
clock-gettime-monotonic:    vdso: 1076 nsec/call

With the patch:

clock-gettime-monotonic-raw:    vdso: 1100 nsec/call
clock-gettime-monotonic-coarse:    vdso: 667 nsec/call
clock-gettime-monotonic:    vdso: 1025 nsec/call

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/90dcf45ebadfd5a07f24241551c62f619d1cb930.1617209142.git.christophe.leroy@csgroup.eu
3 years agopowerpc: move norestart trap flag to bit 0
Nicholas Piggin [Tue, 16 Mar 2021 10:42:05 +0000 (20:42 +1000)]
powerpc: move norestart trap flag to bit 0

Compact the trap flags down to use the low 4 bits of regs.trap.

A few 64e interrupt trap numbers set bit 4. Although they tended to be
trivial so it wasn't a real problem[1], it is not the right thing to do,
and confusing.

[*] E.g., 0x310 hypercall goes to unknown_exception, which prints
    regs->trap directly so 0x310 will appear fine, and only the syscall
    interrupt will test norestart, so it won't be confused by 0x310.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-12-npiggin@gmail.com
3 years agopowerpc: remove partial register save logic
Nicholas Piggin [Tue, 16 Mar 2021 10:42:04 +0000 (20:42 +1000)]
powerpc: remove partial register save logic

All subarchitectures always save all GPRs to pt_regs interrupt frames
now. Remove FULL_REGS and associated bits.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-11-npiggin@gmail.com
3 years agopowerpc: clean up do_page_fault
Nicholas Piggin [Tue, 16 Mar 2021 10:42:03 +0000 (20:42 +1000)]
powerpc: clean up do_page_fault

search_exception_tables + __bad_page_fault can be substituted with
bad_page_fault, do_page_fault no longer needs to return a value
to asm for any sub-architecture, and __bad_page_fault can be static.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-10-npiggin@gmail.com
3 years agopowerpc/64e/interrupt: handle bad_page_fault in C
Nicholas Piggin [Tue, 16 Mar 2021 10:42:02 +0000 (20:42 +1000)]
powerpc/64e/interrupt: handle bad_page_fault in C

With non-volatile registers saved on interrupt, bad_page_fault
can now be called by do_page_fault.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-9-npiggin@gmail.com
3 years agopowerpc/64e/interrupt: Use new interrupt context tracking scheme
Nicholas Piggin [Tue, 16 Mar 2021 10:42:01 +0000 (20:42 +1000)]
powerpc/64e/interrupt: Use new interrupt context tracking scheme

With the new interrupt exit code, context tracking can be managed
more precisely, so remove the last of the 64e workarounds and switch
to the new context tracking code already used by 64s.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-8-npiggin@gmail.com
3 years agopowerpc/64e/interrupt: reconcile irq soft-mask state in C
Nicholas Piggin [Tue, 16 Mar 2021 10:42:00 +0000 (20:42 +1000)]
powerpc/64e/interrupt: reconcile irq soft-mask state in C

Use existing 64s interrupt entry wrapper code to reconcile irqs in C.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-7-npiggin@gmail.com
3 years agopowerpc/64e/interrupt: NMI save irq soft-mask state in C
Nicholas Piggin [Tue, 16 Mar 2021 10:41:59 +0000 (20:41 +1000)]
powerpc/64e/interrupt: NMI save irq soft-mask state in C

64e non-maskable interrupts save the state of the irq soft-mask in
asm. This can be done in C in interrupt wrappers as 64s does.

I haven't been able to test this with qemu because it doesn't seem
to cause FSL bookE WDT interrupts.

This makes WatchdogException an NMI interrupt, which affects 32-bit
as well (okay, or create a new handler?)

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-6-npiggin@gmail.com
3 years agopowerpc/64e/interrupt: use new interrupt return
Nicholas Piggin [Tue, 16 Mar 2021 10:41:58 +0000 (20:41 +1000)]
powerpc/64e/interrupt: use new interrupt return

Update the new C and asm interrupt return code to account for 64e
specifics, switch over to use it.

The now-unused old ret_from_except code, that was moved to 64e after the
64s conversion, is removed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-5-npiggin@gmail.com
3 years agopowerpc/interrupt: update common interrupt code for
Nicholas Piggin [Tue, 16 Mar 2021 10:41:57 +0000 (20:41 +1000)]
powerpc/interrupt: update common interrupt code for

This makes adjustments to 64-bit asm and common C interrupt return
code to be usable by the 64e subarchitecture.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-4-npiggin@gmail.com
3 years agopowerpc/64e/interrupt: always save nvgprs on interrupt
Nicholas Piggin [Tue, 16 Mar 2021 10:41:56 +0000 (20:41 +1000)]
powerpc/64e/interrupt: always save nvgprs on interrupt

In order to use the C interrupt return, nvgprs must always be saved.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-3-npiggin@gmail.com
3 years agopowerpc/syscall: switch user_exit_irqoff and trace_hardirqs_off order
Nicholas Piggin [Tue, 16 Mar 2021 10:41:55 +0000 (20:41 +1000)]
powerpc/syscall: switch user_exit_irqoff and trace_hardirqs_off order

user_exit_irqoff() -> __context_tracking_exit -> vtime_user_exit
warns in __seqprop_assert due to lockdep thinking preemption is enabled
because trace_hardirqs_off() has not yet been called.

Switch the order of these two calls, which matches their ordering in
interrupt_enter_prepare.

Fixes: 5f0b6ac3905f ("powerpc/64/syscall: Reconcile interrupts")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-2-npiggin@gmail.com
3 years agopowerpc/perf: Infrastructure to support checking of attr.config*
Madhavan Srinivasan [Thu, 8 Apr 2021 07:45:03 +0000 (13:15 +0530)]
powerpc/perf: Infrastructure to support checking of attr.config*

Introduce code to support the checking of attr.config* for
values which are reserved for a given platform.
Performance Monitoring Unit (PMU) configuration registers
have fields that are reserved and some specific values for
bit fields are reserved. For ex., MMCRA[61:62] is
Random Sampling Mode (SM) and value of 0b11 for this field
is reserved.

Writing non-zero or invalid values in these fields will
have unknown behaviours.

Patch adds a generic call-back function "check_attr_config"
in "struct power_pmu", to be called in event_init to
check for attr.config* values for a given platform.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408074504.248211-1-maddy@linux.ibm.com
3 years agopowerpc/fadump: make symbol 'rtas_fadump_set_regval' static
Pu Lehui [Thu, 8 Apr 2021 06:20:12 +0000 (14:20 +0800)]
powerpc/fadump: make symbol 'rtas_fadump_set_regval' static

Fix sparse warnings:

arch/powerpc/platforms/pseries/rtas-fadump.c:250:6: warning:
 symbol 'rtas_fadump_set_regval' was not declared. Should it be static?

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408062012.85973-1-pulehui@huawei.com
3 years agopowerpc/mem: Use kmap_local_page() in flushing functions
Christophe Leroy [Thu, 8 Apr 2021 15:30:33 +0000 (15:30 +0000)]
powerpc/mem: Use kmap_local_page() in flushing functions

Flushing functions don't rely on preemption being disabled, so
use kmap_local_page() instead of kmap_atomic().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b6a880ea0ec7886b51edbb4979c188be549231c0.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: Inline flush_dcache_page()
Christophe Leroy [Thu, 8 Apr 2021 15:30:32 +0000 (15:30 +0000)]
powerpc/mem: Inline flush_dcache_page()

flush_dcache_page() is only a few lines, it is worth
inlining.

ia64, csky, mips, openrisc and riscv have a similar
flush_dcache_page() and inline it.

On pmac32_defconfig, we get a small size reduction.
On ppc64_defconfig, we get a very small size increase.

In both case that's in the noise (less than 0.1%).

text data bss dec hex filename
18991155 5934744 1497624 26423523 19330e3 vmlinux64.before
18994829 5936732 1497624 26429185 1934701 vmlinux64.after
9150963 2467502  184548 11803013  b41985 vmlinux32.before
9149689 2467302  184548 11801539  b413c3 vmlinux32.after

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/21c417488b70b7629dae316539fb7bb8bdef4fdd.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: Help GCC realise __flush_dcache_icache() flushes single pages
Christophe Leroy [Thu, 8 Apr 2021 15:30:31 +0000 (15:30 +0000)]
powerpc/mem: Help GCC realise __flush_dcache_icache() flushes single pages

'And' the given page address with PAGE_MASK to help GCC.

With the patch:

00000024 <__flush_dcache_icache>:
  24: 54 63 00 26  rlwinm  r3,r3,0,0,19
  28: 39 40 00 40  li      r10,64
  2c: 7c 69 1b 78  mr      r9,r3
  30: 7d 49 03 a6  mtctr   r10
  34: 7c 00 48 6c  dcbst   0,r9
  38: 39 29 00 20  addi    r9,r9,32
  3c: 7c 00 48 6c  dcbst   0,r9
  40: 39 29 00 20  addi    r9,r9,32
  44: 42 00 ff f0  bdnz    34 <__flush_dcache_icache+0x10>
  48: 7c 00 04 ac  hwsync
  4c: 39 20 00 40  li      r9,64
  50: 7d 29 03 a6  mtctr   r9
  54: 7c 00 1f ac  icbi    0,r3
  58: 38 63 00 20  addi    r3,r3,32
  5c: 7c 00 1f ac  icbi    0,r3
  60: 38 63 00 20  addi    r3,r3,32
  64: 42 00 ff f0  bdnz    54 <__flush_dcache_icache+0x30>
  68: 7c 00 04 ac  hwsync
  6c: 4c 00 01 2c  isync
  70: 4e 80 00 20  blr

Without the patch:

00000024 <__flush_dcache_icache>:
  24: 54 6a 00 34  rlwinm  r10,r3,0,0,26
  28: 39 23 10 1f  addi    r9,r3,4127
  2c: 7d 2a 48 50  subf    r9,r10,r9
  30: 55 29 d9 7f  rlwinm. r9,r9,27,5,31
  34: 41 82 00 94  beq     c8 <__flush_dcache_icache+0xa4>
  38: 71 28 00 01  andi.   r8,r9,1
  3c: 38 c9 ff ff  addi    r6,r9,-1
  40: 7d 48 53 78  mr      r8,r10
  44: 7d 27 4b 78  mr      r7,r9
  48: 40 82 00 6c  bne     b4 <__flush_dcache_icache+0x90>
  4c: 54 e7 f8 7e  rlwinm  r7,r7,31,1,31
  50: 7c e9 03 a6  mtctr   r7
  54: 7c 00 40 6c  dcbst   0,r8
  58: 39 08 00 20  addi    r8,r8,32
  5c: 7c 00 40 6c  dcbst   0,r8
  60: 39 08 00 20  addi    r8,r8,32
  64: 42 00 ff f0  bdnz    54 <__flush_dcache_icache+0x30>
  68: 7c 00 04 ac  hwsync
  6c: 71 28 00 01  andi.   r8,r9,1
  70: 39 09 ff ff  addi    r8,r9,-1
  74: 40 82 00 2c  bne     a0 <__flush_dcache_icache+0x7c>
  78: 55 29 f8 7e  rlwinm  r9,r9,31,1,31
  7c: 7d 29 03 a6  mtctr   r9
  80: 7c 00 57 ac  icbi    0,r10
  84: 39 4a 00 20  addi    r10,r10,32
  88: 7c 00 57 ac  icbi    0,r10
  8c: 39 4a 00 20  addi    r10,r10,32
  90: 42 00 ff f0  bdnz    80 <__flush_dcache_icache+0x5c>
  94: 7c 00 04 ac  hwsync
  98: 4c 00 01 2c  isync
  9c: 4e 80 00 20  blr
  a0: 7c 00 57 ac  icbi    0,r10
  a4: 2c 08 00 00  cmpwi   r8,0
  a8: 39 4a 00 20  addi    r10,r10,32
  ac: 40 82 ff cc  bne     78 <__flush_dcache_icache+0x54>
  b0: 4b ff ff e4  b       94 <__flush_dcache_icache+0x70>
  b4: 7c 00 50 6c  dcbst   0,r10
  b8: 2c 06 00 00  cmpwi   r6,0
  bc: 39 0a 00 20  addi    r8,r10,32
  c0: 40 82 ff 8c  bne     4c <__flush_dcache_icache+0x28>
  c4: 4b ff ff a4  b       68 <__flush_dcache_icache+0x44>
  c8: 7c 00 04 ac  hwsync
  cc: 7c 00 04 ac  hwsync
  d0: 4c 00 01 2c  isync
  d4: 4e 80 00 20  blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/23030822ea5cd0a122948b10226abe56602dc027.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: flush_dcache_icache_phys() is for HIGHMEM pages only
Christophe Leroy [Thu, 8 Apr 2021 15:30:30 +0000 (15:30 +0000)]
powerpc/mem: flush_dcache_icache_phys() is for HIGHMEM pages only

__flush_dcache_icache() is usable for non HIGHMEM pages on
every platform.

It is only for HIGHMEM pages that BOOKE needs kmap() and
BOOK3S needs flush_dcache_icache_phys().

So make flush_dcache_icache_phys() dependent on CONFIG_HIGHMEM and
call it only when it is a HIGHMEM page.

We could make flush_dcache_icache_phys() available at all time,
but as it is declared NOKPROBE_SYMBOL(), GCC doesn't optimise
it out when it is not used.

So define a stub for !CONFIG_HIGHMEM in order to remove the #ifdef in
flush_dcache_icache_page() and use IS_ENABLED() instead.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/79ed5d7914f497cd5fcd681ca2f4d50a91719455.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: Optimise flush_dcache_icache_hugepage()
Christophe Leroy [Thu, 8 Apr 2021 15:30:29 +0000 (15:30 +0000)]
powerpc/mem: Optimise flush_dcache_icache_hugepage()

flush_dcache_icache_hugepage() is a static function, with
only one caller. That caller calls it when PageCompound() is true,
so bugging on !PageCompound() is useless if we can trust the
compiler a little. Remove the BUG_ON(!PageCompound()).

The number of elements of a page won't change over time, but
GCC doesn't know about it, so it gets the value at every iteration.

To avoid that, call compound_nr() outside the loop and save it in
a local variable.

Whether the page is a HIGHMEM page or not doesn't change over time.

But GCC doesn't know it so it does the test on every iteration.

Do the test outside the loop.

When the page is not a HIGHMEM page, page_address() will fallback on
lowmem_page_address(), so call lowmem_page_address() directly and
don't suffer the call to page_address() on every iteration.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ab03712b70105fccfceef095aa03007de9295a40.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: Call flush_coherent_icache() at higher level
Christophe Leroy [Thu, 8 Apr 2021 15:30:28 +0000 (15:30 +0000)]
powerpc/mem: Call flush_coherent_icache() at higher level

flush_coherent_icache() doesn't need the address anymore,
so it can be called immediately when entering the public
functions and doesn't need to be disseminated among
lower level functions.

And use page_to_phys() instead of open coding the calculation
of phys address to call flush_dcache_icache_phys().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5f063986e325d2efdd404b8f8c5f4bcbd4eb11a6.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: Remove address argument to flush_coherent_icache()
Christophe Leroy [Thu, 8 Apr 2021 15:30:27 +0000 (15:30 +0000)]
powerpc/mem: Remove address argument to flush_coherent_icache()

flush_coherent_icache() can use any valid address as mentionned
by the comment.

Use PAGE_OFFSET as base address. This allows removing the
user access stuff.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/742b6360ae4f344a1c6ecfadcf3b6645f443fa7a.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: Declare __flush_dcache_icache() static
Christophe Leroy [Thu, 8 Apr 2021 15:30:26 +0000 (15:30 +0000)]
powerpc/mem: Declare __flush_dcache_icache() static

__flush_dcache_icache() is only used in mem.c.

Move it before the functions that use it and declare it static.

And also fix the name of the parameter in the comment.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3fa903eb5a10b2bc7d99a8c559ffdaa05452d8e0.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mem: Move cache flushing functions into mm/cacheflush.c
Christophe Leroy [Thu, 8 Apr 2021 15:30:24 +0000 (15:30 +0000)]
powerpc/mem: Move cache flushing functions into mm/cacheflush.c

Cache flushing functions are in the middle of completely
unrelated stuff in mm/mem.c

Create a dedicated mm/cacheflush.c for those functions.

Also cleanup the list of included headers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7bf6f1600acad146e541a4e220940062f2e5b03d.1617895813.git.christophe.leroy@csgroup.eu
3 years agopowerpc/powernv: make symbol 'mpipl_kobj' static
Bixuan Cui [Fri, 9 Apr 2021 06:38:55 +0000 (14:38 +0800)]
powerpc/powernv: make symbol 'mpipl_kobj' static

The sparse tool complains as follows:

arch/powerpc/platforms/powernv/opal-core.c:74:16: warning:
 symbol 'mpipl_kobj' was not declared.

This symbol is not used outside of opal-core.c, so marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210409063855.57347-1-cuibixuan@huawei.com
3 years agopowerpc/xmon: Make symbol 'spu_inst_dump' static
Pu Lehui [Fri, 9 Apr 2021 07:01:51 +0000 (15:01 +0800)]
powerpc/xmon: Make symbol 'spu_inst_dump' static

Fix sparse warning:

arch/powerpc/xmon/xmon.c:4216:1: warning:
 symbol 'spu_inst_dump' was not declared. Should it be static?

This symbol is not used outside of xmon.c, so make it static.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210409070151.163424-1-pulehui@huawei.com
3 years agopowerpc/perf/hv-24x7: Make some symbols static
Bixuan Cui [Fri, 9 Apr 2021 09:01:24 +0000 (17:01 +0800)]
powerpc/perf/hv-24x7: Make some symbols static

The sparse tool complains as follows:

arch/powerpc/perf/hv-24x7.c:229:1: warning:
 symbol '__pcpu_scope_hv_24x7_txn_flags' was not declared. Should it be static?
arch/powerpc/perf/hv-24x7.c:230:1: warning:
 symbol '__pcpu_scope_hv_24x7_txn_err' was not declared. Should it be static?
arch/powerpc/perf/hv-24x7.c:236:1: warning:
 symbol '__pcpu_scope_hv_24x7_hw' was not declared. Should it be static?
arch/powerpc/perf/hv-24x7.c:244:1: warning:
 symbol '__pcpu_scope_hv_24x7_reqb' was not declared. Should it be static?
arch/powerpc/perf/hv-24x7.c:245:1: warning:
 symbol '__pcpu_scope_hv_24x7_resb' was not declared. Should it be static?

This symbol is not used outside of hv-24x7.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210409090124.59492-1-cuibixuan@huawei.com
3 years agopowerpc/perf: Make symbol 'isa207_pmu_format_attr' static
Bixuan Cui [Fri, 9 Apr 2021 09:01:19 +0000 (17:01 +0800)]
powerpc/perf: Make symbol 'isa207_pmu_format_attr' static

The sparse tool complains as follows:

arch/powerpc/perf/isa207-common.c:24:18: warning:
 symbol 'isa207_pmu_format_attr' was not declared. Should it be static?

This symbol is not used outside of isa207-common.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210409090119.59444-1-cuibixuan@huawei.com
3 years agopowerpc/pseries/pmem: Make symbol 'drc_pmem_match' static
Bixuan Cui [Fri, 9 Apr 2021 09:01:14 +0000 (17:01 +0800)]
powerpc/pseries/pmem: Make symbol 'drc_pmem_match' static

The sparse tool complains as follows:

arch/powerpc/platforms/pseries/pmem.c:142:27: warning:
 symbol 'drc_pmem_match' was not declared. Should it be static?

This symbol is not used outside of pmem.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210409090114.59396-1-cuibixuan@huawei.com
3 years agopowerpc/pseries: Make symbol '__pcpu_scope_hcall_stats' static
Bixuan Cui [Fri, 9 Apr 2021 09:01:09 +0000 (17:01 +0800)]
powerpc/pseries: Make symbol '__pcpu_scope_hcall_stats' static

The sparse tool complains as follows:

arch/powerpc/platforms/pseries/hvCall_inst.c:29:1: warning:
 symbol '__pcpu_scope_hcall_stats' was not declared. Should it be static?

This symbol is not used outside of hvCall_inst.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210409090109.59347-1-cuibixuan@huawei.com
3 years agopowerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR
Leonardo Bras [Thu, 8 Apr 2021 20:19:16 +0000 (17:19 -0300)]
powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes"
will let the OS know all possible pagesizes that can be used for creating a
new DDW.

Currently Linux will only try using 3 of the 8 available options:
4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M,
128M, 256M and 16G.

Enabling bigger pages would be interesting for direct mapping systems
with a lot of RAM, while using less TCE entries.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408201915.174217-1-leobras.c@gmail.com
3 years agopowerpc/syscalls: switch to generic syscallhdr.sh
Masahiro Yamada [Mon, 1 Mar 2021 15:30:19 +0000 (00:30 +0900)]
powerpc/syscalls: switch to generic syscallhdr.sh

Many architectures duplicate similar shell scripts.

This commit converts powerpc to use scripts/syscallhdr.sh.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210301153019.362742-2-masahiroy@kernel.org
3 years agopowerpc/syscalls: switch to generic syscalltbl.sh
Masahiro Yamada [Mon, 1 Mar 2021 15:30:18 +0000 (00:30 +0900)]
powerpc/syscalls: switch to generic syscalltbl.sh

Many architectures duplicate similar shell scripts.

This commit converts powerpc to use scripts/syscalltbl.sh. This also
unifies syscall_table_32.h and syscall_table_c32.h.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210301153019.362742-1-masahiroy@kernel.org
3 years agopowerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
Nathan Lynch [Thu, 8 Apr 2021 14:06:30 +0000 (09:06 -0500)]
powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE

RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any
sense. It represents the size of an area of memory set aside for user
space to use as work areas for certain RTAS calls.

Rename it to RTAS_USER_REGION_SIZE.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-6-nathanl@linux.ibm.com
3 years agopowerpc/rtas: move syscall filter setup into separate function
Nathan Lynch [Thu, 8 Apr 2021 14:06:29 +0000 (09:06 -0500)]
powerpc/rtas: move syscall filter setup into separate function

Reduce conditionally compiled sections within rtas_initialize() by
moving the filter table initialization into its own function already
guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-5-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove ibm_suspend_me_token
Nathan Lynch [Thu, 8 Apr 2021 14:06:28 +0000 (09:06 -0500)]
powerpc/rtas: remove ibm_suspend_me_token

There's not a compelling reason to cache the value of the token for
the ibm,suspend-me function. Just look it up when needed in the RTAS
syscall's special case for it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-4-nathanl@linux.ibm.com
3 years agopowerpc/rtas-proc: remove unused RMO_READ_BUF_MAX
Nathan Lynch [Thu, 8 Apr 2021 14:06:27 +0000 (09:06 -0500)]
powerpc/rtas-proc: remove unused RMO_READ_BUF_MAX

This constant is unused.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-3-nathanl@linux.ibm.com
3 years agopowerpc/rtas: improve ppc_rtas_rmo_buf_show documentation
Nathan Lynch [Thu, 8 Apr 2021 14:06:26 +0000 (09:06 -0500)]
powerpc/rtas: improve ppc_rtas_rmo_buf_show documentation

Add kerneldoc for ppc_rtas_rmo_buf_show(), the callback for
/proc/powerpc/rtas/rmo_buffer, explaining its expected use.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-2-nathanl@linux.ibm.com
3 years agopowerpc/eeh: Fix EEH handling for hugepages in ioremap space.
Mahesh Salgaonkar [Mon, 12 Apr 2021 07:52:50 +0000 (13:22 +0530)]
powerpc/eeh: Fix EEH handling for hugepages in ioremap space.

During the EEH MMIO error checking, the current implementation fails to map
the (virtual) MMIO address back to the pci device on radix with hugepage
mappings for I/O. This results into failure to dispatch EEH event with no
recovery even when EEH capability has been enabled on the device.

eeh_check_failure(token) # token = virtual MMIO address
  addr = eeh_token_to_phys(token);
  edev = eeh_addr_cache_get_dev(addr);
  if (!edev)
return 0;
  eeh_dev_check_failure(edev); <= Dispatch the EEH event

In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys
translation that results in wrong physical address, which is then passed to
eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges
to get to a PCI device. Hence, it fails to find a match and the EEH event
never gets dispatched leaving the device in failed state.

The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space")
introduced following logic to translate virt to phys for hugepage mappings:

eeh_token_to_phys():
+ pa = pte_pfn(*ptep);
+
+ /* On radix we can do hugepage mappings for io, so handle that */
+       if (hugepage_shift) {
+               pa <<= hugepage_shift; <= This is wrong
+               pa |= token & ((1ul << hugepage_shift) - 1);
+       }

This patch fixes the virt -> phys translation in eeh_token_to_phys()
function.

  $ cat /sys/kernel/debug/powerpc/eeh_address_cache
  mem addr range [0x0000040080000000-0x00000400807fffff]: 0030:01:00.1
  mem addr range [0x0000040080800000-0x0000040080ffffff]: 0030:01:00.1
  mem addr range [0x0000040081000000-0x00000400817fffff]: 0030:01:00.0
  mem addr range [0x0000040081800000-0x0000040081ffffff]: 0030:01:00.0
  mem addr range [0x0000040082000000-0x000004008207ffff]: 0030:01:00.1
  mem addr range [0x0000040082080000-0x00000400820fffff]: 0030:01:00.0
  mem addr range [0x0000040082100000-0x000004008210ffff]: 0030:01:00.1
  mem addr range [0x0000040082110000-0x000004008211ffff]: 0030:01:00.0

Above is the list of cached io address ranges of pci 0030:01:00.<fn>.

Before this patch:

Tracing 'arg1' of function eeh_addr_cache_get_dev() during error injection
clearly shows that 'addr=' contains wrong physical address:

   kworker/u16:0-7       [001] ....   108.883775: eeh_addr_cache_get_dev:
   (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x80103000a510

dmesg shows no EEH recovery messages:

  [  108.563768] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x9ae) != mcp_pulse (0x7fff)
  [  108.563788] bnx2x: [bnx2x_hw_stats_update:870(eth2)]NIG timer max (4294967295)
  [  108.883788] bnx2x: [bnx2x_acquire_hw_lock:2013(eth1)]lock_status 0xffffffff  resource_bit 0x1
  [  108.884407] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout
  [  108.884976] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout
  <..>

After this patch:

eeh_addr_cache_get_dev() trace shows correct physical address:

  <idle>-0       [001] ..s.  1043.123828: eeh_addr_cache_get_dev:
  (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x40080bc7cd8

dmesg logs shows EEH recovery getting triggerred:

  [  964.323980] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x746f) != mcp_pulse (0x7fff)
  [  964.323991] EEH: Recovering PHB#30-PE#10000
  [  964.324002] EEH: PE location: N/A, PHB location: N/A
  [  964.324006] EEH: Frozen PHB#30-PE#10000 detected
  <..>

Fixes: 33439620680b ("powerpc/eeh: Handle hugepages in ioremap space")
Cc: stable@vger.kernel.org # v5.3+
Reported-by: Dominic DeMarco <ddemarc@us.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/161821396263.48361.2796709239866588652.stgit@jupiter
3 years agopowerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
Cédric Le Goater [Wed, 31 Mar 2021 14:45:14 +0000 (16:45 +0200)]
powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler

Instead of calling irq_create_mapping() to map the IPI for a node,
introduce an 'alloc' handler. This is usually an extension to support
hierarchy irq_domains which is not exactly the case for XIVE-IPI
domain. However, we can now use the irq_domain_alloc_irqs() routine
which allocates the IRQ descriptor on the specified node, even better
for cache performance on multi node machines.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-10-clg@kaod.org
3 years agopowerpc/xive: Map one IPI interrupt per node
Cédric Le Goater [Wed, 31 Mar 2021 14:45:13 +0000 (16:45 +0200)]
powerpc/xive: Map one IPI interrupt per node

ipistorm [*] can be used to benchmark the raw interrupt rate of an
interrupt controller by measuring the number of IPIs a system can
sustain. When applied to the XIVE interrupt controller of POWER9 and
POWER10 systems, a significant drop of the interrupt rate can be
observed when crossing the second node boundary.

This is due to the fact that a single IPI interrupt is used for all
CPUs of the system. The structure is shared and the cache line updates
impact greatly the traffic between nodes and the overall IPI
performance.

As a workaround, the impact can be reduced by deactivating the IRQ
lockup detector ("noirqdebug") which does a lot of accounting in the
Linux IRQ descriptor structure and is responsible for most of the
performance penalty.

As a fix, this proposal allocates an IPI interrupt per node, to be
shared by all CPUs of that node. It solves the scaling issue, the IRQ
lockup detector still has an impact but the XIVE interrupt rate scales
linearly. It also improves the "noirqdebug" case as showed in the
tables below.

 * P9 DD2.2 - 2s * 64 threads

                                               "noirqdebug"
                        Mint/s                    Mint/s
 chips  cpus      IPI/sys   IPI/chip       IPI/chip    IPI/sys
 --------------------------------------------------------------
 1      0-15     4.984023   4.875405       4.996536   5.048892
        0-31    10.879164  10.544040      10.757632  11.037859
        0-47    15.345301  14.688764      14.926520  15.310053
        0-63    17.064907  17.066812      17.613416  17.874511
 2      0-79    11.768764  21.650749      22.689120  22.566508
        0-95    10.616812  26.878789      28.434703  28.320324
        0-111   10.151693  31.397803      31.771773  32.388122
        0-127    9.948502  33.139336      34.875716  35.224548

 * P10 DD1 - 4s (not homogeneous) 352 threads

                                               "noirqdebug"
                        Mint/s                    Mint/s
 chips  cpus      IPI/sys   IPI/chip       IPI/chip    IPI/sys
 --------------------------------------------------------------
 1      0-15     2.409402   2.364108       2.383303   2.395091
        0-31     6.028325   6.046075       6.089999   6.073750
        0-47     8.655178   8.644531       8.712830   8.724702
        0-63    11.629652  11.735953      12.088203  12.055979
        0-79    14.392321  14.729959      14.986701  14.973073
        0-95    12.604158  13.004034      17.528748  17.568095
 2      0-111    9.767753  13.719831      19.968606  20.024218
        0-127    6.744566  16.418854      22.898066  22.995110
        0-143    6.005699  19.174421      25.425622  25.417541
        0-159    5.649719  21.938836      27.952662  28.059603
        0-175    5.441410  24.109484      31.133915  31.127996
 3      0-191    5.318341  24.405322      33.999221  33.775354
        0-207    5.191382  26.449769      36.050161  35.867307
        0-223    5.102790  29.356943      39.544135  39.508169
        0-239    5.035295  31.933051      42.135075  42.071975
        0-255    4.969209  34.477367      44.655395  44.757074
 4      0-271    4.907652  35.887016      47.080545  47.318537
        0-287    4.839581  38.076137      50.464307  50.636219
        0-303    4.786031  40.881319      53.478684  53.310759
        0-319    4.743750  43.448424      56.388102  55.973969
        0-335    4.709936  45.623532      59.400930  58.926857
        0-351    4.681413  45.646151      62.035804  61.830057

[*] https://github.com/antonblanchard/ipistorm

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-9-clg@kaod.org
3 years agopowerpc/xive: Fix xmon command "dxi"
Cédric Le Goater [Wed, 31 Mar 2021 14:45:12 +0000 (16:45 +0200)]
powerpc/xive: Fix xmon command "dxi"

When under xmon, the "dxi" command dumps the state of the XIVE
interrupts. If an interrupt number is specified, only the state of
the associated XIVE interrupt is dumped. This form of the command
lacks an irq_data parameter which is nevertheless used by
xmon_xive_get_irq_config(), leading to an xmon crash.

Fix that by doing a lookup in the system IRQ mapping to query the IRQ
descriptor data. Invalid interrupt numbers, or not belonging to the
XIVE IRQ domain, OPAL event interrupt number for instance, should be
caught by the previous query done at the firmware level.

Fixes: 97ef27507793 ("powerpc/xive: Fix xmon support on the PowerNV platform")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-8-clg@kaod.org
3 years agopowerpc/xive: Simplify the dump of XIVE interrupts under xmon
Cédric Le Goater [Wed, 31 Mar 2021 14:45:11 +0000 (16:45 +0200)]
powerpc/xive: Simplify the dump of XIVE interrupts under xmon

Move the xmon routine under XIVE subsystem and rework the loop on the
interrupts taking into account the xive_irq_domain to filter out IPIs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-7-clg@kaod.org
3 years agopowerpc/xive: Drop check on irq_data in xive_core_debug_show()
Cédric Le Goater [Wed, 31 Mar 2021 14:45:10 +0000 (16:45 +0200)]
powerpc/xive: Drop check on irq_data in xive_core_debug_show()

When looping on IRQ descriptor, irq_data is always valid.

Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-6-clg@kaod.org
3 years agopowerpc/xive: Simplify xive_core_debug_show()
Cédric Le Goater [Wed, 31 Mar 2021 14:45:09 +0000 (16:45 +0200)]
powerpc/xive: Simplify xive_core_debug_show()

Now that the IPI interrupt has its own domain, the checks on the HW
interrupt number XIVE_IPI_HW_IRQ and on the chip can be replaced by a
check on the domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-5-clg@kaod.org
3 years agopowerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
Cédric Le Goater [Wed, 31 Mar 2021 14:45:08 +0000 (16:45 +0200)]
powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ

The IPI interrupt has its own domain now. Testing the HW interrupt
number is not needed anymore.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-4-clg@kaod.org
3 years agopowerpc/xive: Introduce an IPI interrupt domain
Cédric Le Goater [Wed, 31 Mar 2021 14:45:07 +0000 (16:45 +0200)]
powerpc/xive: Introduce an IPI interrupt domain

The IPI interrupt is a special case of the XIVE IRQ domain. When
mapping and unmapping the interrupts in the Linux interrupt number
space, the HW interrupt number 0 (XIVE_IPI_HW_IRQ) is checked to
distinguish the IPI interrupt from other interrupts of the system.

Simplify the XIVE interrupt domain by introducing a specific domain
for the IPI.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-3-clg@kaod.org
3 years agopowerpc/smp: Make some symbols static
Yu Kuai [Wed, 7 Apr 2021 12:59:03 +0000 (20:59 +0800)]
powerpc/smp: Make some symbols static

The sparse tool complains as follows:

arch/powerpc/kernel/smp.c:86:1: warning:
 symbol '__pcpu_scope_cpu_coregroup_map' was not declared. Should it be static?
arch/powerpc/kernel/smp.c:125:1: warning:
 symbol '__pcpu_scope_thread_group_l1_cache_map' was not declared. Should it be static?
arch/powerpc/kernel/smp.c:132:1: warning:
 symbol '__pcpu_scope_thread_group_l2_cache_map' was not declared. Should it be static?

These symbols are not used outside of smp.c, so this
commit marks them static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210407125903.4139663-1-yukuai3@huawei.com
3 years agomacintosh/via-pmu: Make some symbols static
Yu Kuai [Wed, 7 Apr 2021 12:58:03 +0000 (20:58 +0800)]
macintosh/via-pmu: Make some symbols static

The sparse tool complains as follows:

drivers/macintosh/via-pmu.c:183:5: warning:
 symbol 'pmu_cur_battery' was not declared. Should it be static?
drivers/macintosh/via-pmu.c:190:5: warning:
 symbol '__fake_sleep' was not declared. Should it be static?

These symbols are not used outside of via-pmu.c, so this
commit marks them static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210407125803.4138837-1-yukuai3@huawei.com
3 years agowindfarm: make symbol 'wf_thread' static
Yu Kuai [Wed, 7 Apr 2021 12:57:38 +0000 (20:57 +0800)]
windfarm: make symbol 'wf_thread' static

The sparse tool complains as follows:

drivers/macintosh/windfarm_core.c:59:20: warning:
 symbol 'wf_thread' was not declared. Should it be static?

This symbol is not used outside of windfarm_core.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210407125738.4138480-1-yukuai3@huawei.com
3 years agomacintosh/windfarm: Make symbol 'pm121_sys_state' static
Yu Kuai [Wed, 7 Apr 2021 12:57:12 +0000 (20:57 +0800)]
macintosh/windfarm: Make symbol 'pm121_sys_state' static

The sparse tool complains as follows:

drivers/macintosh/windfarm_pm121.c:436:24: warning:
 symbol 'pm121_sys_state' was not declared. Should it be static?

This symbol is not used outside of windfarm_pm121.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210407125712.4138033-1-yukuai3@huawei.com
3 years agopowerpc/mce: Make symbol 'mce_ue_event_work' static
Li Huafei [Thu, 8 Apr 2021 03:58:02 +0000 (11:58 +0800)]
powerpc/mce: Make symbol 'mce_ue_event_work' static

The sparse tool complains as follows:

arch/powerpc/kernel/mce.c:43:1: warning:
 symbol 'mce_ue_event_work' was not declared. Should it be static?

This symbol is not used outside of mce.c, so this commit marks it
static.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408035802.31853-1-lihuafei1@huawei.com
3 years agopowerpc/security: Make symbol 'stf_barrier' static
Li Huafei [Thu, 8 Apr 2021 03:39:51 +0000 (11:39 +0800)]
powerpc/security: Make symbol 'stf_barrier' static

The sparse tool complains as follows:

arch/powerpc/kernel/security.c:253:6: warning:
 symbol 'stf_barrier' was not declared. Should it be static?

This symbol is not used outside of security.c, so this commit marks it
static.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408033951.28369-1-lihuafei1@huawei.com
3 years agopowerpc/32s: Define a MODULE area below kernel text all the time
Christophe Leroy [Thu, 1 Apr 2021 13:30:43 +0000 (13:30 +0000)]
powerpc/32s: Define a MODULE area below kernel text all the time

On book3s/32, the segment below kernel text is used for module
allocation when CONFIG_STRICT_KERNEL_RWX is defined.

In order to benefit from the powerpc specific module_alloc()
function which allocate modules with 32 Mbytes from
end of kernel text, use that segment below PAGE_OFFSET at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a46dcdd39a9e80b012d86c294c4e5cd8d31665f3.1617283827.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Define a MODULE area below kernel text
Christophe Leroy [Thu, 1 Apr 2021 13:30:42 +0000 (13:30 +0000)]
powerpc/8xx: Define a MODULE area below kernel text

On the 8xx, TASK_SIZE is 0x80000000. The space between TASK_SIZE
and PAGE_OFFSET is not used.

In order to benefit from the powerpc specific module_alloc()
function which allocate modules with 32 Mbytes from
end of kernel text, define MODULES_VADDR and MODULES_END.

Set a 256Mb area just below PAGE_OFFSET, like book3s/32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a225606d5b3a8bc53fe612ad52c855c60b0a0a58.1617283827.git.christophe.leroy@csgroup.eu
3 years agopowerpc/modules: Load modules closer to kernel text
Christophe Leroy [Thu, 1 Apr 2021 13:30:41 +0000 (13:30 +0000)]
powerpc/modules: Load modules closer to kernel text

On book3s/32, when STRICT_KERNEL_RWX is selected, modules are
allocated on the segment just before kernel text, ie on the
0xb0000000-0xbfffffff when PAGE_OFFSET is 0xc0000000.

On the 8xx, TASK_SIZE is 0x80000000. The space between TASK_SIZE and
PAGE_OFFSET is not used and could be used for modules.

The idea comes from ARM architecture.

Having modules just below PAGE_OFFSET offers an opportunity to
minimise the distance between kernel text and modules and avoid
trampolines in modules to access kernel functions or other module
functions.

When MODULES_VADDR is defined, powerpc has it's own module_alloc()
function. In that function, first try to allocate the module
above the limit defined by '_etext - 32M'. Then if the allocation
fails, fallback to the entire MODULES area.

DEBUG logs in module_32.c without the patch:

[ 1572.588822] module_32: Applying ADD relocate section 13 to 12
[ 1572.588891] module_32: Doing plt for call to 0xc00671a4 at 0xcae04024
[ 1572.588964] module_32: Initialized plt for 0xc00671a4 at cae04000
[ 1572.589037] module_32: REL24 value = CAE04000. location = CAE04024
[ 1572.589110] module_32: Location before: 48000001.
[ 1572.589171] module_32: Location after: 4BFFFFDD.
[ 1572.589231] module_32: ie. jump to 03FFFFDC+CAE04024 = CEE04000
[ 1572.589317] module_32: Applying ADD relocate section 15 to 14
[ 1572.589386] module_32: Doing plt for call to 0xc00671a4 at 0xcadfc018
[ 1572.589457] module_32: Initialized plt for 0xc00671a4 at cadfc000
[ 1572.589529] module_32: REL24 value = CADFC000. location = CADFC018
[ 1572.589601] module_32: Location before: 48000000.
[ 1572.589661] module_32: Location after: 4BFFFFE8.
[ 1572.589723] module_32: ie. jump to 03FFFFE8+CADFC018 = CEDFC000

With the patch:

[  279.404671] module_32: Applying ADD relocate section 13 to 12
[  279.404741] module_32: REL24 value = C00671B4. location = BF808024
[  279.404814] module_32: Location before: 48000001.
[  279.404874] module_32: Location after: 4885F191.
[  279.404933] module_32: ie. jump to 0085F190+BF808024 = C00671B4
[  279.405016] module_32: Applying ADD relocate section 15 to 14
[  279.405085] module_32: REL24 value = C00671B4. location = BF800018
[  279.405156] module_32: Location before: 48000000.
[  279.405215] module_32: Location after: 4886719C.
[  279.405275] module_32: ie. jump to 0086719C+BF800018 = C00671B4

We see that with the patch, no plt entries are set.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0c3d5cb8a4dfdf6ca1b8aeb385c01470d6628d55.1617283827.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Add cond_resched() while removing hpte mappings
Vaibhav Jain [Sun, 4 Apr 2021 16:31:48 +0000 (22:01 +0530)]
powerpc/mm: Add cond_resched() while removing hpte mappings

While removing large number of mappings from hash page tables for
large memory systems as soft-lockup is reported because of the time
spent inside htap_remove_mapping() like one below:

 watchdog: BUG: soft lockup - CPU#8 stuck for 23s!
 <snip>
 NIP plpar_hcall+0x38/0x58
 LR  pSeries_lpar_hpte_invalidate+0x68/0xb0
 Call Trace:
  0x1fffffffffff000 (unreliable)
  pSeries_lpar_hpte_removebolted+0x9c/0x230
  hash__remove_section_mapping+0xec/0x1c0
  remove_section_mapping+0x28/0x3c
  arch_remove_memory+0xfc/0x150
  devm_memremap_pages_release+0x180/0x2f0
  devm_action_release+0x30/0x50
  release_nodes+0x28c/0x300
  device_release_driver_internal+0x16c/0x280
  unbind_store+0x124/0x170
  drv_attr_store+0x44/0x60
  sysfs_kf_write+0x64/0x90
  kernfs_fop_write+0x1b0/0x290
  __vfs_write+0x3c/0x70
  vfs_write+0xd4/0x270
  ksys_write+0xdc/0x130
  system_call+0x5c/0x70

Fix this by adding a cond_resched() to the loop in
htap_remove_mapping() that issues hcall to remove hpte mapping. The
call to cond_resched() is issued every HZ jiffies which should prevent
the soft-lockup from being reported.

Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210404163148.321346-1-vaibhav@linux.ibm.com
3 years agopowerpc/papr_scm: Implement support for H_SCM_FLUSH hcall
Shivaprasad G Bhat [Mon, 29 Mar 2021 17:36:43 +0000 (13:36 -0400)]
powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall

Add support for ND_REGION_ASYNC capability if the device tree
indicates 'ibm,hcall-flush-required' property in the NVDIMM node.
Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor.

If the flush request failed, the hypervisor is expected to
to reflect the problem in the subsequent nvdimm H_SCM_HEALTH call.

This patch prevents mmap of namespaces with MAP_SYNC flag if the
nvdimm requires an explicit flush[1].

References:
[1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Use unsigned long / long instead of uint64_t/int64_t]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/161703936121.36.7260632399582101498.stgit@e1fbed493c87
3 years agopowerpc/signal32: Fix build failure with CONFIG_SPE
Christophe Leroy [Sun, 11 Apr 2021 16:39:53 +0000 (16:39 +0000)]
powerpc/signal32: Fix build failure with CONFIG_SPE

Add missing fault exit label in unsafe_copy_from_user() in order to
avoid following build failure with CONFIG_SPE

  CC      arch/powerpc/kernel/signal_32.o
arch/powerpc/kernel/signal_32.c: In function 'restore_user_regs':
arch/powerpc/kernel/signal_32.c:565:36: error: macro "unsafe_copy_from_user" requires 4 arguments, but only 3 given
  565 |           ELF_NEVRREG * sizeof(u32));
      |                                    ^
In file included from ./include/linux/uaccess.h:11,
                 from ./include/linux/sched/task.h:11,
                 from ./include/linux/sched/signal.h:9,
                 from ./include/linux/rcuwait.h:6,
                 from ./include/linux/percpu-rwsem.h:7,
                 from ./include/linux/fs.h:33,
                 from ./include/linux/huge_mm.h:8,
                 from ./include/linux/mm.h:707,
                 from arch/powerpc/kernel/signal_32.c:17:
./arch/powerpc/include/asm/uaccess.h:428: note: macro "unsafe_copy_from_user" defined here
  428 | #define unsafe_copy_from_user(d, s, l, e) \
      |
arch/powerpc/kernel/signal_32.c:564:3: error: 'unsafe_copy_from_user' undeclared (first use in this function); did you mean 'raw_copy_from_user'?
  564 |   unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs,
      |   ^~~~~~~~~~~~~~~~~~~~~
      |   raw_copy_from_user
arch/powerpc/kernel/signal_32.c:564:3: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [arch/powerpc/kernel/signal_32.o] Error 1

Fixes: 627b72bee84d ("powerpc/signal32: Convert restore_[tm]_user_regs() to user access block")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aad2cb1801a3cc99bc27081022925b9fc18a0dfb.1618159169.git.christophe.leroy@csgroup.eu
3 years agoKVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR
Nicholas Piggin [Mon, 12 Apr 2021 01:48:45 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR

Rather than clear the HV bit from the MSR at guest entry, make it clear
that the hypervisor does not allow the guest to set the bit.

The HV clear is kept in guest entry for now, but a future patch will
warn if it is set.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-13-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR
Nicholas Piggin [Mon, 12 Apr 2021 01:48:44 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR

Rather than add the ME bit to the MSR at guest entry, make it clear
that the hypervisor does not allow the guest to clear the bit.

The ME set is kept in guest entry for now, but a future patch will
warn if it's not present.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-12-npiggin@gmail.com
3 years agopowerpc/64s: remove KVM SKIP test from instruction breakpoint handler
Nicholas Piggin [Mon, 12 Apr 2021 01:48:43 +0000 (11:48 +1000)]
powerpc/64s: remove KVM SKIP test from instruction breakpoint handler

The code being executed in KVM_GUEST_MODE_SKIP is hypervisor code with
MSR[IR]=0, so the faults of concern are the d-side ones caused by access
to guest context by the hypervisor.

Instruction breakpoint interrupts are not a concern here. It's unlikely
any good would come of causing breaks in this code, but skipping the
instruction that caused it won't help matters (e.g., skip the mtmsr that
sets MSR[DR]=0 or clears KVM_GUEST_MODE_SKIP).

 [Paul notes: "the 0x1300 interrupt was dropped from the architecture a
  long time ago and is not generated by P7, P8, P9 or P10." So add a
  comment about this in the handler code while we're here. ]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-11-npiggin@gmail.com
3 years agopowerpc/64s: Remove KVM handler support from CBE_RAS interrupts
Nicholas Piggin [Mon, 12 Apr 2021 01:48:42 +0000 (11:48 +1000)]
powerpc/64s: Remove KVM handler support from CBE_RAS interrupts

Cell does not support KVM.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-10-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: Fix CONFIG_SPAPR_TCE_IOMMU=n default hcalls
Nicholas Piggin [Mon, 12 Apr 2021 01:48:41 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Fix CONFIG_SPAPR_TCE_IOMMU=n default hcalls

This config option causes the warning in init_default_hcalls to fire
because the TCE handlers are in the default hcall list but not
implemented.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-9-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument
Nicholas Piggin [Mon, 12 Apr 2021 01:48:40 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument

The va argument is not used in the function or set by its asm caller,
so remove it to be safe.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-8-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: Remove redundant mtspr PSPB
Nicholas Piggin [Mon, 12 Apr 2021 01:48:39 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Remove redundant mtspr PSPB

This SPR is set to 0 twice when exiting the guest.

Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-7-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: Prevent radix guests setting LPCR[TC]
Nicholas Piggin [Mon, 12 Apr 2021 01:48:38 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Prevent radix guests setting LPCR[TC]

Prevent radix guests setting LPCR[TC]. This bit only applies to hash
partitions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-6-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2
Nicholas Piggin [Mon, 12 Apr 2021 01:48:37 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2

These are already disallowed by H_SET_MODE from the guest, also disallow
these by updating LPCR directly.

AIL modes can affect the host interrupt behaviour while the guest LPCR
value is set, so filter it here too.

Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-5-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: Add a function to filter guest LPCR bits
Nicholas Piggin [Mon, 12 Apr 2021 01:48:36 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits

Guest LPCR depends on hardware type, and future changes will add
restrictions based on errata and guest MMU mode. Move this logic
to a common function and use it for the cases where the guest
wants to update its LPCR (or the LPCR of a nested guest).

This also adds a warning in other places that set or update LPCR
if we try to set something that would have been disallowed by
the filter, as a sanity check.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-4-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs
Nicholas Piggin [Mon, 12 Apr 2021 01:48:35 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs

This will get a bit more complicated in future patches. Move it
into the helper function.

This change allows the L1 hypervisor to determine some of the LPCR
bits that the L0 is using to run it, which could be a privilege
violation (LPCR is HV-privileged), although the same problem exists
now for HFSCR for example. Discussion of the HV privilege issue is
ongoing and can be resolved with a later change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-3-npiggin@gmail.com
3 years agoKVM: PPC: Book3S HV P9: Restore host CTRL SPR after guest exit
Nicholas Piggin [Mon, 12 Apr 2021 01:48:34 +0000 (11:48 +1000)]
KVM: PPC: Book3S HV P9: Restore host CTRL SPR after guest exit

The host CTRL (runlatch) value is not restored after guest exit. The
host CTRL should always be 1 except in CPU idle code, so this can result
in the host running with runlatch clear, and potentially switching to
a different vCPU which then runs with runlatch clear as well.

This has little effect on P9 machines, CTRL is only responsible for some
PMU counter logic in the host and so other than corner cases of software
relying on that, or explicitly reading the runlatch value (Linux does
not appear to be affected but it's possible non-Linux guests could be),
there should be no execution correctness problem, though it could be
used as a covert channel between guests.

There may be microcontrollers, firmware or monitoring tools that sample
the runlatch value out-of-band, however since the register is writable
by guests, these values would (should) not be relied upon for correct
operation of the host, so suboptimal performance or incorrect reporting
should be the worst problem.

Fixes: 95a6432ce9038 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-2-npiggin@gmail.com
3 years agopowerpc/32: Remove powerpc specific definition of 'ptrdiff_t'
Christophe Leroy [Mon, 5 Apr 2021 09:57:27 +0000 (09:57 +0000)]
powerpc/32: Remove powerpc specific definition of 'ptrdiff_t'

For unknown reason, old commit d27dfd388715 ("Import pre2.0.8")
changed 'ptrdiff_t' from 'int' to 'long'.

GCC expects it as 'int' really, and this leads to the following
warning when building KFENCE:

  CC      mm/kfence/report.o
In file included from ./include/linux/printk.h:7,
                 from ./include/linux/kernel.h:16,
                 from mm/kfence/report.c:10:
mm/kfence/report.c: In function 'kfence_report_error':
./include/linux/kern_levels.h:5:18: warning: format '%td' expects argument of type 'ptrdiff_t', but argument 6 has type 'long int' [-Wformat=]
    5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
      |                  ^~~~~~
./include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
   11 | #define KERN_ERR KERN_SOH "3" /* error conditions */
      |                  ^~~~~~~~
./include/linux/printk.h:343:9: note: in expansion of macro 'KERN_ERR'
  343 |  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
      |         ^~~~~~~~
mm/kfence/report.c:213:3: note: in expansion of macro 'pr_err'
  213 |   pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%td):\n",
      |   ^~~~~~

<asm-generic/uapi/posix-types.h> defines it as 'int', and
defines 'size_t' and 'ssize_t' exactly as powerpc do, so
remove the powerpc specific definitions and fallback on
generic ones.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e43d133bf52fa19e577f64f3a3a38cedc570377d.1617616601.git.christophe.leroy@csgroup.eu
3 years agopowerpc: iommu: fix build when neither PCI or IBMVIO is set
Randy Dunlap [Sun, 4 Apr 2021 19:26:23 +0000 (12:26 -0700)]
powerpc: iommu: fix build when neither PCI or IBMVIO is set

When neither CONFIG_PCI nor CONFIG_IBMVIO is set/enabled, iommu.c has a
build error. The fault injection code is not useful in that kernel config,
so make the FAIL_IOMMU option depend on PCI || IBMVIO.

Prevents this build error (warning escalated to error):
../arch/powerpc/kernel/iommu.c:178:30: error: 'fail_iommu_bus_notifier' defined but not used [-Werror=unused-variable]
  178 | static struct notifier_block fail_iommu_bus_notifier = {

Fixes: d6b9a81b2a45 ("powerpc: IOMMU fault injection")
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210404192623.10697-1-rdunlap@infradead.org
3 years agopowerpc/pseries: remove unneeded semicolon
Yang Li [Tue, 6 Apr 2021 01:33:05 +0000 (09:33 +0800)]
powerpc/pseries: remove unneeded semicolon

Eliminate the following coccicheck warning:
./arch/powerpc/platforms/pseries/lpar.c:1633:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1617672785-81372-1-git-send-email-yang.lee@linux.alibaba.com
3 years agopowerpc/64s: power4 nap fixup in C
Nicholas Piggin [Tue, 6 Apr 2021 02:55:08 +0000 (12:55 +1000)]
powerpc/64s: power4 nap fixup in C

There is no need for this to be in asm, use the new intrrupt entry wrapper.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210406025508.821718-1-npiggin@gmail.com
3 years agopowerpc/perf: Fix PMU constraint check for EBB events
Athira Rajeev [Tue, 6 Apr 2021 16:16:01 +0000 (12:16 -0400)]
powerpc/perf: Fix PMU constraint check for EBB events

The power PMU group constraints includes check for EBB events to make
sure all events in a group must agree on EBB. This will prevent
scheduling EBB and non-EBB events together. But in the existing check,
settings for constraint mask and value is interchanged. Patch fixes the
same.

Before the patch, PMU selftest "cpu_event_pinned_vs_ebb_test" fails with
below in dmesg logs. This happens because EBB event gets enabled along
with a non-EBB cpu event.

  [35600.453346] cpu_event_pinne[41326]: illegal instruction (4)
  at 10004a18 nip 10004a18 lr 100049f8 code 1 in
  cpu_event_pinned_vs_ebb_test[10000000+10000]

Test results after the patch:

  $ ./pmu/ebb/cpu_event_pinned_vs_ebb_test
  test: cpu_event_pinned_vs_ebb
  tags: git_version:v5.12-rc5-93-gf28c3125acd3-dirty
  Binding to cpu 8
  EBB Handler is at 0x100050c8
  read error on event 0x7fffe6bd4040!
  PM_RUN_INST_CMPL: result 9872 running/enabled 37930432
  success: cpu_event_pinned_vs_ebb

This bug was hidden by other logic until commit 1908dc911792 (perf:
Tweak perf_event_attr::exclusive semantics).

Fixes: 4df489991182 ("powerpc/perf: Add power8 EBB support")
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Mention commit 1908dc911792]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1617725761-1464-1-git-send-email-atrajeev@linux.vnet.ibm.com
3 years agoselftests/powerpc: Suggest memtrace instead of /dev/mem for ci memory
Jordan Niethe [Thu, 25 Feb 2021 03:21:07 +0000 (14:21 +1100)]
selftests/powerpc: Suggest memtrace instead of /dev/mem for ci memory

The suggested alternative for getting cache-inhibited memory with 'mem='
and /dev/mem is pretty hacky. Also, PAPR guests do not allow system
memory to be mapped cache-inhibited so despite /dev/mem being available
this will not work which can cause confusion.  Instead recommend using
the memtrace buffers. memtrace is only available on powernv so there
will not be any chance of trying to do this in a guest.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210225032108.1458352-2-jniethe5@gmail.com