OSDN Git Service

uclinux-h8/linux.git
3 years agopowerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832x
Christophe Leroy [Tue, 18 Aug 2020 17:19:18 +0000 (17:19 +0000)]
powerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832x

The e300c2 core which is embedded in mpc832x CPU doesn't have
an FPU.

Make it possible to not select CONFIG_PPC_FPU when building a
kernel dedicated to that target.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fcdc60d85baf80eaa0a7f3261d9d889282068216.1597770847.git.christophe.leroy@csgroup.eu
3 years agopowerpc/signal: Don't manage floating point regs when no FPU
Christophe Leroy [Tue, 18 Aug 2020 17:19:17 +0000 (17:19 +0000)]
powerpc/signal: Don't manage floating point regs when no FPU

There is no point in copying floating point regs when there
is no FPU and MATH_EMULATION is not selected.

Create a new CONFIG_PPC_FPU_REGS bool that is selected by
CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to
opt out everything related to fp_state in thread_struct.

The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU
as fpu.S build is conditionnal to CONFIG_PPC_FPU.

The following app spends approx 8.1 seconds system time on an 8xx
without the patch, and 7.0 seconds with the patch (13.5% reduction).

On an 832x, it spends approx 2.6 seconds system time without
the patch and 2.1 seconds with the patch (19% reduction).

void sigusr1(int sig) { }

int main(int argc, char **argv)
{
int i = 100000;

signal(SIGUSR1, sigusr1);
for (;i--;)
raise(SIGUSR1);
exit(0);
}

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
3 years agopowerpc/ptrace: Create ptrace_get_fpr() and ptrace_put_fpr()
Christophe Leroy [Tue, 18 Aug 2020 17:19:16 +0000 (17:19 +0000)]
powerpc/ptrace: Create ptrace_get_fpr() and ptrace_put_fpr()

On the same model as ptrace_get_reg() and ptrace_put_reg(),
create ptrace_get_fpr() and ptrace_put_fpr() to get/set
the floating points registers.

We move the boundary checkings in them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24a1baedea7f7ae7b6bf27be98bab6d01b5ca2c1.1597770847.git.christophe.leroy@csgroup.eu
3 years agopowerpc/ptrace: Consolidate reg index calculation
Christophe Leroy [Tue, 18 Aug 2020 17:19:15 +0000 (17:19 +0000)]
powerpc/ptrace: Consolidate reg index calculation

Today we have:

#ifdef CONFIG_PPC32
index = addr >> 2;
if ((addr & 3) || child->thread.regs == NULL)
#else
index = addr >> 3;
if ((addr & 7))
#endif

sizeof(long) has value 4 for PPC32 and value 8 for PPC64.

Dividing by 4 is equivalent to >> 2 and dividing by 8 is equivalent
to >> 3.

And 3 and 7 are respectively (sizeof(long) - 1).

Use sizeof(long) to get rid of the #ifdef CONFIG_PPC32 and consolidate
the calculation and checking.

thread.regs have to be not NULL on both PPC32 and PPC64 so adding
that test on PPC64 is harmless.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3cd1e284e93c60db981659585e18d1f6bb73ed2f.1597770847.git.christophe.leroy@csgroup.eu
3 years agopowerpc/ptrace: Move declaration of ptrace_get_reg() and ptrace_set_reg()
Christophe Leroy [Tue, 18 Aug 2020 17:19:14 +0000 (17:19 +0000)]
powerpc/ptrace: Move declaration of ptrace_get_reg() and ptrace_set_reg()

ptrace_get_reg() and ptrace_set_reg() are only used internally by
ptrace.

Move them in arch/powerpc/kernel/ptrace/ptrace-decl.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/376c258267aeae54a4423bc4a2e107a9611f0039.1597770847.git.christophe.leroy@csgroup.eu
3 years agopowerpc/signal: Move inline functions in signal.h
Christophe Leroy [Tue, 18 Aug 2020 17:19:13 +0000 (17:19 +0000)]
powerpc/signal: Move inline functions in signal.h

To really be inlined, the functions need to be defined in the
same C file as the caller, or in an included header.

Move functions defined inline from signal .c in signal.h

Fixes: 3dd4eb83a9c0 ("powerpc: move common register copy functions from signal_32.c to signal.c")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/35b1bd44a1a66f5bcf9b457a1c480ac8d5ef50b2.1597770847.git.christophe.leroy@csgroup.eu
3 years agopowerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
Christophe Leroy [Thu, 26 Nov 2020 13:10:06 +0000 (00:10 +1100)]
powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-9-mpe@ellerman.id.au
3 years agopowerpc/vdso: Switch VDSO to generic C implementation.
Christophe Leroy [Thu, 26 Nov 2020 13:10:05 +0000 (00:10 +1100)]
powerpc/vdso: Switch VDSO to generic C implementation.

With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.

On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
  gettimeofday:       vdso:  828 nsec/call
  clock-getres-realtime-coarse:   vdso:  391 nsec/call
  clock-gettime-realtime-coarse:  vdso:  614 nsec/call
  clock-getres-realtime:       vdso:  460 nsec/call
  clock-gettime-realtime:       vdso:  876 nsec/call
  clock-getres-monotonic-coarse:  vdso:  399 nsec/call
  clock-gettime-monotonic-coarse: vdso:  691 nsec/call
  clock-getres-monotonic:       vdso:  460 nsec/call
  clock-gettime-monotonic:       vdso: 1026 nsec/call

On an 8xx at 132 MHz, vdsotest with the C VDSO:
  gettimeofday:       vdso:  955 nsec/call
  clock-getres-realtime-coarse:   vdso:  545 nsec/call
  clock-gettime-realtime-coarse:  vdso:  592 nsec/call
  clock-getres-realtime:          vdso:  545 nsec/call
  clock-gettime-realtime:       vdso:  941 nsec/call
  clock-getres-monotonic-coarse:  vdso:  545 nsec/call
  clock-gettime-monotonic-coarse: vdso:  591 nsec/call
  clock-getres-monotonic:         vdso:  545 nsec/call
  clock-gettime-monotonic:        vdso:  940 nsec/call

It is even better for gettime with monotonic clocks.

Unsupported clocks with ASM VDSO:
  clock-gettime-boottime:         vdso: 3851 nsec/call
  clock-gettime-tai:         vdso: 3852 nsec/call
  clock-gettime-monotonic-raw:    vdso: 3396 nsec/call

Same clocks with C VDSO:
  clock-gettime-tai:              vdso:  941 nsec/call
  clock-gettime-monotonic-raw:    vdso: 1001 nsec/call
  clock-gettime-monotonic-coarse: vdso:  591 nsec/call

On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
  gettimeofday:        vdso: 220 nsec/call
  clock-getres-realtime-coarse:   vdso: 102 nsec/call
  clock-gettime-realtime-coarse:  vdso: 178 nsec/call
  clock-getres-realtime:          vdso: 129 nsec/call
  clock-gettime-realtime:       vdso: 235 nsec/call
  clock-getres-monotonic-coarse:  vdso: 105 nsec/call
  clock-gettime-monotonic-coarse: vdso: 208 nsec/call
  clock-getres-monotonic:         vdso: 129 nsec/call
  clock-gettime-monotonic:        vdso: 274 nsec/call

On an 8321E at 333 MHz, vdsotest with the C VDSO:
  gettimeofday:       vdso: 272 nsec/call
  clock-getres-realtime-coarse:   vdso: 160 nsec/call
  clock-gettime-realtime-coarse:  vdso: 184 nsec/call
  clock-getres-realtime:          vdso: 166 nsec/call
  clock-gettime-realtime:         vdso: 281 nsec/call
  clock-getres-monotonic-coarse:  vdso: 160 nsec/call
  clock-gettime-monotonic-coarse: vdso: 184 nsec/call
  clock-getres-monotonic:         vdso: 169 nsec/call
  clock-gettime-monotonic:        vdso: 275 nsec/call

On a Power9 Nimbus DD2.2 at 3.8GHz, with the ASM VDSO:
  clock-gettime-monotonic:       vdso:  35 nsec/call
  clock-getres-monotonic:       vdso:  16 nsec/call
  clock-gettime-monotonic-coarse: vdso:  18 nsec/call
  clock-getres-monotonic-coarse:  vdso: 522 nsec/call
  clock-gettime-monotonic-raw:    vdso: 598 nsec/call
  clock-getres-monotonic-raw:     vdso: 520 nsec/call
  clock-gettime-realtime:       vdso:  34 nsec/call
  clock-getres-realtime:       vdso:  16 nsec/call
  clock-gettime-realtime-coarse:  vdso:  18 nsec/call
  clock-getres-realtime-coarse:   vdso: 517 nsec/call
  getcpu:       vdso:   8 nsec/call
  gettimeofday:       vdso:  25 nsec/call

And with the C VDSO:
  clock-gettime-monotonic:       vdso:  37 nsec/call
  clock-getres-monotonic:       vdso:  20 nsec/call
  clock-gettime-monotonic-coarse: vdso:  21 nsec/call
  clock-getres-monotonic-coarse:  vdso:  19 nsec/call
  clock-gettime-monotonic-raw:    vdso:  38 nsec/call
  clock-getres-monotonic-raw:     vdso:  20 nsec/call
  clock-gettime-realtime:       vdso:  37 nsec/call
  clock-getres-realtime:       vdso:  20 nsec/call
  clock-gettime-realtime-coarse:  vdso:  20 nsec/call
  clock-getres-realtime-coarse:   vdso:  19 nsec/call
  getcpu:       vdso:   8 nsec/call
  gettimeofday:       vdso:  28 nsec/call

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-8-mpe@ellerman.id.au
3 years agopowerpc/vdso: Save and restore TOC pointer on PPC64
Christophe Leroy [Thu, 26 Nov 2020 13:10:04 +0000 (00:10 +1100)]
powerpc/vdso: Save and restore TOC pointer on PPC64

On PPC64, the TOC pointer needs to be saved and restored.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-7-mpe@ellerman.id.au
3 years agopowerpc/vdso: Prepare for switching VDSO to generic C implementation.
Christophe Leroy [Thu, 26 Nov 2020 13:10:03 +0000 (00:10 +1100)]
powerpc/vdso: Prepare for switching VDSO to generic C implementation.

Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and always return true.

Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:

  18: 35 25 ff e0  addic.  r9,r5,-32
  1c: 41 80 00 10  blt     2c <shift+0x14>
  20: 7c 64 4c 30  srw     r4,r3,r9
  24: 38 60 00 00  li      r3,0
  ...
  2c: 54 69 08 3c  rlwinm  r9,r3,1,0,30
  30: 21 45 00 1f  subfic  r10,r5,31
  34: 7c 84 2c 30  srw     r4,r4,r5
  38: 7d 29 50 30  slw     r9,r9,r10
  3c: 7c 63 2c 30  srw     r3,r3,r5
  40: 7d 24 23 78  or      r4,r9,r4

In our case the shift is always <= 32. In addition,  the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.

With the patch, we get:
   0: 21 25 00 20  subfic  r9,r5,32
   4: 7c 69 48 30  slw     r9,r3,r9
   8: 7c 84 2c 30  srw     r4,r4,r5
   c: 7d 24 23 78  or      r4,r9,r4
  10: 7c 63 2c 30  srw     r3,r3,r5

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-6-mpe@ellerman.id.au
3 years agopowerpc/barrier: Use CONFIG_PPC64 for barrier selection
Michael Ellerman [Thu, 26 Nov 2020 13:10:02 +0000 (00:10 +1100)]
powerpc/barrier: Use CONFIG_PPC64 for barrier selection

Currently we use ifdef __powerpc64__ in barrier.h to decide if we
should use lwsync or eieio for SMPWMB which is then used by
__smp_wmb().

That means when we are building the compat VDSO we will use eieio,
because it's 32-bit code, even though we're building a 64-bit kernel
for a 64-bit CPU.

Although eieio should work, it would be cleaner if we always used the
same barrier, even for the 32-bit VDSO.

So change the ifdef to CONFIG_PPC64, so that the selection is made
based on the bitness of the kernel we're building for, not the current
compilation unit.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-5-mpe@ellerman.id.au
3 years agopowerpc/time: Fix mftb()/get_tb() for use with the compat VDSO
Michael Ellerman [Thu, 26 Nov 2020 13:10:01 +0000 (00:10 +1100)]
powerpc/time: Fix mftb()/get_tb() for use with the compat VDSO

When we're building the compat VDSO we are building 32-bit code but in
the context of a 64-bit kernel configuration.

To make this work we need to be careful in some places when using
ifdefs to differentiate between CONFIG_PPC64 and __powerpc64__.

CONFIG_PPC64 indicates the kernel we're building is 64-bit, but it
doesn't tell us that we're currently building 64-bit code - we could
be building 32-bit code for the compat VDSO.

On the other hand __powerpc64__ tells us that we are currently
building 64-bit code (and therefore we must also be building a 64-bit
kernel).

In the case of get_tb() we want to use the 32-bit code sequence
regardless of whether the kernel we're building for is 64-bit or
32-bit, what matters is the word size of the current object. So we
need to check __powerpc64__ to decide if we use mftb() or the
mftbu()/mftb() sequence.

For mftb() the logic for CPU_FTR_CELL_TB_BUG only makes sense if we're
building 64-bit code, so guard that with a __powerpc64__ check.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-4-mpe@ellerman.id.au
3 years agopowerpc/time: Move timebase functions into new asm/vdso/timebase.h
Christophe Leroy [Thu, 26 Nov 2020 13:10:00 +0000 (00:10 +1100)]
powerpc/time: Move timebase functions into new asm/vdso/timebase.h

In order to easily use get_tb() from C VDSO, move timebase
functions into a new header named asm/vdso/timebase.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-3-mpe@ellerman.id.au
3 years agopowerpc/processor: Move cpu_relax() into asm/vdso/processor.h
Christophe Leroy [Thu, 26 Nov 2020 13:09:59 +0000 (00:09 +1100)]
powerpc/processor: Move cpu_relax() into asm/vdso/processor.h

cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.

Move it there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-2-mpe@ellerman.id.au
3 years agopowerpc/feature: Use CONFIG_PPC64 instead of __powerpc64__ to define possible features
Christophe Leroy [Thu, 26 Nov 2020 13:09:58 +0000 (00:09 +1100)]
powerpc/feature: Use CONFIG_PPC64 instead of __powerpc64__ to define possible features

In order to build VDSO32 for PPC64, we need to have CPU_FTRS_POSSIBLE
and CPU_FTRS_ALWAYS independant of whether we are building the
32 bits VDSO or the 64 bits VDSO.

Use #ifdef CONFIG_PPC64 instead of #ifdef __powerpc64__

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-1-mpe@ellerman.id.au
3 years agopowerpc: Update NUMA Kconfig description & help text
Michael Ellerman [Tue, 24 Nov 2020 12:05:47 +0000 (23:05 +1100)]
powerpc: Update NUMA Kconfig description & help text

Update the NUMA Kconfig description to match other architectures, and
add some help text. Shamelessly borrowed from x86/arm64.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201124120547.1940635-3-mpe@ellerman.id.au
3 years agopowerpc: Make NUMA default y for powernv
Michael Ellerman [Tue, 24 Nov 2020 12:05:46 +0000 (23:05 +1100)]
powerpc: Make NUMA default y for powernv

Our NUMA option is default y for pseries, but not powernv. The bulk of
powernv systems are NUMA, so make NUMA default y for powernv also.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20201124120547.1940635-2-mpe@ellerman.id.au
3 years agopowerpc: Make NUMA depend on SMP
Michael Ellerman [Tue, 24 Nov 2020 12:05:45 +0000 (23:05 +1100)]
powerpc: Make NUMA depend on SMP

Our Kconfig allows NUMA to be enabled without SMP, but none of
our defconfigs use that combination. This means it can easily be
broken inadvertently by code changes, which has happened recently.

Although it's theoretically possible to have a machine with a single
CPU and multiple memory nodes, I can't think of any real systems where
that's the case. Even so if such a system exists, it can just run an
SMP kernel anyway.

So to avoid the need to add extra #ifdefs and/or build breaks, make
NUMA depend on SMP.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201124120547.1940635-1-mpe@ellerman.id.au
3 years agopowerpc: inline iomap accessors
Christophe Leroy [Sat, 21 Nov 2020 17:59:19 +0000 (17:59 +0000)]
powerpc: inline iomap accessors

ioreadXX()/ioreadXXbe() accessors are equivalent to ppc
in_leXX()/in_be16() accessors but they are not inlined.

Since commit 0eb573682872 ("powerpc/kerenl: Enable EEH for IO
accessors"), the 'le' versions are equivalent to the ones
defined in asm-generic/io.h, allthough the ones there are inlined.

Include asm-generic/io.h to get them. Keep ppc versions of the
'be' ones as they are optimised, but make them inline in ppc io.h.

This reduces the size of ppc64e_defconfig build by 3 kbytes:

   text    data     bss     dec     hex filename
10160733 4343422  562972 15067127  e5e7f7 vmlinux.before
10159239 4341590  562972 15063801  e5daf9 vmlinux.after

A typical function using ioread and iowrite before the change:

c00000000066a3c4 <.ata_bmdma_stop>:
c00000000066a3c4: 7c 08 02 a6  mflr    r0
c00000000066a3c8: fb c1 ff f0  std     r30,-16(r1)
c00000000066a3cc: f8 01 00 10  std     r0,16(r1)
c00000000066a3d0: fb e1 ff f8  std     r31,-8(r1)
c00000000066a3d4: f8 21 ff 81  stdu    r1,-128(r1)
c00000000066a3d8: eb e3 00 00  ld      r31,0(r3)
c00000000066a3dc: eb df 00 98  ld      r30,152(r31)
c00000000066a3e0: 7f c3 f3 78  mr      r3,r30
c00000000066a3e4: 4b 9b 6f 7d  bl      c000000000021360 <.ioread8>
c00000000066a3e8: 60 00 00 00  nop
c00000000066a3ec: 7f c4 f3 78  mr      r4,r30
c00000000066a3f0: 54 63 06 3c  rlwinm  r3,r3,0,24,30
c00000000066a3f4: 4b 9b 70 4d  bl      c000000000021440 <.iowrite8>
c00000000066a3f8: 60 00 00 00  nop
c00000000066a3fc: 7f e3 fb 78  mr      r3,r31
c00000000066a400: 38 21 00 80  addi    r1,r1,128
c00000000066a404: e8 01 00 10  ld      r0,16(r1)
c00000000066a408: eb c1 ff f0  ld      r30,-16(r1)
c00000000066a40c: 7c 08 03 a6  mtlr    r0
c00000000066a410: eb e1 ff f8  ld      r31,-8(r1)
c00000000066a414: 4b ff ff 8c  b       c00000000066a3a0 <.ata_sff_dma_pause>

The same function with this patch:

c000000000669cb4 <.ata_bmdma_stop>:
c000000000669cb4: e8 63 00 00  ld      r3,0(r3)
c000000000669cb8: e9 43 00 98  ld      r10,152(r3)
c000000000669cbc: 7c 00 04 ac  hwsync
c000000000669cc0: 89 2a 00 00  lbz     r9,0(r10)
c000000000669cc4: 0c 09 00 00  twi     0,r9,0
c000000000669cc8: 4c 00 01 2c  isync
c000000000669ccc: 55 29 06 3c  rlwinm  r9,r9,0,24,30
c000000000669cd0: 7c 00 04 ac  hwsync
c000000000669cd4: 99 2a 00 00  stb     r9,0(r10)
c000000000669cd8: a1 4d 06 f0  lhz     r10,1776(r13)
c000000000669cdc: 2c 2a 00 00  cmpdi   r10,0
c000000000669ce0: 41 c2 00 08  beq-    c000000000669ce8 <.ata_bmdma_stop+0x34>
c000000000669ce4: b1 4d 06 f2  sth     r10,1778(r13)
c000000000669ce8: 4b ff ff a8  b       c000000000669c90 <.ata_sff_dma_pause>

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/18b357d68c4cde149f75c7a1031c850925cd8128.1605981539.git.christophe.leroy@csgroup.eu
3 years agopowerpc/perf: Fix crash with is_sier_available when pmu is not set
Athira Rajeev [Tue, 24 Nov 2020 02:40:40 +0000 (21:40 -0500)]
powerpc/perf: Fix crash with is_sier_available when pmu is not set

On systems without any specific PMU driver support registered, running
'perf record' with â€”intr-regs  will crash ( perf record -I <workload> ).

The relevant portion from crash logs and Call Trace:

Unable to handle kernel paging request for data at address 0x00000068
Faulting instruction address: 0xc00000000013eb18
Oops: Kernel access of bad area, sig: 11 [#1]
CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1
NIP:  c00000000013eb18 LR: c000000000139f2c CTR: c000000000393d80
REGS: c0000004a07ab4f0 TRAP: 0300   Not tainted  (4.18.0-193.el8.ppc64le)
NIP [c00000000013eb18] is_sier_available+0x18/0x30
LR [c000000000139f2c] perf_reg_value+0x6c/0xb0
Call Trace:
[c0000004a07ab770] [c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable)
[c0000004a07ab7a0] [c0000000003aa77c] perf_output_sample+0x60c/0xac0
[c0000004a07ab840] [c0000000003ab3f0] perf_event_output_forward+0x70/0xb0
[c0000004a07ab8c0] [c00000000039e208] __perf_event_overflow+0x88/0x1a0
[c0000004a07ab910] [c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0
[c0000004a07abc50] [c000000000228b9c] __hrtimer_run_queues+0x17c/0x480
[c0000004a07abcf0] [c00000000022aaf4] hrtimer_interrupt+0x144/0x520
[c0000004a07abdd0] [c00000000002a864] timer_interrupt+0x104/0x2f0
[c0000004a07abe30] [c0000000000091c4] decrementer_common+0x114/0x120

When perf record session is started with "-I" option, capturing registers
on each sample calls is_sier_available() to check for the
SIER (Sample Instruction Event Register) availability in the platform.
This function in core-book3s accesses 'ppmu->flags'. If a platform specific
PMU driver is not registered, ppmu is set to NULL and accessing its
members results in a crash. Fix the crash by returning false in
is_sier_available() if ppmu is not set.

Fixes: 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
3 years agopowerpc/boot: Make use of REL16 relocs in powerpc/boot/util.S
Alan Modra [Fri, 27 Nov 2020 00:48:42 +0000 (11:48 +1100)]
powerpc/boot: Make use of REL16 relocs in powerpc/boot/util.S

Use bcl 20,31,0f rather than plain bl to avoid unbalancing the link
stack.

Update the code to use REL16 relocs, available for ppc64 in 2009 (and
ppc32 in 2005).

Signed-off-by: Alan Modra <amodra@gmail.com>
[mpe: Incorporate more detail into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agopowerpc: Work around inline asm issues in alternate feature sections
Bill Wendling [Fri, 20 Nov 2020 22:40:34 +0000 (14:40 -0800)]
powerpc: Work around inline asm issues in alternate feature sections

The clang toolchain treats inline assembly a bit differently than
straight assembly code. In particular, inline assembly doesn't have
the complete context available to resolve expressions. This is
intentional to avoid divergence in the resulting assembly code.

We can work around this issue by borrowing a workaround done for ARM,
i.e. not directly testing the labels themselves, but by moving the
current output pointer by a value that should always be zero. If this
value is not null, then we will trigger a backward move, which is
explicitly forbidden.

Signed-off-by: Bill Wendling <morbo@google.com>
[mpe: Put it in a macro and only do the workaround for clang]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-4-morbo@google.com
3 years agopowerpc/boot: Use clang when CC is clang
Bill Wendling [Fri, 20 Nov 2020 22:40:33 +0000 (14:40 -0800)]
powerpc/boot: Use clang when CC is clang

The gcc compiler may not be available if CC is clang.

Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-3-morbo@google.com
3 years agopowerpc/boot/wrapper: Add "-z notext" flag to disable diagnostic
Bill Wendling [Fri, 20 Nov 2020 22:40:32 +0000 (14:40 -0800)]
powerpc/boot/wrapper: Add "-z notext" flag to disable diagnostic

The "-z notext" flag disables reporting an error if DT_TEXTREL is set.

  ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
    symbol: _start in readonly segment; recompile object files with
    -fPIC or pass '-Wl,-z,notext' to allow text relocations in the
    output
  >>> defined in
  >>> referenced by crt0.o:(.text+0x8) in archive arch/powerpc/boot/wrapper.a

The BFD linker disables this by default (though it's configurable in
current versions). LLD enables this by default. So we add the flag to
keep LLD from emitting the error.

Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-2-morbo@google.com
3 years agopowerpc/boot/wrapper: Add "-z rodynamic" when using LLD
Bill Wendling [Wed, 18 Nov 2020 22:39:10 +0000 (14:39 -0800)]
powerpc/boot/wrapper: Add "-z rodynamic" when using LLD

Normally all read-only sections precede SHF_WRITE sections. .dynamic
and .got have the SHF_WRITE flag; .dynamic probably because of
DT_DEBUG. LLD emits an error when this happens, so use "-z rodynamic"
to mark .dynamic as read-only.

Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201118223910.2711337-1-morbo@google.com
3 years agopowerpc/boot: Move the .got section to after the .dynamic section
Bill Wendling [Sat, 17 Oct 2020 00:01:51 +0000 (17:01 -0700)]
powerpc/boot: Move the .got section to after the .dynamic section

Both .dynamic and .got are RELRO sections and should be placed
together, and LLD emits an error:

  ld.lld: error: section: .got is not contiguous with other relro sections

Place them together to avoid this.

Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201017000151.150788-1-morbo@google.com
3 years agopowerpc/ptrace: Hard wire PT_SOFTE value to 1 in gpr_get() too
Oleg Nesterov [Thu, 19 Nov 2020 16:02:47 +0000 (17:02 +0100)]
powerpc/ptrace: Hard wire PT_SOFTE value to 1 in gpr_get() too

The commit a8a4b03ab95f ("powerpc: Hard wire PT_SOFTE value to 1 in
ptrace & signals") changed ptrace_get_reg(PT_SOFTE) to report 0x1,
but PTRACE_GETREGS still copies pt_regs->softe as is.

This is not consistent and this breaks the user-regs-peekpoke test
from https://sourceware.org/systemtap/wiki/utrace/tests/

Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201119160247.GB5188@redhat.com
3 years agopowerpc/ptrace: Simplify gpr_get()/tm_cgpr_get()
Oleg Nesterov [Thu, 19 Nov 2020 16:02:21 +0000 (17:02 +0100)]
powerpc/ptrace: Simplify gpr_get()/tm_cgpr_get()

gpr_get() does membuf_write() twice to override pt_regs->msr in
between. We can call membuf_write() once and change ->msr in the
kernel buffer, this simplifies the code and the next fix.

The patch adds a new simple helper, membuf_at(offs), it returns the
new membuf which can be safely used after membuf_write().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[mpe: Fixup some minor whitespace issues noticed by Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201119160221.GA5188@redhat.com
3 years agoMerge branch 'fixes' into next
Michael Ellerman [Wed, 25 Nov 2020 12:17:31 +0000 (23:17 +1100)]
Merge branch 'fixes' into next

Merge our fixes branch, in particular to bring in the changes for the
entry/uaccess flush.

3 years agopowerpc/64s: Fix allnoconfig build since uaccess flush
Stephen Rothwell [Mon, 23 Nov 2020 07:40:16 +0000 (18:40 +1100)]
powerpc/64s: Fix allnoconfig build since uaccess flush

Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h.

Otherwise the build fails with eg:

  arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class
     66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);

Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
3 years agoMerge tag 'powerpc-cve-2020-4788' into fixes
Michael Ellerman [Mon, 23 Nov 2020 10:16:27 +0000 (21:16 +1100)]
Merge tag 'powerpc-cve-2020-4788' into fixes

From Daniel's cover letter:

IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.

However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.

This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.

3 years agopowerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations
Daniel Axtens [Tue, 17 Nov 2020 05:59:16 +0000 (16:59 +1100)]
powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations

pseries|pnv_setup_rfi_flush already does the count cache flush setup, and
we just added entry and uaccess flushes. So the name is not very accurate
any more. In both platforms we then also immediately setup the STF flush.

Rename them to _setup_security_mitigations and fold the STF flush in.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agoselftests/powerpc: refactor entry and rfi_flush tests
Daniel Axtens [Tue, 17 Nov 2020 05:59:15 +0000 (16:59 +1100)]
selftests/powerpc: refactor entry and rfi_flush tests

For simplicity in backporting, the original entry_flush test contained
a lot of duplicated code from the rfi_flush test. De-duplicate that code.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agoselftests/powerpc: entry flush test
Daniel Axtens [Tue, 17 Nov 2020 05:59:14 +0000 (16:59 +1100)]
selftests/powerpc: entry flush test

Add a test modelled on the RFI flush test which counts the number
of L1D misses doing a simple syscall with the entry flush on and off.

For simplicity of backporting, this test duplicates a lot of code from
rfi_flush. We clean that up in the next patch.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agopowerpc: Only include kup-radix.h for 64-bit Book3S
Michael Ellerman [Thu, 19 Nov 2020 12:43:53 +0000 (23:43 +1100)]
powerpc: Only include kup-radix.h for 64-bit Book3S

In kup.h we currently include kup-radix.h for all 64-bit builds, which
includes Book3S and Book3E. The latter doesn't make sense, Book3E
never uses the Radix MMU.

This has worked up until now, but almost by accident, and the recent
uaccess flush changes introduced a build breakage on Book3E because of
the bad structure of the code.

So disentangle things so that we only use kup-radix.h for Book3S. This
requires some more stubs in kup.h and fixing an include in
syscall_64.c.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agopowerpc/64s: flush L1D after user accesses
Nicholas Piggin [Tue, 17 Nov 2020 05:59:13 +0000 (16:59 +1100)]
powerpc/64s: flush L1D after user accesses

IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.

However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.

This is part of the fix for CVE-2020-4788.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agopowerpc/64s: flush L1D on kernel entry
Nicholas Piggin [Tue, 17 Nov 2020 05:59:12 +0000 (16:59 +1100)]
powerpc/64s: flush L1D on kernel entry

IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.

However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.

This is part of the fix for CVE-2020-4788.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agoselftests/powerpc: rfi_flush: disable entry flush if present
Russell Currey [Tue, 17 Nov 2020 05:59:11 +0000 (16:59 +1100)]
selftests/powerpc: rfi_flush: disable entry flush if present

We are about to add an entry flush. The rfi (exit) flush test measures
the number of L1D flushes over a syscall with the RFI flush enabled and
disabled. But if the entry flush is also enabled, the effect of enabling
and disabling the RFI flush is masked.

If there is a debugfs entry for the entry flush, disable it during the RFI
flush and restore it later.

Reported-by: Spoorthy S <spoorts2@in.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agopowernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
David Hildenbrand [Wed, 11 Nov 2020 14:53:22 +0000 (15:53 +0100)]
powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations

Let's use alloc_contig_pages() for allocating memory and remove the
linear mapping manually via arch_remove_linear_mapping(). Mark all pages
PG_offline, such that they will definitely not get touched - e.g.,
when hibernating. When freeing memory, try to revert what we did.

The original idea was discussed in:
 https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com

This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
architectures, whereby only single pages are unmapped from the linear
mapping. Let's mimic what memory hot(un)plug would do with the linear
mapping.

We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO
that we want to use __GFP_ZERO for clearing once alloc_contig_pages()
understands that.

Tested with in QEMU/TCG with 10 GiB of main memory:
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  105.903043][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  145.042493][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
  [  145.049019][ T1080] memtrace: Freed trace memory back on node 0
  [  145.333960][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
  [root@localhost ~]# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  213.606916][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
  [  213.613855][ T1080] memtrace: Freed trace memory back on node 0
  [  214.185094][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
  [root@localhost ~]# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  234.874872][ T1080] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages
  [  234.886974][ T1080] memtrace: Freed trace memory back on node 0
  [  234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  259.490196][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000

I also made sure allocated memory is properly zeroed.

Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
pages are not movable. However, as we don't run with any memory
hot(un)plug mechanism around, we could make an exception to
increase the chance of allocations succeeding.

Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
along PG_reserved in hibernation code will always return "true"
on powerpc, resulting in the pages getting touched. It's too
generic - e.g., indicates boot allocations.

Note 3: For now, we keep using memory_block_size_bytes() as minimum
granularity.

Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-9-david@redhat.com
3 years agopowerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
David Hildenbrand [Wed, 11 Nov 2020 14:53:21 +0000 (15:53 +0100)]
powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()

Let's revert what we did in case something goes wrong and we return an
error - as already done on arm64 and s390x.

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-8-david@redhat.com
3 years agopowerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()
David Hildenbrand [Wed, 11 Nov 2020 14:53:20 +0000 (15:53 +0100)]
powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()

The single caller (arch_remove_linear_mapping()) prints a proper
warning when this function fails. No need to eventually crash the
kernel - let's drop this WARN_ON.

Suggested-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-7-david@redhat.com
3 years agopowerpc/mm: print warning in arch_remove_linear_mapping()
David Hildenbrand [Wed, 11 Nov 2020 14:53:19 +0000 (15:53 +0100)]
powerpc/mm: print warning in arch_remove_linear_mapping()

Let's print a warning similar to in arch_add_linear_mapping() instead of
WARN_ON_ONCE() and eventually crashing the kernel.

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-6-david@redhat.com
3 years agopowerpc/mm: protect linear mapping modifications by a mutex
David Hildenbrand [Wed, 11 Nov 2020 14:53:18 +0000 (15:53 +0100)]
powerpc/mm: protect linear mapping modifications by a mutex

This code currently relies on mem_hotplug_begin()/mem_hotplug_done() -
create_section_mapping()/remove_section_mapping() implementations
cannot tollerate getting called concurrently.

Let's prepare for callers (memtrace) not holding any such locks (and
don't force them to mess with memory hotplug locks).

Other parts in these functions don't seem to rely on external locking.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-5-david@redhat.com
3 years agopowerpc/mm: factor out creating/removing linear mapping
David Hildenbrand [Wed, 11 Nov 2020 14:53:17 +0000 (15:53 +0100)]
powerpc/mm: factor out creating/removing linear mapping

We want to stop abusing memory hotplug infrastructure in memtrace code
to perform allocations and remove the linear mapping. Instead we will use
alloc_contig_pages() and remove the linear mapping manually.

Let's factor out creating/removing the linear mapping into
arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the
future, we might be able to have whole arch_add_memory() /
arch_remove_memory() be implemented in common code.

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-4-david@redhat.com
3 years agopowerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
David Hildenbrand [Wed, 11 Nov 2020 14:53:16 +0000 (15:53 +0100)]
powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently

It's very easy to crash the kernel right now by simply trying to
enable memtrace concurrently, hammering on the "enable" interface

loop.sh:
  #!/bin/bash

  dmesg --console-off

  while true; do
          echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  done

[root@localhost ~]# loop.sh &
[root@localhost ~]# loop.sh &

Resulting quickly in a kernel crash. Let's properly protect using a
mutex.

Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org# v4.14+
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
3 years agopowerpc/powernv/memtrace: Don't leak kernel memory to user space
David Hildenbrand [Wed, 11 Nov 2020 14:53:15 +0000 (15:53 +0100)]
powerpc/powernv/memtrace: Don't leak kernel memory to user space

We currently leak kernel memory to user space, because memory
offlining doesn't do any implicit clearing of memory and we are
missing explicit clearing of memory.

Let's keep it simple and clear pages before removing the linear
mapping.

Reproduced in QEMU/TCG with 10 GiB of main memory:
  [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null
  [... wait until "free -m" used counter no longer changes and cancel]
  19665802+0 records in
  1+0 records out
  9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s
  [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes
  40000000
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900
  [  402.980063][ T1086] flags: 0x7ffff000001000(reserved)
  [  402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000
  [  402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
  [  402.980845][ T1086] page dumped because: unmovable page
  [  402.989608][ T1086] Offlined Pages 16384
  [  403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000

Before this patch:
  [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace  | head
  00000000  c8 25 72 51 4d 26 36 c5  5c c2 56 15 d5 1a cd 10  |.%rQM&6.\.V.....|
  00000010  19 b9 50 b2 cb e3 60 b8  ec 0a f3 ec 4b 3c 39 f0  |..P...`.....K<9.|$
  00000020  4e 5a 4c cf bd 26 19 ff  37 79 13 67 24 b7 b8 57  |NZL..&..7y.g$..W|$
  00000030  98 3e f5 be 6f 14 6a bd  a4 52 bc 6e e9 e0 c1 5d  |.>..o.j..R.n...]|$
  00000040  76 b3 ae b5 88 d7 da e3  64 23 85 2c 10 88 07 b6  |v.......d#.,....|$
  00000050  9a d8 91 de f7 50 27 69  2e 64 9c 6f d3 19 45 79  |.....P'i.d.o..Ey|$
  00000060  6a 6f 8a 61 71 19 1f c7  f1 df 28 26 ca 0f 84 55  |jo.aq.....(&...U|$
  00000070  01 3f be e4 e2 e1 da ff  7b 8c 8e 32 37 b4 24 53  |.?......{..27.$S|$
  00000080  1b 70 30 45 56 e6 8c c4  0e b5 4c fb 9f dd 88 06  |.p0EV.....L.....|$
  00000090  ef c4 18 79 f1 60 b1 5c  79 59 4d f4 36 d7 4a 5c  |...y.`.\yYM.6.J\|$

After this patch:
  [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace  | head
  00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  *
  40000000

Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com
3 years agopowerpc/perf: Use regs->nip when SIAR is zero
Madhavan Srinivasan [Wed, 21 Oct 2020 08:53:29 +0000 (14:23 +0530)]
powerpc/perf: Use regs->nip when SIAR is zero

In power10 DD1, there is an issue where the SIAR (Sampled Instruction
Address Register) is not latching to the sampled address during random
sampling. This results in value of 0s in the SIAR. Add a check to use
regs->nip when SIAR is zero.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-5-maddy@linux.ibm.com
3 years agopowerpc/perf: Use the address from SIAR register to set cpumode flags
Athira Rajeev [Wed, 21 Oct 2020 08:53:27 +0000 (14:23 +0530)]
powerpc/perf: Use the address from SIAR register to set cpumode flags

While setting the processor mode for any sample, perf_get_misc_flags()
expects the privilege level to differentiate the userspace and kernel
address. On power10 DD1, there is an issue that causes MSR_HV MSR_PR
bits of Sampled Instruction Event Register (SIER) not to be set for
marked events. Hence add a check to use the address in SIAR (Sampled
Instruction Address Register) to identify the privilege level.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-3-maddy@linux.ibm.com
3 years agopowerpc/perf: Drop the check for SIAR_VALID
Athira Rajeev [Wed, 21 Oct 2020 08:53:26 +0000 (14:23 +0530)]
powerpc/perf: Drop the check for SIAR_VALID

In power10 DD1, there is an issue that causes the SIAR_VALID bit of
the SIER (Sampled Instruction Event Register) to not be set. But the
SIAR_VALID bit is used for fetching the instruction address from the
SIAR (Sampled Instruction Address Register), and marked events are
sampled only if the SIAR_VALID bit is set. So drop the check for
SIAR_VALID and return true always incase of power10 DD1.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-2-maddy@linux.ibm.com
3 years agopowerpc/perf: Add new power PMU flag "PPMU_P10_DD1" for power10 DD1
Athira Rajeev [Wed, 21 Oct 2020 08:53:25 +0000 (14:23 +0530)]
powerpc/perf: Add new power PMU flag "PPMU_P10_DD1" for power10 DD1

Add a new power PMU flag "PPMU_P10_DD1" which can be used to
conditionally add any code path for power10 DD1 processor version.
Also modify power10 PMU driver code to set this flag only for DD1,
based on the Processor Version Register (PVR) value.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-1-maddy@linux.ibm.com
3 years agopowerpc/mm: Fix comparing pointer to 0 warning
Kaixu Xia [Tue, 10 Nov 2020 02:56:01 +0000 (10:56 +0800)]
powerpc/mm: Fix comparing pointer to 0 warning

Fixes coccicheck warning:

./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0

Avoid pointer type value compared to 0.

Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604976961-20441-1-git-send-email-kaixuxia@tencent.com
3 years agopowerpc: Remove RFI macro
Christophe Leroy [Sun, 8 Nov 2020 16:57:37 +0000 (16:57 +0000)]
powerpc: Remove RFI macro

RFI macro is just there to add an infinite loop past
rfi in order to avoid prefetch on 40x in half a dozen
of places in entry_32 and head_32.

Those places are already full of #ifdefs, so just add a
few more to explicitely show those loops and remove RFI.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7e9cb9e9240feec63cb330abf40b67d1aad852f.1604854583.git.christophe.leroy@csgroup.eu
3 years agopowerpc: Replace RFI by rfi on book3s/32 and booke
Christophe Leroy [Sun, 8 Nov 2020 16:57:36 +0000 (16:57 +0000)]
powerpc: Replace RFI by rfi on book3s/32 and booke

For book3s/32 and for booke, RFI is just an rfi.
Only 40x has a non trivial RFI.
CONFIG_PPC_RTAS is never selected by 40x platforms.

Make it more explicit by replacing RFI by rfi wherever possible.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu
3 years agopowerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI
Christophe Leroy [Sun, 8 Nov 2020 16:57:35 +0000 (16:57 +0000)]
powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI

In head_64.S, we have two places using RFI to return to
kernel. Use RFI_TO_KERNEL instead.

They are the two only places using RFI on book3s/64, so
the RFI macro can go away.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7719261b0a0d2787772339484c33eb809723bca7.1604854583.git.christophe.leroy@csgroup.eu
3 years agopowerpc/powernv/sriov: fix unsigned int win compared to less than zero
Kaixu Xia [Tue, 10 Nov 2020 11:19:30 +0000 (19:19 +0800)]
powerpc/powernv/sriov: fix unsigned int win compared to less than zero

Fix coccicheck warning:

  arch/powerpc/platforms/powernv/pci-sriov.c:443:7-10:
  WARNING: Unsigned expression compared with zero: win < 0

  arch/powerpc/platforms/powernv/pci-sriov.c:462:7-10:
  WARNING: Unsigned expression compared with zero: win < 0

Fixes: 39efc03e3ee8 ("powerpc/powernv/sriov: Move M64 BAR allocation into a helper")
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1605007170-22171-1-git-send-email-kaixuxia@tencent.com
3 years agoRevert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
Zhang Xiaoxu [Wed, 11 Nov 2020 02:07:52 +0000 (21:07 -0500)]
Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"

This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980.

Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR
support for drc-info property"), the 'cpu_drcs' wouldn't be double
freed when the 'cpus' node not found.

So we needn't apply this patch, otherwise, the memory will be leaked.

Fixes: a0ff72f9f5a7 ("powerpc/pseries/hotplug-cpu: Remove double free in error path")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
[mpe: Caused by me applying a patch to a function that had changed in the interim]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111020752.1686139-1-zhangxiaoxu5@huawei.com
3 years agopowerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory
Nicholas Piggin [Wed, 11 Nov 2020 12:01:51 +0000 (22:01 +1000)]
powerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory

read_user_stack_slow that walks user address translation by hand is
only required on hash, because a hash fault can not be serviced from
"NMI" context (to avoid re-entering the hash code) so the user stack
can be mapped into Linux page tables but not accessible by the CPU.

Radix MMU mode does not have this restriction. A page fault failure
would indicate the page is not accessible via get_user_pages either,
so avoid this on radix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111120151.3150658-1-npiggin@gmail.com
3 years agopowerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
Youling Tang [Wed, 4 Nov 2020 10:59:10 +0000 (18:59 +0800)]
powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S

Use the common INIT_DATA_SECTION rule for the linker script in an effort
to regularize the linker script.

Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604487550-20040-1-git-send-email-tangyouling@loongson.cn
3 years agopowerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
Christophe Leroy [Tue, 3 Nov 2020 18:07:12 +0000 (18:07 +0000)]
powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32

On 8xx, we get the following features:

[    0.000000] cpu_features      = 0x0000000000000100
[    0.000000]   possible        = 0x0000000000000120
[    0.000000]   always          = 0x0000000000000000

This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all
other configurations, the three lines should be equal.

The problem is due to CPU_FTRS_GENERIC_32 which is taken when
CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is
pointless because there is no generic configuration supporting
all 32 bits but book3s/32.

Remove this pointless generic features definition to unbreak the
calculation of 'possible' features and 'always' features.

Fixes: 76bc080ef5a3 ("[POWERPC] Make default cputable entries reflect selected CPU family")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/76a85f30bf981d1aeaae00df99321235494da254.1604426550.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Update tlbiel loop on POWER10
Aneesh Kumar K.V [Wed, 7 Oct 2020 05:33:05 +0000 (11:03 +0530)]
powerpc/mm: Update tlbiel loop on POWER10

With POWER10, single tlbiel instruction invalidates all the congruence
class of the TLB and hence we need to issue only one tlbiel with SET=0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
3 years agopowerpc: Avoid broken GCC __attribute__((optimize))
Ard Biesheuvel [Wed, 28 Oct 2020 08:04:33 +0000 (09:04 +0100)]
powerpc: Avoid broken GCC __attribute__((optimize))

Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early
boot") introduced a couple of uses of __attribute__((optimize)) with
function scope, to disable the stack protector in some early boot
code.

Unfortunately, and this is documented in the GCC man pages [0],
overriding function attributes for optimization is broken, and is only
supported for debug scenarios, not for production: the problem appears
to be that setting GCC -f flags using this method will cause it to
forget about some or all other optimization settings that have been
applied.

So the only safe way to disable the stack protector is to disable it
for the entire source file.

[0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
[mpe: Drop one remaining use of __nostackprotector, reported by snowpatch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
3 years agopowerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
Qinglang Miao [Wed, 28 Oct 2020 09:15:51 +0000 (17:15 +0800)]
powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
3 years agoselftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic
Po-Hsu Lin [Fri, 23 Oct 2020 02:45:39 +0000 (10:45 +0800)]
selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

The eeh-basic test got its own 60 seconds timeout (defined in commit
414f50434aa2 "selftests/eeh: Bump EEH wait time to 60s") per breakable
device.

And we have discovered that the number of breakable devices varies
on different hardware. The device recovery time ranges from 0 to 35
seconds. In our test pool it will take about 30 seconds to run on a
Power8 system that with 5 breakable devices, 60 seconds to run on a
Power9 system that with 4 breakable devices.

Extend the timeout setting in the kselftest framework to 5 minutes
to give it a chance to finish.

Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023024539.9512-1-po-hsu.lin@canonical.com
3 years agopowerpc/ps3: Drop unused DBG macro
Michael Ellerman [Fri, 23 Oct 2020 03:13:05 +0000 (14:13 +1100)]
powerpc/ps3: Drop unused DBG macro

This DBG macro is unused, and has been unused since the file was
originally merged into mainline. Just drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023031305.3284819-1-mpe@ellerman.id.au
3 years agopowerpc/85xx: Fix declaration made after definition
Michael Ellerman [Fri, 23 Oct 2020 02:08:38 +0000 (13:08 +1100)]
powerpc/85xx: Fix declaration made after definition

Currently the clang build of corenet64_smp_defconfig fails with:

  arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error:
  attribute declaration must precede definition
  machine_arch_initcall(corenet_generic, corenet_gen_publish_devices);

Fix it by moving the initcall definition prior to the machine
definition, and directly below the function it calls, which is the
usual style anyway.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023020838.3274226-1-mpe@ellerman.id.au
3 years agopowerpc/mm: Move setting PTE specific flags to pfn_pmd()
Aneesh Kumar K.V [Thu, 22 Oct 2020 09:11:15 +0000 (14:41 +0530)]
powerpc/mm: Move setting PTE specific flags to pfn_pmd()

powerpc used to set the PTE specific flags in set_pte_at(). That is
different from other architectures. To be consistent with other
architectures powerpc updated pfn_pte() to set _PAGE_PTE in commit
379c926d6334 ("powerpc/mm: move setting pte specific flags to
pfn_pte")

That commit didn't do the same for pfn_pmd() because we expect
pmd_mkhuge() to do that. But as per Linus that is a bad rule:

  The rule that you must use "pmd_mkhuge()" seems _completely_ wrong.
  The only valid use to ever make a pmd out of a pfn is to make a
  huge-page.

Hence update pfn_pmd() to set _PAGE_PTE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201022091115.39568-1-aneesh.kumar@linux.ibm.com
3 years agopowerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
Christophe Leroy [Thu, 22 Oct 2020 14:05:46 +0000 (14:05 +0000)]
powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()

fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
On powerpc, those builtins trivially use ctlzw and ctlzd power
instructions.

Allthough those instructions provide the expected result with
input argument 0, __builtin_ctz() and __builtin_ctzll() are
documented as undefined for value 0.

The easiest fix would be to use fls() and fls64() functions
defined in include/asm-generic/bitops/builtin-fls.h and
include/asm-generic/bitops/fls64.h, but GCC output is not optimal:

00000388 <testfls>:
 388:   2c 03 00 00     cmpwi   r3,0
 38c:   41 82 00 10     beq     39c <testfls+0x14>
 390:   7c 63 00 34     cntlzw  r3,r3
 394:   20 63 00 20     subfic  r3,r3,32
 398:   4e 80 00 20     blr
 39c:   38 60 00 00     li      r3,0
 3a0:   4e 80 00 20     blr

000003b0 <testfls64>:
 3b0:   2c 03 00 00     cmpwi   r3,0
 3b4:   40 82 00 1c     bne     3d0 <testfls64+0x20>
 3b8:   2f 84 00 00     cmpwi   cr7,r4,0
 3bc:   38 60 00 00     li      r3,0
 3c0:   4d 9e 00 20     beqlr   cr7
 3c4:   7c 83 00 34     cntlzw  r3,r4
 3c8:   20 63 00 20     subfic  r3,r3,32
 3cc:   4e 80 00 20     blr
 3d0:   7c 63 00 34     cntlzw  r3,r3
 3d4:   20 63 00 40     subfic  r3,r3,64
 3d8:   4e 80 00 20     blr

When the input of fls(x) is a constant, just check x for nullity and
return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
directly.

For fls64() on PPC64, do the same but with __builtin_clzll() and
cntlzd instruction. On PPC32, lets take the generic fls64() which
will use our fls(). The result is as expected:

00000388 <testfls>:
 388:   7c 63 00 34     cntlzw  r3,r3
 38c:   20 63 00 20     subfic  r3,r3,32
 390:   4e 80 00 20     blr

000003a0 <testfls64>:
 3a0:   2c 03 00 00     cmpwi   r3,0
 3a4:   40 82 00 10     bne     3b4 <testfls64+0x14>
 3a8:   7c 83 00 34     cntlzw  r3,r4
 3ac:   20 63 00 20     subfic  r3,r3,32
 3b0:   4e 80 00 20     blr
 3b4:   7c 63 00 34     cntlzw  r3,r3
 3b8:   20 63 00 40     subfic  r3,r3,64
 3bc:   4e 80 00 20     blr

Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
3 years agopowerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
Jordan Niethe [Wed, 14 Oct 2020 07:28:37 +0000 (18:28 +1100)]
powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C

The only thing keeping the cpu_setup() and cpu_restore() functions
used in the cputable entries for Power7, Power8, Power9 and Power10 in
assembly was cpu_restore() being called before there was a stack in
generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel
stack for secondaries before cpu_restore()") means that it is now
possible to use C.

Rewrite the functions in C so they are a little bit easier to read.
This is not changing their functionality.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Tweak copyright and authorship notes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com
3 years agopowerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context
Nicholas Piggin [Tue, 17 Nov 2020 13:56:17 +0000 (23:56 +1000)]
powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context

Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR
interrupts when PR KVM is supported") removed KVM guest tests from
interrupts that do not set HV=1, when PR-KVM is not configured.

This is wrong for HV-KVM HPT guest MMIO emulation case which attempts
to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with
the guest MMU context loaded. This can cause host DSI, DSLB interrupts
which must test for KVM guest. Restore this and add a comment.

Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
3 years agopowerpc: Drop -me200 addition to build flags
Michael Ellerman [Mon, 16 Nov 2020 12:09:13 +0000 (23:09 +1100)]
powerpc: Drop -me200 addition to build flags

Currently a build with CONFIG_E200=y will fail with:

  Error: invalid switch -me200
  Error: unrecognized option -me200

Upstream binutils has never supported an -me200 option. Presumably it
was supported at some point by either a fork or Freescale internal
binutils.

We can't support code that we can't even build test, so drop the
addition of -me200 to the build flags, so we can at least build with
CONFIG_E200=y.

Reported-by: NĂ©meth MĂ¡rton <nm127@freemail.hu>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Scott Wood <oss@buserror.net>
Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
3 years agoKVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page
CĂ©dric Le Goater [Thu, 5 Nov 2020 13:47:13 +0000 (14:47 +0100)]
KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page

When accessing the ESB page of a source interrupt, the fault handler
will retrieve the page address from the XIVE interrupt 'xive_irq_data'
structure. If the associated KVM XIVE interrupt is not valid, that is
not allocated at the HW level for some reason, the fault handler will
dereference a NULL pointer leading to the oops below :

  WARNING: CPU: 40 PID: 59101 at arch/powerpc/kvm/book3s_xive_native.c:259 xive_native_esb_fault+0xe4/0x240 [kvm]
  CPU: 40 PID: 59101 Comm: qemu-system-ppc Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-240.el8.ppc64le #1
  NIP:  c00800000e949fac LR: c00000000044b164 CTR: c00800000e949ec8
  REGS: c000001f69617840 TRAP: 0700   Tainted: G        W        --------- -  -  (4.18.0-240.el8.ppc64le)
  MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44044282  XER: 00000000
  CFAR: c00000000044b160 IRQMASK: 0
  GPR00: c00000000044b164 c000001f69617ac0 c00800000e96e000 c000001f69617c10
  GPR04: 05faa2b21e000080 0000000000000000 0000000000000005 ffffffffffffffff
  GPR08: 0000000000000000 0000000000000001 0000000000000000 0000000000000001
  GPR12: c00800000e949ec8 c000001ffffd3400 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c000001f5c065160 c000000001c76f90
  GPR24: c000001f06f20000 c000001f5c065100 0000000000000008 c000001f0eb98c78
  GPR28: c000001dcab40000 c000001dcab403d8 c000001f69617c10 0000000000000011
  NIP [c00800000e949fac] xive_native_esb_fault+0xe4/0x240 [kvm]
  LR [c00000000044b164] __do_fault+0x64/0x220
  Call Trace:
  [c000001f69617ac0] [0000000137a5dc20] 0x137a5dc20 (unreliable)
  [c000001f69617b50] [c00000000044b164] __do_fault+0x64/0x220
  [c000001f69617b90] [c000000000453838] do_fault+0x218/0x930
  [c000001f69617bf0] [c000000000456f50] __handle_mm_fault+0x350/0xdf0
  [c000001f69617cd0] [c000000000457b1c] handle_mm_fault+0x12c/0x310
  [c000001f69617d10] [c00000000007ef44] __do_page_fault+0x264/0xbb0
  [c000001f69617df0] [c00000000007f8c8] do_page_fault+0x38/0xd0
  [c000001f69617e30] [c00000000000a714] handle_page_fault+0x18/0x38
  Instruction dump:
  40c2fff0 7c2004ac 2fa90000 409e0118 73e90001 41820080 e8bd0008 7c2004ac
  7ca90074 39400000 915c0000 7929d182 <0b0900002fa50000 419e0080 e89e0018
  ---[ end trace 66c6ff034c53f64f ]---
  xive-kvm: xive_native_esb_fault: accessing invalid ESB page for source 8 !

Fix that by checking the validity of the KVM XIVE interrupt structure.

Fixes: 6520ca64cde7 ("KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: CĂ©dric Le Goater <clg@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201105134713.656160-1-clg@kaod.org
3 years agopowerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y
Nicholas Piggin [Sat, 14 Nov 2020 11:47:43 +0000 (21:47 +1000)]
powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y

pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs,
which is basically the same as the regular handlers for those
interrupts.

The system reset FWNMI handler did not have a KVM guest test in it,
although it probably should have because the guest can itself run
guests.

Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt
handlers to new style code gen macros") convert the handler faithfully
to avoid a KVM test with a "clever" trick to modify the IKVM_REAL
setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y).
This worked when the KVM test was generated in the interrupt entry
handlers, but a later patch moved the KVM test to the common handler,
and the common handler macro is expanded below the fwnmi entry. This
prevents the KVM test from being generated even for the 0x100 entry
point as well.

The result is NMI IPIs in the host kernel when a guest is running will
use gest registers. This goes particularly badly when an HPT guest is
running and the MMU is set to guest mode.

Remove this trickery and just generate the test always.

Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
4 years agopowerpc/32s: Use relocation offset when setting early hash table
Christophe Leroy [Sat, 7 Nov 2020 09:07:40 +0000 (09:07 +0000)]
powerpc/32s: Use relocation offset when setting early hash table

When calling early_hash_table(), the kernel hasn't been yet
relocated to its linking address, so data must be addressed
with relocation offset.

Add relocation offset to write into Hash in early_hash_table().

Fixes: 69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu
4 years agopowerpc/numa: Fix build when CONFIG_NUMA=n
Scott Cheloha [Thu, 5 Nov 2020 22:30:40 +0000 (16:30 -0600)]
powerpc/numa: Fix build when CONFIG_NUMA=n

Add a non-NUMA definition for of_drconf_to_nid_single() to topology.h
so we have one even if powerpc/mm/numa.c is not compiled. On a
non-NUMA kernel the appropriate node id is always first_online_node.

Fixes: 72cdd117c449 ("pseries/hotplug-memory: hot-add: skip redundant LMB lookup")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201105223040.3612663-1-cheloha@linux.ibm.com
4 years agopowerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry
Christophe Leroy [Mon, 12 Oct 2020 08:54:33 +0000 (08:54 +0000)]
powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry

When _PAGE_ACCESSED is not set, a minor fault is expected.
To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED
into the L2 entry valid bit.

To simplify the processing and reduce the number of instructions in
TLB miss exceptions, manage it as an APG bit and get it next to
_PAGE_GUARDED bit to allow a copy in one go. Then declare the
corresponding groups as handling all accesses as user accesses.
As the PP bits always define user as No Access, it will generate
a fault.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu
4 years agopowerpc/8xx: Always fault when _PAGE_ACCESSED is not set
Christophe Leroy [Mon, 12 Oct 2020 08:54:31 +0000 (08:54 +0000)]
powerpc/8xx: Always fault when _PAGE_ACCESSED is not set

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.

Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.

Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu
4 years agopowerpc/40x: Always fault when _PAGE_ACCESSED is not set
Christophe Leroy [Sat, 10 Oct 2020 15:14:29 +0000 (15:14 +0000)]
powerpc/40x: Always fault when _PAGE_ACCESSED is not set

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu
4 years agopowerpc/603: Always fault when _PAGE_ACCESSED is not set
Christophe Leroy [Sat, 10 Oct 2020 15:14:30 +0000 (15:14 +0000)]
powerpc/603: Always fault when _PAGE_ACCESSED is not set

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu
4 years agopowerpc: Use asm_goto_volatile for put_user()
Michael Ellerman [Wed, 4 Nov 2020 11:17:42 +0000 (22:17 +1100)]
powerpc: Use asm_goto_volatile for put_user()

Andreas reported that commit ee0a49a6870e ("powerpc/uaccess: Switch
__put_user_size_allowed() to __put_user_asm_goto()") broke
CLONE_CHILD_SETTID.

Further inspection showed that the put_user() in schedule_tail() was
missing entirely, the store not emitted by the compiler.

  <.schedule_tail>:
    mflr    r0
    std     r0,16(r1)
    stdu    r1,-112(r1)
    bl      <.finish_task_switch>
    ld      r9,2496(r3)
    cmpdi   cr7,r9,0
    bne     cr7,<.schedule_tail+0x60>
    ld      r3,392(r13)
    ld      r9,1392(r3)
    cmpdi   cr7,r9,0
    beq     cr7,<.schedule_tail+0x3c>
    li      r4,0
    li      r5,0
    bl      <.__task_pid_nr_ns>
    nop
    bl      <.calculate_sigpending>
    nop
    addi    r1,r1,112
    ld      r0,16(r1)
    mtlr    r0
    blr
    nop
    nop
    nop
    bl      <.__balance_callback>
    b       <.schedule_tail+0x1c>

Notice there are no stores other than to the stack. There should be a
stw in there for the store to current->set_child_tid.

This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and
4.9.4), and only when CONFIG_PPC_KUAP is disabled.

When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync()
and mtspr() inlined via allow_user_access() seems to be enough to
avoid the bug.

We already have a macro to work around this (or a similar bug), called
asm_volatile_goto which includes an empty asm block to tickle the
compiler into generating the right code. So use that.

With this applied the code generation looks more like it will work:

  <.schedule_tail>:
    mflr    r0
    std     r31,-8(r1)
    std     r0,16(r1)
    stdu    r1,-144(r1)
    std     r3,112(r1)
    bl      <._mcount>
    nop
    ld      r3,112(r1)
    bl      <.finish_task_switch>
    ld      r9,2624(r3)
    cmpdi   cr7,r9,0
    bne     cr7,<.schedule_tail+0xa0>
    ld      r3,2408(r13)
    ld      r31,1856(r3)
    cmpdi   cr7,r31,0
    beq     cr7,<.schedule_tail+0x80>
    li      r4,0
    li      r5,0
    bl      <.__task_pid_nr_ns>
    nop
    li      r9,-1
    clrldi  r9,r9,12
    cmpld   cr7,r31,r9
    bgt     cr7,<.schedule_tail+0x80>
    lis     r9,16
    rldicr  r9,r9,32,31
    subf    r9,r31,r9
    cmpldi  cr7,r9,3
    ble     cr7,<.schedule_tail+0x80>
    li      r9,0
    stw     r3,0(r31) <-- stw
    nop
    bl      <.calculate_sigpending>
    nop
    addi    r1,r1,144
    ld      r0,16(r1)
    ld      r31,-8(r1)
    mtlr    r0
    blr
    nop
    bl      <.__balance_callback>
    b       <.schedule_tail+0x30>

Fixes: ee0a49a6870e ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201104111742.672142-1-mpe@ellerman.id.au
4 years agopowerpc/64: Set up a kernel stack for secondaries before cpu_restore()
Jordan Niethe [Wed, 14 Oct 2020 07:28:36 +0000 (18:28 +1100)]
powerpc/64: Set up a kernel stack for secondaries before cpu_restore()

Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
is called before a stack has been set up in r1. This was previously fine
as the cpu_restore() functions were implemented in assembly and did not
use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
device tree binding for discovering CPU features") used
__restore_cpu_cpufeatures() as the cpu_restore() function for a
device-tree features based cputable entry. This is a C function and
hence uses a stack in r1.

generic_secondary_smp_init() is entered on the secondary cpus via the
primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
thread has its own stack. The OPAL call is ran in the primary's hardware
thread. During the call, a job is scheduled on a secondary cpu that will
start executing at the address of generic_secondary_smp_init().  Hence
the value that will be left in r1 when the secondary cpu enters the
kernel is part of that secondary cpu's individual OPAL stack. This means
that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
not horribly bad as each hardware thread has its own stack and the call
that enters the kernel from OPAL never returns, but it is still wrong
and should be corrected.

Create the temp kernel stack before calling cpu_restore().

As noted by mpe, for a kexec boot, the secondary CPUs are released from
the spin loop at address 0x60 by smp_release_cpus() and then jump to
generic_secondary_smp_init(). The call to smp_release_cpus() is in
setup_arch(), and it comes before the call to emergency_stack_init().
emergency_stack_init() allocates an emergency stack in the PACA for each
CPU.  This address in the PACA is what is used to set up the temp kernel
stack in generic_secondary_smp_init(). Move releasing the secondary CPUs
to after the PACAs have been allocated an emergency stack, otherwise the
PACA stack pointer will contain garbage and hence the temp kernel stack
created from it will be broken.

Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201014072837.24539-1-jniethe5@gmail.com
4 years agopowerpc/smp: Call rcu_cpu_starting() earlier
Qian Cai [Wed, 28 Oct 2020 18:23:34 +0000 (14:23 -0400)]
powerpc/smp: Call rcu_cpu_starting() earlier

The call to rcu_cpu_starting() in start_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows (with CONFIG_PROVE_RCU_LIST=y):

  WARNING: suspicious RCU usage
  -----------------------------
  kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  RCU used illegally from offline CPU!
  rcu_scheduler_active = 1, debug_locks = 1
  no locks held by swapper/1/0.

  Call Trace:
  dump_stack+0xec/0x144 (unreliable)
  lockdep_rcu_suspicious+0x128/0x14c
  __lock_acquire+0x1060/0x1c60
  lock_acquire+0x140/0x5f0
  _raw_spin_lock_irqsave+0x64/0xb0
  clockevents_register_device+0x74/0x270
  register_decrementer_clockevent+0x94/0x110
  start_secondary+0x134/0x800
  start_secondary_prolog+0x10/0x14

This is avoided by adding a call to rcu_cpu_starting() near the
beginning of the start_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.

It's safe to call rcu_cpu_starting() in the arch code as well as later
in generic code, as explained by Paul:

  It uses a per-CPU variable so that RCU pays attention only to the
  first call to rcu_cpu_starting() if there is more than one of them.
  This is even intentional, due to there being a generic
  arch-independent call to rcu_cpu_starting() in
  notify_cpu_starting().

  So multiple calls to rcu_cpu_starting() are fine by design.

Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
[mpe: Add Fixes tag, reword slightly & expand change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com
4 years agopowerpc/eeh_cache: Fix a possible debugfs deadlock
Qian Cai [Wed, 28 Oct 2020 15:27:17 +0000 (11:27 -0400)]
powerpc/eeh_cache: Fix a possible debugfs deadlock

Lockdep complains that a possible deadlock below in
eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
disabled. Let's just make eeh_addr_cache_show() acquire the lock with
IRQ disabled as well.

        CPU0                    CPU1
        ----                    ----
   lock(&pci_io_addr_cache_root.piar_lock);
                                local_irq_disable();
                                lock(&tp->lock);
                                lock(&pci_io_addr_cache_root.piar_lock);
   <Interrupt>
     lock(&tp->lock);

  *** DEADLOCK ***

  lock_acquire+0x140/0x5f0
  _raw_spin_lock_irqsave+0x64/0xb0
  eeh_addr_cache_insert_dev+0x48/0x390
  eeh_probe_device+0xb8/0x1a0
  pnv_pcibios_bus_add_device+0x3c/0x80
  pcibios_bus_add_device+0x118/0x290
  pci_bus_add_device+0x28/0xe0
  pci_bus_add_devices+0x54/0xb0
  pcibios_init+0xc4/0x124
  do_one_initcall+0xac/0x528
  kernel_init_freeable+0x35c/0x3fc
  kernel_init+0x24/0x148
  ret_from_kernel_thread+0x5c/0x80

  lock_acquire+0x140/0x5f0
  _raw_spin_lock+0x4c/0x70
  eeh_addr_cache_show+0x38/0x110
  seq_read+0x1a0/0x660
  vfs_read+0xc8/0x1f0
  ksys_read+0x74/0x130
  system_call_exception+0xf8/0x1d0
  system_call_common+0xe8/0x218

Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache")
Signed-off-by: Qian Cai <cai@redhat.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com
4 years agoLinux 5.10-rc2
Linus Torvalds [Sun, 1 Nov 2020 22:43:51 +0000 (14:43 -0800)]
Linux 5.10-rc2

4 years agoMerge tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Nov 2020 19:21:26 +0000 (11:21 -0800)]
Merge tag 'x86-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Three fixes all related to #DB:

   - Handle the BTF bit correctly so it doesn't get lost due to a kernel
     #DB

   - Only clear and set the virtual DR6 value used by ptrace on user
     space triggered #DB. A kernel #DB must leave it alone to ensure
     data consistency for ptrace.

   - Make the bitmasking of the virtual DR6 storage correct so it does
     not lose DR_STEP"

* tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6)
  x86/debug: Only clear/set ->virtual_dr6 for userspace #DB
  x86/debug: Fix BTF handling

4 years agoMerge tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 1 Nov 2020 19:13:45 +0000 (11:13 -0800)]
Merge tag 'timers-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "A few fixes for timers/timekeeping:

   - Prevent undefined behaviour in the timespec64_to_ns() conversion
     which is used for converting user supplied time input to
     nanoseconds. It lacked overflow protection.

   - Mark sched_clock_read_begin/retry() to prevent recursion in the
     tracer

   - Remove unused debug functions in the hrtimer and timerlist code"

* tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Prevent undefined behaviour in timespec64_to_ns()
  timers: Remove unused inline funtion debug_timer_free()
  hrtimer: Remove unused inline function debug_hrtimer_free()
  time/sched_clock: Mark sched_clock_read_begin/retry() as notrace

4 years agoMerge tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Nov 2020 19:11:38 +0000 (11:11 -0800)]
Merge tag 'smp-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull smp fix from Thomas Gleixner:
 "A single fix for stop machine.

  Mark functions no trace to prevent a crash caused by recursion when
  enabling or disabling a tracer on RISC-V (probably all architectures
  which patch through stop machine)"

* tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stop_machine, rcu: Mark functions as notrace

4 years agoMerge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 1 Nov 2020 19:08:17 +0000 (11:08 -0800)]
Merge tag 'locking-urgent-2020-11-01' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "A couple of locking fixes:

   - Fix incorrect failure injection handling in the fuxtex code

   - Prevent a preemption warning in lockdep when tracking
     local_irq_enable() and interrupts are already enabled

   - Remove more raw_cpu_read() usage from lockdep which causes state
     corruption on !X86 architectures.

   - Make the nr_unused_locks accounting in lockdep correct again"

* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix nr_unused_locks accounting
  locking/lockdep: Remove more raw_cpu_read() usage
  futex: Fix incorrect should_fail_futex() handling
  lockdep: Fix preemption WARN for spurious IRQ-enable

4 years agoMerge tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 1 Nov 2020 18:05:16 +0000 (10:05 -0800)]
Merge tag 'char-misc-5.10-rc2' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes/removals from Greg KH:
 "Here's some small fixes for 5.10-rc2 and a big driver removal.

  The fixes are for some reported issues in the interconnect and
  coresight drivers, nothing major.

  The "big" driver removal is the MIC drivers have been asked to be
  removed as the hardware never shipped and Intel no longer wants to
  maintain something that no one can use. This is welcomed by many as
  the DMA usage of these drivers was "interesting" and the security
  people were starting to question some issues that were starting to be
  found in the codebase.

  Note, one of the subsystems for this driver, the "VOP" code, will
  probably come back in future kernel versions as it was looking to
  potentially solve some PCIe virtualization issues that a number of
  other vendors were wanting to solve. But as-is, this codebase didn't
  work for anyone else so no actual functionality is being removed.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  coresight: cti: Initialize dynamic sysfs attributes
  coresight: Fix uninitialised pointer bug in etm_setup_aux()
  coresight: add module license
  misc: mic: remove the MIC drivers
  interconnect: qcom: use icc_sync state for sm8[12]50
  interconnect: qcom: Ensure that the floor bandwidth value is enforced
  interconnect: qcom: sc7180: Init BCMs before creating the nodes
  interconnect: qcom: sdm845: Init BCMs before creating the nodes
  interconnect: Aggregate before setting initial bandwidth
  interconnect: qcom: sdm845: Enable keepalive for the MM1 BCM

4 years agoMerge tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Nov 2020 17:59:13 +0000 (09:59 -0800)]
Merge tag 'driver-core-5.10-rc2' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core and documentation fixes from Greg KH:
 "Here is one tiny debugfs change to fix up an API where the last user
  was successfully fixed up in 5.10-rc1 (so it couldn't be merged
  earlier), and a much larger Documentation/ABI/ update to the files so
  they can be automatically parsed by our tools.

  The Documentation/ABI/ updates are just formatting issues, small ones
  to bring the files into parsable format, and have been acked by
  numerous subsystem maintainers and the documentation maintainer. I
  figured it was good to get this into 5.10-rc2 to help wih the merge
  issues that would arise if these were to stick in linux-next until
  5.11-rc1.

  The debugfs change has been in linux-next for a long time, and the
  Documentation updates only for the last linux-next release"

* tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (40 commits)
  scripts: get_abi.pl: assume ReST format by default
  docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication
  docs: ABI: sysfs-class-backlight: unify ABI documentation
  docs: ABI: sysfs-c2port: remove a duplicated entry
  docs: ABI: sysfs-class-power: unify duplicated properties
  docs: ABI: unify /sys/class/leds/<led>/brightness documentation
  docs: ABI: stable: remove a duplicated documentation
  docs: ABI: change read/write attributes
  docs: ABI: cleanup several ABI documents
  docs: ABI: sysfs-bus-nvdimm: use the right format for ABI
  docs: ABI: vdso: use the right format for ABI
  docs: ABI: fix syntax to be parsed using ReST notation
  docs: ABI: convert testing/configfs-acpi to ReST
  docs: Kconfig/Makefile: add a check for broken ABI files
  docs: abi-testing.rst: enable --rst-sources when building docs
  docs: ABI: don't escape ReST-incompatible chars from obsolete and removed
  docs: ABI: create a 2-depth index for ABI
  docs: ABI: make it parse ABI/stable as ReST-compatible files
  docs: ABI: sysfs-uevent: make it compatible with ReST output
  docs: ABI: testing: make the files compatible with ReST output
  ...

4 years agoMerge tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 1 Nov 2020 17:57:24 +0000 (09:57 -0800)]
Merge tag 'staging-5.10-rc2' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some small staging driver fixes for issues that have been
  reported in 5.10-rc1:

   - octeon driver fixes

   - wfx driver fixes

   - memory leak fix in vchiq driver

   - fieldbus driver bugfix

   - comedi driver bugfix

  All of these have been in linux-next with no reported issues"

* tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: fieldbus: anybuss: jump to correct label in an error path
  staging: wfx: fix test on return value of gpiod_get_value()
  staging: wfx: fix use of uninitialized pointer
  staging: mmal-vchiq: Fix memory leak for vchiq_instance
  staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
  staging: octeon: Drop on uncorrectable alignment or FCS error
  staging: octeon: repair "fixed-link" support

4 years agoMerge tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 1 Nov 2020 17:55:36 +0000 (09:55 -0800)]
Merge tag 'tty-5.10-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small TTY and Serial driver fixes for reported issues
  for 5.10-rc2. They include:

   - vt ioctl bugfix for reported problems

   - fsl_lpuart serial driver fix

   - 21285 serial driver bugfix

  All have been in linux-next with no reported issues"

* tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  vt_ioctl: fix GIO_UNIMAP regression
  vt: keyboard, extend func_buf_lock to readers
  vt: keyboard, simplify vt_kdgkbsent
  tty: serial: fsl_lpuart: LS1021A has a FIFO size of 16 words, like LS1028A
  tty: serial: 21285: fix lockup on open

4 years agoMerge tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 1 Nov 2020 17:53:38 +0000 (09:53 -0800)]
Merge tag 'usb-5.10-rc2' of git://git./linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are a number of small bugfixes for reported issues in some USB
  drivers. They include:

   - typec bugfixes

   - xhci bugfixes and lockdep warning fixes

   - cdc-acm driver regression fix

   - kernel doc fixes

   - cdns3 driver bugfixes for a bunch of reported issues

   - other tiny USB driver fixes

  All have been in linux-next with no reported issues"

* tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdns3: gadget: own the lock wrongly at the suspend routine
  usb: cdns3: Fix on-chip memory overflow issue
  usb: cdns3: gadget: suspicious implicit sign extension
  xhci: Don't create stream debugfs files with spinlock held.
  usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC
  xhci: Fix sizeof() mismatch
  usb: typec: stusb160x: fix signedness comparison issue with enum variables
  usb: typec: add missing MODULE_DEVICE_TABLE() to stusb160x
  USB: apple-mfi-fastcharge: don't probe unhandled devices
  usbcore: Check both id_table and match() when both available
  usb: host: ehci-tegra: Fix error handling in tegra_ehci_probe()
  usb: typec: stusb160x: fix an IS_ERR() vs NULL check in probe
  usb: typec: tcpm: reset hard_reset_count for any disconnect
  usb: cdc-acm: fix cooldown mechanism
  usb: host: fsl-mph-dr-of: check return of dma_set_mask()
  usb: fix kernel-doc markups
  usb: typec: stusb160x: fix some signedness bugs
  usb: cdns3: Variable 'length' set but not used

4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 1 Nov 2020 17:43:32 +0000 (09:43 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:
   - selftest fix
   - force PTE mapping on device pages provided via VFIO
   - fix detection of cacheable mapping at S2
   - fallback to PMD/PTE mappings for composite huge pages
   - fix accounting of Stage-2 PGD allocation
   - fix AArch32 handling of some of the debug registers
   - simplify host HYP entry
   - fix stray pointer conversion on nVHE TLB invalidation
   - fix initialization of the nVHE code
   - simplify handling of capabilities exposed to HYP
   - nuke VCPUs caught using a forbidden AArch32 EL0

  x86:
   - new nested virtualization selftest
   - miscellaneous fixes
   - make W=1 fixes
   - reserve new CPUID bit in the KVM leaves"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: vmx: remove unused variable
  KVM: selftests: Don't require THP to run tests
  KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
  KVM: selftests: test behavior of unmapped L2 APIC-access address
  KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()
  KVM: x86: replace static const variables with macros
  KVM: arm64: Handle Asymmetric AArch32 systems
  arm64: cpufeature: upgrade hyp caps to final
  arm64: cpufeature: reorder cpus_have_{const, final}_cap()
  KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()
  KVM: arm64: Force PTE mapping on fault resulting in a device mapping
  KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
  KVM: arm64: Fix masks in stage2_pte_cacheable()
  KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
  KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
  KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition
  KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation
  KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call
  x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID

4 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Sat, 31 Oct 2020 21:41:48 +0000 (14:41 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull vhost fixes from Michael Tsirkin:
 "Fixes all over the place.

  A new UAPI is borderline: can also be considered a new feature but
  also seems to be the only way we could come up with to fix addressing
  for userspace - and it seems important to switch to it now before
  userspace making assumptions about addressing ability of devices is
  set in stone"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpasim: allow to assign a MAC address
  vdpasim: fix MAC address configuration
  vdpa: handle irq bypass register failure case
  vdpa_sim: Fix DMA mask
  Revert "vhost-vdpa: fix page pinning leakage in error path"
  vdpa/mlx5: Fix error return in map_direct_mr()
  vhost_vdpa: Return -EFAULT if copy_from_user() fails
  vdpa_sim: implement get_iova_range()
  vhost: vdpa: report iova range
  vdpa: introduce config op to get valid iova range

4 years agoMerge tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 31 Oct 2020 21:31:28 +0000 (14:31 -0700)]
Merge tag 'flexible-array-conversions-5.10-rc2' of git://git./linux/kernel/git/gustavoars/linux

Pull more flexible-array member conversions from Gustavo A. R. Silva:
 "Replace zero-length arrays with flexible-array members"

* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  printk: ringbuffer: Replace zero-length array with flexible-array member
  net/smc: Replace zero-length array with flexible-array member
  net/mlx5: Replace zero-length array with flexible-array member
  mei: hw: Replace zero-length array with flexible-array member
  gve: Replace zero-length array with flexible-array member
  Bluetooth: btintel: Replace zero-length array with flexible-array member
  scsi: target: tcmu: Replace zero-length array with flexible-array member
  ima: Replace zero-length array with flexible-array member
  enetc: Replace zero-length array with flexible-array member
  fs: Replace zero-length array with flexible-array member
  Bluetooth: Replace zero-length array with flexible-array member
  params: Replace zero-length array with flexible-array member
  tracepoint: Replace zero-length array with flexible-array member
  platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
  platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
  mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
  dmaengine: ti-cppi5: Replace zero-length array with flexible-array member

4 years agoMerge tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sat, 31 Oct 2020 19:25:58 +0000 (12:25 -0700)]
Merge tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Fix an integer overflow on 32-bit platforms in the new DMA range code
  (Geert Uytterhoeven)"

* tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n

4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 31 Oct 2020 19:21:04 +0000 (12:21 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four driver fixes and one core fix.

  The core fix closes a race window where we could kick off a second
  asynchronous scan because the test and set of the variable preventing
  it isn't atomic"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: hisi_sas: Stop using queue #0 always for v2 hw
  scsi: ibmvscsi: Fix potential race after loss of transport
  scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
  scsi: qla2xxx: Return EBUSY on fcport deletion
  scsi: core: Don't start concurrent async scan on same host

4 years agoKVM: vmx: remove unused variable
Paolo Bonzini [Sat, 31 Oct 2020 15:38:09 +0000 (11:38 -0400)]
KVM: vmx: remove unused variable

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: selftests: Don't require THP to run tests
Andrew Jones [Thu, 29 Oct 2020 20:17:00 +0000 (21:17 +0100)]
KVM: selftests: Don't require THP to run tests

Unless we want to test with THP, then we shouldn't require it to be
configured by the host kernel. Unfortunately, even advising with
MADV_NOHUGEPAGE does require it, so check for THP first in order
to avoid madvise failing with EINVAL.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201029201703.102716-2-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
Vitaly Kuznetsov [Wed, 14 Oct 2020 14:33:46 +0000 (16:33 +0200)]
KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again

It was noticed that evmcs_sanitize_exec_ctrls() is not being executed
nowadays despite the code checking 'enable_evmcs' static key looking
correct. Turns out, static key magic doesn't work in '__init' section
(and it is unclear when things changed) but setup_vmcs_config() is called
only once per CPU so we don't really need it to. Switch to checking
'enlightened_vmcs' instead, it is supposed to be in sync with
'enable_evmcs'.

Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded
extra newline from it.

Reported-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201014143346.2430936-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>