OSDN Git Service

tomoyo/tomoyo-test1.git
4 years agoperf tools: Propagate CFLAGS to libperf
Jiri Olsa [Fri, 11 Oct 2019 12:21:55 +0000 (14:21 +0200)]
perf tools: Propagate CFLAGS to libperf

Andi reported that 'make DEBUG=1' does not propagate to the libbperf
code. It's true also for the other flags. Changing the code to propagate
the global build flags to libperf compilation.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191011122155.15738-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_evlist__filter_pollfd() from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:34 +0000 (14:53 +0200)]
libperf: Adopt perf_evlist__filter_pollfd() from tools/perf

Introduce the perf_evlist__filter_pollfd function and export it in the
perf/evlist.h header, so that libperf users can check if the descriptor
is still alive.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-27-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Introduce perf_evlist__purge()
Jiri Olsa [Mon, 7 Oct 2019 12:53:33 +0000 (14:53 +0200)]
libperf: Introduce perf_evlist__purge()

Add a static perf_evlist__purge() function to purge evsels from a evlist.

Add also perf_evlist__for_each_entry_safe() which is used by
perf_evlist__purge().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-26-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Introduce perf_evlist__exit()
Jiri Olsa [Mon, 7 Oct 2019 12:53:32 +0000 (14:53 +0200)]
libperf: Introduce perf_evlist__exit()

Add the perf_evlist__exit() function, so far it's not exported and added
only for internal use for perf and libperf.

USe it to release cpus/threads and pollfd array.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-25-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Move the pollfd allocation from tools/perf to libperf
Jiri Olsa [Mon, 7 Oct 2019 12:53:31 +0000 (14:53 +0200)]
libperf: Move the pollfd allocation from tools/perf to libperf

It's needed in libperf only, so move it to the perf_evlist__mmap_ops()
function.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-24-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Centralize map refcnt setting
Jiri Olsa [Mon, 7 Oct 2019 12:53:30 +0000 (14:53 +0200)]
libperf: Centralize map refcnt setting

Currently when a new map is mmapped we set its refcnt to 2 in the
perf_evlist_mmap_ops::mmap callback.

Every mmap gets its refcnt set to 2 when it's first mmaped:

  - 1 for the current user, which will be taken out by a call to
    perf_evlist__munmap_filtered(), where we find out there's
    no more data comming from kernel to this mmap.

  - 1 for the drain code where in perf_mmap__consume() the mmap
    is released if it is empty.

Move this common setup into libperf's generic code before the mmap
callback is called.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-23-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf evlist: Switch to libperf's mmap interface
Jiri Olsa [Mon, 7 Oct 2019 12:53:29 +0000 (14:53 +0200)]
perf evlist: Switch to libperf's mmap interface

Switch to the libperf mmap interface by calling directly
perf_evlist__mmap_ops() and removing perf's evlist__mmap_per_*
functions.

By switching to libperf perf_evlist__mmap() we need to operate over
'struct perf_mmap' in evlist__add_pollfd, so make the related changes
there.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-22-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf evlist: Introduce perf_evlist__mmap_cb_mmap()
Jiri Olsa [Mon, 7 Oct 2019 12:53:28 +0000 (14:53 +0200)]
perf evlist: Introduce perf_evlist__mmap_cb_mmap()

Add the perf_evlist__mmap_cb_mmap() function to call perf specific
mmap__mmap() function during perf_evlist__mmap_ops() call.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-21-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf evlist: Introduce perf_evlist__mmap_cb_get()
Jiri Olsa [Mon, 7 Oct 2019 12:53:27 +0000 (14:53 +0200)]
perf evlist: Introduce perf_evlist__mmap_cb_get()

Add the perf_evlist__mmap_cb_get() function to return 'struct perf_mmap'
object during perf_evlist__mmap_ops() call.

The array of 'struct mmap' is allocated via evlist__alloc_mmap(), in
this callback we simply returns pointer to the base object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Introduce perf_evlist__mmap_cb_idx()
Jiri Olsa [Mon, 7 Oct 2019 12:53:26 +0000 (14:53 +0200)]
perf tools: Introduce perf_evlist__mmap_cb_idx()

Add perf_evlist__mmap_cb_idx function to call auxtrace_mmap_params__set_idx()
on each new index during perf_evlist__mmap_ops call.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-19-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Introduce perf_evlist_mmap_ops::mmap callback
Jiri Olsa [Mon, 7 Oct 2019 12:53:25 +0000 (14:53 +0200)]
libperf: Introduce perf_evlist_mmap_ops::mmap callback

Add the perf_evlist_mmap_ops::mmap callback to be called in
mmap_per_evsel() to actually mmap the map.

Add libperf's perf_evlist__mmap_cb_mmap() function as libperf's mmap
callback.

New mmaped map gets refcount set to 2 in mmap__mmap(), we follow that in
mmap callback. We will move this to common place after we switch to
perf_evlist__mmap().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-18-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Add perf_evlist_mmap_ops::get callback
Jiri Olsa [Mon, 7 Oct 2019 12:53:24 +0000 (14:53 +0200)]
libperf: Add perf_evlist_mmap_ops::get callback

Add the perf_evlist_mmap_ops::get callback to be called in
mmap_per_evsel() to get/allocate the 'struct perf_mmap' object.

Add the libperf's perf_evlist__mmap_cb_get() function as libperf's get
callback.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-17-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Introduce perf_evlist_mmap_ops::idx callback
Jiri Olsa [Mon, 7 Oct 2019 12:53:23 +0000 (14:53 +0200)]
libperf: Introduce perf_evlist_mmap_ops::idx callback

Add the perf_evlist_mmap_ops::idx callback to be called in
mmap_per_cpu() and mmap_per_thread() with current cpu and thread
indexes.

It's used by current aux code, so perf will use this callback to set the
aux index.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Introduce perf_evlist__mmap_ops()
Jiri Olsa [Mon, 7 Oct 2019 12:53:22 +0000 (14:53 +0200)]
libperf: Introduce perf_evlist__mmap_ops()

To be able to pass specific callbacks to evlist's mmap.

There will be a specific call to this function from perf's
evlist__mmap() and libperf's perf_evlist__mmap() functions in following
changes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-15-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
4 years agolibperf: Adopt perf_evlist__mmap()/munmap() from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:21 +0000 (14:53 +0200)]
libperf: Adopt perf_evlist__mmap()/munmap() from tools/perf

Add libperf's version of perf_evlist__mmap()/munmap() functions and
exporting them in the perf/evlist.h header.

It's the backbone of what we have in perf code. The following changes
will add needed callbacks and then we'll finally switch the perf code to
use libperf's version.

Add mmap/mmap_ovw 'struct perf_mmap' object arrays to hold maps for
libperf's evlist.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-14-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__read_event() from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:20 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__read_event() from tools/perf

Move perf_mmap__read_event() from tools/perf to libperf and export it in
the perf/mmap.h header.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-13-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__read_done() from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:19 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__read_done() from tools/perf

Move perf_mmap__read_init() from tools/perf to libperf and export it in
the perf/mmap.h header.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-12-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__read_init() from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:18 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__read_init() from tools/perf

Move perf_mmap__read_init() from tools/perf to libperf and export it in
perf/mmap.h header.

And add pr_debug2()/pr_debug3() macros support, because the code is
using them.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__consume() function from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:17 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__consume() function from tools/perf

Move perf_mmap__consume() vrom tools/perf to libperf and export it in
the perf/mmap.h header.

Move also the needed helpers perf_mmap__write_tail(),
perf_mmap__read_head() and perf_mmap__empty().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Use perf_mmap way to detect aux mmap
Jiri Olsa [Mon, 7 Oct 2019 12:53:16 +0000 (14:53 +0200)]
perf tools: Use perf_mmap way to detect aux mmap

We will move this code to libperf shortly, so we need to free it of
'struct auxtrace_mmap' usage, because it won't be available in libperf
(for now).

The perf_event_mmap_page::aux_size is set when the aux mmap is mapped,
so the check is equivalent.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__put() function from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:15 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__put() function from tools/perf

Move perf_mmap__put() from tools/perf to libperf.

Once perf_mmap__put() is moved, we need a way to call application
related unmap code (AIO and aux related code for eprf), when the map
goes away.

Add the perf_mmap::unmap callback to do that.

The unmap path from perf is:

  perf_mmap__put                           (libperf)
    perf_mmap__munmap                      (libperf)
      map->unmap_cb -> perf_mmap__unmap_cb (perf)
        mmap__munmap                       (perf)

Committer notes:

Add missing linux/kernel.h to tools/perf/lib/mmap.c to get the BUG_ON
definition.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__unmap() function from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:14 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__unmap() function from tools/perf

Move perf_mmap__unmap() from tools/perf to libperf, to internal header
internal/mmap.h. It will be used in the following patches. And rename
the existing perf's function to mmap__munmap().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__get() function from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:13 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__get() function from tools/perf

Move perf_mmap__get() from tools/perf to libperf in the internal header
internal/mmap.h.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__mmap() function from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:12 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__mmap() function from tools/perf

Move perf_mmap__mmap() from tools/perf to libperf, it will be used in
the following patches. And rename the existing perf's function to
mmap__mmap().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Adopt perf_mmap__mmap_len() function from tools/perf
Jiri Olsa [Mon, 7 Oct 2019 12:53:11 +0000 (14:53 +0200)]
libperf: Adopt perf_mmap__mmap_len() function from tools/perf

Move perf_mmap__mmap_len() from tools/perf wto libperf, it will be used
in the following patches. And rename the existing perf's function to
mmap__mmap_len().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Add 'struct perf_mmap_param'
Jiri Olsa [Mon, 7 Oct 2019 12:53:10 +0000 (14:53 +0200)]
libperf: Add 'struct perf_mmap_param'

Add libperf's version of mmap params 'struct perf_mmap_param' object
with the basics: 'prot' and 'mask'.  Encapsulate it in the current
'struct mmap_params' object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf: Add perf_mmap__init() function
Jiri Olsa [Mon, 7 Oct 2019 12:53:09 +0000 (14:53 +0200)]
libperf: Add perf_mmap__init() function

Add perf_mmap__init() function to initialize 'struct perf_mmap' objects.

Add it to a new mmap.c source file, that will carry all the mmap related
functions.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMAINTAINERS: Add entry for perf tool arm64 pmu-events files
John Garry [Wed, 9 Oct 2019 08:54:33 +0000 (16:54 +0800)]
MAINTAINERS: Add entry for perf tool arm64 pmu-events files

Will and I have an interest in reviewing the pmu-events changes related
to arm64, so add a specific entry for this.

Signed-off-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: linuxarm@huawei.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/1570611273-108281-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Avoid 'sample_reg_masks' being const + weak
Ian Rogers [Tue, 1 Oct 2019 00:36:23 +0000 (17:36 -0700)]
perf tools: Avoid 'sample_reg_masks' being const + weak

Being const + weak breaks with some compilers that constant-propagate
from the weak symbol. This behavior is outside of the specification, but
in LLVM is chosen to match GCC's behavior.

LLVM's implementation was set in this patch:

  https://github.com/llvm/llvm-project/commit/f49573d1eedcf1e44893d5a062ac1b72c8419646

A const + weak symbol is set to be weak_odr:

  https://llvm.org/docs/LangRef.html

ODR is one definition rule, and given there is one constant definition
constant-propagation is possible. It is possible to get this code to
miscompile with LLVM when applying link time optimization. As compilers
become more aggressive, this is likely to break in more instances.

Move the definition of sample_reg_masks to the conditional part of
perf_regs.h and guard usage with HAVE_PERF_REGS_SUPPORT. This avoids the
weak symbol.

Fix an issue when HAVE_PERF_REGS_SUPPORT isn't defined from patch v1.
In v3, add perf_regs.c for architectures that HAVE_PERF_REGS_SUPPORT but
don't declare sample_regs_masks.

Further notes:

Jiri asked:

  "Is this just a precaution or you actualy saw some breakage?"

Ian answered:

  "We saw a breakage with clang with thinlto enabled for linking. Our
   compiler team had recently seen, and were surprised by, a similar issue
   and were able to dig out the weak ODR issue."

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Guo Ren <guoren@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-riscv@lists.infradead.org
Cc: Mao Han <han_mao@c-sky.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20191001003623.255186-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf beauty: Introduce strtoul() for x86 MSRs
Arnaldo Carvalho de Melo [Wed, 9 Oct 2019 19:25:02 +0000 (16:25 -0300)]
perf beauty: Introduce strtoul() for x86 MSRs

Continuing from the previous cset comment, now that filter expression
works:

  # perf trace -e msr:* --filter="msr!=FS_BASE && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL" --filter-pids 3750
     0.000 Timer/5033 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
     0.009 Timer/5033 msr:write_msr(msr: LSTAR, val: -1398800368)
     0.010 Timer/5033 msr:write_msr(msr: TSC_AUX, val: 4)
     0.050 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
    45.661 gnome-terminal/12595 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
    45.672 gnome-terminal/12595 msr:write_msr(msr: LSTAR, val: -1398800368)
    45.675 gnome-terminal/12595 msr:write_msr(msr: TSC_AUX, val: 3)
    54.852 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
   130.508 Timer/4050 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
   130.527 Timer/4050 msr:write_msr(msr: LSTAR, val: -1398800368)
   130.531 Timer/4050 msr:write_msr(msr: TSC_AUX, val: 3)
   140.924 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
   164.738 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
   603.578 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
   620.809 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
   690.115 JS Watchdog/4259 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
   690.136 JS Watchdog/4259 msr:write_msr(msr: LSTAR, val: -1398800368)
   690.141 JS Watchdog/4259 msr:write_msr(msr: TSC_AUX, val: 3)
   690.186 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
   759.016 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
^C[root@quaco ~]#

Or look at the first 3 write_msr events for that IA32_TSC_DEADLINE to learn why
it happens so often:

  # perf trace --max-events=3 --max-stack=8 -e msr:* --filter="msr==IA32_TSC_DEADLINE" --filter-pids 3750
     0.000 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 19296732550862)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       lapic_next_deadline ([kernel.kallsyms])
                                       clockevents_program_event ([kernel.kallsyms])
                                       hrtimer_interrupt ([kernel.kallsyms])
                                       smp_apic_timer_interrupt ([kernel.kallsyms])
                                       apic_timer_interrupt ([kernel.kallsyms])
                                       cpuidle_enter_state ([kernel.kallsyms])
    32.646 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 19296800134158)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       lapic_next_deadline ([kernel.kallsyms])
                                       clockevents_program_event ([kernel.kallsyms])
                                       hrtimer_start_range_ns ([kernel.kallsyms])
                                       tick_nohz_restart_sched_tick ([kernel.kallsyms])
                                       tick_nohz_idle_exit ([kernel.kallsyms])
                                       do_idle ([kernel.kallsyms])
    32.802 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 19297507436922)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       lapic_next_deadline ([kernel.kallsyms])
                                       clockevents_program_event ([kernel.kallsyms])
                                       hrtimer_try_to_cancel ([kernel.kallsyms])
                                       hrtimer_cancel ([kernel.kallsyms])
                                       tick_nohz_restart_sched_tick ([kernel.kallsyms])
                                       tick_nohz_idle_exit ([kernel.kallsyms])
  #

And if some of the strings can't be found:

  # trace -e msr:* --filter="msr!=SPECULATIVE_EXECUTION_PROBLEMS_SOLUTION && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL" --filter-pids 3750
  "SPECULATIVE_EXECUTION_PROBLEMS_SOLUTION" not found for "msr" in "msr:read_msr", can't set filter "(msr!=SPECULATIVE_EXECUTION_PROBLEMS_SOLUTION && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL) && (common_pid != 28131 && common_pid != 3750)"
  #

Next step is to automatically wire up the pre-existing strarrays, which there
are quite a few.

The strtoul() methods will be further enhanced to allow for looking at other
arguments in a syscall/tracepoint, just like going from integer to string
(scnprintf methods), so that those "val" lines for the msr tracepoints can be
properly formatted or even resolved into some string.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-4qaai5iqjgefd11k4ddm7qg8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Expand strings in filters to integers
Arnaldo Carvalho de Melo [Wed, 9 Oct 2019 19:22:16 +0000 (16:22 -0300)]
perf trace: Expand strings in filters to integers

So that one can try things like:

  # perf trace -e msr:* --filter="msr!=FS_BASE && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL" --filter-pids 3750

That, at this point in the patchset, without any strtoul in place for
tracepoint arguments, will result in:

  No resolver (strtoul) for "msr" in "msr:read_msr", can't set filter "(msr!=FS_BASE && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL) && (common_pid != 25407 && common_pid != 3750)"
  #

See you in the next cset!

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-dx5j70fv2rgkeezd1cb3hv2p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Introduce a strtoul() method for 'struct strarrays'
Arnaldo Carvalho de Melo [Wed, 9 Oct 2019 19:11:36 +0000 (16:11 -0300)]
perf trace: Introduce a strtoul() method for 'struct strarrays'

And also for 'struct strarray', since its needed to implement
strarrays__strtoul(). This just traverses the entries and when finding a
match, returns (offset + index), i.e. the value associated with the
searched string.

E.g. "EFER" (MSR_EFER) returns:

  # grep -w EFER -B2 /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
  #define x86_64_specific_MSRs_offset 0xc0000080
  static const char *x86_64_specific_MSRs[] = {
[0xc0000080 - x86_64_specific_MSRs_offset] = "EFER",
  #

  0xc0000080

This will be auto-attached to 'struct syscall_arg_fmt' entries
associated with strarrays as soon as we add a ->strarray and ->strarrays
to 'struct syscall_arg_fmt'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-r2hpaahf8lishyb1owko9vs1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Add a strtoul() method to 'struct syscall_arg_fmt'
Arnaldo Carvalho de Melo [Wed, 9 Oct 2019 19:06:43 +0000 (16:06 -0300)]
perf trace: Add a strtoul() method to 'struct syscall_arg_fmt'

This will go from a string to a number, so that filter expressions can
be constructed with strings and then, before applying the tracepoint
filters (or eBPF, in the future) we can map those strings to numbers.

The first one will be for 'msr' tracepoint arguments, but real quickly
we will be able to reuse all strarrays for that.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-wgqq48agcgr95b8dmn6fygtr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Introduce --filter for tracepoint events
Arnaldo Carvalho de Melo [Tue, 8 Oct 2019 10:33:08 +0000 (07:33 -0300)]
perf trace: Introduce --filter for tracepoint events

Similar to what is in 'perf record', works just like there:

  # perf trace -e msr:*
   328.297 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.302 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.306 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.317 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.322 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.327 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.331 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.336 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
   328.340 :0/0 ^Cmsr:write_msr(msr: FS_BASE, val: 140240388381888)
  #

So, for a system wide trace session looking at the write_msr tracepoint
we see a flood of MSR_FS_BASE, we need to get the number for that:

  # grep FS_BASE /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
[0xc0000100 - x86_64_specific_MSRs_offset] = "FS_BASE",
  #

And then use it in a filter:

  # perf trace -e msr:* --filter="msr!=0xc0000100"
  <SNIP>
   942.177 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056931068232)
   942.199 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3057135655252)
   942.203 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056931068222)
   942.231 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056998373022)
   942.241 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056931068236)
  <SNIP>
  #

Ok, lets filter that too, too noisy:

  # grep TSC_DEADLINE /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
[0x000006E0] = "IA32_TSC_DEADLINE",
  #

  # perf trace -e msr:* --filter="msr!=0xc0000100 && msr!=0x6e0" -a sleep 0.1
     0.000 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
     0.066 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
     0.070 CPU 0/KVM/4895 msr:write_msr(msr: 0x830, val: 34359740667)
     0.099 CPU 0/KVM/4895 msr:read_msr(msr: IA32_SYSENTER_ESP, val: -2199021993472)
     0.100 CPU 0/KVM/4895 msr:read_msr(msr: IA32_APICBASE, val: 4276096000)
     0.101 CPU 0/KVM/4895 msr:read_msr(msr: IA32_DEBUGCTLMSR)
     0.109 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL)
     1.000 :0/0 msr:write_msr(msr: 0x830, val: 17179871485)
    18.893 :0/0 msr:write_msr(msr: 0x83f, val: 246)
    28.810 :0/0 msr:write_msr(msr: 0x830, val: 68719479037)
    40.117 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
    40.127 CPU 0/KVM/4895 msr:read_msr(msr: IA32_DEBUGCTLMSR)
    40.139 CPU 0/KVM/4895 msr:write_msr(msr: LSTAR, val: -2130661312)
    40.141 CPU 0/KVM/4895 msr:write_msr(msr: SYSCALL_MASK, val: 14080)
    40.142 CPU 0/KVM/4895 msr:write_msr(msr: TSC_AUX)
    40.144 CPU 0/KVM/4895 msr:write_msr(msr: KERNEL_GS_BASE)
    40.147 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL)
    40.148 CPU 0/KVM/4895 msr:write_msr(msr: IA32_FLUSH_CMD, val: 1)
    40.151 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
  ^C
  #

One can combine that with filtering pids as well:

  # perf trace -e msr:* --filter="msr!=0xc0000100 && msr!=0x6e0" --filter-pids 4895 -a sleep 0.09
     0.000 :0/0 msr:write_msr(msr: 0x830, val: 4294969597)
     0.291 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
     0.294 gnome-terminal/2790 msr:write_msr(msr: LSTAR, val: -1935671280)
     0.295 gnome-terminal/2790 msr:write_msr(msr: TSC_AUX, val: 6)
    10.940 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
    15.943 gnome-shell/2096 msr:write_msr(msr: 0x830, val: 4294969597)
    16.975 :0/0 msr:write_msr(msr: 0x830, val: 4294969597)
    19.560 :0/0 msr:write_msr(msr: 0x83f, val: 246)
    25.162 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
    25.807 JS Watchdog/3635 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
    25.820 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL)
    25.941 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
    26.941 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
    29.942 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
    45.313 :0/0 msr:write_msr(msr: 0x83f, val: 246)
    56.945 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
    60.946 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
    74.096 JS Watchdog/8971 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
    74.130 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL)
    79.673 :0/0 msr:write_msr(msr: 0x83f, val: 246)
    79.947 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 17179871485)
  #

Or for just a pid, with callchains:

  # grep SYSCALL_MAS /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
[0xc0000084 - x86_64_specific_MSRs_offset] = "SYSCALL_MASK",
  # perf trace -e msr:* --filter="msr==0xc0000084" --pid 2790 --call-graph=dwarf

     0.000 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       kvm_on_user_return ([kvm])
                                       fire_user_return_notifiers ([kernel.kallsyms])
                                       exit_to_usermode_loop ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64 ([kernel.kallsyms])
                                       __GI___poll (inlined)
  9299.073 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       kvm_on_user_return ([kvm])
                                       fire_user_return_notifiers ([kernel.kallsyms])
                                       exit_to_usermode_loop ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64 ([kernel.kallsyms])
                                       __GI___poll (inlined)
  9348.374 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       kvm_on_user_return ([kvm])
                                       fire_user_return_notifiers ([kernel.kallsyms])
                                       exit_to_usermode_loop ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64 ([kernel.kallsyms])
                                       __GI___poll (inlined)
  <SNIP>
  #

Ok, just another form of KVM to emit MSRs :-)

Next step: elliminate those greps by getting the filter expression,
looking for arg names, then for the arrays associated with it to do a
reverse lookup.

Also allow those filters to be associated with strace-like syscall
names.

After that: augment the 'val' arg for 'msr:write_msr' based on the first
arg, 'msr'.

Then, do that with eBPF too, not just with tracepoint filters.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-95bfe5d4tzy5f66bx49d05rj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf evlist: Introduce append_tp_filter_pid() and append_tp_filter_pids()
Arnaldo Carvalho de Melo [Mon, 7 Oct 2019 20:00:34 +0000 (17:00 -0300)]
perf evlist: Introduce append_tp_filter_pid() and append_tp_filter_pids()

We'll need this to support 'perf trace e tracepoint --filter=expr', as
the command line tracepoint filter is attache to the preceding evsel,
just like in 'perf record' and when we go to set pid filters, which we
do at the minimum to filter 'perf trace' own syscalls, we need to
append, not set the tp filter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-daynpknni44ywuzi8iua57nn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf evlist: Introduce append_tp_filter() method
Arnaldo Carvalho de Melo [Mon, 7 Oct 2019 19:52:17 +0000 (16:52 -0300)]
perf evlist: Introduce append_tp_filter() method

Will be used by 'perf trace' to support 'perf trace --filter', we need
to append to any pre-existing filter.

When parse_filter() gets invoked to process --filter, it'll set the
filter to that specified on the command line, later on, when we filter
out 'perf trace' own pid to avoid an event feedback loop, we need to
preserve the command line filter put in place by parse_filter().

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-h9rot08qmxlnfmte0holt68x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf evlist: Factor out asprintf routine to build a tracepoint pid filter
Arnaldo Carvalho de Melo [Mon, 7 Oct 2019 19:43:03 +0000 (16:43 -0300)]
perf evlist: Factor out asprintf routine to build a tracepoint pid filter

Will be used to append such lists to existing filters.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-798vlyqfqw938ehoe8etivx1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Associate the "msr" tracepoint arg name with x86_MSR__scnprintf()
Arnaldo Carvalho de Melo [Mon, 7 Oct 2019 18:54:51 +0000 (15:54 -0300)]
perf trace: Associate the "msr" tracepoint arg name with x86_MSR__scnprintf()

So that we can go from:

  # perf trace -e msr:write_msr --max-stack=16 sleep 1
       0.000 sleep/6740 msr:write_msr(msr: 3221225728, val: 139636317451648)
                                         do_trace_write_msr ([kernel.kallsyms])
                                         do_trace_write_msr ([kernel.kallsyms])
                                         do_arch_prctl_64 ([kernel.kallsyms])
                                         __x64_sys_arch_prctl ([kernel.kallsyms])
                                         do_syscall_64 ([kernel.kallsyms])
                                         entry_SYSCALL_64 ([kernel.kallsyms])
                                         init_tls (/usr/lib64/ld-2.29.so)
                                         dl_main (/usr/lib64/ld-2.29.so)
                                         _dl_sysdep_start (/usr/lib64/ld-2.29.so)
                                         _dl_start (/usr/lib64/ld-2.29.so)
  #

To:

  # perf trace -e msr:write_msr --max-stack=16 sleep 1
     0.000 sleep/8519 msr:write_msr(msr: FS_BASE, val: 139878031705472)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_arch_prctl_64 ([kernel.kallsyms])
                                       __x64_sys_arch_prctl ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64 ([kernel.kallsyms])
                                       init_tls (/usr/lib64/ld-2.29.so)
                                       dl_main (/usr/lib64/ld-2.29.so)
                                       _dl_sysdep_start (/usr/lib64/ld-2.29.so)
                                       _dl_start (/usr/lib64/ld-2.29.so)
  #

This, in reverse, will allow for symbolic system call/tracepoint
filtering.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-q1q4unmqja5ex7dy0kb5cjaa@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace beauty: Add the glue for the autogenerated MSR arrays
Arnaldo Carvalho de Melo [Mon, 7 Oct 2019 18:52:19 +0000 (15:52 -0300)]
perf trace beauty: Add the glue for the autogenerated MSR arrays

We need to wrap those autogenerated string arrays with the
strarrays__scnprintf() formatter, do it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-wqjz4kwi4a0ot6lsis3kc65j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Allow associating scnprintf routines with well known arg names
Arnaldo Carvalho de Melo [Mon, 7 Oct 2019 18:50:15 +0000 (15:50 -0300)]
perf trace: Allow associating scnprintf routines with well known arg names

For instance 'msr' appears in several tracepoints, so we can associate
it with a single scnprintf() routine auto-generated from kernel headers,
as will be done in followup patches.

Start with an empty array of associations.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-89ptht6s5fez82lykuwq1eyb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf beauty: Hook up the x86 MSR table generator
Arnaldo Carvalho de Melo [Thu, 26 Sep 2019 18:47:16 +0000 (15:47 -0300)]
perf beauty: Hook up the x86 MSR table generator

This way we generate the source with the table for later use by plugins,
etc.

I.e. after running:

  $ make -C tools/perf O=/tmp/build/perf

We end up with:

  $ head /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
  static const char *x86_MSRs[] = {
   [0x00000000] = "IA32_P5_MC_ADDR",
   [0x00000001] = "IA32_P5_MC_TYPE",
   [0x00000010] = "IA32_TSC",
   [0x00000017] = "IA32_PLATFORM_ID",
   [0x0000001b] = "IA32_APICBASE",
   [0x00000020] = "KNC_PERFCTR0",
   [0x00000021] = "KNC_PERFCTR1",
   [0x00000028] = "KNC_EVNTSEL0",
   [0x00000029] = "KNC_EVNTSEL1",
  $

Now its just a matter of using it, first in a libtracevent plugin.

At some point we should move tools/perf/trace/beauty to tools/beauty/,
so that it can be used more generally and even made available externally
like libbpf, libperf, libtraevent, etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-b3rmutg4igcohx6kpo67qh4j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace beauty: Add a x86 MSR cmd id->str table generator
Arnaldo Carvalho de Melo [Thu, 26 Sep 2019 18:28:02 +0000 (15:28 -0300)]
perf trace beauty: Add a x86 MSR cmd id->str table generator

Without parameters it'll parse tools/arch/x86/include/asm/msr-index.h
and output a table usable by tools, that will be wired up later to a
libtraceevent plugin registered from perf's glue code:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh
  static const char *x86_MSRs[] = {
 <SNIP>
   [0x00000034] = "SMI_COUNT",
   [0x0000003a] = "IA32_FEATURE_CONTROL",
   [0x0000003b] = "IA32_TSC_ADJUST",
   [0x00000040] = "LBR_CORE_FROM",
   [0x00000048] = "IA32_SPEC_CTRL",
   [0x00000049] = "IA32_PRED_CMD",
 <SNIP>
   [0x0000010b] = "IA32_FLUSH_CMD",
   [0x0000010F] = "TSX_FORCE_ABORT",
 <SNIP>
   [0x00000198] = "IA32_PERF_STATUS",
   [0x00000199] = "IA32_PERF_CTL",
  <SNIP>
   [0x00000da0] = "IA32_XSS",
   [0x00000dc0] = "LBR_INFO_0",
   [0x00000ffc] = "IA32_BNDCFGS_RSVD",
  };

  #define x86_64_specific_MSRs_offset 0xc0000080
  static const char *x86_64_specific_MSRs[] = {
   [0xc0000080 - x86_64_specific_MSRs_offset] = "EFER",
   [0xc0000081 - x86_64_specific_MSRs_offset] = "STAR",
   [0xc0000082 - x86_64_specific_MSRs_offset] = "LSTAR",
   [0xc0000083 - x86_64_specific_MSRs_offset] = "CSTAR",
   [0xc0000084 - x86_64_specific_MSRs_offset] = "SYSCALL_MASK",
  <SNIP>
   [0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
   [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
  };

  #define x86_AMD_V_KVM_MSRs_offset 0xc0010000
  static const char *x86_AMD_V_KVM_MSRs[] = {
   [0xc0010000 - x86_AMD_V_KVM_MSRs_offset] = "K7_EVNTSEL0",
  <SNIP>
   [0xc0010114 - x86_AMD_V_KVM_MSRs_offset] = "VM_CR",
   [0xc0010115 - x86_AMD_V_KVM_MSRs_offset] = "VM_IGNNE",
   [0xc0010117 - x86_AMD_V_KVM_MSRs_offset] = "VM_HSAVE_PA",
  <SNIP>
   [0xc0010240 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTL",
   [0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR",
   [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
  };

Then these will in turn be hooked up in a follow up patch to be used by
strarrays__scnprintf().

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ja080xawx08kedez855usnon@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf beauty: Make strarray's offset be u64
Arnaldo Carvalho de Melo [Wed, 9 Oct 2019 14:22:43 +0000 (11:22 -0300)]
perf beauty: Make strarray's offset be u64

We need it for things like MSRs that are sparse and go over MAXINT.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-g8t2d0jr0mg3yimg2qrjkvlt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools arch x86: Grab a copy of the file containing the MSR numbers
Arnaldo Carvalho de Melo [Thu, 26 Sep 2019 18:26:39 +0000 (15:26 -0300)]
tools arch x86: Grab a copy of the file containing the MSR numbers

We'll use it to generate a table and then convert the
msr:{read,write}_msr 'msr' option in things like perf trace, script,
etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-y1f4s0y1s43d4drh7pd2huzn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Allow choosing how to augment the tracepoint arguments
Arnaldo Carvalho de Melo [Fri, 4 Oct 2019 18:28:13 +0000 (15:28 -0300)]
perf trace: Allow choosing how to augment the tracepoint arguments

So far we used the libtraceevent printing routines when showing
tracepoint arguments, but since 'perf trace' has a lot of beautifiers
for syscall arguments, and since some of those can be used to augment
tracepoint arguments, add a routine to make use of those beautifiers
and allow the user to choose which one to use.

The default now is to use the same beautifiers used for the strace-like
sys_enter+sys_exit lines, but the user can choose the libtraceevent ones
by either using the:

    perf trace --libtraceevent_print

command line option, or by setting:

  # cat ~/.perfconfig
  [trace]
tracepoint_beautifiers = libtraceevent

For instance, here are some examples:

  # perf trace -e sched:*switch,*sleep,sched:*wakeup,exit*,sched:*exit sleep 1
       0.000 sched:sched_wakeup(comm: "perf", pid: 5273 (perf), prio: 120, success: 1, target_cpu: 6)
       0.621 nanosleep(rqtp: 0x7ffdd06d1140, rmtp: NULL) ...
       0.628 sched:sched_switch(prev_comm: "sleep", prev_pid: 5273 (sleep), prev_prio: 120, prev_state: 1, next_comm: "swapper/6", next_pid: 0, next_prio: 120)
    1000.879 sched:sched_wakeup(comm: "sleep", pid: 5273 (sleep), prio: 120, success: 1, target_cpu: 6)
       0.621  ... [continued]: nanosleep())          = 0
    1001.026 exit_group(error_code: 0)               = ?
    1001.216 sched:sched_process_exit(comm: "sleep", pid: 5273 (sleep), prio: 120)
  #

And then using libtraceevent, as before:

  # perf trace --libtraceevent_print -e sched:*switch,*sleep,sched:*wakeup,exit*,sched:*exit sleep 1
       0.000 sched:sched_wakeup(comm=perf pid=5288 prio=120 target_cpu=001)
       0.739 nanosleep(rqtp: 0x7ffeba6c2f40, rmtp: NULL) ...
       0.747 sched:sched_switch(prev_comm=sleep prev_pid=5288 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120)
    1000.902 sched:sched_wakeup(comm=sleep pid=5288 prio=120 target_cpu=001)
       0.739  ... [continued]: nanosleep())          = 0
    1001.012 exit_group(error_code: 0)               = ?
  #

The new default allocates an array of 'struct syscall_arg_fmt' for the
tracepoint arguments and, just like with syscall arguments, tries to
find suitable syscall_arg__scnprintf_NAME() routines to augment those
tracepoint arguments based on their type (as in the tracefs "format"
file), or even in their name + type, for instance arguntents with names
ending in "fd" with type "int" get the fd scnprintf beautifier attached,
etc.

Soon this will take advantage of the kernel BTF information to augment
enumerations based on the tracefs "format" type info.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-o8qdluotkcb3b1x2gjqrejcl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Enclose all events argument lists with ()
Arnaldo Carvalho de Melo [Fri, 4 Oct 2019 18:01:30 +0000 (15:01 -0300)]
perf trace: Enclose all events argument lists with ()

So that they look a bit like normal strace-like syscall enter+exit
lines.

They will look even more when we switch from using libtraceevent's
tep_print_event() routine in favour of using all the perf beautifiers
used by the strace-like syscall enter+exit lines.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-y4fcej6v6u1m644nbxd2r4pg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Add array of chars scnprintf beautifier
Arnaldo Carvalho de Melo [Fri, 4 Oct 2019 17:56:40 +0000 (14:56 -0300)]
perf trace: Add array of chars scnprintf beautifier

Needed for sched's traceoints prev/next comm, where, unlike with
syscalls, we are not dealing with an integer or pointer, but an array
straight out from the ring buffer.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-rlll7tmcqe1g4odtaifil5re@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Add the syscall_arg_fmt pointer to syscall_arg
Arnaldo Carvalho de Melo [Fri, 4 Oct 2019 17:52:30 +0000 (14:52 -0300)]
perf trace: Add the syscall_arg_fmt pointer to syscall_arg

So that the scnprintf beautifiers can access it, as will be the case
with the char array one in the following csets, that needs to know
the number of elements in an array.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-01qmjqv6cb1nj1qy4khdexce@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Move some scnprintf methods from syscall to syscall_arg_fmt
Arnaldo Carvalho de Melo [Fri, 4 Oct 2019 14:30:41 +0000 (11:30 -0300)]
perf trace: Move some scnprintf methods from syscall to syscall_arg_fmt

Since all they operate on is on a syscall_arg_fmt instance, so move them
to allow use it from the upcoming tracepoint fprintf routine.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ynttrs1l75f0x9tk67spd7jd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Allocate an array of beautifiers for tracepoint args
Arnaldo Carvalho de Melo [Thu, 3 Oct 2019 19:18:22 +0000 (16:18 -0300)]
perf trace: Allocate an array of beautifiers for tracepoint args

This will work similar to the syscall args, we'll allocate an array
of 'struct syscall_arg_fmt' for the tracepoint args and then init them
using the same algorithm used for the defaults for syscall args, i.e.
using its types and sometimes names as hints to find the right scnprintf
routine to beautify them from numbers into strings.

Next step is to stop using libtracevent to printf tracepoints, as we'll
have more beautifiers than int provides, modulo perhaps some plugins.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-dcl135relxvf6ljisjg13aqg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Factor out the initialization of syscal_arg_fmt->scnprintf
Arnaldo Carvalho de Melo [Thu, 3 Oct 2019 18:57:42 +0000 (15:57 -0300)]
perf trace: Factor out the initialization of syscal_arg_fmt->scnprintf

We set the default scnprint routines for the syscall args based on its
type or on heuristics based on its names, now we'll use this for
tracepoints as well, so move it out of syscall__set_arg_fmts() and into
a routine that receive just an array of syscall_arg_fmt entries + the
tracepoint format fields list.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-xs3x0zzyes06c7scdsjn01ty@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf script: Allow --time with --reltime
Andi Kleen [Wed, 2 Oct 2019 16:46:42 +0000 (09:46 -0700)]
perf script: Allow --time with --reltime

The original --reltime patch forbid --time with --reltime.

But it turns out --time doesn't really care about --reltime, because the
relative time is only used at final output, while the time filtering
always works earlier on absolute time.

So just remove the check and allow combining the two options.

Fixes: 90b10f47c0ee ("perf script: Support relative time")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20191002164642.1719-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agosamples/bpf: fix build by setting HAVE_ATTR_TEST to zero
Björn Töpel [Tue, 1 Oct 2019 11:33:07 +0000 (13:33 +0200)]
samples/bpf: fix build by setting HAVE_ATTR_TEST to zero

To remove that test_attr__{enabled/open} are used by perf-sys.h, we
set HAVE_ATTR_TEST to zero.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: KP Singh <kpsingh@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20191001113307.27796-3-bjorn.topel@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Make usage of test_attr__* optional for perf-sys.h
Björn Töpel [Tue, 1 Oct 2019 11:33:06 +0000 (13:33 +0200)]
perf tools: Make usage of test_attr__* optional for perf-sys.h

For users of perf-sys.h outside perf, e.g. samples/bpf/bpf_load.c, it's
convenient not to depend on test_attr__*.

After commit 91854f9a077e ("perf tools: Move everything related to
sys_perf_event_open() to perf-sys.h"), all users of perf-sys.h will
depend on test_attr__enabled and test_attr__open.

This commit enables a user to define HAVE_ATTR_TEST to zero in order
to omit the test dependency.

Fixes: 91854f9a077e ("perf tools: Move everything related to sys_perf_event_open() to perf-sys.h")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20191001113307.27796-2-bjorn.topel@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf scripts python: exported-sql-viewer.py: Add Time chart by CPU
Adrian Hunter [Wed, 21 Aug 2019 08:32:16 +0000 (11:32 +0300)]
perf scripts python: exported-sql-viewer.py: Add Time chart by CPU

Add a time chart based on context switch information.

Context switch information was added to the database export fairly
recently, so the chart menu option will only appear if context switch
information is in the database.

Refer to the Exported SQL Viewer Help option for more information about
the chart.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20190821083216.1340-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf scripts python: exported-sql-viewer.py: Add ability for Call tree to open at...
Adrian Hunter [Wed, 21 Aug 2019 08:32:15 +0000 (11:32 +0300)]
perf scripts python: exported-sql-viewer.py: Add ability for Call tree to open at a specified task and time

Add ability for Call tree to open at a specified task and time.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20190821083216.1340-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf scripts python: exported-sql-viewer.py: Tidy up Call tree call_time
Adrian Hunter [Wed, 21 Aug 2019 08:32:14 +0000 (11:32 +0300)]
perf scripts python: exported-sql-viewer.py: Tidy up Call tree call_time

Record call_time on tree nodes and re-name the misnamed "count" parameter.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20190821083216.1340-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf scripts python: exported-sql-viewer.py: Add global time range calculations
Adrian Hunter [Wed, 21 Aug 2019 08:32:13 +0000 (11:32 +0300)]
perf scripts python: exported-sql-viewer.py: Add global time range calculations

Add calculations to determine a time range that encompasses all data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20190821083216.1340-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf scripts python: exported-sql-viewer.py: Add HBoxLayout and VBoxLayout
Adrian Hunter [Wed, 21 Aug 2019 08:32:12 +0000 (11:32 +0300)]
perf scripts python: exported-sql-viewer.py: Add HBoxLayout and VBoxLayout

Add layout classes HBoxLayout and VBoxLayout.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20190821083216.1340-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf scripts python: exported-sql-viewer.py: Add LookupModel()
Adrian Hunter [Wed, 21 Aug 2019 08:32:11 +0000 (11:32 +0300)]
perf scripts python: exported-sql-viewer.py: Add LookupModel()

Add LookupModel() to find a model in the model cache without creating it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20190821083216.1340-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace augmented_syscalls: Do not show syscalls when none was asked for
Arnaldo Carvalho de Melo [Wed, 2 Oct 2019 15:54:07 +0000 (12:54 -0300)]
perf trace augmented_syscalls: Do not show syscalls when none was asked for

When not using augmented syscalls, i.e. not passing thru the command
line a eBPF source or object file event that provides the
__augmented_syscalls__ BPF_MAP_TYPE_PERF_EVENT_ARRAY, etc, as with:

   perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c

or passing that augmented eBPF source/object via the trace.add_events in
.perfconfig file, we were assuming that syscalls were asked for,
differing from when not using augmented syscalls at all.

This is confusing when using .perfconfig to hide the fact we're using
the augmenter, i.e. using:

 # perf trace -e sched:* sleep 1

Will show both the scheduler tracepoints and the syscalls, where what we
want is to show just the scheduler tracepoints.

To see the scheduler tracepoints and some specific syscall strace-like
formatting, one has to use:

  # perf trace -e sched:*,nanosleep sleep 1

Or, if wanting all the syscalls:

  # perf trace -e sched:* --syscalls sleep 1

This way 'perf trace' can be used to trace just a set of tracepoints
while allowing for mixing with strace-like when desired, by simply
adding to the mix the name of the syscalls to show in addition to the
tracepoints.

Fix it so that the behaviour using the eBPF based syscall augmenter is
the same as when not using one.

Testing:

Before this patch, with this ~/.perfconfig:

  # egrep -B1 ^[[:space:]]+add_events ~/.perfconfig
  [trace]
   add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
  #

That points to this pre-compiled eBPF syscall augmenter:

  # file /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
  /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o: ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), with debug_info, not stripped

And when asking for _only_ sched:sched_switch and sched:sched_wakeup we
were unconditionally getting all the syscalls formatted strace-like:

  # perf trace -e sched:*switch,sched:*wakeup sleep 1 |& tail
     0.633 fstat(3, 0x7fe11d030ac0)                = 0
     0.635 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe10fec5000
     0.643 close(3)                                = 0
     0.668 nanosleep(0x7fff649a3a90, NULL)      ...
     0.672 sched:sched_switch:prev_comm=sleep prev_pid=4417 prev_prio=120 prev_state=S ==> next_comm=swapper/6 next_pid=0 next_prio=120
  1000.822 sched:sched_wakeup:comm=sleep pid=4417 prio=120 target_cpu=006
     0.668  ... [continued]: nanosleep())          = 0
  1000.923 close(1)                                = 0
  1000.941 close(2)                                = 0
  1000.974 exit_group(0)                           = ?
  #

After the patch:

  # perf trace -e sched:*switch,sched:*wakeup sleep 1
     0.000 sched:sched_wakeup:comm=perf pid=5529 prio=120 target_cpu=005
     1.186 sched:sched_switch:prev_comm=sleep prev_pid=5529 prev_prio=120 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
  1001.573 sched:sched_wakeup:comm=sleep pid=5529 prio=120 target_cpu=005
  #

If we add the "open*" syscalls to the mix then the eBPF augmented _will_
be used and these syscalls will be traced together with the specified
sched tracepoints:

  # cd /sys/kernel/debug/tracing/events/syscalls/
  # ls -1d sys_enter_open*
  sys_enter_open
  sys_enter_openat
  sys_enter_open_by_handle_at
  sys_enter_open_tree
  #

  # perf trace -e open*,sched:*switch,sched:*wakeup sleep 1
       0.000 sched:sched_wakeup:comm=perf pid=5580 prio=120 target_cpu=005
       0.590 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
       0.616 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
       0.846 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
       0.891 sched:sched_switch:prev_comm=sleep prev_pid=5580 prev_prio=120 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
    1001.005 sched:sched_wakeup:comm=sleep pid=5580 prio=120 target_cpu=005
  #

And as we can see, the pathnames were collected via the eBPF augmenters.

If we don't specify anything it'll trace all syscalls:

  # perf trace sleep 1 |& tail
       0.299 brk(0x5597543a3000)                     = 0x5597543a3000
       0.302 brk(NULL)                               = 0x5597543a3000
       0.307 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
       0.313 fstat(3, 0x7feece50cac0)                = 0
       0.315 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7feec13a1000
       0.323 close(3)                                = 0
       0.354 nanosleep(0x7ffe338856e0, NULL)         = 0
    1000.641 close(1)                                = 0
    1000.655 close(2)                                = 0
    1000.673 exit_group(0)                           = ?
  #

Ditto if we don't use .perfconfig's trace.add_events but instead pass
just the augmenter as a command line event:

  # vim ~/.perfconfig
  # egrep -B1 ^[[:space:]]+add_events ~/.perfconfig
  # perf trace -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o sleep 1 |& tail
       0.294 brk(0x55ae08ec3000)                     = 0x55ae08ec3000
       0.297 brk(NULL)                               = 0x55ae08ec3000
       0.302 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
       0.309 fstat(3, 0x7f726488fac0)                = 0
       0.311 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7257724000
       0.319 close(3)                                = 0
       0.347 nanosleep(0x7ffe23643a70, NULL)         = 0
    1000.560 close(1)                                = 0
    1000.575 close(2)                                = 0
    1000.593 exit_group(0)                           = ?
  #

As well as that + some syscall names for strace-like formatting:

  # perf trace -e socket,connect,/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o ssh localhost
       0.000 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 3
       0.021 connect(3, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
       0.034 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 3
       0.041 connect(3, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
       0.163 socket(PF_LOCAL, SOCK_STREAM, 0)        = 4
       0.185 connect(4, { .family: PF_LOCAL, path: /var/lib/sss/pipes/nss }, 110) = 0
       0.670 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 7
       0.684 connect(7, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
       0.694 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 7
       0.701 connect(7, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
       0.994 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 5
       1.006 connect(5, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
       1.014 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 5
       1.022 connect(5, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
       1.068 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
       1.087 connect(5, { .family: PF_INET, port: 22, addr: 127.0.0.1 }, 16) = 0
      24.299 socket(PF_LOCAL, SOCK_STREAM, 0)        = 6
      24.337 connect(6, { .family: PF_LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, 110) = 0
      28.441 socket(PF_LOCAL, SOCK_STREAM, 0)        = 6
      28.516 connect(6, { .family: PF_LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, 110) = 0
  root@localhost's password:^C
  #

Everything works without augmenters:

  # egrep -B1 ^[[:space:]]+add_events ~/.perfconfig
  # perf trace sleep 1 |& tail
       0.261 brk(0x5635068ac000)                     = 0x5635068ac000
       0.264 brk(NULL)                               = 0x5635068ac000
       0.268 openat(AT_FDCWD, 0xdce642a0, O_RDONLY|O_CLOEXEC) = 3
       0.275 fstat(3, 0x7f3fdce97ac0)                = 0
       0.277 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3fcfd2c000
       0.284 close(3)                                = 0
       0.310 nanosleep(0x7ffdaea6ecd0, NULL)         = 0
    1000.552 close(1)                                = 0
    1000.565 close(2)                                = 0
    1000.580 exit_group(0)                           = ?
  #

  # perf trace -e connect ssh localhost
       0.000 connect(3, 0x58266930, 110)             = -1 ENOENT (No such file or directory)
       0.022 connect(3, 0x58266af0, 110)             = -1 ENOENT (No such file or directory)
       0.150 connect(4, 0x58266b00, 110)             = 0
       0.490 connect(7, 0x58264150, 110)             = -1 ENOENT (No such file or directory)
       0.505 connect(7, 0x58264300, 110)             = -1 ENOENT (No such file or directory)
       0.832 connect(5, 0x58266220, 110)             = -1 ENOENT (No such file or directory)
       0.847 connect(5, 0x582663e0, 110)             = -1 ENOENT (No such file or directory)
       0.899 connect(5, 0x95ba0630, 16)              = 0
      25.619 connect(6, 0x58266360, 110)             = 0
      40.564 connect(6, 0x58266330, 110)             = 0
  root@localhost's password: ^C
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-624f6jxic04031tnt40va4dd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Postpone parsing .perfconfig trace.add_events to after --verbose is processed
Arnaldo Carvalho de Melo [Tue, 1 Oct 2019 18:44:44 +0000 (15:44 -0300)]
perf trace: Postpone parsing .perfconfig trace.add_events to after --verbose is processed

When we add events via the '[trace]' section in perfconfig the command
line options are not yet processed, so when something goes wrong with
parsing those events and using --verbose is advised, we end up not
getting any more verbosity by doing so.

So just copy the trace.add_events string for later processing, after we
processed --verbose and the other command line options.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-d6wbnz85ftqljdll6ynjyjd8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Generalize the syscall_fmt find routines
Arnaldo Carvalho de Melo [Tue, 1 Oct 2019 18:27:55 +0000 (15:27 -0300)]
perf trace: Generalize the syscall_fmt find routines

To allow them to be used with other stuff, such as tracepoints.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-od3gzg77ppqgnnrxqv40fvgx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Separate 'struct syscall_fmt' definition from syscall_fmts variable
Arnaldo Carvalho de Melo [Tue, 1 Oct 2019 18:16:33 +0000 (15:16 -0300)]
perf trace: Separate 'struct syscall_fmt' definition from syscall_fmts variable

As this has all the things needed to format tracepoints events, not just
syscalls, that, after all, are just tracepoints with a set in stone ABI,
i.e. order and number of parameters.

For tracepoints we'll create a

  static struct syscall_fmt tracepoint_fmts[]

array and will fill the ->arg[] entries with the beautifier for each
positional argument and record the name, then, when we need it, we'll
just check that the position has the same name, maybe even type, so that
we can do some check that the tracepoint hasn't changed, if it has, we
can even reorder things.

Keep calling it syscall_fmt but use it as well for tracepoints, do it
this way to minimize changes and reuse what is in place for syscalls,
we'll see.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-2x1jgiev13zt4njaanlnne0d@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf trace: Make evlist__set_evsel_handler() affect just entries without a handler
Arnaldo Carvalho de Melo [Tue, 1 Oct 2019 14:31:59 +0000 (11:31 -0300)]
perf trace: Make evlist__set_evsel_handler() affect just entries without a handler

Renaming it to evlist__set_default_evsel_handler(), to better reflect
what we want to do, which is to set a default handler for events we
still haven't set a custom handler, like the ones for "msr:write_msr",
etc that are coming soon.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-e1bit7upnpmtsayh8039kfuw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf evlist: Adopt __set_tracepoint_handlers method from perf_session
Arnaldo Carvalho de Melo [Tue, 1 Oct 2019 14:14:26 +0000 (11:14 -0300)]
perf evlist: Adopt __set_tracepoint_handlers method from perf_session

It all operates on the evsels in the session's evlist, so move it to the
evlist layer to make it useful to tools not using perf_session, just
evlists, like 'perf trace' in live mode.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-9oc53gnfi53vg82fvolkm85g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf top: Initialize perf_env->cpuid, needed by the per arch annotation init routine
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 14:53:00 +0000 (11:53 -0300)]
perf top: Initialize perf_env->cpuid, needed by the per arch annotation init routine

Just read it so that later on the per arch init routine can use it,
e.g. x86__annotate_init().

When using a perf.data file this is obtained from a header that was put
there by 'perf record', and then it may be for another machine, another
arch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-4t4n3o8l8s0tc2b1pq53hyr4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf env: Add routine to read the env->cpuid from the running machine
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 14:50:15 +0000 (11:50 -0300)]
perf env: Add routine to read the env->cpuid from the running machine

In 'perf top' we use that cpuid when initializing the per arch
annotation init routines (e.g. x86__annotate_init()) and in that case
(live mode, 'perf top') we need to obtain it from the running machine,
not from a perf.data file header.

Provide a means to do that. Will be used by 'perf top' in a followup
patch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-h2wb3sx7u7znx6lqfezrh7ca@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf/core: Fix inheritance of aux_output groups
Alexander Shishkin [Fri, 4 Oct 2019 12:57:29 +0000 (15:57 +0300)]
perf/core: Fix inheritance of aux_output groups

Commit:

  ab43762ef010 ("perf: Allow normal events to output AUX data")

forgets to configure aux_output relation in the inherited groups, which
results in child PEBS events forever failing to schedule.

Fix this by setting up the AUX output link in the inheritance path.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191004125729.32397-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoMerge tag 'perf-urgent-for-mingo-5.4-20191001' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Mon, 7 Oct 2019 13:15:24 +0000 (15:15 +0200)]
Merge tag 'perf-urgent-for-mingo-5.4-20191001' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

perf script:

  Andi Kleen:

    - Fix recovery from LBR/binary mismatch in the "brstackinsn" --field.

perf annotate:

  Arnaldo Carvalho de Melo:

  - Propagate errors so that meaningful messages can be presented to the
    user in case of problems.

perf map:

  Steve MacLean:

  - Fix handling of maps partially overlapped, resolving symbols in the
    ranges not replaced by new mmaps.

perf tests:

  Ian Rogers:

  - Use raise() instead of NULL derefs to avoid causing a SIGILL rather than a
    SIGSEGV for optimized builds that turn NULL derefs into ud2 instructions.

perf LLVM:

  Ian Rogers:

  - Don't access out-of-scope array.

perf inject:

  Steve MacLean:

  - Fix JIT_CODE_MOVE filename, that was having a u64 truncaded into a 32-bit
    snprintf format and also a missing ".so" suffix in another case.

libsubcmd:

  Ian Rogers:

  - Make _FORTIFY_SOURCE defines dependent on the feature, avoiding
    false positives with with memory sanitizers such as LLVM's ASan.

Vendor specific events:

Intel:

  Andi Kleen:

  - Fix period for Intel fixed counters.

s390:

  Thomas Richter (2):

  - Fix some event details transaction for machine type 8561.

tools headers UAPI:

  Arnaldo Carvalho de Melo:

  - Sync headers with the kernel, catching new usbdevfs ioctls and
    madvise behaviours to properly decode in 'perf trace' output.

Documentation:

  Steve MacLean:

  - Correct and clarify jitdump spec.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoLinux 5.4-rc2 v5.4-rc2
Linus Torvalds [Sun, 6 Oct 2019 21:27:30 +0000 (14:27 -0700)]
Linux 5.4-rc2

4 years agoelf: don't use MAP_FIXED_NOREPLACE for elf executable mappings
Linus Torvalds [Sun, 6 Oct 2019 20:53:27 +0000 (13:53 -0700)]
elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings

In commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") we
changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
executable mappings.

Then, people reported that it broke some binaries that had overlapping
segments from the same file, and commit ad55eac74f20 ("elf: enforce
MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
overlaying elf segment cases.  But only some - despite the summary line
of that commit, it only did it when it also does a temporary brk vma for
one obvious overlapping case.

Now Russell King reports another overlapping case with old 32-bit x86
binaries, which doesn't trigger that limited case.  End result: we had
better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.

Yes, it's a sign of old binaries generated with old tool-chains, but we
do pride ourselves on not breaking existing setups.

This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
and the old load_elf_library() use-cases, because nobody has reported
breakage for those. Yet.

Note that in all the cases seen so far, the overlapping elf sections
seem to be just re-mapping of the same executable with different section
attributes.  We could possibly introduce a new MAP_FIXED_NOFILECHANGE
flag or similar, which acts like NOREPLACE, but allows just remapping
the same executable file using different protection flags.

It's not clear that would make a huge difference to anything, but if
people really hate that "elf remaps over previous maps" behavior, maybe
at least a more limited form of remapping would alleviate some concerns.

Alternatively, we should take a look at our elf_map() logic to see if we
end up not mapping things properly the first time.

In the meantime, this is the minimal "don't do that then" patch while
people hopefully think about it more.

Reported-by: Russell King <linux@armlinux.org.uk>
Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
Fixes: ad55eac74f20 ("elf: enforce  MAP_FIXED on overlaying elf segments")
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sun, 6 Oct 2019 18:10:15 +0000 (11:10 -0700)]
Merge tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping regression fix from Christoph Hellwig:
 "Revert an incorret hunk from a patch that caused problems on various
  arm boards (Andrey Smirnov)"

* tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: fix false positive warnings in dma_common_free_remap()

4 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Sun, 6 Oct 2019 00:18:43 +0000 (17:18 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Olof Johansson:
 "A few fixes this time around:

   - Fixup of some clock specifications for DRA7 (device-tree fix)

   - Removal of some dead/legacy CPU OPP/PM code for OMAP that throws
     warnings at boot

   - A few more minor fixups for OMAPs, most around display

   - Enable STM32 QSPI as =y since their rootfs sometimes comes from
     there

   - Switch CONFIG_REMOTEPROC to =y since it went from tristate to bool

   - Fix of thermal zone definition for ux500 (5.4 regression)"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: multi_v7_defconfig: Fix SPI_STM32_QSPI support
  ARM: dts: ux500: Fix up the CPU thermal zone
  arm64/ARM: configs: Change CONFIG_REMOTEPROC from m to y
  ARM: dts: am4372: Set memory bandwidth limit for DISPC
  ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
  ARM: OMAP2+: Add missing LCDC midlemode for am335x
  ARM: OMAP2+: Fix missing reset done flag for am3 and am43
  ARM: dts: Fix gpio0 flags for am335x-icev2
  ARM: omap2plus_defconfig: Enable more droid4 devices as loadable modules
  ARM: omap2plus_defconfig: Enable DRM_TI_TFP410
  DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again
  ARM: dts: Fix wrong clocks for dra7 mcasp
  clk: ti: dra7: Fix mcasp8 clock bits

4 years agoMerge tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahi...
Linus Torvalds [Sat, 5 Oct 2019 19:56:59 +0000 (12:56 -0700)]
Merge tag 'kbuild-fixes-v5.4' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - remove unneeded ar-option and KBUILD_ARFLAGS

 - remove long-deprecated SUBDIRS

 - fix modpost to suppress false-positive warnings for UML builds

 - fix namespace.pl to handle relative paths to ${objtree}, ${srctree}

 - make setlocalversion work for /bin/sh

 - make header archive reproducible

 - fix some Makefiles and documents

* tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kheaders: make headers archive reproducible
  kbuild: update compile-test header list for v5.4-rc2
  kbuild: two minor updates for Documentation/kbuild/modules.rst
  scripts/setlocalversion: clear local variable to make it work for sh
  namespace: fix namespace.pl script to support relative paths
  video/logo: do not generate unneeded logo C files
  video/logo: remove unneeded *.o pattern from clean-files
  integrity: remove pointless subdir-$(CONFIG_...)
  integrity: remove unneeded, broken attempt to add -fshort-wchar
  modpost: fix static EXPORT_SYMBOL warnings for UML build
  kbuild: correct formatting of header in kbuild module docs
  kbuild: remove SUBDIRS support
  kbuild: remove ar-option and KBUILD_ARFLAGS

4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 5 Oct 2019 19:53:27 +0000 (12:53 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Twelve patches mostly small but obvious fixes or cosmetic but small
  updates"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: Fix Nport ID display value
  scsi: qla2xxx: Fix N2N link up fail
  scsi: qla2xxx: Fix N2N link reset
  scsi: qla2xxx: Optimize NPIV tear down process
  scsi: qla2xxx: Fix stale mem access on driver unload
  scsi: qla2xxx: Fix unbound sleep in fcport delete path.
  scsi: qla2xxx: Silence fwdump template message
  scsi: hisi_sas: Make three functions static
  scsi: megaraid: disable device when probe failed after enabled device
  scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
  scsi: qedf: Remove always false 'tmp_prio < 0' statement
  scsi: ufs: skip shutdown if hba is not powered
  scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF

4 years agoMerge branch 'readdir' (readdir speedup and sanity checking)
Linus Torvalds [Sat, 5 Oct 2019 19:03:27 +0000 (12:03 -0700)]
Merge branch 'readdir' (readdir speedup and sanity checking)

This makes getdents() and getdents64() do sanity checking on the
pathname that it gives to user space.  And to mitigate the performance
impact of that, it first cleans up the way it does the user copying, so
that the code avoids doing the SMAP/PAN updates between each part of the
dirent structure write.

I really wanted to do this during the merge window, but didn't have
time.  The conversion of filldir to unsafe_put_user() is something I've
had around for years now in a private branch, but the extra pathname
checking finally made me clean it up to the point where it is mergable.

It's worth noting that the filename validity checking really should be a
bit smarter: it would be much better to delay the error reporting until
the end of the readdir, so that non-corrupted filenames are still
returned.  But that involves bigger changes, so let's see if anybody
actually hits the corrupt directory entry case before worrying about it
further.

* branch 'readdir':
  Make filldir[64]() verify the directory entry filename is valid
  Convert filldir[64]() from __put_user() to unsafe_put_user()

4 years agoMake filldir[64]() verify the directory entry filename is valid
Linus Torvalds [Sat, 5 Oct 2019 18:32:52 +0000 (11:32 -0700)]
Make filldir[64]() verify the directory entry filename is valid

This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().

This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.

There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.

There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption.  It's really more
"protect user space from bad behavior" as pointed out by Jann.  But
since Eric Biederman looked up the POSIX wording, here it is for context:

 "From readdir:

   The readdir() function shall return a pointer to a structure
   representing the directory entry at the current position in the
   directory stream specified by the argument dirp, and position the
   directory stream at the next entry. It shall return a null pointer
   upon reaching the end of the directory stream. The structure dirent
   defined in the <dirent.h> header describes a directory entry.

  From definitions:

   3.129 Directory Entry (or Link)

   An object that associates a filename with a file. Several directory
   entries can associate names with the same file.

  ...

   3.169 Filename

   A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
   characters composing the name may be selected from the set of all
   character values excluding the slash character and the null byte. The
   filenames dot and dot-dot have special meaning. A filename is
   sometimes referred to as a 'pathname component'."

Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.

Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.

We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')

See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.

Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoConvert filldir[64]() from __put_user() to unsafe_put_user()
Linus Torvalds [Sun, 22 May 2016 04:59:07 +0000 (21:59 -0700)]
Convert filldir[64]() from __put_user() to unsafe_put_user()

We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.

Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.

So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface.  Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.

This also makes the dirent name copy use unsafe_put_user() with a couple
of macros.  We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.

Note that this doesn't bother with the legacy cases.  Nobody should use
them anyway, so performance doesn't really matter there.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Sat, 5 Oct 2019 15:50:15 +0000 (08:50 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Fix ieeeu02154 atusb driver use-after-free, from Johan Hovold.

 2) Need to validate TCA_CBQ_WRROPT netlink attributes, from Eric
    Dumazet.

 3) txq null deref in mac80211, from Miaoqing Pan.

 4) ionic driver needs to select NET_DEVLINK, from Arnd Bergmann.

 5) Need to disable bh during nft_connlimit GC, from Pablo Neira Ayuso.

 6) Avoid division by zero in taprio scheduler, from Vladimir Oltean.

 7) Various xgmac fixes in stmmac driver from Jose Abreu.

 8) Avoid 64-bit division in mlx5 leading to link errors on 32-bit from
    Michal Kubecek.

 9) Fix bad VLAN check in rtl8366 DSA driver, from Linus Walleij.

10) Fix sleep while atomic in sja1105, from Vladimir Oltean.

11) Suspend/resume deadlock in stmmac, from Thierry Reding.

12) Various UDP GSO fixes from Josh Hunt.

13) Fix slab out of bounds access in tcp_zerocopy_receive(), from Eric
    Dumazet.

14) Fix OOPS in __ipv6_ifa_notify(), from David Ahern.

15) Memory leak in NFC's llcp_sock_bind, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  selftests/net: add nettest to .gitignore
  net: qlogic: Fix memory leak in ql_alloc_large_buffers
  nfc: fix memory leak in llcp_sock_bind()
  sch_dsmark: fix potential NULL deref in dsmark_init()
  net: phy: at803x: use operating parameters from PHY-specific status
  net: phy: extract pause mode
  net: phy: extract link partner advertisement reading
  net: phy: fix write to mii-ctrl1000 register
  ipv6: Handle missing host route in __ipv6_ifa_notify
  net: phy: allow for reset line to be tied to a sleepy GPIO controller
  net: ipv4: avoid mixed n_redirects and rate_tokens usage
  r8152: Set macpassthru in reset_resume callback
  cxgb4:Fix out-of-bounds MSI-X info array access
  Revert "ipv6: Handle race in addrconf_dad_work"
  net: make sock_prot_memory_pressure() return "const char *"
  rxrpc: Fix rxrpc_recvmsg tracepoint
  qmi_wwan: add support for Cinterion CLS8 devices
  tcp: fix slab-out-of-bounds in tcp_zerocopy_receive()
  lib: textsearch: fix escapes in example code
  udp: only do GSO if # of segs > 1
  ...

4 years agoMerge tag 's390-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 5 Oct 2019 15:44:02 +0000 (08:44 -0700)]
Merge tag 's390-5.4-3' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - defconfig updates

 - Fix build errors with CC_OPTIMIZE_FOR_SIZE due to usage of "i"
   constraint for function arguments. Two kvm changes acked-by Christian
   Borntraeger.

 - Fix -Wunused-but-set-variable warnings in mm code.

 - Avoid a constant misuse in qdio.

 - Handle a case when cpumf is temporarily unavailable.

* tag 's390-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  KVM: s390: mark __insn32_query() as __always_inline
  KVM: s390: fix __insn32_query() inline assembly
  s390: update defconfigs
  s390/pci: mark function(s) __always_inline
  s390/mm: mark function(s) __always_inline
  s390/jump_label: mark function(s) __always_inline
  s390/cpu_mf: mark function(s) __always_inline
  s390/atomic,bitops: mark function(s) __always_inline
  s390/mm: fix -Wunused-but-set-variable warnings
  s390: mark __cpacf_query() as __always_inline
  s390/qdio: clarify size of the QIB parm area
  s390/cpumf: Fix indentation in sampling device driver
  s390/cpumsf: Check for CPU Measurement sampling
  s390/cpumf: Use consistant debug print format

4 years agoKVM: s390: mark __insn32_query() as __always_inline
Heiko Carstens [Wed, 2 Oct 2019 12:34:37 +0000 (14:34 +0200)]
KVM: s390: mark __insn32_query() as __always_inline

__insn32_query() will not compile if the compiler decides to not
inline it, since it contains an inline assembly with an "i" constraint
with variable contents.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agoKVM: s390: fix __insn32_query() inline assembly
Heiko Carstens [Wed, 2 Oct 2019 12:24:47 +0000 (14:24 +0200)]
KVM: s390: fix __insn32_query() inline assembly

The inline assembly constraints of __insn32_query() tell the compiler
that only the first byte of "query" is being written to. Intended was
probably that 32 bytes are written to.

Fix and simplify the code and just use a "memory" clobber.

Fixes: d668139718a9 ("KVM: s390: provide query function for instructions returning 32 byte")
Cc: stable@vger.kernel.org # v5.2+
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
4 years agodma-mapping: fix false positivse warnings in dma_common_free_remap()
Andrey Smirnov [Sat, 5 Oct 2019 08:23:30 +0000 (10:23 +0200)]
dma-mapping: fix false positivse warnings in dma_common_free_remap()

Commit 5cf4537975bb ("dma-mapping: introduce a dma_common_find_pages
helper") changed invalid input check in dma_common_free_remap() from:

    if (!area || !area->flags != VM_DMA_COHERENT)

to

    if (!area || !area->flags != VM_DMA_COHERENT || !area->pages)

which seem to produce false positives for memory obtained via
dma_common_contiguous_remap()

This triggers the following warning message when doing "reboot" on ZII
VF610 Dev Board Rev B:

WARNING: CPU: 0 PID: 1 at kernel/dma/remap.c:112 dma_common_free_remap+0x88/0x8c
trying to free invalid coherent area: 9ef82980
Modules linked in:
CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.3.0-rc6-next-20190820 #119
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
Backtrace:
[<8010d1ec>] (dump_backtrace) from [<8010d588>] (show_stack+0x20/0x24)
 r7:8015ed78 r6:00000009 r5:00000000 r4:9f4d9b14
[<8010d568>] (show_stack) from [<8077e3f0>] (dump_stack+0x24/0x28)
[<8077e3cc>] (dump_stack) from [<801197a0>] (__warn.part.3+0xcc/0xe4)
[<801196d4>] (__warn.part.3) from [<80119830>] (warn_slowpath_fmt+0x78/0x94)
 r6:00000070 r5:808e540c r4:81c03048
[<801197bc>] (warn_slowpath_fmt) from [<8015ed78>] (dma_common_free_remap+0x88/0x8c)
 r3:9ef82980 r2:808e53e0
 r7:00001000 r6:a0b1e000 r5:a0b1e000 r4:00001000
[<8015ecf0>] (dma_common_free_remap) from [<8010fa9c>] (remap_allocator_free+0x60/0x68)
 r5:81c03048 r4:9f4d9b78
[<8010fa3c>] (remap_allocator_free) from [<801100d0>] (__arm_dma_free.constprop.3+0xf8/0x148)
 r5:81c03048 r4:9ef82900
[<8010ffd8>] (__arm_dma_free.constprop.3) from [<80110144>] (arm_dma_free+0x24/0x2c)
 r5:9f563410 r4:80110120
[<80110120>] (arm_dma_free) from [<8015d80c>] (dma_free_attrs+0xa0/0xdc)
[<8015d76c>] (dma_free_attrs) from [<8020f3e4>] (dma_pool_destroy+0xc0/0x154)
 r8:9efa8860 r7:808f02f0 r6:808f02d0 r5:9ef82880 r4:9ef82780
[<8020f324>] (dma_pool_destroy) from [<805525d0>] (ehci_mem_cleanup+0x6c/0x150)
 r7:9f563410 r6:9efa8810 r5:00000000 r4:9efd0148
[<80552564>] (ehci_mem_cleanup) from [<80558e0c>] (ehci_stop+0xac/0xc0)
 r5:9efd0148 r4:9efd0000
[<80558d60>] (ehci_stop) from [<8053c4bc>] (usb_remove_hcd+0xf4/0x1b0)
 r7:9f563410 r6:9efd0074 r5:81c03048 r4:9efd0000
[<8053c3c8>] (usb_remove_hcd) from [<8056361c>] (host_stop+0x48/0xb8)
 r7:9f563410 r6:9efd0000 r5:9f5f4040 r4:9f5f5040
[<805635d4>] (host_stop) from [<80563d0c>] (ci_hdrc_host_destroy+0x34/0x38)
 r7:9f563410 r6:9f5f5040 r5:9efa8800 r4:9f5f4040
[<80563cd8>] (ci_hdrc_host_destroy) from [<8055ef18>] (ci_hdrc_remove+0x50/0x10c)
[<8055eec8>] (ci_hdrc_remove) from [<804a2ed8>] (platform_drv_remove+0x34/0x4c)
 r7:9f563410 r6:81c4f99c r5:9efa8810 r4:9efa8810
[<804a2ea4>] (platform_drv_remove) from [<804a18a8>] (device_release_driver_internal+0xec/0x19c)
 r5:00000000 r4:9efa8810
[<804a17bc>] (device_release_driver_internal) from [<804a1978>] (device_release_driver+0x20/0x24)
 r7:9f563410 r6:81c41ed0 r5:9efa8810 r4:9f4a1dac
[<804a1958>] (device_release_driver) from [<804a01b8>] (bus_remove_device+0xdc/0x108)
[<804a00dc>] (bus_remove_device) from [<8049c204>] (device_del+0x150/0x36c)
 r7:9f563410 r6:81c03048 r5:9efa8854 r4:9efa8810
[<8049c0b4>] (device_del) from [<804a3368>] (platform_device_del.part.2+0x20/0x84)
 r10:9f563414 r9:809177e0 r8:81cb07dc r7:81c78320 r6:9f563454 r5:9efa8800
 r4:9efa8800
[<804a3348>] (platform_device_del.part.2) from [<804a3420>] (platform_device_unregister+0x28/0x34)
 r5:9f563400 r4:9efa8800
[<804a33f8>] (platform_device_unregister) from [<8055dce0>] (ci_hdrc_remove_device+0x1c/0x30)
 r5:9f563400 r4:00000001
[<8055dcc4>] (ci_hdrc_remove_device) from [<805652ac>] (ci_hdrc_imx_remove+0x38/0x118)
 r7:81c78320 r6:9f563454 r5:9f563410 r4:9f541010
[<8056538c>] (ci_hdrc_imx_shutdown) from [<804a2970>] (platform_drv_shutdown+0x2c/0x30)
[<804a2944>] (platform_drv_shutdown) from [<8049e4fc>] (device_shutdown+0x158/0x1f0)
[<8049e3a4>] (device_shutdown) from [<8013ac80>] (kernel_restart_prepare+0x44/0x48)
 r10:00000058 r9:9f4d8000 r8:fee1dead r7:379ce700 r6:81c0b280 r5:81c03048
 r4:00000000
[<8013ac3c>] (kernel_restart_prepare) from [<8013ad14>] (kernel_restart+0x1c/0x60)
[<8013acf8>] (kernel_restart) from [<8013af84>] (__do_sys_reboot+0xe0/0x1d8)
 r5:81c03048 r4:00000000
[<8013aea4>] (__do_sys_reboot) from [<8013b0ec>] (sys_reboot+0x18/0x1c)
 r8:80101204 r7:00000058 r6:00000000 r5:00000000 r4:00000000
[<8013b0d4>] (sys_reboot) from [<80101000>] (ret_fast_syscall+0x0/0x54)
Exception stack(0x9f4d9fa8 to 0x9f4d9ff0)
9fa0:                   00000000 00000000 fee1dead 28121969 01234567 379ce700
9fc0: 00000000 00000000 00000000 00000058 00000000 00000000 00000000 00016d04
9fe0: 00028e0c 7ec87c64 000135ec 76c1f410

Restore original invalid input check in dma_common_free_remap() to
avoid this problem.

Fixes: 5cf4537975bb ("dma-mapping: introduce a dma_common_find_pages helper")
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
[hch: just revert the offending hunk instead of creating a new helper]
Signed-off-by: Christoph Hellwig <hch@lst.de>
4 years agokheaders: make headers archive reproducible
Dmitry Goldin [Fri, 4 Oct 2019 10:40:07 +0000 (10:40 +0000)]
kheaders: make headers archive reproducible

In commit 43d8ce9d65a5 ("Provide in-kernel headers to make
extending kernel easier") a new mechanism was introduced, for kernels
>=5.2, which embeds the kernel headers in the kernel image or a module
and exposes them in procfs for use by userland tools.

The archive containing the header files has nondeterminism caused by
header files metadata. This patch normalizes the metadata and utilizes
KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
default behaviour.

In commit f7b101d33046 ("kheaders: Move from proc to sysfs") it was
modified to use sysfs and the script for generation of the archive was
renamed to what is being patched.

Signed-off-by: Dmitry Goldin <dgoldin+lkml@protonmail.ch>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agokbuild: update compile-test header list for v5.4-rc2
Masahiro Yamada [Thu, 3 Oct 2019 02:36:29 +0000 (11:36 +0900)]
kbuild: update compile-test header list for v5.4-rc2

Commit 6dc280ebeed2 ("coda: remove uapi/linux/coda_psdev.h") removed
a header in question. Some more build errors were fixed. Add more
headers into the test coverage.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agokbuild: two minor updates for Documentation/kbuild/modules.rst
Masahiro Yamada [Thu, 3 Oct 2019 10:29:12 +0000 (19:29 +0900)]
kbuild: two minor updates for Documentation/kbuild/modules.rst

Capitalize the first word in the sentence.

Use obj-m instead of obj-y. obj-y still works, but we have no built-in
objects in external module builds. So, obj-m is better IMHO.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agoscripts/setlocalversion: clear local variable to make it work for sh
Masahiro Yamada [Tue, 1 Oct 2019 12:17:24 +0000 (21:17 +0900)]
scripts/setlocalversion: clear local variable to make it work for sh

Geert Uytterhoeven reports a strange side-effect of commit 858805b336be
("kbuild: add $(BASH) to run scripts with bash-extension"), which
inserts the contents of a localversion file in the build directory twice.

[Steps to Reproduce]
  $ echo bar > localversion
  $ mkdir build
  $ cd build/
  $ echo foo > localversion
  $ make -s -f ../Makefile defconfig include/config/kernel.release
  $ cat include/config/kernel.release
  5.4.0-rc1foofoobar

This comes down to the behavior change of local variables.

The 'man sh' on my Ubuntu machine, where sh is an alias to dash,
explains as follows:
  When a variable is made local, it inherits the initial value and
  exported and readonly flags from the variable with the same name
  in the surrounding scope, if there is one. Otherwise, the variable
  is initially unset.

[Test Code]

  foo ()
  {
          local res
          echo "res: $res"
  }

  res=1
  foo

[Result]

  $ sh test.sh
  res: 1
  $ bash test.sh
  res:

So, scripts/setlocalversion correctly works only for bash in spite of
its hashbang being #!/bin/sh. Nobody had noticed it before because
CONFIG_SHELL was previously set to bash almost all the time.

Now that CONFIG_SHELL is set to sh, we must write portable and correct
code. I gave the Fixes tag to the commit that uncovered the issue.

Clear the variable 'res' in collect_files() to make it work for sh
(and it also works on distributions where sh is an alias to bash).

Fixes: 858805b336be ("kbuild: add $(BASH) to run scripts with bash-extension")
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
4 years agonamespace: fix namespace.pl script to support relative paths
Jacob Keller [Fri, 27 Sep 2019 23:30:27 +0000 (16:30 -0700)]
namespace: fix namespace.pl script to support relative paths

The namespace.pl script does not work properly if objtree is not set to
an absolute path. The do_nm function is run from within the find
function, which changes directories.

Because of this, appending objtree, $File::Find::dir, and $source, will
return a path which is not valid from the current directory.

This used to work when objtree was set to an absolute path when using
"make namespacecheck". It appears to have not worked when calling
./scripts/namespace.pl directly.

This behavior was changed in 7e1c04779efd ("kbuild: Use relative path
for $(objtree)", 2014-05-14)

Rather than fixing the Makefile to set objtree to an absolute path, just
fix namespace.pl to work when srctree and objtree are relative. Also fix
the script to use an absolute path for these by default.

Use the File::Spec module for this purpose. It's been part of perl
5 since 5.005.

The curdir() function is used to get the current directory when the
objtree and srctree aren't set in the environment.

rel2abs() is used to convert possibly relative objtree and srctree
environment variables to absolute paths.

Finally, the catfile() function is used instead of string appending
paths together, since this is more robust when joining paths together.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agovideo/logo: do not generate unneeded logo C files
Masahiro Yamada [Wed, 21 Aug 2019 04:12:35 +0000 (13:12 +0900)]
video/logo: do not generate unneeded logo C files

Currently, all the logo C files are generated irrespective of the
CONFIG options. Adding them to extra-y is wrong. What we need to do
here is to add them to 'targets' so that if_changed works properly.

Files listed in 'targets' are cleaned, so clean-files is unneeded.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agovideo/logo: remove unneeded *.o pattern from clean-files
Masahiro Yamada [Wed, 21 Aug 2019 04:12:34 +0000 (13:12 +0900)]
video/logo: remove unneeded *.o pattern from clean-files

The pattern *.o is cleaned up globally by the top Makefile.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agointegrity: remove pointless subdir-$(CONFIG_...)
Masahiro Yamada [Fri, 26 Jul 2019 02:10:55 +0000 (11:10 +0900)]
integrity: remove pointless subdir-$(CONFIG_...)

The ima/ and evm/ sub-directories contain built-in objects, so
obj-$(CONFIG_...) is the correct way to descend into them.

subdir-$(CONFIG_...) is redundant.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agointegrity: remove unneeded, broken attempt to add -fshort-wchar
Masahiro Yamada [Fri, 26 Jul 2019 02:10:54 +0000 (11:10 +0900)]
integrity: remove unneeded, broken attempt to add -fshort-wchar

I guess commit 15ea0e1e3e18 ("efi: Import certificates from UEFI Secure
Boot") attempted to add -fshort-wchar for building load_uefi.o, but it
has never worked as intended.

load_uefi.o is created in the platform_certs/ sub-directory. If you
really want to add -fshort-wchar, the correct code is:

  $(obj)/platform_certs/load_uefi.o: KBUILD_CFLAGS += -fshort-wchar

But, you do not need to fix it.

Commit 8c97023cf051 ("Kbuild: use -fshort-wchar globally") had already
added -fshort-wchar globally. This code was unneeded in the first place.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
4 years agoselftests/net: add nettest to .gitignore
Jakub Kicinski [Sat, 5 Oct 2019 00:36:50 +0000 (17:36 -0700)]
selftests/net: add nettest to .gitignore

nettest is missing from gitignore.

Fixes: acda655fefae ("selftests: Add nettest")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: qlogic: Fix memory leak in ql_alloc_large_buffers
Navid Emamdoost [Fri, 4 Oct 2019 20:24:39 +0000 (15:24 -0500)]
net: qlogic: Fix memory leak in ql_alloc_large_buffers

In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
This skb should be released if pci_dma_mapping_error fails.

Fixes: 0f8ab89e825f ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: fix memory leak in llcp_sock_bind()
Eric Dumazet [Fri, 4 Oct 2019 18:08:34 +0000 (11:08 -0700)]
nfc: fix memory leak in llcp_sock_bind()

sysbot reported a memory leak after a bind() has failed.

While we are at it, abort the operation if kmemdup() has failed.

BUG: memory leak
unreferenced object 0xffff888105d83ec0 (size 32):
  comm "syz-executor067", pid 7207, jiffies 4294956228 (age 19.430s)
  hex dump (first 32 bytes):
    00 69 6c 65 20 72 65 61 64 00 6e 65 74 3a 5b 34  .ile read.net:[4
    30 32 36 35 33 33 30 39 37 5d 00 00 00 00 00 00  026533097]......
  backtrace:
    [<0000000036bac473>] kmemleak_alloc_recursive /./include/linux/kmemleak.h:43 [inline]
    [<0000000036bac473>] slab_post_alloc_hook /mm/slab.h:522 [inline]
    [<0000000036bac473>] slab_alloc /mm/slab.c:3319 [inline]
    [<0000000036bac473>] __do_kmalloc /mm/slab.c:3653 [inline]
    [<0000000036bac473>] __kmalloc_track_caller+0x169/0x2d0 /mm/slab.c:3670
    [<000000000cd39d07>] kmemdup+0x27/0x60 /mm/util.c:120
    [<000000008e57e5fc>] kmemdup /./include/linux/string.h:432 [inline]
    [<000000008e57e5fc>] llcp_sock_bind+0x1b3/0x230 /net/nfc/llcp_sock.c:107
    [<000000009cb0b5d3>] __sys_bind+0x11c/0x140 /net/socket.c:1647
    [<00000000492c3bbc>] __do_sys_bind /net/socket.c:1658 [inline]
    [<00000000492c3bbc>] __se_sys_bind /net/socket.c:1656 [inline]
    [<00000000492c3bbc>] __x64_sys_bind+0x1e/0x30 /net/socket.c:1656
    [<0000000008704b2a>] do_syscall_64+0x76/0x1a0 /arch/x86/entry/common.c:296
    [<000000009f4c57a4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 30cc4587659e ("NFC: Move LLCP code to the NFC top level diirectory")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosch_dsmark: fix potential NULL deref in dsmark_init()
Eric Dumazet [Fri, 4 Oct 2019 17:34:45 +0000 (10:34 -0700)]
sch_dsmark: fix potential NULL deref in dsmark_init()

Make sure TCA_DSMARK_INDICES was provided by the user.

syzbot reported :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8799 Comm: syz-executor235 Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:nla_get_u16 include/net/netlink.h:1501 [inline]
RIP: 0010:dsmark_init net/sched/sch_dsmark.c:364 [inline]
RIP: 0010:dsmark_init+0x193/0x640 net/sched/sch_dsmark.c:339
Code: 85 db 58 0f 88 7d 03 00 00 e8 e9 1a ac fb 48 8b 9d 70 ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 ca
RSP: 0018:ffff88809426f3b8 EFLAGS: 00010247
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff85c6eb09
RDX: 0000000000000000 RSI: ffffffff85c6eb17 RDI: 0000000000000004
RBP: ffff88809426f4b0 R08: ffff88808c4085c0 R09: ffffed1015d26159
R10: ffffed1015d26158 R11: ffff8880ae930ac7 R12: ffff8880a7e96940
R13: dffffc0000000000 R14: ffff88809426f8c0 R15: 0000000000000000
FS:  0000000001292880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000080 CR3: 000000008ca1b000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 qdisc_create+0x4ee/0x1210 net/sched/sch_api.c:1237
 tc_modify_qdisc+0x524/0x1c50 net/sched/sch_api.c:1653
 rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223
 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
 netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
 netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:657
 ___sys_sendmsg+0x803/0x920 net/socket.c:2311
 __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
 __do_sys_sendmsg net/socket.c:2365 [inline]
 __se_sys_sendmsg net/socket.c:2363 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440369

Fixes: 758cc43c6d73 ("[PKT_SCHED]: Fix dsmark to apply changes consistent")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'Fix-regression-with-AR8035-speed-downgrade'
David S. Miller [Sat, 5 Oct 2019 01:11:08 +0000 (18:11 -0700)]
Merge branch 'Fix-regression-with-AR8035-speed-downgrade'

Russell King says:

====================
Fix regression with AR8035 speed downgrade

The following series attempts to address an issue spotted by tinywrkb
with the AR8035 on the Cubox-i2 in a situation where the PHY downgrades
the negotiated link.

This is version 2, not much has changed other than rebasing on the
current net tree.  Changes have happend to patch 2 due to conflicts,
so I dropped Andrew's reviewed-by.  Minor context changes to patch 4
which I don't consider important enough to warrant dropping the
reviewed-by.

Before commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in
genphy_read_status"), we would read not only the link partner's
advertisement, but also our own advertisement from the PHY registers,
and use both to derive the PHYs current link mode.  This works when the
AR8035 downgrades the speed, because it appears that the AR8035 clears
link mode bits in the advertisement registers as part of the downgrade.

Commentary: what is not yet known is whether the AR8035 restores the
            advertisement register when the link goes down to the
    previous state.

However, since the above referenced commit, we no longer use the PHYs
advertisement registers, instead converting the link partner's
advertisement to the ethtool link mode array, and combine that with
phylib's cached version of our advertisement - which is not updated on
speed downgrade.

This results in phylib disagreeing with the actual operating mode of
the PHY.

Commentary: I wonder how many more PHY drivers are broken by this
    commit, but have yet to be discovered.

The obvious way to address this would be to disable the downgrade
feature, and indeed this does fix the problem in tinywrkb's case - his
link partner instead downgrades the speed by reducing its
advertisement, resulting in phylib correctly evaluating a slower speed.

However, it has a serious drawback - the gigabit control register (MII
register 9) appears to become read only.  It seems the only way to
update the register is to re-enable the downgrade feature, reset the
PHY, changing register 9, disable the downgrade feature, and reset the
PHY again.

This series attempts to address the problem using a different approach,
similar to the approach taken with Marvell PHYs.  The AR8031, AR8033
and AR8035 have a PHY-Specific Status register which reports the
actual operating mode of the PHY - both speed and duplex.  This
register correctly reports the operating mode irrespective of whether
autoneg is enabled or not.  We use this register to fill in phylib's
speed and duplex parameters.

In detail:

Patch 1 fixes a bug where writing to register 9 does not update
phylib's advertisement mask in the same way that writing register 4
does; this looks like an omission from when gigabit PHY support came
into being.

Patch 2 seperates the generic phylib code which reads the link partners
advertisement from the PHY, so that we can re-use this in the Atheros
PHY driver.

Patch 3 seperates the generic phylib pause mode; phylib provides no
help for MAC drivers to ascertain the negotiated pause mode, it merely
copies the link partner's pause mode bits into its own variables.

Commentary: Both the aforementioned Atheros PHYs and Marvell PHYs
            provide the resolved pause modes in terms of whether
    we should transmit pause frames, or whether we should
    allow reception of pause frames.  Surely the resolution
    of this should be in phylib?

Patch 4 provides the Atheros PHY driver with a private "read_status"
implementation that fills in phylib's speed and duplex settings
depending on the PHY-Specific status register.  This ensures that
phylib and the MAC driver match the operating mode that the PHY has
decided to use.  Since the register also gives us MDIX status, we
can trivially fill that status in as well.

Note that, although the bits mentioned in this patch for this register
match those in th Marvell PHY driver, and it is located at the same
address, the meaning of other register bits varies between the PHYs.
Therefore, I do not feel that it would be appropriate to make this some
kind of generic function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: at803x: use operating parameters from PHY-specific status
Russell King [Fri, 4 Oct 2019 16:06:14 +0000 (17:06 +0100)]
net: phy: at803x: use operating parameters from PHY-specific status

Read the PHY-specific status register for the current operating mode
(speed and duplex) of the PHY.  This register reflects the actual
mode that the PHY has resolved depending on either the advertisements
of autoneg is enabled, or the forced mode if autoneg is disabled.

This ensures that phylib's software state always tracks the hardware
state.

It seems both AR8033 (which uses the AR8031 ID) and AR8035 support
this status register.  AR8030 is not known at the present time.

This patch depends on "net: phy: extract pause mode" and "net: phy:
extract link partner advertisement reading".

Reported-by: tinywrkb <tinywrkb@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: tinywrkb <tinywrkb@gmail.com>
Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: extract pause mode
Russell King [Fri, 4 Oct 2019 16:06:09 +0000 (17:06 +0100)]
net: phy: extract pause mode

Extract the update of phylib's software pause mode state from
genphy_read_status(), so that we can re-use this functionality with
PHYs that have alternative ways to read the negotiation results.

Tested-by: tinywrkb <tinywrkb@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>