Change Intel PT's branch stack support to use thread stacks. The
advantages of using branch stack support from the thread-stack are:
1. the branches are accumulated separately for each thread
2. the branch stack is cleared only in between continuous traces
This helps pave the way for adding branch stacks to regular events, not
just synthesized events as at present.
While the 2 approaches are not identical, in simple cases the results
can be identical e.g.
Before:
# perf record --kcore -e intel_pt// uname
# perf script --itrace=i10usl -F+brstacksym,+addr,+flags > cmp1.txt
After:
# perf script --itrace=i10usl -F+brstacksym,+addr,+flags > cmp2.txt
# diff -s cmp1.txt cmp2.txt
Files cmp1.txt and cmp2.txt are identical
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Intel PT already has support for creating branch stacks for each context
(per-cpu or per-thread). In the more common per-cpu case, the branch stack
is not separated for different threads, instead being cleared in between
each sample.
That approach will not work very well for adding branch stacks to
regular events. The branch stacks really need to be accumulated
separately for each thread.
As a start to accomplishing that, this patch adds support for putting
branch stack support into the thread-stack. The advantages are:
1. the branches are accumulated separately for each thread
2. the branch stack is cleared only in between continuous traces
This helps pave the way for adding branch stacks to regular events, not
just synthesized events as at present.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Trying to disentangle this a bit further, unfortunately it uses
parse_events(), its interesting to have it separated anyway, so do it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Patch enhances current metric infrastructure to handle "?" in the metric
expression. The "?" can be use for parameters whose value not known
while creating metric events and which can be replace later at runtime
to the proper value. It also add flexibility to create multiple events
out of single metric event added in JSON file.
Patch adds function 'arch_get_runtimeparam' which is a arch specific
function, returns the count of metric events need to be created. By
default it return 1.
This infrastructure needed for hv_24x7 socket/chip level events.
"hv_24x7" chip level events needs specific chip-id to which the data is
requested. Function 'arch_get_runtimeparam' implemented in header.c
which extract number of sockets from sysfs file "sockets" under
"/sys/devices/hv_24x7/interface/".
With this patch basically we are trying to create as many metric events
as define by runtime_param.
For that one loop is added in function 'metricgroup__add_metric', which
create multiple events at run time depend on return value of
'arch_get_runtimeparam' and merge that event in 'group_list'.
To achieve that we are actually passing this parameter value as part of
`expr__find_other` function and changing "?" present in metric
expression with this value.
As in our JSON file, there gonna be single metric event, and out of
which we are creating multiple events.
To understand which data count belongs to which parameter value,
we also printing param value in generic_metric function.
For example,
command:# ./perf stat -M PowerBUS_Frequency -C 0 -I 1000
1.000101867 9,356,933 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0
1.000101867 9,366,134 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1
2.000314878 9,365,868 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0
2.000314878 9,366,092 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1
So, here _0 and _1 after PowerBUS_Frequency specify parameter value.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-5-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The synthesize benchmark, run on a single process and thread, shows
perf_event__synthesize_mmap_events as the hottest function with fgets
and sscanf taking the majority of execution time.
fscanf performs similarly well. Replace the scanf call with manual
reading of each field of the /proc/pid/maps line, and remove some
unnecessary buffering.
This change also addresses potential, but unlikely, buffer overruns for
the string values read by scanf.
Performance before is:
$ sudo perf bench internals synthesize -m 16 -M 16 -s -t
\# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 102.810 usec (+- 0.027 usec)
Average num. events: 17.000 (+- 0.000)
Average time per event 6.048 usec
Average data synthesis took: 106.325 usec (+- 0.018 usec)
Average num. events: 89.000 (+- 0.000)
Average time per event 1.195 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 16
Average synthesis took: 68103.100 usec (+- 441.234 usec)
Average num. events: 30703.000 (+- 0.730)
Average time per event 2.218 usec
And after is:
$ sudo perf bench internals synthesize -m 16 -M 16 -s -t
\# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 50.388 usec (+- 0.031 usec)
Average num. events: 17.000 (+- 0.000)
Average time per event 2.964 usec
Average data synthesis took: 52.693 usec (+- 0.020 usec)
Average num. events: 89.000 (+- 0.000)
Average time per event 0.592 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 16
Average synthesis took: 45022.400 usec (+- 552.740 usec)
Average num. events: 30624.200 (+- 10.037)
Average time per event 1.470 usec
On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.
Committer testing:
On a AMD Ryzen 5 3600X 6-Core Processor:
Before:
# perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 267.491 usec (+- 0.176 usec)
Average num. events: 56.000 (+- 0.000)
Average time per event 4.777 usec
Average data synthesis took: 277.257 usec (+- 0.169 usec)
Average num. events: 287.000 (+- 0.000)
Average time per event 0.966 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 12
Average synthesis took: 81599.500 usec (+- 346.315 usec)
Average num. events: 36096.100 (+- 2.523)
Average time per event 2.261 usec
#
After:
# perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 110.125 usec (+- 0.080 usec)
Average num. events: 56.000 (+- 0.000)
Average time per event 1.967 usec
Average data synthesis took: 118.518 usec (+- 0.057 usec)
Average num. events: 287.000 (+- 0.000)
Average time per event 0.413 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 12
Average synthesis took: 43490.700 usec (+- 284.527 usec)
Average num. events: 37028.500 (+- 0.563)
Average time per event 1.175 usec
#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By default this isn't run as it reads /proc and may not have access.
For consistency, modify the single threaded benchmark to compute an
average time per event.
Committer testing:
$ grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
$ grep "model name" /proc/cpuinfo | wc -l
8
$
$ perf bench internals synthesize -h
# Running 'internals/synthesize' benchmark:
Usage: perf bench internals synthesize <options>
-I, --multi-iterations <n>
Number of iterations used to compute multi-threaded average
-i, --single-iterations <n>
Number of iterations used to compute single-threaded average
-M, --max-threads <n>
Maximum number of threads in multithreaded bench
-m, --min-threads <n>
Minimum number of threads in multithreaded bench
-s, --st Run single threaded benchmark
-t, --mt Run multi-threaded benchmark
$
$ perf bench internals synthesize -t
# Running 'internals/synthesize' benchmark:
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 1
Average synthesis took: 65449.000 usec (+- 586.442 usec)
Average num. events: 9405.400 (+- 0.306)
Average time per event 6.959 usec
Number of synthesis threads: 2
Average synthesis took: 37838.300 usec (+- 130.259 usec)
Average num. events: 9501.800 (+- 20.469)
Average time per event 3.982 usec
Number of synthesis threads: 3
Average synthesis took: 48551.400 usec (+- 225.686 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 5.087 usec
Number of synthesis threads: 4
Average synthesis took: 29632.500 usec (+- 50.808 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 3.105 usec
Number of synthesis threads: 5
Average synthesis took: 33920.400 usec (+- 284.509 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 3.554 usec
Number of synthesis threads: 6
Average synthesis took: 27604.100 usec (+- 72.344 usec)
Average num. events: 9548.000 (+- 0.000)
Average time per event 2.891 usec
Number of synthesis threads: 7
Average synthesis took: 25406.300 usec (+- 933.371 usec)
Average num. events: 9545.500 (+- 0.167)
Average time per event 2.662 usec
Number of synthesis threads: 8
Average synthesis took: 24110.400 usec (+- 73.229 usec)
Average num. events: 9551.000 (+- 0.000)
Average time per event 2.524 usec
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To control degree of parallelism of the synthesize_mmap() code which
is scanning /proc/PID/task/PID/maps and can be time consuming.
Mimic perf top way of handling the option.
If not specified will default to 1 thread, i.e. default behavior before
this option.
On a desktop computer the processing of /proc/PID/task/PID/maps isn't
slow enough to warrant parallel processing and the thread creation has
some cost - hence the default of 1. On a loaded server with
>100 cores it is possible to see synthesis times in the order of
seconds and in this case having the option is desirable.
As the processing is a synchronization point, it is legitimate to worry if
Amdahl's law will apply to this patch. Profiling with this patch in
place:
https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/
shows:
...
- 32.59% __perf_event__synthesize_threads
- 32.54% __event__synthesize_thread
+ 22.13% perf_event__synthesize_mmap_events
+ 6.68% perf_event__get_comm_ids.constprop.0
+ 1.49% process_synthesized_event
+ 1.29% __GI___readdir64
+ 0.60% __opendir
...
That is the processing is 1.49% of execution time and there is plenty to
make parallel. This is shown in the benchmark in this patch:
https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 1
Average synthesis took: 127729.000 usec (+- 3372.880 usec)
Average num. events: 21548.600 (+- 0.306)
Average time per event 5.927 usec
Number of synthesis threads: 2
Average synthesis took: 88863.500 usec (+- 385.168 usec)
Average num. events: 21552.800 (+- 0.327)
Average time per event 4.123 usec
Number of synthesis threads: 3
Average synthesis took: 83257.400 usec (+- 348.617 usec)
Average num. events: 21553.200 (+- 0.327)
Average time per event 3.863 usec
Number of synthesis threads: 4
Average synthesis took: 75093.000 usec (+- 422.978 usec)
Average num. events: 21554.200 (+- 0.200)
Average time per event 3.484 usec
Number of synthesis threads: 5
Average synthesis took: 64896.600 usec (+- 353.348 usec)
Average num. events: 21558.000 (+- 0.000)
Average time per event 3.010 usec
Number of synthesis threads: 6
Average synthesis took: 59210.200 usec (+- 342.890 usec)
Average num. events: 21560.000 (+- 0.000)
Average time per event 2.746 usec
Number of synthesis threads: 7
Average synthesis took: 54093.900 usec (+- 306.247 usec)
Average num. events: 21562.000 (+- 0.000)
Average time per event 2.509 usec
Number of synthesis threads: 8
Average synthesis took: 48938.700 usec (+- 341.732 usec)
Average num. events: 21564.000 (+- 0.000)
Average time per event 2.269 usec
Where average time per synthesized event goes from 5.927 usec with 1
thread to 2.269 usec with 8. This isn't a linear speed up as not all of
synthesize code has been made parallel. If the synthesis time was about
10 seconds then using 8 threads may bring this down to less than 4.
Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Jones <tonyj@suse.de>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 2d4f27999b ("perf data: Add global path holder") missed path
conversion in tests/topology.c, causing the "Session topology" testcase
to "hang" (waits forever for input from stdin) when doing "ssh $VM perf
test".
Can be reproduced by running "cat | perf test topo", and crashed by
replacing cat with true:
$ true | perf test -v topo
40: Session topology :
--- start ---
test child forked, pid 3638
templ file: /tmp/perf-test-QPvAch
incompatible file format
incompatible file format (rerun with -v to learn more)
free(): invalid pointer
test child interrupted
---- end ----
Session topology: FAILED!
Committer testing:
Reproduced the above result before the patch and after it is back
working:
# true | perf test -v topo
41: Session topology :
--- start ---
test child forked, pid 19374
templ file: /tmp/perf-test-YOTEQg
CPU 0, core 0, socket 0
CPU 1, core 1, socket 0
CPU 2, core 2, socket 0
CPU 3, core 3, socket 0
CPU 4, core 0, socket 0
CPU 5, core 1, socket 0
CPU 6, core 2, socket 0
CPU 7, core 3, socket 0
test child finished with 0
---- end ----
Session topology: Ok
#
Fixes: 2d4f27999b ("perf data: Add global path holder")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200423115341.562782-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For interval mode, the metric is printed after the '#' character if it
exists. But it's not calculated by the counts generated in this
interval.
See the following examples:
root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
# time counts unit events
1.000422803 764,809 inst_retired.any # 2.9 CPI
1.000422803 2,234,932 cycles
2.001464585 1,960,061 inst_retired.any # 1.6 CPI
2.001464585 4,022,591 cycles
The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1)
root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000429493 2,869,311 cycles
1.000429493 816,875 instructions # 0.28 insn per cycle
2.001516426 9,260,973 cycles
2.001516426 5,250,634 instructions # 0.87 insn per cycle
The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is
0.57).
The current code uses a global variable 'rt_stat' for tracking and
updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not
reset for interval. While the counts are reset for interval.
perf_stat_process_counter()
{
if (config->interval)
init_stats(ps->res_stats);
}
So for interval mode, the 'rt_stat' variable should be reset too.
This patch resets 'rt_stat' before read_counters(), so the runtime stat
is only calculated by the counts generated in this interval.
With this patch:
root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
# time counts unit events
1.000420924 2,408,818 inst_retired.any # 2.1 CPI
1.000420924 5,010,111 cycles
2.001448579 2,798,407 inst_retired.any # 1.6 CPI
2.001448579 4,599,861 cycles
root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000428555 2,769,714 cycles
1.000428555 774,462 instructions # 0.28 insn per cycle
2.001471562 3,595,904 cycles
2.001471562 1,243,703 instructions # 0.35 insn per cycle
Now the second 'insn per cycle' and CPI are calculated by the counts
generated in this interval.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-By: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200420145417.6864-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do not bother with close() if fd is not valid, just to silence valgrind:
$ valgrind ./perf script
==59169== Memcheck, a memory error detector
==59169== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==59169== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==59169== Command: ./perf script
==59169==
==59169== Warning: invalid file descriptor -1 in syscall close()
==59169== Warning: invalid file descriptor -1 in syscall close()
==59169== Warning: invalid file descriptor -1 in syscall close()
==59169== Warning: invalid file descriptor -1 in syscall close()
==59169== Warning: invalid file descriptor -1 in syscall close()
==59169== Warning: invalid file descriptor -1 in syscall close()
==59169== Warning: invalid file descriptor -1 in syscall close()
==59169== Warning: invalid file descriptor -1 in syscall close()
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200417132330.119407-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:
kernel + tools/perf:
Alexey Budankov:
- Introduce CAP_PERFMON to kernel and user space.
callchains:
Adrian Hunter:
- Allow using Intel PT to synthesize callchains for regular events.
Kan Liang:
- Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
perf script:
Andreas Gerstmayr:
- Add flamegraph.py script
BPF:
Jiri Olsa:
- Synthesize bpf_trampoline/dispatcher ksymbol events.
perf stat:
Arnaldo Carvalho de Melo:
- Honour --timeout for forked workloads.
Stephane Eranian:
- Force error in fallback on :k events, to avoid counting nothing when
the user asks for kernel events but is not allowed to.
perf bench:
Ian Rogers:
- Add event synthesis benchmark.
tools api fs:
Stephane Eranian:
- Make xxx__mountpoint() more scalable
libtraceevent:
He Zhe:
- Handle return value of asprintf.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/vm: fix cross-compile build
coredump: fix null pointer dereference on coredump
mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path
shmem: fix possible deadlocks on shmlock_user_lock
vmalloc: fix remap_vmalloc_range() bounds checks
mm/shmem: fix build without THP
mm/ksm: fix NULL pointer dereference when KSM zero page is enabled
tools/build: tweak unused value workaround
checkpatch: fix a typo in the regex for $allocFunctions
mm, gup: return EINTR when gup is interrupted by fatal signals
mm/hugetlb: fix a addressing exception caused by huge_pte_offset
MAINTAINERS: add an entry for kfifo
mm/userfaultfd: disable userfaultfd-wp on x86_32
slub: avoid redzone when choosing freepointer location
sh: fix build error in mm/init.c
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, and a few cleanups to the newly-introduced assembly language
vmentry code for AMD"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions
kvm: Disable objtool frame pointer checking for vmenter.S
MAINTAINERS: add a reviewer for KVM/s390
KVM: s390: Fix PV check in deliverable_irqs()
kvm: Handle reads of SandyBridge RAPL PMU MSRs rather than injecting #GP
KVM: Remove CREATE_IRQCHIP/SET_PIT2 race
KVM: SVM: Fix __svm_vcpu_run declaration.
KVM: SVM: Do not setup frame pointer in __svm_vcpu_run
KVM: SVM: Fix build error due to missing release_pages() include
KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD
kvm: nVMX: match comment with return type for nested_vmx_exit_reflected
kvm: nVMX: reflect MTF VM-exits if injected by L1
KVM: s390: Return last valid slot if approx index is out-of-bounds
KVM: Check validity of resolved slot when searching memslots
KVM: VMX: Enable machine check support for 32bit targets
KVM: SVM: move more vmentry code to assembly
KVM: SVM: fix compilation with modular PSP and non-modular KVM
Pull virtio fixes and cleanups from Michael Tsirkin:
- Some bug fixes
- Cleanup a couple of issues that surfaced meanwhile
- Disable vhost on ARM with OABI for now - to be fixed fully later in
the cycle or in the next release.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits)
vhost: disable for OABI
virtio: drop vringh.h dependency
virtio_blk: add a missing include
virtio-balloon: Avoid using the word 'report' when referring to free page hinting
virtio-balloon: make virtballoon_free_page_report() static
vdpa: fix comment of vdpa_register_device()
vdpa: make vhost, virtio depend on menu
vdpa: allow a 32 bit vq alignment
drm/virtio: fix up for include file changes
remoteproc: pull in slab.h
rpmsg: pull in slab.h
virtio_input: pull in slab.h
remoteproc: pull in slab.h
virtio-rng: pull in slab.h
virtgpu: pull in uaccess.h
tools/virtio: make asm/barrier.h self contained
tools/virtio: define aligned attribute
virtio/test: fix up after IOTLB changes
vhost: Create accessors for virtqueues private_data
vdpasim: Return status in vdpasim_get_status
...
Pull tpm fixes from Jarkko Sakkinen:
"A few bug fixes"
* tag 'tpmdd-next-20200421' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm/tpm_tis: Free IRQ if probing fails
tpm: fix wrong return value in tpm_pcr_extend
tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send()
tpm: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module
Pull clang-format fixlets from Miguel Ojeda:
"Two trivial clang-format changes:
- Don't indent C++ namespaces (Ian Rogers)
- The usual clang-format macro list update (Miguel Ojeda)"
* tag 'clang-format-for-linus-v5.7-rc3' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
clang-format: don't indent namespaces