After copying Arm64's perf archive with object files and perf.data file
to x86 laptop, the x86's perf kernel symbol resolution fails. It
outputs 'unknown' for all symbols parsing.
This issue is root caused by the function elf__needs_adjust_symbols(),
x86 perf tool uses one weak version, Arm64 (and powerpc) has rewritten
their own version. elf__needs_adjust_symbols() decides if need to parse
symbols with the relative offset address; but x86 building uses the weak
function which misses to check for the elf type 'ET_DYN', so that it
cannot parse symbols in Arm DSOs due to the wrong result from
elf__needs_adjust_symbols().
The DSO parsing should not depend on any specific architecture perf
building; e.g. x86 perf tool can parse Arm and Arm64 DSOs, vice versa.
And confirmed by Naveen N. Rao that powerpc64 kernels are not being
built as ET_DYN anymore and change to ET_EXEC.
This patch removes the arch specific functions for Arm64 and powerpc and
changes elf__needs_adjust_symbols() as a common function.
In the common elf__needs_adjust_symbols(), it checks an extra condition
'ET_DYN' for elf header type. With this fixing, the Arm64 DSO can be
parsed properly with x86's perf tool.
Before:
# perf script
main 3258 1 branches: 0 [unknown] ([unknown]) => ffff800010c4665c [unknown] ([kernel.kallsyms])
main 3258 1 branches: ffff800010c46670 [unknown] ([kernel.kallsyms]) => ffff800010c4eaec [unknown] ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4eaec [unknown] ([kernel.kallsyms]) => ffff800010c4eb00 [unknown] ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4eb08 [unknown] ([kernel.kallsyms]) => ffff800010c4e780 [unknown] ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4e7a0 [unknown] ([kernel.kallsyms]) => ffff800010c4eeac [unknown] ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4eebc [unknown] ([kernel.kallsyms]) => ffff800010c4ed80 [unknown] ([kernel.kallsyms])
After:
# perf script
main 3258 1 branches: 0 [unknown] ([unknown]) => ffff800010c4665c coresight_timeout+0x54 ([kernel.kallsyms])
main 3258 1 branches: ffff800010c46670 coresight_timeout+0x68 ([kernel.kallsyms]) => ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) => ffff800010c4eb00 etm4_enable_hw+0x3e0 ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4eb08 etm4_enable_hw+0x3e8 ([kernel.kallsyms]) => ffff800010c4e780 etm4_enable_hw+0x60 ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4e7a0 etm4_enable_hw+0x80 ([kernel.kallsyms]) => ffff800010c4eeac etm4_enable+0x2d4 ([kernel.kallsyms])
main 3258 1 branches: ffff800010c4eebc etm4_enable+0x2e4 ([kernel.kallsyms]) => ffff800010c4ed80 etm4_enable+0x1a8 ([kernel.kallsyms])
v3: Changed to check for ET_DYN across all architectures.
v2: Fixed Arm64 and powerpc native building.
Reported-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20200306015759.10084-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf record:
Alexey Budankov:
- Fix binding of AIO user space buffers to nodes
maps:
Dominik b. Czarnota:
- Fix off by one in strncpy() size argument.
Arnaldo Carvalho de Melo:
- Use strstarts() to look for Android libraries.
Ian Rogers:
- Give synthetic mmap events an inode generation.
man pages:
Ian Rogers:
- Set man page date to last git commit.
perf test:
Ian Rogers:
- Print if shell directory isn't present.
perf report:
Jin Yao:
- Fix no branch type statistics report issue.
perf expr:
Jiri Olsa:
- Fix copy/paste mistake
vendor events:
Kan Liang:
- Support metric constraints.
vendor events intel:
Kan Liang:
- Add NO_NMI_WATCHDOG metric constraint.
vendor events s390:
Thomas Richter:
- Add new deflate counters for IBM z15.
ARM cs-etm:
Leo Yan:
- Last branch improvements.
intel-pt:
Adrian Hunter:
- Update intel-pt.txt file with new location of the documentation.
- Add Intel PT man page references.
- Rename intel-pt.txt and put it in man page format.
perl scripting:
Michael Petlan:
- Add common_callchain to fix argument order.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Conflicts:
tools/perf/util/map.c
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf probe:
Masami Hiramatsu:
- Fix deletion of multiple probe events.
- Fix userspace libraries handling by not depending on dwfl_module_addrsym().
Event parsing:
Ian Rogers:
- Fix reading of invalid memory in event parsing.
python binding:
Ilie Halip:
- Fix clang detection when using CC=clang-version.
build:
Masami Hiramatsu:
- Fix O= use with relative paths.
Android:
Dominik b. Czarnota:
- Fix off by one in strncpy() size argument when handling Android
libraries.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Previously we could get the report of branch type statistics.
For example:
# perf record -j any,save_type ...
# t perf report --stdio
#
# Branch Statistics:
#
COND_FWD: 40.6%
COND_BWD: 4.1%
CROSS_4K: 24.7%
CROSS_2M: 12.3%
COND: 44.7%
UNCOND: 0.0%
IND: 6.1%
CALL: 24.5%
RET: 24.7%
But now for the recent perf, it can't report the branch type statistics.
It's a regression issue caused by commit 40c39e3046 ("perf report: Fix
a no annotate browser displayed issue"), which only counts the branch
type statistics for browser mode.
This patch moves the branch_type_count() outside of ui__has_annotation()
checking, then branch type statistics can work for stdio mode.
Fixes: 40c39e3046 ("perf report: Fix a no annotate browser displayed issue")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200313134607.12873-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When mmap2 events are synthesized the ino_generation field isn't being
set leading to uninitialized memory being compared.
Caught with clang's -fsanitize=memory:
==124733==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55a96a6a65cc in __dso_id__cmp tools/perf/util/dsos.c:23:6
#1 0x55a96a6a81d5 in dso_id__cmp tools/perf/util/dsos.c:38:9
#2 0x55a96a6a717f in __dso__cmp_long_name tools/perf/util/dsos.c:74:15
#3 0x55a96a6a6c4c in __dsos__findnew_link_by_longname_id tools/perf/util/dsos.c:106:12
#4 0x55a96a6a851e in __dsos__findnew_by_longname_id tools/perf/util/dsos.c:178:9
#5 0x55a96a6a7798 in __dsos__find_id tools/perf/util/dsos.c:191:9
#6 0x55a96a6a7b57 in __dsos__findnew_id tools/perf/util/dsos.c:251:20
#7 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
#8 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
#9 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
#10 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
#11 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
#12 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
#13 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
#14 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
#15 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
#16 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
#17 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
#18 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
#19 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
#20 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
#21 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
#22 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
#23 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
#24 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
#25 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
#26 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
#27 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
#28 0x55a96a282223 in main tools/perf/perf.c:538:3
Uninitialized value was stored to memory at
#1 0x55a96a6a18f7 in dso__new_id tools/perf/util/dso.c:1230:14
#2 0x55a96a6a78ee in __dsos__addnew_id tools/perf/util/dsos.c:233:20
#3 0x55a96a6a7bcc in __dsos__findnew_id tools/perf/util/dsos.c:252:21
#4 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
#5 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
#6 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
#7 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
#8 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
#9 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
#10 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
#11 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
#12 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
#13 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
#14 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
#15 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
#16 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
#17 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
#18 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
#19 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
Uninitialized value was stored to memory at
#0 0x55a96a7725af in machine__process_mmap2_event tools/perf/util/machine.c:1646:25
#1 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
#2 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
#3 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
#4 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
#5 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
#6 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
#7 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
#8 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
#9 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
#10 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
#11 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
#12 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
#13 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
#14 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
#15 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
#16 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
#17 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
#18 0x55a96a282223 in main tools/perf/perf.c:538:3
Uninitialized value was created by a heap allocation
#0 0x55a96a22f60d in malloc llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:925:3
#1 0x55a96a882948 in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:655:15
#2 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
#3 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
#4 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
#5 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
#6 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
#7 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
#8 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
#9 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
#10 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
#11 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
#12 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
#13 0x55a96a282223 in main tools/perf/perf.c:538:3
SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/perf/util/dsos.c:23:6 in __dso_id__cmp
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200313053129.131264-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf fixes from Thomas Gleixner:
"A pile of perf fixes:
Kernel side:
- AMD uncore driver: Replace the open coded sanity check with the
core variant, which provides the correct error code and also leaves
a hint in dmesg
Tooling:
- Fix the stdio input handling with glibc versions >= 2.28
- Unbreak the futex-wake benchmark which was reduced to 0 test
threads due to the conversion to cpumaps
- Initialize sigaction structs before invoking sys_sigactio()
- Plug the mapfile memory leak in perf jevents
- Fix off by one relative directory includes
- Fix an undefined string comparison in perf diff"
* tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
tools: Fix off-by 1 relative directory includes
perf jevents: Fix leak of mapfile memory
perf bench: Clear struct sigaction before sigaction() syscall
perf bench futex-wake: Restore thread count default to online CPU count
perf top: Fix stdio interface input handling with glibc 2.28+
perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
perf symbols: Don't try to find a vmlinux file when looking for kernel modules
perf bench: Share some global variables to fix build with gcc 10
perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
perf env: Do not return pointers to local variables
perf tests bp_account: Make global variable static
Pull power management fix from Rafael Wysocki:
"Fix cpupower utility build failures with -fno-common enabled (Mike
Gilbert)"
* tag 'pm-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpupower: avoid multiple definition with gcc -fno-common
Pull networking fixes from David Miller:
"It looks like a decent sized set of fixes, but a lot of these are one
liner off-by-one and similar type changes:
1) Fix netlink header pointer to calcular bad attribute offset
reported to user. From Pablo Neira Ayuso.
2) Don't double clear PHY interrupts when ->did_interrupt is set,
from Heiner Kallweit.
3) Add missing validation of various (devlink, nl802154, fib, etc.)
attributes, from Jakub Kicinski.
4) Missing *pos increments in various netfilter seq_next ops, from
Vasily Averin.
5) Missing break in of_mdiobus_register() loop, from Dajun Jin.
6) Don't double bump tx_dropped in veth driver, from Jiang Lidong.
7) Work around FMAN erratum A050385, from Madalin Bucur.
8) Make sure ARP header is pulled early enough in bonding driver,
from Eric Dumazet.
9) Do a cond_resched() during multicast processing of ipvlan and
macvlan, from Mahesh Bandewar.
10) Don't attach cgroups to unrelated sockets when in interrupt
context, from Shakeel Butt.
11) Fix tpacket ring state management when encountering unknown GSO
types. From Willem de Bruijn.
12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend()
only in the suspend context. From Heiner Kallweit"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
net: systemport: fix index check to avoid an array out of bounds access
tc-testing: add ETS scheduler to tdc build configuration
net: phy: fix MDIO bus PM PHY resuming
net: hns3: clear port base VLAN when unload PF
net: hns3: fix RMW issue for VLAN filter switch
net: hns3: fix VF VLAN table entries inconsistent issue
net: hns3: fix "tc qdisc del" failed issue
taprio: Fix sending packets without dequeueing them
net: mvmdio: avoid error message for optional IRQ
net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register
net: memcg: fix lockdep splat in inet_csk_accept()
s390/qeth: implement smarter resizing of the RX buffer pool
s390/qeth: refactor buffer pool code
s390/qeth: use page pointers to manage RX buffer pool
seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
net/packet: tpacket_rcv: do not increment ring index on drop
sxgbe: Fix off by one in samsung driver strncpy size arg
net: caif: Add lockdep expression to RCU traversal primitive
MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer
...
add CONFIG_NET_SCH_ETS to 'config', otherwise test suites using this file
to perform a full tdc run will encounter the following warning:
ok 645 e90e - Add ETS qdisc using bands # skipped - "-----> teardown stage" did not complete successfully
Fixes: 82c664b69c ("selftests: qdiscs: Add test coverage for ETS Qdisc")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since common_callchain has been added to the argument array, we need to
reflect it in perl-based scripts, because otherwise the following args
would be shifted and thus incorrect. E.g. rw-by-pid and calculation of
read and written bytes:
Before:
read counts by pid:
pid comm # reads bytes_requested bytes_read
------ -------------------- ----------- ---------- ----------
19301 dd 4 424510450039736 0
After:
read counts by pid:
pid comm # reads bytes_requested bytes_read
------ -------------------- ----------- ---------- ----------
19301 dd 4 9536 4341
Committer testing:
To see before after first do:
# perf script record rw-by-pid
^C
Now you'll have a perf.data file to report on, then do before and after
using:
# perf script report rw-by-pid
Anbd notice the bytes_request/bytes_read, as above.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Salon <bsalon@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
LPU-Reference: 20200311132836.12693-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the man page dates reflect the date the man pages were built.
This patch adjusts the date so that the date is when then man page
last had a commit against it. The date is generated using 'git log'.
Committer testing:
$ git log -1 --pretty="format:%cd" --date=short tools/perf/Documentation/perf-top.txt
2020-01-14
Before:
rm -rf /tmp/build/perf
mkdir -p /tmp/build/perf
make -C tools/perf O=/tmp/build/perf/ install
$ date
Wed 11 Mar 2020 10:21:19 AM -03
$ man perf-top | tail -1
perf 03/11/2020 PERF-TOP(1)
$
After:
rm -rf /tmp/build/perf
mkdir -p /tmp/build/perf
make -C tools/perf O=/tmp/build/perf/ install
$ date
$ date
Wed 11 Mar 2020 10:24:06 AM -03
$ man perf-top | tail -1
perf 2020-01-14 PERF-TOP(1)
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200311052110.23132-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When 'etm->instructions_sample_period' is less than
'tidq->period_instructions', the function cs_etm__sample() cannot handle
this case properly with its logic.
Let's see below flow as an example:
- If we set itrace option '--itrace=i4', then function cs_etm__sample()
has variables with initialized values:
tidq->period_instructions = 0
etm->instructions_sample_period = 4
- When the first packet is coming:
packet->instr_count = 10; the number of instructions executed in this
packet is 10, thus update period_instructions as below:
tidq->period_instructions = 0 + 10 = 10
instrs_over = 10 - 4 = 6
offset = 10 - 6 - 1 = 3
tidq->period_instructions = instrs_over = 6
- When the second packet is coming:
packet->instr_count = 10; in the second pass, assume 10 instructions
in the trace sample again:
tidq->period_instructions = 6 + 10 = 16
instrs_over = 16 - 4 = 12
offset = 10 - 12 - 1 = -3 -> the negative value
tidq->period_instructions = instrs_over = 12
So after handle these two packets, there have below issues:
The first issue is that cs_etm__instr_addr() returns the address within
the current trace sample of the instruction related to offset, so the
offset is supposed to be always unsigned value. But in fact, function
cs_etm__sample() might calculate a negative offset value (in handling
the second packet, the offset is -3) and pass to cs_etm__instr_addr()
with u64 type with a big positive integer.
The second issue is it only synthesizes 2 samples for sample period = 4.
In theory, every packet has 10 instructions so the two packets have
total 20 instructions, 20 instructions should generate 5 samples
(4 x 5 = 20). This is because cs_etm__sample() only calls once
cs_etm__synth_instruction_sample() to generate instruction sample per
range packet.
This patch fixes the logic in function cs_etm__sample(); the basic
idea for handling coming packet is:
- To synthesize the first instruction sample, it combines the left
instructions from the previous packet and the head of the new
packet; then generate continuous samples with sample period;
- At the tail of the new packet, if it has the rest instructions,
these instructions will be left for the sequential sample.
Suggested-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight ml <coresight@lists.linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Every time synthesize instruction sample, the last branch recording will
be reset. This is fine if the instruction period is big enough, for
example if use the option '--itrace=i100000', the last branch array is
reset for every sample with 100000 instructions per period; before
generate the next instruction sample, there has the sufficient packets
coming to fill the last branch array.
On the other hand, if set a very small period, the packets will be
significantly reduced between two continuous instruction samples, thus
the last branch array is almost empty for new instruction sample by
frequently resetting.
To allow the last branches to work properly for any instruction periods,
this patch avoids to reset the last branch for every instruction sample
and only reset it when flush the trace data. The last branches will be
reset only for two cases, one is for trace starting, another case is for
discontinuous trace; other cases can keep recording last branches for
continuous instruction samples.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight ml <coresight@lists.linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some metric groups have metric constraints. A metric group can be
scheduled as a group only when some constraints are applied. For
example, Page_Walks_Utilization has a metric constraint,
"NO_NMI_WATCHDOG".
When NMI watchdog is disabled, the metric group can be scheduled as a
group. Otherwise, splitting the metric group into standalone metrics.
Add a new function, metricgroup__has_constraint(), to check whether all
constraints are applied. If not, splitting the metric group into
standalone metrics.
Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a
warning for the metric group with the constraint, when NMI WATCHDOG is
enabled.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lore.kernel.org/lkml/1582581564-184429-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support for new deflate counters:
- Counter 247: cycles CPU spent obtaining access to Deflate unit
- Counter 252: cycles CPU is using Deflate unit
- Counter 264: Increments by one for every DEFLATE CONVERSION CALL
instruction executed.
- Counter 265: Increments by one for every DEFLATE CONVERSION CALL
instruction executed that ended in Condition Codes
0, 1 or 2.
Also adjust the some crypto counter description to latest documentation.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200310142937.32045-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull cpupower utility fix for v5.6 from Shuah Khan:
"This cpupower update for Linux 5.6-rc6 consists of a fix from
Mike Gilbert for build failures when -fno-common is enabled.
-fno-common will be default in gcc v10."
* tag 'linux-cpupower-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower: avoid multiple definition with gcc -fno-common
A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
in latest kernel.
Enable HW_INDEX by default in LBR call stack mode.
If kernel doesn't support the sample type, switching it off.
Add HW_INDEX in attr_fprintf as well. User can check whether the branch
sample type is set via debug information or header.
Committer testing:
First collect some samples with LBR callchains, system wide, for a few
seconds:
# perf record --call-graph lbr -a sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
#
Now lets use 'perf evlist -v' to look at the branch_sample_type:
# perf evlist -v
cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
#
So the machine has the kernel feature, and it was correctly added to
perf_event_attr.branch_sample_type, for the default 'cycles' event.
If we do it in another machine, where the kernel lacks the HW_INDEX
feature, we get:
# perf record --call-graph lbr -a sleep 2s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ]
# perf evlist -v
cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
#
No HW_INDEX in attr.branch_sample_type.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The low level index of raw branch records for the most recent branch can
be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
branch_sample_type. Extend struct branch_stack to support it.
However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
entries[] will be output by kernel. The pointer of entries[] could be
wrong, since the output format is different with new struct
branch_stack. Add a variable no_hw_idx in struct perf_sample to
indicate whether the hw_idx is output. Add get_branch_entry() to return
corresponding pointer of entries[0].
To make dummy branch sample consistent as new branch sample, add hw_idx
in struct dummy_branch_stack for cs-etm and intel-pt.
Apply the new struct branch_stack for synthetic events as well.
Extend test case sample-parsing to support new struct branch_stack.
Committer notes:
Renamed get_branch_entries() to perf_sample__branch_entries() to have
proper namespacing and pave the way for this to be moved to libperf,
eventually.
Add 'static' to that inline as it is in a header.
Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
on arm64.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull Ktest fixes and clean ups from Steven Rostedt:
- Make the default option oldconfig instead of randconfig (one too many
times I lost my config because I left the build type out)
- Add timeout to ssh sync to sync before reboot (prevents test hangs)
- A couple of spelling fix patches
* tag 'ktest-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Fix typos in ktest.pl
ktest: Add timeout for ssh sync testing
ktest: Make default build option oldconfig not randconfig
ktest: Fix some typos in sample.conf
Before rebooting the box, a "ssh sync" is called to the test machine to see
if it is alive or not. But if the test machine is in a partial state, that
ssh may never actually finish, and the ktest test hangs.
Add a 10 second timeout to the sync test, which will fail after 10 seconds
and then cause the test to reboot the test machine.
Cc: stable@vger.kernel.org
Fixes: 6474ace999 ("ktest.pl: Powercycle the box on reboot if no connection can be made")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
For the last time, I screwed up my ktest config file, and the build went
into the default "randconfig", blowing away the .config that I had set up.
The reason for the default randconfig was because when this was first
written, I wanted to do a bunch of randconfigs. But as time progressed,
ktest isn't about randconfig anymore, and because randconfig destroys the
config in the build directory, it's a dangerous default to have. Use
oldconfig as the default.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>