Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Allow running 'perf test' entries in the same process, not forking to
test each testcase, useful for debugging (Jiri Olsa)
- Show number of samples in the stdio annotate header (Peter Zijlstra)
Documentation changes:
- Add documentation for perf.data on disk format (Andi Kleen)
Build fixes:
- Fix 'perf trace' build on old systems wrt missing SCHED_RESET_ON_FORK and
eventfd.h (Arnaldo Carvalho de Melo)
Infrastructure changes:
- Utility function to fetch arch from evsel/evlist (Ravi Bangoria)
Trivial changes:
- Fix spelling mistake: "missmatch" -> "mismatch" in libbpf (Colin Ian King)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Staring at annotations of large functions is useless if there's only a
few samples in them. Report the number of samples in the header to make
this easier to determine.
Committer note:
The change amounts to:
- Percent | Source code & Disassembly of perf-vdso.so for cycles:u
------------------------------------------------------------------
+ Percent | Source code & Disassembly of perf-vdso.so for cycles:u (3278 samples)
+--------------------------------------------------------------------------------
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20160630082955.GA30921@twins.programming.kicks-ass.net
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Generate comm, fork and exit events when converting perf.data files to CTF (Wang Nan)
Infrastructure changes:
- Add libbabeltrace to build-test (Wang Nan)
- 'perf record' prep work to support multiple evlists (Wang Nan)
- Remove unused hist_entry__annotate function (Ravi Bangoria)
- Add more toolchain triplets (Ravi Bangoria)
- Update message for slang devel packages on Ubuntu (Neeraj Badlani)
- Generalize handling of 'ret' instructions in the annotate TUI (Naveen N. Rao)
- Use proper dso name for is_regular_file, fixing device file handling (Jiri Olsa)
Build Fixes:
- Add missing config.h include, fixing the build with libbabeltrace (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Following commits are going to allow 'perf data convert' to collect not
only samples, but also non-sample events like comm and fork. In this
patch we count non-sample events using c.non_sample_count, and prepare
to print number of both type of events like:
# ~/perf data convert --all --to-ctf ./out.ctf
[ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
[ perf data convert: Converted and wrote 0.846 MB (6508 samples, 686 non-samples) ]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1466767332-114472-5-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Marc reported use of uninitialized memory:
> In commit "403567217d3f perf symbols: Do not read symbols/data from
> device files" a check to uninitialzied memory was added. This leads to
> the following valgrind output:
>
> ==24515== Syscall param stat(file_name) points to uninitialised byte(s)
> ==24515== at 0x75B26D5: _xstat (in /lib/x86_64-linux-gnu/libc-2.22.so)
> ==24515== by 0x4E548D: stat (stat.h:454)
> ==24515== by 0x4E548D: is_regular_file (util.c:687)
> ==24515== by 0x4A5BEE: dso__load (symbol.c:1435)
> ==24515== by 0x4BB1AE: map__load (map.c:289)
> ==24515== by 0x4BB1AE: map__find_symbol (map.c:333)
> ==24515== by 0x4835B3: thread__find_addr_location (event.c:1300)
> ==24515== by 0x4B5342: add_callchain_ip (machine.c:1652)
> ==24515== by 0x4B5342: thread__resolve_callchain_sample (machine.c:1906)
> ==24515== by 0x4B9E7D: thread__resolve_callchain (machine.c:1958)
> ==24515== by 0x441B3E: process_event (builtin-script.c:795)
> ==24515== by 0x441B3E: process_sample_event (builtin-script.c:920)
> ==24515== by 0x4BEE29: perf_evlist__deliver_sample (session.c:1192)
> ==24515== by 0x4BEE29: machines__deliver_event (session.c:1229)
> ==24515== by 0x4BF770: perf_session__deliver_event (session.c:1286)
> ==24515== by 0x4BF770: ordered_events__deliver_event (session.c:114)
> ==24515== by 0x4C1D17: __ordered_events__flush (ordered-events.c:207)
> ==24515== by 0x4C1D17: ordered_events__flush.part.3 (ordered-events.c:274)
> ==24515== by 0x4BF44C: perf_session__process_user_event (session.c:1325)
> ==24515== by 0x4BF44C: perf_session__process_event (session.c:1451)
> ==24515== Address 0x807c6a0 is 0 bytes inside a block of size 4,096 alloc'd
> ==24515== at 0x4C29C0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==24515== by 0x4A5BCB: dso__load (symbol.c:1421)
> ==24515== by 0x4BB1AE: map__load (map.c:289)
> ==24515== by 0x4BB1AE: map__find_symbol (map.c:333)
> ==24515== by 0x4835B3: thread__find_addr_location (event.c:1300)
> ==24515== by 0x4B5342: add_callchain_ip (machine.c:1652)
> ==24515== by 0x4B5342: thread__resolve_callchain_sample (machine.c:1906)
> ==24515== by 0x4B9E7D: thread__resolve_callchain (machine.c:1958)
> ==24515== by 0x441B3E: process_event (builtin-script.c:795)
> ==24515== by 0x441B3E: process_sample_event (builtin-script.c:920)
> ==24515== by 0x4BEE29: perf_evlist__deliver_sample (session.c:1192)
> ==24515== by 0x4BEE29: machines__deliver_event (session.c:1229)
> ==24515== by 0x4BF770: perf_session__deliver_event (session.c:1286)
> ==24515== by 0x4BF770: ordered_events__deliver_event (session.c:114)
> ==24515== by 0x4C1D17: __ordered_events__flush (ordered-events.c:207)
> ==24515== by 0x4C1D17: ordered_events__flush.part.3 (ordered-events.c:274)
> ==24515== by 0x4BF44C: perf_session__process_user_event (session.c:1325)
> ==24515== by 0x4BF44C: perf_session__process_event (session.c:1451)
> ==24515== by 0x4C0EAC: __perf_session__process_events (session.c:1804)
> ==24515== by 0x4C0EAC: perf_session__process_events (session.c:1858)
The reason was a typo that passed global 'name' variable as the
is_regular_file argument instead dso->long_name.
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 403567217d ("perf symbols: Do not read symbols/data from device files")
Link: http://lkml.kernel.org/r/1466772025-17471-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add few more triplets based on Fedora and Ubuntu binutils (cross tools).
Before applying patch on x86:
( Install binutils-powerpc64-linux-gnu.x86_64 )
$ perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc \
--objdump powerpc64-linux-gnu-objdump
After applying patch on x86:
$ perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc
I.e. it will find the right objdump from the environment data recorded
in the perf.data file + these triplets.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/1466769240-12376-7-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add quirk for context switch to save/restore the value of
MSR_LAST_BRANCH_FROM_x when LBR is enabled and there is potential for
kernel addresses to be in the lbr_from register.
To test this patch, use a perf tool and kernel with the patch
next in this series. That patch removes the work around that masked
the hw bug:
$ ./lbr_perf record --call-graph lbr -e cycles:k sleep 1
where lbr_perf is the patched perf tool, that allows to specify :k
on lbr mode. The above command will trigger a #GPF :
WARNING: CPU: 28 PID: 14096 at arch/x86/mm/extable.c:65 ex_handler_wrmsr_unsafe+0x70/0x80
unchecked MSR access error: WRMSR to 0x681 (tried to write 0x1fffffff81010794)
...
Call Trace:
[<ffffffff8167af49>] dump_stack+0x4d/0x63
[<ffffffff810b9b15>] __warn+0xe5/0x100
[<ffffffff810b9be9>] warn_slowpath_fmt+0x49/0x50
[<ffffffff810abb40>] ex_handler_wrmsr_unsafe+0x70/0x80
[<ffffffff810abc42>] fixup_exception+0x42/0x50
[<ffffffff81079d1a>] do_general_protection+0x8a/0x160
[<ffffffff81684ec2>] general_protection+0x22/0x30
[<ffffffff810101b9>] ? intel_pmu_lbr_sched_task+0xc9/0x380
[<ffffffff81009d7c>] intel_pmu_sched_task+0x3c/0x60
[<ffffffff81003a2b>] x86_pmu_sched_task+0x1b/0x20
[<ffffffff81192a5b>] perf_pmu_sched_task+0x6b/0xb0
[<ffffffff8119746d>] __perf_event_task_sched_in+0x7d/0x150
[<ffffffff810dd9dc>] finish_task_switch+0x15c/0x200
[<ffffffff8167f894>] __schedule+0x274/0x6cc
[<ffffffff8167fdd9>] schedule+0x39/0x90
[<ffffffff81675398>] exit_to_usermode_loop+0x39/0x89
[<ffffffff810028ce>] prepare_exit_to_usermode+0x2e/0x30
[<ffffffff81683c1b>] retint_user+0x8/0x10
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-5-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel's SDM states that bits 61:62 in MSR_LAST_BRANCH_FROM_x are the
TSX flags for formats with LBR_TSX flags (i.e. LBR_FORMAT_EIP_EFLAGS2).
However, when the CPU has TSX support deactivated, bits 61:62 actually
behave as follows:
- For wrmsr(), bits 61:62 are considered part of the sign extension.
- When capturing branches, the LBR hw will always clear bits 61:62.
regardless of the sign extension.
Therefore, if:
1) LBR has TSX format.
2) CPU has no TSX support enabled.
... then any value passed to wrmsr() must be sign extended to 63 bits
and any value from rdmsr() must be converted to have a sign extension
of 61 bits, ignoring the values at TSX flags.
This bug was masked by the work-around to the Intel's CPU bug:
BJ94. "LBR May Contain Incorrect Information When Using FREEZE_LBRS_ON_PMI"
in Document Number: 324643-037US.
The aforementioned work-around uses hw flags to filter out all kernel
branches, limiting LBR callstack to user level execution only.
Since user addresses are not sign extended, they do not trigger the wrmsr()
bug in MSR_LAST_BRANCH_FROM_x when saved/restored at context switch.
To verify the hw bug:
$ perf record -b -e cycles sleep 1
$ rdmsr -p 0 0x680
0x1fffffffb0b9b0cc
$ wrmsr -p 0 0x680 0x1fffffffb0b9b0cc
write(): Input/output error
The quirk for LBR_FROM_ MSRs is required before calls to wrmsrl() and
after rdmsrl().
This patch introduces it for wrmsrl()'s done for testing LBR support.
Future patch in series adds the quirk for context switch, that would
be required if LBR callstack is to be enabled for ring 0.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-3-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull SCSI fixes from James Bottomley:
"Two straightforward fixes.
One is a concurrency issue only affecting SAS connected SATA drives,
but which could hang the storage subsystem if it triggers (because the
outstanding command count on error never goes back to zero) and the
other is a NO_TAG fallout from the switch to hostwide tags which
causes the system to crash on module insertion (we've checked
carefully and only the 53c700 family of drivers is vulnerable to this
issue)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
53c700: fix BUG on untagged commands
scsi: fix race between simultaneous decrements of ->host_failed
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Add 'callindent' option to 'perf script -F', to indent the Intel PT
call stack, making this output more ftrace-like (Adrian Hunter, Andi Kleen)
User visible changes:
- Enlarge 'pid' column width, to cope with large pids (Jiri Olsa)
Infrastructure changes:
- Fix cross platform unwind (He Kuang)
- Make destructors accept NULL, behaving like free() (Arnaldo Carvalho de Melo)
- Remove reference to perl interpreted in the recently added 'perf script'
stackcollapse python script (Arnaldo Carvalho de Melo)
- Rename CLASS__for_each() macros to CLASS__for_each_entry(), to use the
list_for_each_entry() semantics, as most of these class specific loop helpers
are list_for_each_entry*() wrappers (Arnaldo Carvalho de Melo)
- Expose the hist_browser code, will be used with data structures other
than perf_evsel (Jiri Olsa)
- Refactor 'perf config' (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.
We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"
* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.
This first part has some fixes and two performance improvements that
we've been testing for some time.
Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing
Pull sound fixes from Takashi Iwai:
"Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy driver
spotted by syzkaller"
* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: dummy: Fix a use-after-free at closing
ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
ALSA: hda - Fix the headset mic jack detection on Dell machine
ALSA: hda/tegra: iomem fixups for sparse warnings
ALSA: hdac_regmap - fix the register access for runtime PM
Pull x86 kprobe fix from Thomas Gleixner:
"A single fix clearing the TF bit when a fault is single stepped"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Clear TF bit in fault on single-stepping
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:
- force watchdog reset while processing sysrq-w
- fix a deadlock when enabling trace events in the scheduler
- fixes to the throttled next buddy logic
- fixes for the average accounting (missing serialization and
underflow handling)
- allow kernel threads for fallback to online but not active cpus"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Allow kthreads to fall back to online && !active cpus
sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
sched/fair: Initialize throttle_count for new task-groups lazily
sched/fair: Fix cfs_rq avg tracking underflow
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
sched/debug: Fix deadlock when enabling sched events
sched/fair: Fix post_init_entity_util_avg() serialization
Commit fe742fd4f9 ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.
This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.
Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
done
along with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i<4; i++)); do
find . >/dev/null &
done
wait
done
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Pull locking fix from Thomas Gleixner:
"A single fix to address a race in the static key logic"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/static_key: Fix concurrent static_key_slow_inc()
Pull irq fix from Thomas Gleixner:
"A single fix for the fallout from the conversion of MIPS GIC to irq
domains"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Fix IRQs in gic_dev_domain
Pull powerpc fixes from Michael Ellerman:
"mm/radix (Aneesh Kumar K.V):
- Update to tlb functions ric argument
- Flush page walk cache when freeing page table
- Update Radix tree size as per ISA 3.0
mm/hash (Aneesh Kumar K.V):
- Use the correct PPP mask when updating HPTE
- Don't add memory coherence if cache inhibited is set
eeh (Gavin Shan):
- Fix invalid cached PE primary bus
bpf/jit (Naveen N. Rao):
- Disable classic BPF JIT on ppc64le
.. and fix faults caused by radix patching of SLB miss handler
(Michael Ellerman)"
* tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
powerpc: Fix faults caused by radix patching of SLB miss handler
powerpc/eeh: Fix invalid cached PE primary bus
powerpc/mm/radix: Update Radix tree size as per ISA 3.0
powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
powerpc/mm/hash: Use the correct PPP mask when updating HPTE
powerpc/mm/radix: Flush page walk cache when freeing page table
powerpc/mm/radix: Update to tlb functions ric argument
Commit b235beea9e ("Clarify naming of thread info/stack allocators")
breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:
kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
kernel/fork.c:355:8: error: assignment from incompatible pointer type
stack = alloc_thread_stack_node(tsk, node);
^
Fix it by renaming free_stack() to free_thread_stack(), and updating the
return type of alloc_thread_stack_node().
Fixes: b235beea9e ("Clarify naming of thread info/stack allocators")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc fixes from Andrew Morton:
"Two weeks worth of fixes here"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
autofs: don't get stuck in a loop if vfs_write() returns an error
mm/page_owner: avoid null pointer dereference
tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
fs/nilfs2: fix potential underflow in call to crc32_le
oom, suspend: fix oom_reaper vs. oom_killer_disable race
ocfs2: disable BUG assertions in reading blocks
mm, compaction: abort free scanner if split fails
mm: prevent KASAN false positives in kmemleak
mm/hugetlb: clear compound_mapcount when freeing gigantic pages
mm/swap.c: flush lru pvecs on compound page arrival
memcg: css_alloc should return an ERR_PTR value on error
memcg: mem_cgroup_migrate() may be called with irq disabled
hugetlb: fix nr_pmds accounting with shared page tables
Revert "mm: disable fault around on emulated access bit architecture"
Revert "mm: make faultaround produce old ptes"
mailmap: add Boris Brezillon's email
mailmap: add Antoine Tenart's email
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
mm: mempool: kasan: don't poot mempool objects in quarantine
...