Even when it is not used to actually reorder events, some of its fields
are used, like session->ordered_events->tool, to shorten function
signatures where tool, for instance, was being passed, as the tool is
needed for the ordered_events code, we need it there and might as well
use it for other perf_session needs.
This fixes a problem where 'perf script' had some condition that made
session->ordered_events not to be initialized even with its
script->tool ordered_events related flags asking for it to be, which
looks like another bug and needs to be investigated further.
Always initializing session->ordered_events at least leaves the current
assumptions in place, so do it now.
Reported-by: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-b1xxk0rwkz2a0gip1uufmjqg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.
This is erratum BDM11 and BDM55:
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf
BDM11: When using a period < 100; we may get incorrect PEBS/PMI
interrupts and/or an invalid counter state.
BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
records on overflow.
Add a new callback to enforce this, and set it for Broadwell.
How does this handle the case when an app requests a specific
period with some of the bottom bits set?
Short answer:
Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.
Long answer (by Peterz):
IFF we guarantee perf_event_attr::sample_period >= 128.
Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.
So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.
So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.
So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Updated comments and changelog a bit. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add Broadwell support for Broadwell to perf.
The basic support is very similar to Haswell. We use the new cache
event list added for Haswell earlier. The only differences
are a few bits related to remote nodes. To avoid an extra,
mostly identical, table these are patched up in the initialization code.
The constraint list has one new event that needs to be handled over Haswell.
Includes code and testing from Kan Liang.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Haswell offcore events are quite different from Sandy Bridge.
Add a new table to handle Haswell properly.
Note that the offcore bits listed in the SDM are not quite correct
(this is currently being fixed). An uptodate list of bits is
in the patch.
The basic setup is similar to Sandy Bridge. The prefetch columns
have been removed, as prefetch counting is not very reliable
on Haswell. One L1 event that is not in the event list anymore
has been also removed.
- data reads do not include code reads (comparable to earlier Sandy Bridge tables)
- data counts include speculative execution (except L1 write, dtlb, bpu)
- remote node access includes both remote memory, remote cache, remote mmio.
- prefetches are not included in the counts for consistency
(different from Sandy Bridge, which includes prefetches in the remote node)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
- Fix garbage output when intermixing syscalls from different threads in 'perf trace' (Arnaldo Carvalho de Melo)
- Fix 'perf timechart' SIBGUS error on sparc64 (David Ahern)
Infrastructure changes:
- Set JOBS based on CPU or processor, making it work on SPARC, where
/proc/cpuinfo has "CPU", not "processor" (David Ahern)
- Zero should not be considered "not found" in libtraceevent's eval_flag() (Steven Rostedt)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Guilherme Cox found that:
There is, however, a potential bug if there is an item with code zero
that is not the first one in the symbol list, since eval_flag(..)
returns 0 when it doesn't find anything.
That is, if you have the following enums:
enum {
FOO_START = 0,
FOO_GO = 1,
FOO_END = 2
}
and then have:
__print_symbolic(foo, FOO_GO, "go", FOO_START, "start",
FOO_END, "end")
If none of the enums are known to pevent, then eval_flag() will return
zero, and it will match it to the first item in the list, which would be
FOO_GO, which is not zero.
Luckily, in most cases, the first element would be zero, and the parsing
would match out of sheer luck.
Reported-by: Guilherme Cox <cox@computer.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150324145813.0bfe95ba@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
commit e596663ebb
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Fri Feb 13 13:22:21 2015 -0300
perf trace: Handle multiple threads better wrt syscalls being intermixed
Introduced a bug where it considered the number of bytes output directly
to the output file when formatting the syscall entry buffer that is
stored to be finally printed at syscall exit, ending up leaving garbage
at the start of syscalls that appeared while another syscall was being
processed, in another thread. Fix it.
Example of garbage in the output before this patch:
4280.102 ( 0.000 ms): lsmd/763 ... [continued]: select()) = 0 Timeout
4280.107 (275.250 ms): tuned/852 select(tvp: 0x7f41f7ffde50 ) ...
4280.109 ( 0.002 ms): lsmd/763 Xl�� ) = -10
4639.197 ( 0.000 ms): systemd-journa/542 ... [continued]: epoll_wait()) = 1
4639.202 (359.088 ms): lsmd/763 select(n: 6, inp: 0x7ffff21daad0, tvp: 0x7ffff21daac0) ...
4639.207 ( 0.005 ms): systemd-journa/542 Hn�� ) = 106
4639.221 ( 0.002 ms): systemd-journa/542 uname(name: 0x7ffdbaed8e00) = 0
4639.271 ( 0.008 ms): systemd-journa/542 ftruncate(fd: 11</run/log/journal/60cd52417cf440a4a80107518bbd3c20/system.journal>, length: 50331648) = 0
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-9ckfe8mvsedgkg6y80gz1ul8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use of a bad filter currently generates the message:
Error: failed to set filter with 22 (Invalid argument)
Add the event name to make it clear to which event the filter
failed to apply:
Error: Failed to set filter "foo" on event sched:sg_lb_stats: 22: Invalid argument
To test it use something like:
# perf record -e sched:sched_switch -e sched:*fork --filter parent_pid==1 -e sched:*wait* --filter bla usleep 1
Error: failed to set filter "bla" on event sched:sched_stat_iowait with 22 (Invalid argument)
#
Based-on-a-patch-by: David Ahern <dsahern@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-d7gq2fjvaecozp9o2i0siifu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf timechart -T on sparc64 is terminating due to SIGBUS. Backtrace:
Program received signal SIGBUS, Bus error.
0x0000000000173d7c in perf_evsel__intval (evsel=<value optimized out>, sample=0x7feffffda28, name=0x289b28 "prev_state")
at util/evsel.c:1918
1918 util/evsel.c: No such file or directory.
in util/evsel.c
Missing separate debuginfos, use: debuginfo-install audit-libs-2.3.7-1.0.1.el6.sparc64 bzip2-libs-1.0.5-7.el6_0.sparc64 elfutils-libelf-0.155-2.0.3.el6.sparc64 elfutils-libs-0.155-2.0.3.el6.sparc64 glibc-2.12-1.132.0.8.el6_5.sparc64 numactl-2.0.7-8.el6.sparc64 python-libs-2.6.6-52.0.2.el6.sparc64 slang-2.2.1-1.el6.sparc64 xz-libs-4.999.9-0.3.beta.20091007git.el6.sparc64 zlib-1.2.3-29.el6.sparc64
(gdb) bt
0 0x0000000000173d7c in perf_evsel__intval (evsel=<value optimized out>, sample=0x7feffffda28,
name=0x289b28 "prev_state") at util/evsel.c:1918
1 0x0000000000123b94 in process_sample_sched_switch (tchart=0x7feffffe040, evsel=0x4ca850, sample=0x7feffffda28,
backtrace=0xc39010 "") at builtin-timechart.c:627
2 0x0000000000122828 in process_sample_event (tool=0x7feffffe040, event=<value optimized out>, sample=0x7feffffda28,
evsel=0x4ca850, machine=0x4c9c88) at builtin-timechart.c:569
Another extended load on unaligned pointer. As before fix by copying to
a temporary variable using memcpy.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Link: http://lkml.kernel.org/r/1427228049-51893-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Improve support of compressed kernel modules (Jiri Olsa)
- Add --kallsyms option to 'perf diff' (David Ahern)
- Add pid/tid filtering to 'report' and 'script' commands (David Ahern)
- Add support for __print_array() in libtraceevent (Javi Merino)
- Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
- Fix 'probe' to get ummapped symbol address on kernel (Masami Hiramatsu)
- Print big numbers using thousands' group in 'kmem' (Namhyung Kim)
- Remove (null) value of "Sort order" for perf mem report (Yunlong Song)
Infrastructure changes:
- Handle NULL comm name in libtracevent (Josef Bacik)
- Libtraceevent synchronization with trace-cmd repo (Steven Rostedt)
- Work around lack of sched_getcpu() in glibc < 2.6. (Vinson Lee)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a plugin option is defined, by default it is a boolean (true or false).
If the option is something else, then it needs to set its "value" field to
a default string other than NULL (can be just "").
If the value is not set then the option is considered boolean, and the
updating of the option value will be handled accordingly.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150324135923.308372986@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a pevent_data_comm_from_pid() that returns the cmdline stored for
a given pid in order for users to map pids to comms, but there's no method
to convert a comm back to a pid. This is useful for filters that specify
a comm instead of a PID (it's faster than searching each individual event).
Add a way to retrieve a comm from a pid. Since there can be more than one
pid associated to a comm, it returns a data structure that lets the user
iterate over all the saved comms for a given pid.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150324135923.001103479@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The pevent->trace_clock should not be a direct pointer to what was
given. It should be copied and freed.
Note, valgrind pointed this out when a caller passed in a pointer that
needed to be freed and it never was. Ideally, pevent should copy it
(which this change does), and free the copy. It's up to the caller to
free the clock string passed in.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20150324135922.695906738@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Before, when some problem happened while trying to load the kernel
symtab, 'perf top' would show:
┌─Warning:───────────────────────────┐
│The vmlinux file can't be used. │
│Kernel samples will not be resolved.│
│ │
│ │
│Press any key... │
└────────────────────────────────────┘
Now, it reports:
# perf top --vmlinux /dev/null
┌─Warning:───────────────────────────────────────────┐
│The /tmp/passwd file can't be used: Invalid ELF file│
│Kernel samples will not be resolved. │
│ │
│ │
│Press any key... │
└────────────────────────────────────────────────────┘
This is possible because we now register the reason for not being able
to load the symtab in the dso->load_errno member, and provide a
dso__strerror_load() routine to format this error into a strerror like
string with a short reason for the error while loading.
That can be just forwarding the dso__strerror_load() call to
strerror_r(), or, for a separate errno range providing a custom message.
Reported-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-u5rb5uq63xqhkfb8uv2lxd5u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix to get correctly unmapped symbol address on kernel. This allows us
to probe on syscall symbols which are aliases of SyS_ functions with
using debuginfo.
Without this fix:
----
# ./perf probe -a sys_write
Failed to find debug information for address 3b0100
Probe point 'sys_write' not found.
Error: Failed to add events.
----
The address 0x3b0100 is a mapped address, and not usable
in debuginfo.
With this fix:
----
# ./perf probe -a sys_write
Added new event:
probe:sys_write (on sys_write)
You can now use it in all perf tools, such as:
perf record -e probe:sys_write -aR sleep 1
----
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150322114022.32639.19096.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When '--sort' is not set, 'perf mem report" will print a null pointer as
the output value of sort order, so fix it.
Example:
Before this patch:
$ perf mem report
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 18 of event 'cpu/mem-loads/pp'
# Total weight : 188
# Sort order : (null)
#
...
After this patch:
$ perf mem report
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 18 of event 'cpu/mem-loads/pp'
# Total weight : 188
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
...
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427082605-12881-1-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Vince reported a watchdog lockup like:
[<ffffffff8115e114>] perf_tp_event+0xc4/0x210
[<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160
[<ffffffff810b7f10>] lock_release+0x130/0x260
[<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40
[<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80
[<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0
[<ffffffff811f71ce>] send_sigio+0xae/0x100
[<ffffffff811f72b7>] kill_fasync+0x97/0xf0
[<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0
[<ffffffff8115d103>] perf_pending_event+0x33/0x60
[<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80
[<ffffffff8114e448>] irq_work_run+0x18/0x40
[<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0
[<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80
Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.
This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.
Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull bugfix for md from Neil Brown:
"One fix for md in 4.0-rc4
Regression in recent patch causes crash on error path"
* tag 'md/4.0-rc4-fix' of git://neil.brown.name/md:
md: fix problems with freeing private data after ->run failure.