The perf tools have a symbol resolver that includes solving kernel
symbols using either kallsyms or ELF symtabs, and it also is using
libtraceevent to format the trace events fields, including via
subsystem specific plugins, like the "timer" one.
To solve fields like "timer:hrtimer_start"'s "function", libtraceevent
needs a way to map from its value to a function name and addr.
This patch provides a way for tools that already have symbol resolving
facilities to ask libtraceevent to use it when needing to resolve
kernel symbols.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-fdx1fazols17w5py26ia3bwh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Allow filtering out of perf's PID via 'perf record --exclude-perf'. (Wang Nan)
- 'perf trace' now supports syscall groups, like strace, i.e:
$ trace -e file touch file
Will expand 'file' into multiple, file related, syscalls. More work needed to
add extra groups for other syscall groups, and also to complement what was
added for the 'file' group, included as a proof of concept. (Arnaldo Carvalho de Melo)
- Add lock_pi stresser to 'perf bench futex', to test the kernel code
related to FUTEX_(UN)LOCK_PI. (Davidlohr Bueso)
User visible fixes:
- Apply --filter to all events in a glob matching, not just the last one. (Wang Nan)
Documentation changes:
- Document setting '-e pmu/period=N/' in the 'perf record' man page. (Kan Liang)
Infrastructure changes:
- 'perf probe' code simplifications and movements to separate files. (Masami Hiramatsu)
- Fix makefile generation under 'dash'. (Sergei Trofimovich)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Allows a way of measuring low level kernel implementation of FUTEX_LOCK_PI and
FUTEX_UNLOCK_PI.
The program comes in two flavors:
(i) single futex (default), all threads contend on the same uaddr. For the
sake of the benchmark, we call into kernel space even when the lock is
uncontended. The kernel will set it to TID, any waters that come in and
contend for the pi futex will be handled respectively by the kernel.
(ii) -M option for multiple futexes, each thread deals with its own futex. This
is a trivial scenario and only measures kernel handling of 0->TID transition.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1436259353.12255.78.camel@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch allows 'perf record' to exclude events issued by perf itself
by '--exclude-perf' option.
Before this patch, when doing something like:
# perf record -a -e syscalls:sys_enter_write <cmd>
One could easily get result like this:
# /tmp/perf report --stdio
...
# Overhead Command Shared Object Symbol
# ........ ....... .................. ....................
#
99.99% perf libpthread-2.18.so [.] __write_nocancel
0.01% ls libc-2.18.so [.] write
0.01% sshd libc-2.18.so [.] write
...
Where most events are generated by perf itself.
A shell trick can be done to filter perf itself out:
# cat << EOF > ./tmp
> #!/bin/sh
> exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
> EOF
# chmod a+x ./tmp
# ./tmp
However, doing so is user unfriendly.
This patch extracts evsel iteration framework introduced by patch 'perf
record: Apply filter to all events in a glob matching' into
foreach_evsel_in_last_glob(), and makes exclude_perf() function append
new filter expression to each evsel selected by a '-e' selector.
To avoid losing filters if user pass '--filter' after '--exclude-perf',
this patch uses perf_evsel__append_filter() in both case, instead of
perf_evsel__set_filter() which removes old filter. As a side effect, now
it is possible to use multiple '--filter' option for one selector. They
are combinded with '&&'.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1436513770-8896-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull s390 fix from Martin Schwidefsky:
"Fast path fix for the thread_struct breakage"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: adapt entry.S to the move of thread_struct
Pull AVR32 update from Hans-Christian Egtvedt.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
AVR32/time: Migrate to new 'set-state' interface
There is an old problem in perf's filter applying which first posted at
Sep. 2014 at https://lkml.org/lkml/2014/9/9/944 that, if passing
multiple events in a glob matching expression in cmdline then add
'--filter' after them, the filter will be applied on only the last one.
For example:
# dd if=/dev/zero of=/dev/null &
[1] 464
# perf record -a -e 'syscalls:sys_*_read' --filter 'common_pid != 464' sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.239 MB perf.data (2094 samples) ]
# perf report --stdio | tee
...
# Samples: 2K of event 'syscalls:sys_enter_read'
# Event count (approx.): 2092
...
# Samples: 2 of event 'syscalls:sys_exit_read'
# Event count (approx.): 2
...
In this example, filter only applied on 'syscalls:sys_exit_read', and
there's no way to set filter for ''syscalls:sys_enter_read'.
This patch adds a 'cmdline_group_boundary' for 'struct evsel', and
apply filter on all events between two boundary marks.
After applying this patch:
# perf record -a -e 'syscalls:sys_*_read' --filter 'common_pid != 464' sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.031 MB perf.data (3 samples) ]
# perf report --stdio | tee
...
# Samples: 1 of event 'syscalls:sys_enter_read'
# Event count (approx.): 1
...
# Samples: 2 of event 'syscalls:sys_exit_read'
# Event count (approx.): 2
...
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1436513770-8896-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So, if we have an strlist equal to:
"file,close"
And we call it as:
struct strlist_config *config = { .dirname = "~/strace/groups", };
struct strlist *slist = strlist__new("file, close", &config);
And we have:
$ cat ~/strace/groups/file
access
open
openat
statfs
Then the resulting strlist will have these contents:
[ "access", "open", "openat", "statfs", "close" ]
This will be used to implement strace syscall groups in 'perf trace',
but can be used in some other tool, thus being implemented in 'strlist'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-wi6l6qtomqlywwr6005jvs05@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
git commit 0c8c0f03e3
"x86/fpu, sched: Dynamically allocate 'struct fpu'"
moved the thread_struct to the end of the task_struct.
This causes some of the offsets used in entry.S to overflow their
instruction operand field. To fix this use aghi to create a
dedicated pointer for the thread_struct.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Migrate avr32 driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.
This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.
We want to call cpu_idle_poll_ctrl() in shutdown only if we were in
oneshot or resume state earlier. Create another variable to save this
information and check that in shutdown callback.
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Pull SCSI fixes from James Bottomley:
"Two fairly simple fixes: one is a change that causes us to have a very
low queue depth leading to performance issues and the other is a null
deref occasionally in tapes thanks to use after put"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fix host max depth checking for the 'queue_depth' sysfs interface
st: null pointer dereference panic caused by use after kref_put by st_open
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.2.
Things are looking quite decent at this stage but the recent work on
the FPU support took its toll:
- fix an incorrect overly restrictive ifdef
- select O32 64-bit FP support for O32 binary compatibility
- remove workarounds for Sibyte SB1250 Pass1 parts. There are rare
fixing the workarounds is not worth the effort.
- patch up an outdated and now incorrect comment"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU
MIPS: SB1: Remove support for Pass 1 parts.
MIPS: Require O32 FP64 support for MIPS64 with O32 compat
MIPS: asm-offset.c: Patch up various comments refering to the old filename.
Pull parisc fix from Helge Deller:
"A memory leak fix from Christophe Jaillet which was introduced with
kernel 4.0 and which leads to kernel crashes on parisc after 1-3 days"
* 'parisc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: mm: Fix a memory leak related to pmd not attached to the pgd
Pull ARM SoC fixes from Olof Johansson:
"By far most of the fixes here are updates to DTS files to deal with
some mostly minor bugs.
There's also a fix to deal with non-PM kernel configs on i.MX, a
regression fix for ethernet on PXA platforms and a dependency fix for
OMAP"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: keystone: dts: rename pcie nodes to help override status
ARM: keystone: dts: fix dt bindings for PCIe
ARM: pxa: fix dm9000 platform data regression
ARM: dts: Correct audio input route & set mic bias for am335x-pepper
ARM: OMAP2+: Add HAVE_ARM_SCU for AM43XX
MAINTAINERS: digicolor: add dts files
ARM: ux500: fix MMC/SD card regression
ARM: ux500: define serial port aliases
ARM: dts: OMAP5: Add #iommu-cells property to IOMMUs
ARM: dts: OMAP4: Add #iommu-cells property to IOMMUs
ARM: dts: Fix frequency scaling on Gumstix Pepper
ARM: dts: configure regulators for Gumstix Pepper
ARM: dts: omap3: overo: Update LCD panel names
ARM: dts: cros-ec-keyboard: Add support for some Japanese keys
ARM: imx6: gpc: always enable PU domain if CONFIG_PM is not set
ARM: dts: imx53-qsb: fix TVE entry
ARM: dts: mx23: fix iio-hwmon support
ARM: dts: imx27: Adjust the GPT compatible string
ARM: socfpga: dts: Fix entries order
ARM: socfpga: dts: Fix adxl34x formating and compatible string
Commit 0e0da48dee ("parisc: mm: don't count preallocated pmds")
introduced a memory leak.
After this commit, the 'return' statement in pmd_free is executed in all
cases. Even for pmd that are not attached to the pgd. So 'free_pages'
can never be called anymore, leading to a memory leak.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Merge "pxa fixes for v4.2" from Robert Jarzmik:
ARM: pxa: fixes for v4.2-rc2
This single fix reenables ethernet cards for several pxa boards,
broken by regulator addition to dm9000 driver.
* tag 'pxa-fixes-v4.2-rc2' of https://github.com/rjarzmik/linux:
ARM: pxa: fix dm9000 platform data regression
Pull ARM fixes from Russell King:
"A small set of ARM fixes for -rc3, most of them not far off
one-liners, with the exception of fixing the V7 cache invalidation for
incoming SMP processors which was causing problems for SoCFPGA
devices"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: fix __virt_to_idmap build error on !MMU
ARM: invalidate L1 before enabling coherency
ARM: 8404/1: dma-mapping: fix off-by-one error in bitmap size check
ARM: 8402/1: perf: Don't use of_node after putting it
ARM: 8400/1: use virt_to_idmap to get phys_reset address
Pull x86 fixes from Ingo Molnar:
"Two families of fixes:
- Fix an FPU context related boot crash on newer x86 hardware with
larger context sizes than what most people test. To fix this
without ugly kludges or extensive reverts we had to touch core task
allocator, to allow x86 to determine the task size dynamically, at
boot time.
I've tested it on a number of x86 platforms, and I cross-built it
to a handful of architectures:
(warns) (warns)
testing x86-64: -git: pass ( 0), -tip: pass ( 0)
testing x86-32: -git: pass ( 0), -tip: pass ( 0)
testing arm: -git: pass ( 1359), -tip: pass ( 1359)
testing cris: -git: pass ( 1031), -tip: pass ( 1031)
testing m32r: -git: pass ( 1135), -tip: pass ( 1135)
testing m68k: -git: pass ( 1471), -tip: pass ( 1471)
testing mips: -git: pass ( 1162), -tip: pass ( 1162)
testing mn10300: -git: pass ( 1058), -tip: pass ( 1058)
testing parisc: -git: pass ( 1846), -tip: pass ( 1846)
testing sparc: -git: pass ( 1185), -tip: pass ( 1185)
... so I hope the cross-arch impact 'none', as intended.
(by Dave Hansen)
- Fix various NMI handling related bugs unearthed by the big asm code
rewrite and generally make the NMI code more robust and more
maintainable while at it. These changes are a bit late in the
cycle, I hope they are still acceptable.
(by Andy Lutomirski)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
x86/fpu, sched: Dynamically allocate 'struct fpu'
x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
x86/nmi/64: Make the "NMI executing" variable more consistent
x86/nmi/64: Minor asm simplification
x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
x86/nmi/64: Reorder nested NMI checks
x86/nmi/64: Improve nested NMI comments
x86/nmi/64: Switch stacks on userspace NMI entry
x86/nmi/64: Remove asm code that saves CR2
x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
Pull timer fix from Ingo Molnar:
"Fix for a misplaced export that can cause build failures in certain
(rare) Kconfig situations"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Move the export of tick_broadcast_oneshot_control to the proper place
Pull scheduler fix from Ingo Molnar:
"A oneliner rq throttling fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Test list head instead of list entry in throttle_cfs_rq()
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus a static key fix fixing /sys/devices/cpu/rdpmc"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Really allow to specify custom CC, AR or LD
perf auxtrace: Fix misplaced check for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
perf hists browser: Take the --comm, --dsos, etc filters into account
perf symbols: Store if there is a filter in place
x86, perf: Fix static_key bug in load_mm_cr4()
tools: Copy lib/hweight.c from the kernel sources
perf tools: Fix the detached tarball wrt rbtree copy
perf thread_map: Fix the sizeof() calculation for map entries
tools lib: Improve clean target
perf stat: Fix shadow declaration of close
perf tools: Fix lockup using 32-bit compat vdso
Pull irq fixes from Ingo Molnar:
"Misc irq fixes:
- two driver fixes
- a Xen regression fix
- a nested irq thread crash fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gicv3-its: Fix mapping of LPIs to collections
genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
gpio/davinci: Fix race in installing chained irq handler
Merge fixes from Andrew Morton:
"25 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
lib/decompress: set the compressor name to NULL on error
mm/cma_debug: correct size input to bitmap function
mm/cma_debug: fix debugging alloc/free interface
mm/page_owner: set correct gfp_mask on page_owner
mm/page_owner: fix possible access violation
fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
/proc/$PID/cmdline: fixup empty ARGV case
dma-debug: skip debug_dma_assert_idle() when disabled
hexdump: fix for non-aligned buffers
checkpatch: fix long line messages about patch context
mm: clean up per architecture MM hook header files
MAINTAINERS: uclinux-h8-devel is moderated for non-subscribers
mailmap: update Sudeep Holla's email id
Update Viresh Kumar's email address
mm, meminit: suppress unused memory variable warning
configfs: fix kernel infoleak through user-controlled format string
include, lib: add __printf attributes to several function prototypes
s390/hugetlb: add hugepages_supported define
mm: hugetlb: allow hugepages_supported to be architecture specific
revert "s390/mm: make hugepages_supported a boot time decision"
...
Pull btrfs fixes from Chris Mason:
"These are all from Filipe, and cover a few problems we've had reported
on the list recently (along with ones he found on his own)"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix file corruption after cloning inline extents
Btrfs: fix order by which delayed references are run
Btrfs: fix list transaction->pending_ordered corruption
Btrfs: fix memory leak in the extent_same ioctl
Btrfs: fix shrinking truncate when the no_holes feature is enabled
Pull rtc fixes from Alexandre Belloni:
"A few fixes for the RTC susbsystem for 4.2.
The mt6397 driver was introduce in 4.2 so it is worth fixing before
the final release. I though the compilation warning for armada38x was
fixed by akpm in commit f98b733e93 ("rtc-armada38x.c: remove unused
local `flags'") but he actually missed some occurrences of the
variables. Since I received 4 patches for that, I think we can
include it now.
Summary:
- fix mt6397 wakealarm creation
- remove a compilation warning for armada38x that was forgotten"
* tag 'rtc-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: armada38x: Remove unused variable from armada38x_rtc_set_time()
rtc: mt6397: enable wakeup before registering rtc device
Pull device mapper fixes from Mike Snitzer:
- revert a request-based DM core change that caused IO latency to
increase and adversely impact both throughput and system load
- fix for a use after free bug in DM core's device cleanup
- a couple DM btree removal fixes (used by dm-thinp)
- a DM thinp fix for order-5 allocation failure
- a DM thinp fix to not degrade to read-only metadata mode when in
out-of-data-space mode for longer than the 'no_space_timeout'
- fix a long-standing oversight in both dm-thinp and dm-cache by now
exporting 'needs_check' in status if it was set in metadata
- fix an embarrassing dm-cache busy-loop that caused worker threads to
eat cpu even if no IO was actively being issued to the cache device
* tag 'dm-4.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: avoid calls to prealloc_free_structs() if possible
dm cache: avoid preallocation if no work in writeback_some_dirty_blocks()
dm cache: do not wake_worker() in free_migration()
dm cache: display 'needs_check' in status if it is set
dm thin: display 'needs_check' in status if it is set
dm thin: stay in out-of-data-space mode once no_space_timeout expires
dm: fix use after free crash due to incorrect cleanup sequence
Revert "dm: only run the queue on completion if congested or no requests pending"
dm btree: silence lockdep lock inversion in dm_btree_del()
dm thin: allocate the cell_sort_array dynamically
dm btree remove: fix bug in redistribute3
Without this we end up using the previous name of the compressor in the
loop in unpack_rootfs. For example we get errors like "compression
method gzip not configured" even when we have CONFIG_DECOMPRESS_GZIP
enabled.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CMA has alloc/free interface for debugging. It is intended that
alloc/free occurs in specific CMA region, but, currently, alloc/free
interface is on root dir due to the bug so we can't select CMA region
where alloc/free happens.
This patch fixes this problem by making alloc/free interface per CMA
region.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stefan Strogin <stefan.strogin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we set wrong gfp_mask to page_owner info in case of isolated
freepage by compaction and split page. It causes incorrect mixed
pageblock report that we can get from '/proc/pagetypeinfo'. This metric
is really useful to measure fragmentation effect so should be accurate.
This patch fixes it by setting correct information.
Without this patch, after kernel build workload is finished, number of
mixed pageblock is 112 among roughly 210 movable pageblocks.
But, with this fix, output shows that mixed pageblock is just 57.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>