When trying to capture perf data on a system running spejbb2013, perf
hung for about 15 minutes. This is because it took that long to gather
about 10,000 thread maps and process them.
I don't think a user wants to wait that long.
Instead, recognize that thread maps are roughly equivalent to pid maps
and just quickly copy those instead.
To do this, I synthesize 'fork' events, this eventually calls
thread__fork() and copies the maps over.
The overhead goes from 15 minutes down to about a few seconds.
--
V2: based on Jiri's comments, moved malloc up a level
and made sure the memory was freed
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Link: http://lkml.kernel.org/r/1394808224-113774-1-git-send-email-dzickus@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
devm_ioremap_resource() returns a pointer to the remapped memory or
an ERR_PTR() encoded error code on failure. Fix the check inside
ahci_platform_get_resources() accordingly.
Also while at it remove a needless line break.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
devm_ioremap_resource() returns a pointer to the remapped memory or
an ERR_PTR() encoded error code on failure. Fix the check inside
pata_imx_probe() accordingly.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
struct ahci_platform_data is deprecated (please see comments in
<linux/ahci_platform.h> for details). Convert ahci_st driver to
use custom ->host_stop method instead.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
* The config option for ahci_st driver was renamed from
CONFIG_SATA_AHCI_ST to CONFIG_AHCI_ST but Makefile was
not updated. Fix it (also while at it move the ahci_st
driver entry below ahci_imx and ahci_sunxi ones).
* Fix a few build issues in the ahci_st driver itself.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Rather than have separate hugetlb and transparent huge page pmd
manipulation functions, re-wire our thp functions to simply call the
pte equivalents.
This allows THP to take advantage of the new PTE_WRITE logic introduced
in:
c2c93e5 arm64: mm: Introduce PTE_WRITE
To represent splitting THPs we use the PTE_SPECIAL bit as this is not
used for pmds.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
asm-generic offers an atomic-add based rwsem implementation, which
can avoid the need for heavier, spinlock-based synchronisation on the
fast path.
This patch makes use of the optimised implementation for arm64 CPUs.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
asm-generic/rwsem.h used to live under arch/powerpc. During its
liberation to common code, a few references to its former home where
preserved, in particular the definition of RWSEM_ACTIVE_MASK is
predicated on CONFIG_PPC64.
This patch updates the ifdefs and comments to architecturally neutral
versions.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This enables support for the generic CPU feature modalias implementation that
wires up optional CPU features to udev based module autoprobing.
A file <asm/cpufeature.h> is provided that maps CPU feature numbers to
elf_hwcap bits, which is the standard way on arm64 to advertise optional CPU
features both internally and to user space.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[catalin.marinas@arm.com: removed unnecessary "!!"]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The for_each_bench() macro must check that the "benchmarks" field of a
collection is not NULL before dereferencing it because the "all"
collection in particular has a NULL "benchmarks" field (signifying that
it has no benchmarks to iterate over).
This fixes this NULL pointer dereference when running "perf bench all":
[root@ssdandy ~]# perf bench all
<SNIP>
# Running mem/memset benchmark...
# Copying 1MB Bytes ...
2.453675 GB/Sec
12.056327 GB/Sec (with prefault)
Segmentation fault (core dumped)
[root@ssdandy ~]#
Signed-off-by: Patrick Palka <patrick@parcs.ath.cx>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1394664051-6037-1-git-send-email-patrick@parcs.ath.cx
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that we can directly get the ACPI device conterpart of the physical
ATA transport device, the odd_can_poweroff can be eliminated.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If ACPI handle for an ATA device is NULL, we shouldn't call
ata_dev_get_GTF as that function will use handle to do some ACPI
evaluation.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
ZPODD is built on top of runtime PM functionality, it doesn't make sense
to have it in a kernel that doesn't have CONFIG_PM_RUNTIME set.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
It has been reported that there is a new hardware version of the G27
in the 'wild'. This patch add's this new revision so that it can be
sent the command to switch to native mode.
Reported-by: "Ivan Baldo" <ibaldo@adinet.com.uy>
Tested-by: "evilcow" <evilcow93@yahoo.com>
Signed-off-by: Simon Wood <simon@mungewell.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The SC1200 is a SoC based on the Geode GX1 32-bit x86 processor, so
its drivers are only needed on this architecture, except for build
testing purpose.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently if a process creates a bunch of threads using pthread_create
and then perf is run in system_wide mode, the mmaps for those threads
are not captured with a synthesized mmap event.
The reason is those threads are not visible when walking the /proc/
directory looking for /proc/<pid>/maps files. Instead they are
discovered using the /proc/<pid>/tasks file (which the synthesized comm
event uses).
This causes problems when a program is trying to map a data address to a
tid. Because the tid has no maps, the event is dropped. Changing the
program to look up using the pid instead of the tid, finds the correct
maps but creates ugly hacks in the program to carry the correct tid
around.
Fix this by moving the walking of the /proc/<pid>/tasks up a level (out
of the comm function) based on Arnaldo's suggestion.
Tweaked things a bit to special case the 'full' bit and 'guest' check.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1393429527-167840-2-git-send-email-dzickus@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Block a bunch of threads on a futex and requeue them on another, N at a
time.
This program is particularly useful to measure the latency of nthread
requeues without waking up any tasks -- thus mimicking a regular
futex_wait.
An example run:
$ perf bench futex requeue -r 100 -t 64
Run summary [PID 151011]: Requeuing 64 threads (from 0x7d15c4 to 0x7d15c8), 1 at a time.
[Run 1]: Requeued 64 of 64 threads in 0.0400 ms
[Run 2]: Requeued 64 of 64 threads in 0.0390 ms
[Run 3]: Requeued 64 of 64 threads in 0.0400 ms
...
[Run 100]: Requeued 64 of 64 threads in 0.0390 ms
Requeued 64 of 64 threads in 0.0399 ms (+-0.37%)
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1387081917-9102-4-git-send-email-davidlohr@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Block a bunch of threads on a futex and wake them up, N at a time.
This program is particularly useful to measure the latency of nthread
wakeups in non-error situations: all waiters are queued and all wake
calls wakeup one or more tasks.
An example run:
$ perf bench futex wake -t 512 -r 100
Run summary [PID 27823]: blocking on 512 threads (at futex 0x7e10d4), waking up 1 at a time.
[Run 1]: Wokeup 512 of 512 threads in 6.0080 ms
[Run 2]: Wokeup 512 of 512 threads in 5.2280 ms
[Run 3]: Wokeup 512 of 512 threads in 4.8300 ms
...
[Run 100]: Wokeup 512 of 512 threads in 5.0100 ms
Wokeup 512 of 512 threads in 5.0109 ms (+-2.25%)
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1387081917-9102-3-git-send-email-davidlohr@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce futexes to perf-bench and add a program that stresses and
measures the kernel's implementation of the hash table.
This is a multi-threaded program that simply measures the amount of
failed futex wait calls - we only want to deal with the hashing
overhead, so a negative return of futex_wait_setup() is enough to do the
trick.
An example run:
$ perf bench futex hash -t 32
Run summary [PID 10989]: 32 threads, each operating on 1024 [private] futexes for 10 secs.
[thread 0] futexes: 0x19d9b10 ... 0x19dab0c [ 418713 ops/sec ]
[thread 1] futexes: 0x19daca0 ... 0x19dbc9c [ 469913 ops/sec ]
[thread 2] futexes: 0x19dbe30 ... 0x19dce2c [ 479744 ops/sec ]
...
[thread 31] futexes: 0x19fbb80 ... 0x19fcb7c [ 464179 ops/sec ]
Averaged 454310 operations/sec (+- 0.84%), total secs = 10
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1387081917-9102-2-git-send-email-davidlohr@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is where we disable IRQs on suspend and update the internal
power state during suspend/resume.
Suggested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
This device is designed specifically to run on ST-Microelectronics'
hardware. To ensure no attempts are made to run on anything incompatible
we add a dependency on ST architecture
Suggested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
ata_platform_remove_one() allows us to specify our own exit function
via platform data then goes off and removes ATA Host and Port in
preparation for device removal.
Suggested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Other devices have adopted similar naming conventions which have been
accepted as the standard. This patch brings any mention of the the ST
AHCI driver into line with them.
Suggested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
If we call just:
perf bench numa mem
it will present the same output as:
perf bench numa mem -h
i.e. ask for instructions about what to run.
While that is kinda ok, using 'run all tests' as the default, i.e.
making 'no parms' be equivalent to:
perf bench numa mem -a
Will allow:
perf bench numa all
to actually do what is asked: i.e. run all the 'bench' tests, instead of
responding to that by asking what to do.
That, in turn, allows:
perf bench all
to actually complete, for the same reasons.
And after that, the tests that come after that, and that at some point
hit a NULL deref, will run, allowing me to reproduce a recently reported
problem.
That when you have the needed numa libraries, which wasn't the case for
the reporter, making me a bit confused after trying to reproduce his
report.
So make no parms mean -a.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Patrick Palka <patrick@parcs.ath.cx>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-x7h0ghx4pef4n0brywg21krk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Allwinner A10 compatibles were following a slightly different compatible
patterns than the rest of the SoCs for historical reasons. Change the compatibles
to match the other pattern in the irq controller driver for consistency.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The PTRACE_SINGLEBLOCK option is used to get control whenever
the inferior has executed a successful branch. The PER option to
implement block stepping is successful-branching event, bit 32
in the PER-event mask.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For systems with multiple servers and routed fabric, all
northbridges get assigned to the first server. Fix this by also
using the node reported from the PCI bus. For single-fabric
systems, the northbriges are on PCI bus 0 by definition, which
are on NUMA node 0 by definition, so this is invarient on most
systems.
Tested on fam10h and fam15h single and multi-fabric systems and
candidate for stable.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1394710981-3596-1-git-send-email-daniel@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>