One should first enqueue to the waitqueue and then check for the
condition. If the condition gets true after mutex_unlock() but before
poll_wait() then we lose it and would have wait for another wakeup.
This has been like this since v2.6.31-rc1 commit c7138f37f9 ("perf_counter:
fix perf_poll()"). Before that it was slightly worse. I guess we get enough
wakeups so if we miss here one it doesn't really matter. It is still a
bad example.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1407159068-1478-1-git-send-email-bigeasy@linutronix.de
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
HSW-EP has a larger offcore mask than the client Haswell CPUs.
It is the same mask as on Sandy/IvyBridge-EP. All of
Haswell was using the client mask, so some bits were missing.
On the client parts some bits were also missing compared
to Sandy/IvyBridge, in particular the bits to match on a L4
cache hit.
The Haswell core in both client and server incarnations
accepts the same bits (but some are nops), so we can use
the same mask.
So use the snbep extended mask, which is a superset of the
client and the server, for all of Haswell.
This allows specifying a number of extra offcore events, like
for example for HSW-EP.
% perf stat -e cpu/event=0xb7,umask=0x1,offcore_rsp=0x3fffc00100,name=offcore_response_pf_l3_rfo_l3_miss_any_response/ true
which were <not supported> before.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: eranian@google.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1406840722-25416-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In cases when the owner task exits before the workload and the
workload made some forks, all the events stay in until the last
workload process exits. Thats' because each child event holds
parent reference.
We want to release all children events once the parent is gone,
because at that time there's no process to read them anyway, so
they're just eating resources.
This removal races with process exit, which removes all events
and fork, which clone events. To be clear of those two, adding
work queue to remove orphaned child for context in case such
event is detected.
Using delayed work queue (with delay == 1), because we queue this
work under perf scheduler callbacks. Normal work queue tries to wake
up the queue process, which deadlocks on rq->lock in this place.
Also preventing clones from abandoned parent event.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406896382-18404-4-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible fixes and changes:
* Show better error message in case we fail to open counters due to EBUSY error,
for instance, when oprofile is running. (Jiri Olsa)
* Honour -w in the report tools (report, top), allowing to specify the widths
for the histogram entries columns. (Namhyung Kim)
* Don't run workload if not told to, as happens when the user has no
permission for profiling and even then the specified workload ends
up running (Arnaldo Carvalho de Melo)
* Do not ignore mmap events in 'perf kmem report'. This tool was using
the kernel mmaps in the running machine instead of processing the mmap
records from the perf.data file. (Namhyung Kim)
* Properly show submicrosecond times in 'perf kvm stat' (Christian Borntraeger)
* Honour existing 'perf record' --time/-T command line option (Andi Kleen)
* Make sure --symfs usage includes the path separator (Arnaldo Carvalho de Melo)
Development infrastructure fixes and changes:
* Fix arm64 build error (Mark Salter)
* Fix make PYTHON override (Namhyung Kim)
* Rename ordered_samples to ordered_events and allow setting a queue
size for ordering events (Jiri Olsa)
* Default to python version 2 (Thomas Ilsche)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
According to PEP 394 recommendation [1], it's more portable to use
python2 rather than plain python to refer python binary version 2.
Since there're distros using python3 by default like Arch, and we don't
support python3 (yet), it'd be better using python2 explicitly.
But older versions (prior to 2.7) seem not to provide python2 but just
python. Given that it's only old version, try python2 first and then
fallback to python. It'll ensure that it always points to python 2.x.
I tested (compiles and perf script runs) with the combinations:
1) python -> python2.x, python-config -> python2.x-config
python2 N/A, python2-config N/A
2) python -> python3.x, python-config -> python3.x-config
python2 -> python2.x, python2-config -> python2.x-config
3) python -> python2.x, python-config -> python2.x-config
python2 -> python2.x, python2-config -> python2.x-config
4) python -> python2.x, python-config -> python2.x-config
python2 -> python2.x, python2-config N/A
Based on / replaces the patch 2/2 by Namhyung Kim.
[1] https://www.python.org/dev/peps/pep-0394
Based-on-patch-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Thomas Ilsche <thomas.ilsche@tu-dresden.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/53DF8493.6070206@tu-dresden.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Showing better error message in case we fail to open counters due to the
EBUSY error. If we detect oprofile daemon process running, we now
display following message for EBUSY error:
$ perf record ls
Error:
The PMU counters are busy/taken by another profiler.
We found oprofile daemon running, please stop it and try again.
In case oprofiled was not detected the current error message stays:
$ perf record ls
Error:
The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (cycles).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
Also changing PERF_FLAG_FD_CLOEXEC detection code not to display error
in case of EBUSY error, as it currently does:
$ perf record ls
Error:
perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected error 16 (Device or resource busy)
perf_event_open(..., 0) failed unexpectedly with error 16 (Device or resource busy)
The PMU counters are busy/taken by another profiler.
We found oprofile daemon running, please stop it and try again.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Link: http://lkml.kernel.org/r/1406908014-8312-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull x86 vdso updates from Ingo Molnar:
"Further simplifications and improvements to the VDSO code, by Andy
Lutomirski"
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64/vsyscall: Fix warn_bad_vsyscall log output
x86/vdso: Set VM_MAYREAD for the vvar vma
x86, vdso: Get rid of the fake section mechanism
x86, vdso: Move the vvar area before the vdso text
Pull x86 UV TLB update from Ingo Molnar:
"UV TLB shootdown logic updates for version of the UV architecture"
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/uv: Update the UV3 TLB shootdown logic
Pull RAS updates from Ingo Molnar:
"The main changes in this cycle are:
- RAS tracing/events infrastructure, by Gong Chen.
- Various generalizations of the APEI code to make it available to
non-x86 architectures, by Tomasz Nowicki"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras: Fix build warnings in <linux/aer.h>
acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context.
acpi, apei, ghes: Make NMI error notification to be GHES architecture extension.
apei, mce: Factor out APEI architecture specific MCE calls.
RAS, extlog: Adjust init flow
trace, eMCA: Add a knob to adjust where to save event log
trace, RAS: Add eMCA trace event interface
RAS, debugfs: Add debugfs interface for RAS subsystem
CPER: Adjust code flow of some functions
x86, MCE: Robustify mcheck_init_device
trace, AER: Move trace into unified interface
trace, RAS: Add basic RAS trace event
x86, MCE: Kill CPU_POST_DEAD
Pull x86 platform updates from Ingo Molnar:
"The main changes in this cycle are:
- Intel SOC driver updates, by Aubrey Li.
- TS5500 platform updates, by Vivien Didelot"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pmc_atom: Silence shift wrapping warnings in pmc_sleep_tmr_show()
x86/pmc_atom: Expose PMC device state and platform sleep state
x86/pmc_atom: Eisable a few S0ix wake up events for S0ix residency
x86/platform: New Intel Atom SOC power management controller driver
x86/platform/ts5500: Add support for TS-5400 boards
x86/platform/ts5500: Add a 'name' sysfs attribute
x86/platform/ts5500: Use the DEVICE_ATTR_RO() macro
Pull x86 mm changes from Ingo Molnar:
"The main change in this cycle is the rework of the TLB range flushing
code, to simplify, fix and consolidate the code. By Dave Hansen"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Set TLB flush tunable to sane value (33)
x86/mm: New tunable for single vs full TLB flush
x86/mm: Add tracepoints for TLB flushes
x86/mm: Unify remote INVLPG code
x86/mm: Fix missed global TLB flush stat
x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
x86/mm: Clean up the TLB flushing code
x86/smep: Be more informative when signalling an SMEP fault
Pull EFI changes from Ingo Molnar:
"Main changes in this cycle are:
- arm64 efi stub fixes, preservation of FP/SIMD registers across
firmware calls, and conversion of the EFI stub code into a static
library - Ard Biesheuvel
- Xen EFI support - Daniel Kiper
- Support for autoloading the efivars driver - Lee, Chun-Yi
- Use the PE/COFF headers in the x86 EFI boot stub to request that
the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael
Brown
- Consolidate all the x86 EFI quirks into one file - Saurabh Tangri
- Additional error logging in x86 EFI boot stub - Ulf Winkelvos
- Support loading initrd above 4G in EFI boot stub - Yinghai Lu
- EFI reboot patches for ACPI hardware reduced platforms"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
efi/arm64: Handle missing virtual mapping for UEFI System Table
arch/x86/xen: Silence compiler warnings
xen: Silence compiler warnings
x86/efi: Request desired alignment via the PE/COFF headers
x86/efi: Add better error logging to EFI boot stub
efi: Autoload efivars
efi: Update stale locking comment for struct efivars
arch/x86: Remove efi_set_rtc_mmss()
arch/x86: Replace plain strings with constants
xen: Put EFI machinery in place
xen: Define EFI related stuff
arch/x86: Remove redundant set_bit(EFI_MEMMAP) call
arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call
efi: Introduce EFI_PARAVIRT flag
arch/x86: Do not access EFI memory map if it is not available
efi: Use early_mem*() instead of early_io*()
arch/ia64: Define early_memunmap()
x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag
efi/reboot: Allow powering off machines using EFI
efi/reboot: Add generic wrapper around EfiResetSystem()
...
Pull x86 cpufeature updates from Ingo Molnar:
"The main changes in this cycle were:
- Continued cleanups of CPU bugs mis-marked as 'missing features', by
Borislav Petkov.
- Detect the xsaves/xrstors feature and releated cleanup, by Fenghua
Yu"
* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cpu: Kill cpu_has_mp
x86, amd: Cleanup init_amd
x86/cpufeature: Add bug flags to /proc/cpuinfo
x86, cpufeature: Convert more "features" to bugs
x86/xsaves: Detect xsaves/xrstors feature
x86/cpufeature.h: Reformat x86 feature macros
Pull x86 build/cleanup/debug updates from Ingo Molnar:
"Robustify the build process with a quirk to avoid GCC reordering
related bugs.
Two code cleanups.
Simplify entry_64.S CFI annotations, by Jan Beulich"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, build: Change code16gcc.h from a C header to an assembly header
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Simplify __HAVE_ARCH_CMPXCHG tests
x86/tsc: Get rid of custom DIV_ROUND() macro
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Drop several unnecessary CFI annotations
Pull scheduler updates from Ingo Molnar:
- Move the nohz kick code out of the scheduler tick to a dedicated IPI,
from Frederic Weisbecker.
This necessiated quite some background infrastructure rework,
including:
* Clean up some irq-work internals
* Implement remote irq-work
* Implement nohz kick on top of remote irq-work
* Move full dynticks timer enqueue notification to new kick
* Move multi-task notification to new kick
* Remove unecessary barriers on multi-task notification
- Remove proliferation of wait_on_bit() action functions and allow
wait_on_bit_action() functions to support a timeout. (Neil Brown)
- Another round of sched/numa improvements, cleanups and fixes. (Rik
van Riel)
- Implement fast idling of CPUs when the system is partially loaded,
for better scalability. (Tim Chen)
- Restructure and fix the CPU hotplug handling code that may leave
cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
cpu. (Kirill Tkhai)
- Robustify the sched topology setup code. (Peterz Zijlstra)
- Improve sched_feat() handling wrt. static_keys (Jason Baron)
- Misc fixes.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
sched/fair: Fix 'make xmldocs' warning caused by missing description
sched: Use macro for magic number of -1 for setparam
sched: Robustify topology setup
sched: Fix sched_setparam() policy == -1 logic
sched: Allow wait_on_bit_action() functions to support a timeout
sched: Remove proliferation of wait_on_bit() action functions
sched/numa: Revert "Use effective_load() to balance NUMA loads"
sched: Fix static_key race with sched_feat()
sched: Remove extra static_key*() function indirection
sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
sched: Transform resched_task() into resched_curr()
sched/deadline: Kill task_struct->pi_top_task
sched: Rework check_for_tasks()
sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
sched/fair: Disable runtime_enabled on dying rq
sched/numa: Change scan period code to match intent
sched/numa: Rework best node setting in task_numa_migrate()
sched/numa: Examine a task move when examining a task swap
sched/numa: Simplify task_numa_compare()
sched/numa: Use effective_load() to balance NUMA loads
...