Pull x86 debugging documentation updates from Ingo Molnar:
"Documentation updates about x86 kernel stacks"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Documentation: Adapt Ingo's explanation on printing backtraces
x86/Documentation: Remove STACKFAULT_STACK bulletpoint
x86/Documentation: Move kernel-stacks doc one level up
Pull x86 CPU features from Ingo Molnar:
"Various CPU feature support related changes: in particular the
/proc/cpuinfo model name sanitization change should be monitored, it
has a chance to break stuff. (but really shouldn't and there are no
regression reports)"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Give access to the number of nodes in a physical package
x86/cpu: Trim model ID whitespace
x86/cpu: Strip any /proc/cpuinfo model name field whitespace
x86/cpu/amd: Set X86_FEATURE_EXTD_APICID for future processors
x86/gart: Check for GART support before accessing GART registers
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Clean up types in xlate_dev_mem_ptr() some more
x86: Deinline dma_free_attrs()
x86: Deinline dma_alloc_attrs()
x86: Remove unused TI_cpu
x86: Merge common 32-bit values in asm-offsets.c
Pull scheduler updates from Ingo Molnar:
"The main changes are:
- lockless wakeup support for futexes and IPC message queues
(Davidlohr Bueso, Peter Zijlstra)
- Replace spinlocks with atomics in thread_group_cputimer(), to
improve scalability (Jason Low)
- NUMA balancing improvements (Rik van Riel)
- SCHED_DEADLINE improvements (Wanpeng Li)
- clean up and reorganize preemption helpers (Frederic Weisbecker)
- decouple page fault disabling machinery from the preemption
counter, to improve debuggability and robustness (David
Hildenbrand)
- SCHED_DEADLINE documentation updates (Luca Abeni)
- topology CPU masks cleanups (Bartosz Golaszewski)
- /proc/sched_debug improvements (Srikar Dronamraju)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
sched/deadline: Remove needless parameter in dl_runtime_exceeded()
sched: Remove superfluous resetting of the p->dl_throttled flag
sched/deadline: Drop duplicate init_sched_dl_class() declaration
sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target
sched/deadline: Make init_sched_dl_class() __init
sched/deadline: Optimize pull_dl_task()
sched/preempt: Add static_key() to preempt_notifiers
sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration
sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched
sched/debug: Replace vruntime with wait_sum in /proc/sched_debug
sched/debug: Properly format runnable tasks in /proc/sched_debug
sched/numa: Only consider less busy nodes as numa balancing destinations
Revert 095bebf61a ("sched/numa: Do not move past the balance point if unbalanced")
sched/fair: Prevent throttling in early pick_next_task_fair()
preempt: Reorganize the notrace definitions a bit
preempt: Use preempt_schedule_context() as the official tracing preemption point
sched: Make preempt_schedule_context() function-tracing safe
x86: Remove cpu_sibling_mask() and cpu_core_mask()
x86: Replace cpu_**_mask() with topology_**_cpumask()
...
Pull perf fixes from Ingo Molnar:
"These are the left over fixes from the v4.1 cycle"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Fix build breakage if prefix= is specified
perf/x86: Honor the architectural performance monitoring version
perf/x86/intel: Fix PMI handling for Intel PT
perf/x86/intel/bts: Fix DS area sharing with x86_pmu events
perf/x86: Add more Broadwell model numbers
perf: Fix ring_buffer_attach() RCU sync, again
Pull perf updates from Ingo Molnar:
"Kernel side changes mostly consist of work on x86 PMU drivers:
- x86 Intel PT (hardware CPU tracer) improvements (Alexander
Shishkin)
- x86 Intel CQM (cache quality monitoring) improvements (Thomas
Gleixner)
- x86 Intel PEBSv3 support (Peter Zijlstra)
- x86 Intel PEBS interrupt batching support for lower overhead
sampling (Zheng Yan, Kan Liang)
- x86 PMU scheduler fixes and improvements (Peter Zijlstra)
There's too many tooling improvements to list them all - here are a
few select highlights:
'perf bench':
- Introduce new 'perf bench futex' benchmark: 'wake-parallel', to
measure parallel waker threads generating contention for kernel
locks (hb->lock). (Davidlohr Bueso)
'perf top', 'perf report':
- Allow disabling/enabling events dynamicaly in 'perf top':
a 'perf top' session can instantly become a 'perf report'
one, i.e. going from dynamic analysis to a static one,
returning to a dynamic one is possible, to toogle the
modes, just press 'f' to 'freeze/unfreeze' the sampling. (Arnaldo Carvalho de Melo)
- Make Ctrl-C stop processing on TUI, allowing interrupting the load of big
perf.data files (Namhyung Kim)
'perf probe': (Masami Hiramatsu)
- Support glob wildcards for function name
- Support $params special probe argument: Collect all function arguments
- Make --line checks validate C-style function name.
- Add --no-inlines option to avoid searching inline functions
- Greatly speed up 'perf probe --list' by caching debuginfo.
- Improve --filter support for 'perf probe', allowing using its arguments
on other commands, as --add, --del, etc.
'perf sched':
- Add option in 'perf sched' to merge like comms to lat output (Josef Bacik)
Plus tons of infrastructure work - in particular preparation for
upcoming threaded perf report support, but also lots of other work -
and fixes and other improvements. See (much) more details in the
shortlog and in the git log"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (305 commits)
perf tools: Configurable per thread proc map processing time out
perf tools: Add time out to force stop proc map processing
perf report: Fix sort__sym_cmp to also compare end of symbol
perf hists browser: React to unassigned hotkey pressing
perf top: Tell the user how to unfreeze events after pressing 'f'
perf hists browser: Honour the help line provided by builtin-{top,report}.c
perf hists browser: Do not exit when 'f' is pressed in 'report' mode
perf top: Replace CTRL+z with 'f' as hotkey for enable/disable events
perf annotate: Rename source_line_percent to source_line_samples
perf annotate: Display total number of samples with --show-total-period
perf tools: Ensure thread-stack is flushed
perf top: Allow disabling/enabling events dynamicly
perf evlist: Add toggle_enable() method
perf trace: Fix race condition at the end of started workloads
perf probe: Speed up perf probe --list by caching debuginfo
perf probe: Show usage even if the last event is skipped
perf tools: Move libtraceevent dynamic list to separated LDFLAGS variable
perf tools: Fix a problem when opening old perf.data with different byte order
perf tools: Ignore .config-detected in .gitignore
perf probe: Fix to return error if no probe is added
...
Pull locking updates from Ingo Molnar:
"The main changes are:
- 'qspinlock' support, enabled on x86: queued spinlocks - these are
now the spinlock variant used by x86 as they outperform ticket
spinlocks in every category. (Waiman Long)
- 'pvqspinlock' support on x86: paravirtualized variant of queued
spinlocks. (Waiman Long, Peter Zijlstra)
- 'qrwlock' support, enabled on x86: queued rwlocks. Similar to
queued spinlocks, they are now the variant used by x86:
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
- various lockdep fixlets
- various locking primitives cleanups, further WRITE_ONCE()
propagation"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
locking/lockdep: Remove hard coded array size dependency
locking/qrwlock: Don't contend with readers when setting _QW_WAITING
lockdep: Do not break user-visible string
locking/arch: Rename set_mb() to smp_store_mb()
locking/arch: Add WRITE_ONCE() to set_mb()
rtmutex: Warn if trylock is called from hard/softirq context
arch: Remove __ARCH_HAVE_CMPXCHG
locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG
locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS
locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS
locking/pvqspinlock: Replace xchg() by the more descriptive set_mb()
locking/pvqspinlock, x86: Enable PV qspinlock for Xen
locking/pvqspinlock, x86: Enable PV qspinlock for KVM
locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching
locking/pvqspinlock: Implement simple paravirt support for the qspinlock
locking/qspinlock: Revert to test-and-set on hypervisors
locking/qspinlock: Use a simple write to grab the lock
locking/qspinlock: Optimize for smaller NR_CPUS
locking/qspinlock: Extract out code snippets for the next patch
locking/qspinlock: Add pending bit
...
Pull RCU updates from Ingo Molnar:
- Continued initialization/Kconfig updates: hide most Kconfig options
from unsuspecting users.
There's now a single high level configuration option:
*
* RCU Subsystem
*
Make expert-level adjustments to RCU configuration (RCU_EXPERT) [N/y/?] (NEW)
Which if answered in the negative, leaves us with a single
interactive configuration option:
Offload RCU callback processing from boot-selected CPUs (RCU_NOCB_CPU) [N/y/?] (NEW)
All the rest of the RCU options are configured automatically. Later
on we'll remove this single leftover configuration option as well.
- Remove all uses of RCU-protected array indexes: replace the
rcu_[access|dereference]_index_check() APIs with READ_ONCE() and
rcu_lockdep_assert()
- RCU CPU-hotplug cleanups
- Updates to Tiny RCU: a race fix and further code shrinkage.
- RCU torture-testing updates: fixes, speedups, cleanups and
documentation updates.
- Miscellaneous fixes
- Documentation updates
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
rcutorture: Allow repetition factors in Kconfig-fragment lists
rcutorture: Display "make oldconfig" errors
rcutorture: Update TREE_RCU-kconfig.txt
rcutorture: Make rcutorture scripts force RCU_EXPERT
rcutorture: Update configuration fragments for rcutree.rcu_fanout_exact
rcutorture: TASKS_RCU set directly, so don't explicitly set it
rcutorture: Test SRCU cleanup code path
rcutorture: Replace barriers with smp_store_release() and smp_load_acquire()
locktorture: Change longdelay_us to longdelay_ms
rcutorture: Allow negative values of nreaders to oversubscribe
rcutorture: Exchange TREE03 and TREE08 NR_CPUS, speed up CPU hotplug
rcutorture: Exchange TREE03 and TREE04 geometries
locktorture: fix deadlock in 'rw_lock_irq' type
rcu: Correctly handle non-empty Tiny RCU callback list with none ready
rcutorture: Test both RCU-sched and RCU-bh for Tiny RCU
rcu: Further shrink Tiny RCU by making empty functions static inlines
rcu: Conditionally compile RCU's eqs warnings
rcu: Remove prompt for RCU implementation
rcu: Make RCU able to tolerate undefined CONFIG_RCU_KTHREAD_PRIO
rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF
...
Pull vfs updates from Al Viro:
"In this pile: pathname resolution rewrite.
- recursion in link_path_walk() is gone.
- nesting limits on symlinks are gone (the only limit remaining is
that the total amount of symlinks is no more than 40, no matter how
nested).
- "fast" (inline) symlinks are handled without leaving rcuwalk mode.
- stack footprint (independent of the nesting) is below kilobyte now,
about on par with what it used to be with one level of nested
symlinks and ~2.8 times lower than it used to be in the worst case.
- struct nameidata is entirely private to fs/namei.c now (not even
opaque pointers are being passed around).
- ->follow_link() and ->put_link() calling conventions had been
changed; all in-tree filesystems converted, out-of-tree should be
able to follow reasonably easily.
For out-of-tree conversions, see Documentation/filesystems/porting
for details (and in-tree filesystems for examples of conversion).
That has sat in -next since mid-May, seems to survive all testing
without regressions and merges clean with v4.1"
* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
turn user_{path_at,path,lpath,path_dir}() into static inlines
namei: move saved_nd pointer into struct nameidata
inline user_path_create()
inline user_path_parent()
namei: trim do_last() arguments
namei: stash dfd and name into nameidata
namei: fold path_cleanup() into terminate_walk()
namei: saner calling conventions for filename_parentat()
namei: saner calling conventions for filename_create()
namei: shift nameidata down into filename_parentat()
namei: make filename_lookup() reject ERR_PTR() passed as name
namei: shift nameidata inside filename_lookup()
namei: move putname() call into filename_lookup()
namei: pass the struct path to store the result down into path_lookupat()
namei: uninline set_root{,_rcu}()
namei: be careful with mountpoint crossings in follow_dotdot_rcu()
Documentation: remove outdated information from automount-support.txt
get rid of assorted nameidata-related debris
lustre: kill unused helper
lustre: kill unused macro (LOOKUP_CONTINUE)
...
Pull scsi target fixes from Nicholas Bellinger:
"Apologies for the late pull request.
Here are the outstanding target-pending fixes for v4.1 code.
The series contains three patches from Sagi + Co that address a few
iser-target issues that have been uncovered during recent testing at
Mellanox.
Patch #1 has a v3.16+ stable tag, and #2-3 have v3.10+ stable tags"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iser-target: Fix possible use-after-free
iser-target: release stale iser connections
iser-target: Fix variable-length response error completion
Pull drm fixes from Dave Airlie:
"A smattering of fixes,
mgag200:
don't accept modes that aren't aligned properly as hw can't do it
i915:
two regression fixes
radeon:
one query to allow userspace fixes
one oops fixer for older hw with new options enabled"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: don't probe MST on hw we don't support it on
drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query
drm/mgag200: Reject non-character-cell-aligned mode widths
Revert "drm/i915: Don't skip request retirement if the active list is empty"
drm/i915: Always reset vma->ggtt_view.pages cache on unbinding
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Replace CTRL+z with 'f' as hotkey for enable/disable events (Arnaldo Carvalho de Melo)
- Do not exit when 'f' is pressed in 'report' mode (Arnaldo Carvalho de Melo)
- Tell the user how to unfreeze events after pressing 'f' in 'perf top' (Arnaldo Carvalho de Melo)
- React to unassigned hotkey pressing in 'top/report' (Arnaldo Carvalho de Melo)
- Display total number of samples with --show-total-period in 'annotate' (Martin Liška)
- Add timeout to make procfs mmap processing more robust (Kan Liang)
- Fix sort__sym_cmp to also compare end of symbol (Yannick Brosseau)
Infrastructure changes:
- Ensure thread-stack is flushed (Adrian Hunter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/urgent fix from Arnaldo Carvalho de Melo:
- Fix build breakage if prefix= is specified, this introduced
a regression for a build idiom used by the Debian "linux-tools"
package, that does:
MAKE_PERF := $(MAKE) prefix=/usr V=1 ARCH=$(KERNEL_ARCH_PERF) ...
Fix it. (Lukas Wunner)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
System wide sampling like 'perf top' or 'perf record -a' read all
threads /proc/xxx/maps before sampling. If there are any threads which
generating a keeping growing huge maps, perf will do infinite loop
during synthesizing. Nothing will be sampled.
This patch fixes this issue by adding per-thread timeout to force stop
this kind of endless proc map processing.
PERF_RECORD_MISC_PROC_MAP_PARSE_TIME_OUT is introduced to indicate that
the mmap record are truncated by time out. User will get warning
notification when truncated mmap records are detected.
Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ying Huang <ying.huang@intel.com>
Link: http://lkml.kernel.org/r/1434549071-25611-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The thread-stack represents a thread's current stack. When a thread
exits there can still be many functions on the stack e.g. exit() can be
called many levels deep, so all the callers will never return. To get
that information output, the thread-stack must be flushed.
Previously it was assumed the thread-stack would be flushed when the
struct thread was deleted. With thread ref-counting it is no longer
clear when that will be, if ever. So instead explicitly flush all the
thread-stacks at the end of a session.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull clk fixes from Michael Turquette:
"Very late clk regression fixes for the ARM-based AT91 platform.
These went unnoticed by me until recently, hence the late pull
request"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: fix h32mx prototype inclusion in pmc header
clk: at91: trivial: typo in peripheral clock description
clk: at91: fix PERIPHERAL_MAX_SHIFT definition
clk: at91: pll: fix input range validity check
Pull sound fixes from Takashi Iwai:
"Nothing looks scary, just a few usual HD-audio regression fixes and
fixup, in addition to a minor Kconfig dependency fix for the old MIPS
drivers"
* tag 'sound-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix unused label skip_i915
ALSA: hda - Fix noisy outputs on Dell XPS13 (2015 model)
ALSA: mips: let SND_SGI_O2 select SND_PCM
ALSA: hda - Fix audio crackles on Dell Latitude E7x40
ALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine
Fix the PERIPHERAL_MAX_SHIFT definition (3 instead of 4) and adapt the
round_rate and set_rate logic accordingly.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: "Wu, Songjun" <Songjun.Wu@atmel.com>
The PLL impose a certain input range to work correctly, but it appears that
this input range does not apply on the input clock (or parent clock) but
on the input clock after it has passed the PLL divisor.
Fix the implementation accordingly.
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Jonas Andersson <jonas@microbit.se>
Jiri reported a machine stuck in multi_cpu_stop() with
migrate_swap_stop() as function and with the following src,dst cpu
pairs: {11, 4} {13, 11} { 4, 13}
4 11 13
cpuM: queue(4 ,13)
*Ma
cpuN: queue(13,11)
*N Na
*M Mb
cpuO: queue(11, 4)
*O Oa
*Nb
*Ob
Where *X denotes the cpu running the queueing of cpu-X and X[ab] denotes
the first/second queued work.
You'll observe the top of the workqueue for each cpu: 4,11,13 to be work
from cpus: M, O, N resp. IOW. deadlock.
Do away with the queueing trickery and introduce lg_double_lock() to
lock both CPUs and fully serialize the stop_two_cpus() callers instead
of the partial (and buggy) serialization we have now.
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150605153023.GH19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
An apparent oversight left a hardcoded '4' in place when
LOCKSTAT_POINTS was introduced.
The contention_point[] and contending_point[] arrays in the
structs lock_class and lock_class_stats need to be the same
size for the loops in lock_stats() to be correct.
This patch allows LOCKSTAT_POINTS to be changed without
affecting the correctness of the code.
Signed-off-by: George Beshers <gbeshers@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.
A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:
With R:W ratio = 5:1
Threads w/o patch with patch % change
------- --------- ---------- --------
2 990 895 -9.6%
3 2136 1912 -10.5%
4 3166 2830 -10.6%
5 3953 3629 -8.2%
6 4628 4405 -4.8%
7 5344 5197 -2.8%
8 6065 6004 -1.0%
9 6826 6811 -0.2%
10 7599 7599 0.0%
15 9757 9766 +0.1%
20 13767 13817 +0.4%
With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433863153-30722-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel PT is a separate PMU and it is not using any of the x86_pmu
code paths, which means in particular that the active_events counter
remains intact when new PT events are created.
However, PT uses the generic x86_pmu PMI handler for its PMI handling needs.
The problem here is that the latter checks active_events and in case of it
being zero, exits without calling the actual x86_pmu.handle_nmi(), which
results in unknown NMI errors and massive data loss for PT.
The effect is not visible if there are other perf events in the system
at the same time that keep active_events counter non-zero, for instance
if the NMI watchdog is running, so one needs to disable it to reproduce
the problem.
At the same time, the active_events counter besides doing what the name
suggests also implicitly serves as a PMC hardware and DS area reference
counter.
This patch adds a separate reference counter for the PMC hardware, leaving
active_events for actually counting the events and makes sure it also
counts PT and BTS events.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/87k2v92t0s.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>