When ordering events, we use preallocated buffers to store separate
events. Those buffers currently don't have their own struct, but since
they are basically an array of 'struct ordered_event' objects, we use
the first event to hold buffers data - list head, that holds all buffers
together:
struct ordered_events {
...
struct ordered_event *buffer;
...
};
struct ordered_event {
u64 timestamp;
u64 file_offset;
union perf_event *event;
struct list_head list;
};
This is quite convoluted and error prone as demonstrated by free-ing
issue discovered and fixed by Stephane in here [1].
This patch adds the 'struct ordered_events_buffer' object, that holds
the buffer data and frees it up properly.
[1] - https://marc.info/?l=linux-kernel&m=153376761329335&w=2
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Stephane Eranian <eranian@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180907102455.7030-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It has been pointed out to me many times that it is useful to be able to
switch off AUX records to save the bandwidth for records that actually
matter, for example, in AUX overwrite mode.
The usefulness of PERF_RECORD_AUX is in some of its flags, like the
TRUNCATED flag that tells the decoder where exactly gaps in the trace
are. The OVERWRITE flag, on the other hand will be set on every single
record in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
generated on every target task's sched_out, which over time adds up to a
lot of useless information.
If any folks out there have userspace that depends on a constant stream
of OVERWRITE records for a good reason, they'll have to let us know.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Link: http://lkml.kernel.org/r/20180404145323.28651-1-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix finding a symbol by name when multiple maps use the same backing DSO,
so we must first see if that symbol name is in the DSO, then see if it is
inside the range of addresses for that specific map (Adrian Hunter)
- Update the tools copies of UAPI headers, which silences the warnings
emitted when building the tools and in some cases, like for the new
KVM ioctls, results in 'perf trace' being able to translate that
ioctl number to a string (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 1c5aae7710 ("perf machine: Create maps for x86 PTI entry
trampolines") revealed a problem with maps__find_symbol_by_name() that
resulted in probes not being found e.g.
$ sudo perf probe xsk_mmap
xsk_mmap is out of .text, skip it.
Probe point 'xsk_mmap' not found.
Error: Failed to add events.
maps__find_symbol_by_name() can optionally return the map of the found
symbol. It can get the map wrong because, in fact, the symbol is found
on the map's dso, not allowing for the possibility that the dso has more
than one map. Fix by always checking the map contains the symbol.
Reported-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 1c5aae7710 ("perf machine: Create maps for x86 PTI entry trampolines")
Link: http://lkml.kernel.org/r/20180907085116.25782-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
c48300c92a ("vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition")
This makes 'perf trace' and other tools in the future using its
beautifiers in a libbeauty.so library be able to translate these new
ioctl to strings:
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > /tmp/after
$ diff -u /tmp/before /tmp/after
--- /tmp/before 2018-09-11 13:10:57.923038244 -0300
+++ /tmp/after 2018-09-11 13:11:20.329012685 -0300
@@ -15,6 +15,7 @@
[0x22] = "SET_VRING_ERR",
[0x23] = "SET_VRING_BUSYLOOP_TIMEOUT",
[0x24] = "GET_VRING_BUSYLOOP_TIMEOUT",
+ [0x25] = "SET_BACKEND_FEATURES",
[0x30] = "NET_SET_BACKEND",
[0x40] = "SCSI_SET_ENDPOINT",
[0x41] = "SCSI_CLEAR_ENDPOINT",
@@ -27,4 +28,5 @@
static const char *vhost_virtio_ioctl_read_cmds[] = {
[0x00] = "GET_FEATURES",
[0x12] = "GET_VRING_BASE",
+ [0x26] = "GET_BACKEND_FEATURES",
};
$
We'll also use this to be able to express syscall filters using symbolic
these symbolic names, something like:
# perf trace --all-cpus -e ioctl(cmd=*GET_FEATURES)
This silences the following warning during perf's build:
Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-35x71oei2hdui9u0tarpimbq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Problem: perf did not show branch predicted/mispredicted bit in brstack.
Output of perf -F brstack for profile collected
Before:
0x4fdbcd/0x4fdc03/-/-/-/0
0x45f4c1/0x4fdba0/-/-/-/0
0x45f544/0x45f4bb/-/-/-/0
0x45f555/0x45f53c/-/-/-/0
0x7f66901cc24b/0x45f555/-/-/-/0
0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0
After:
0x4fdbcd/0x4fdc03/P/-/-/0
0x45f4c1/0x4fdba0/P/-/-/0
0x45f544/0x45f4bb/P/-/-/0
0x45f555/0x45f53c/P/-/-/0
0x7f66901cc24b/0x45f555/P/-/-/0
0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0
Cause:
As mentioned in Software Development Manual vol 3, 17.4.8.1,
IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
its format. Despite that, registers containing FROM address of the branch,
do have MISPREDICT bit but because of the format indicated in
IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.
Solution:
Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.
Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
Kernel:
- Modify breakpoint fixes (Jiri Olsa)
perf annotate:
- Fix parsing aarch64 branch instructions after objdump update (Kim Phillips)
- Fix parsing indirect calls in 'perf annotate' (Martin Liška)
perf probe:
- Ignore SyS symbols irrespective of endianness on PowerPC (Sandipan Das)
perf trace:
- Fix include path for asm-generic/unistd.h on arm64 (Kim Phillips)
Core libraries:
- Fix potential null pointer dereference in perf_evsel__new_idx() (Hisao Tanabe)
- Use fixed size string for comms instead of scanf("%m"), that is
not present in the bionic libc and leads to a crash (Chris Phlipot)
- Fix bad memory access in trace info on 32-bit systems, we were reading
8 bytes from a 4-byte long variable when saving the command line in the
perf.data file. (Chris Phlipot)
Build system:
- Streamline bpf examples and headers installation, clarifying
some install messages. (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Prevent multiplication result truncation on 32bit. Introduced with
the early timestamp reworrk.
- Ensure microcode revision storage to be consistent under all
circumstances
- Prevent write tearing of PTEs
- Prevent confusion of user and kernel reegisters when dumping fatal
signals verbosely
- Make an error return value in a failure path of the vector
allocation negative. Returning EINVAL might the caller assume
success and causes further wreckage.
- A trivial kernel doc warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Use WRITE_ONCE() when setting PTEs
x86/apic/vector: Make error return value negative
x86/process: Don't mix user/kernel regs in 64bit __show_regs()
x86/tsc: Prevent result truncation on 32bit
x86: Fix kernel-doc atomic.h warnings
x86/microcode: Update the new microcode revision unconditionally
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
Pull timekeeping fixes from Thomas Gleixner:
"Two fixes for timekeeping:
- Revert to the previous kthread based update, which is unfortunately
required due to lock ordering issues. The removal caused boot
failures on old Core2 machines. Add a proper comment why the thread
needs to stay to prevent accidental removal in the future.
- Fix a silly typo in a function declaration"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Revert "Remove kthread"
timekeeping: Fix declaration of read_persistent_wall_and_boot_offset()
Pull irqchip fix from Thomas Gleixner:
"A single fix to prevent allocating excessive memory in the GIC/ITS
driver.
While the subject of the patch might suggest otherwise this is a real
fix as some SoCs exceed the memory allocation limits and fail to boot"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Cap lpi_id_bits to reduce memory footprint
Pull cpu hotplug fixes from Thomas Gleixner:
"Two fixes for the hotplug state machine code:
- Move the misplaces smb() in the hotplug thread function to the
proper place, otherwise a half update control struct could be
observed
- Prevent state corruption on error rollback, which causes the state
to advance by one and as a consequence skip it in the bringup
sequence"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Prevent state corruption on error rollback
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()
Pull random driver fix from Ted Ts'o:
"Fix things so the choice of whether or not to trust RDRAND to
initialize the CRNG is configurable via the boot option
random.trust_cpu={on,off}"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: make CPU trust a boot parameter
Pull Kbuild fixes from Masahiro Yamada:
- make setlocalversion more robust about -dirty check
- loosen the pkg-config requirement for Kconfig
- change missing depmod to a warning from an error
- warn modules_install when System.map is missing
* tag 'kbuild-fixes-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: modules_install: warn when missing System.map file
kbuild: make missing $DEPMOD a Warning instead of an Error
kconfig: do not require pkg-config on make {menu,n}config
kconfig: remove a spurious self-assignment
scripts/setlocalversion: git: Make -dirty check more robust
If there is no System.map file for "make modules_install",
scripts/depmod.sh will silently exit with success, having done
nothing. Since this is an unexpected situation, change it to
report a Warning for the missing file. The behavior is not
changed except for the Warning message.
The (previous) silent success and new Warning can be reproduced
by:
$ make mrproper; make defconfig
$ make modules; make modules_install
and since System.map is produced by "make vmlinux", the steps
above omit producing the System.map file.
Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
s390:
- Fallout from the hugetlbfs support: pfmf interpretion and locking
- VSIE: fix keywrapping for nested guests
PPC:
- Fix a bug where pages might not get marked dirty, causing guest
memory corruption on migration
- Fix a bug causing reads from guest memory to use the wrong guest
real address for very large HPT guests (>256G of memory), leading
to failures in instruction emulation.
x86:
- Fix out of bound access from malicious pv ipi hypercalls
(introduced in rc1)
- Fix delivery of pending interrupts when entering a nested guest,
preventing arbitrarily late injection
- Sanitize kvm_stat output after destroying a guest
- Fix infinite loop when emulating a nested guest page fault and
improve the surrounding emulation code
- Two minor cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: LAPIC: Fix pv ipis out-of-bounds access
KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2
arm64: KVM: Remove pgd_lock
KVM: Remove obsolete kvm_unmap_hva notifier backend
arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
KVM: s390: Properly lock mm context allow_gmap_hpage_1m setting
KVM: s390: vsie: copy wrapping keys to right place
KVM: s390: Fix pfmf and conditional skey emulation
tools/kvm_stat: re-animate display of dead guests
tools/kvm_stat: indicate dead guests as such
tools/kvm_stat: handle guest removals more gracefully
tools/kvm_stat: don't reset stats when setting PID filter for debugfs
tools/kvm_stat: fix updates for dead guests
tools/kvm_stat: fix handling of invalid paths in debugfs provider
tools/kvm_stat: fix python3 issues
KVM: x86: Unexport x86_emulate_instruction()
KVM: x86: Rename emulate_instruction() to kvm_emulate_instruction()
KVM: x86: Do not re-{try,execute} after failed emulation in L2
KVM: x86: Default to not allowing emulation retry in kvm_mmu_page_fault
...
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes who have trickled in:
- MMC bus width fixup for some Allwinner platforms
- Fix for NULL deref in ti-aemif when no platform data is passed in
- Fix div by 0 in SCMI code
- Add a missing module alias in a new RPi driver"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
memory: ti-aemif: fix a potential NULL-pointer dereference
firmware: arm_scmi: fix divide by zero when sustained_perf_level is zero
hwmon: rpi: add module alias to raspberrypi-hwmon
arm64: allwinner: dts: h6: fix Pine H64 MMC bus width
Allwinner fixes for 4.19
Just one fix for H6 mmc on the Pine H64: the mmc bus width was missing
from the device tree. This was added in 4.19-rc1.
* tag 'sunxi-fixes-for-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: allwinner: dts: h6: fix Pine H64 MMC bus width
Signed-off-by: Olof Johansson <olof@lixom.net>
When page-table entries are set, the compiler might optimize their
assignment by using multiple instructions to set the PTE. This might
turn into a security hazard if the user somehow manages to use the
interim PTE. L1TF does not make our lives easier, making even an interim
non-present PTE a security hazard.
Using WRITE_ONCE() to set PTEs and friends should prevent this potential
security hazard.
I skimmed the differences in the binary with and without this patch. The
differences are (obviously) greater when CONFIG_PARAVIRT=n as more
code optimizations are possible. For better and worse, the impact on the
binary with this patch is pretty small. Skimming the code did not cause
anything to jump out as a security hazard, but it seems that at least
move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.
Fixes: 2db1f959d9 ("x86/vector: Handle managed interrupts proper")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Pull i2c fixes from Wolfram Sang:
- bugfixes for uniphier, i801, and xiic drivers
- ID removal (never produced) for imx
- one MAINTAINER addition
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xiic: Record xilinx i2c with Zynq fragment
i2c: xiic: Make the start and the byte count write atomic
i2c: i801: fix DNV's SMBCTRL register offset
i2c: imx-lpi2c: Remove mx8dv compatible entry
dt-bindings: imx-lpi2c: Remove mx8dv compatible entry
i2c: uniphier-f: issue STOP only for last message or I2C_M_STOP
i2c: uniphier: issue STOP only for last message or I2C_M_STOP
Fix the cell specification mechanism to allow cells to be pre-created
without having to specify at least one address (the addresses will be
upcalled for).
This allows the cell information preload service to avoid the need to issue
loads of DNS lookups during boot to get the addresses for each cell (500+
lookups for the 'standard' cell list[*]). The lookups can be done later as
each cell is accessed through the filesystem.
Also remove the print statement that prints a line every time a new cell is
added.
[*] There are 144 cells in the list. Each cell is first looked up for an
SRV record, and if that fails, for an AFSDB record. These get a list
of server names, each of which then has to be looked up to get the
addresses for that server. E.g.:
dig srv _afs3-vlserver._udp.grand.central.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull MD fixes from Shaohua Li:
- Fix a locking issue for md-cluster (Guoqing)
- Fix a sync crash for raid10 (Ni)
- Fix a reshape bug with raid5 cache enabled (me)
* tag 'md/4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md-cluster: release RESYNC lock after the last resync message
RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0
md/raid5-cache: disable reshape completely
Pull ceph fixes from Ilya Dryomov:
"Two rbd patches to complete support for images within namespaces that
went into -rc1 and a use-after-free fix.
The rbd changes have been sitting in a branch for quite a while but
couldn't be included into the -rc1 pull request because of a pending
wire protocol backwards compatibility fixup that only got committed
early this week"
* tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client:
rbd: support cloning across namespaces
rbd: factor out get_parent_info()
ceph: avoid a use-after-free in ceph_destroy_options()
Pull fsnotify fix from Jan Kara:
"A small fsnotify fix from Amir"
* tag 'for_v4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix ignore mask logic in fsnotify()
Pull arm64 fix from Will Deacon:
"Just one small fix here, preventing a VM_WARN_ON when a !present
PMD/PUD is "freed" as part of a huge ioremap() operation.
The correct behaviour is to skip the free silently in this case, which
is a little weird (the function is a bit of a misnomer), but it
follows the x86 implementation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix erroneous warnings in page freeing functions
Pull ACPI fixes from Rafael Wysocki:
"These fix a regression from the 4.18 cycle in the ACPI driver for
Intel SoCs (LPSS) and prevent dmi_check_system() from being called on
non-x86 systems in the ACPI core.
Specifics:
- Fix a power management regression in the ACPI driver for Intel SoCs
(LPSS) introduced by a system-wide suspend/resume fix during the
4.18 cycle (Zhang Rui).
- Prevent dmi_check_system() from being called on non-x86 systems in
the ACPI core (Jean Delvare)"
* tag 'acpi-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / LPSS: Force LPSS quirks on boot
ACPI / bus: Only call dmi_check_system() on X86