When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:
# perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
perf: Segmentation fault
Obtained 12 stack frames.
./perf/perf(+0x197673) [0x55f99710b673]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
./perf/perf(+0x8eb40) [0x55f997002b40]
./perf/perf(+0x1f6882) [0x55f99716a882]
./perf/perf(+0x794c2) [0x55f996fed4c2]
./perf/perf(+0x7b7c7) [0x55f996fef7c7]
./perf/perf(+0x9074b) [0x55f99700474b]
./perf/perf(+0x12e23c) [0x55f9970a223c]
./perf/perf(+0x12e54a) [0x55f9970a254a]
./perf/perf(+0x7db60) [0x55f996ff1b60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
Segmentation fault (core dumped)
backtrace of the core file is as follows:
(gdb) bt
#0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
#1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
#2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
#3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
#4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
from=from@entry=0) at util/synthetic-events.c:1895
#5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
at util/synthetic-events.c:1905
#6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
#7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
#8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
#9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
#10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
#11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
#12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
The reason is that record__bytes_written accesses the freed memory rec->thread_data,
The process is as follows:
__cmd_record
-> record__free_thread_data
-> zfree(&rec->thread_data) // free rec->thread_data
-> record__synthesize
-> perf_event__synthesize_id_index
-> process_synthesized_event
-> record__write
-> record__bytes_written // access rec->thread_data
We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.
Fixes: 6d57581659 ("perf record: Add support for limit perf output file size")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I have downloaded linux-next and build the perf tool using
# make LIBPFM4=1
to have libpfm4 support built into perf. The build fails:
# make LIBPFM4=1
....
INSTALL libbpf_headers
CC util/pfm.o
util/pfm.c: In function ‘print_libpfm_event’:
util/pfm.c:189:9: error: too many arguments to function ‘print_cb->print_event’
189 | print_cb->print_event(print_state,
| ^~~~~~~~
util/pfm.c:220:25: error: too many arguments to function ‘print_cb->print_event’
220 | print_cb->print_event(print_state,
The build error is caused by commit d9dc8874d6 ("perf pmu-events:
Remove now unused event and metric variables") which changes the
function prototype of
struct print_callbacks {
...
void (*print_event)(...); --> last two parameters removed.
};
but does not adjust the usage of this function prototype in util/pfm.c.
In file util/pfm.c function print_event() is still invoked with 13
parameters instead of 11. The compile fails.
When I adjust the file util/pfm.c as in this patch, the build works file.
Please check this patch for correctness, I have just fixed the compile
issue.
Fixes: d9dc8874d6 ("perf pmu-events: Remove now unused event and metric variables")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: egorenar@linux.ibm.com
Cc: linux-kernel-next@vger.kernel.org
Link: https://lore.kernel.org/r/20230207140447.1827741-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When there're many lock contentions in the system, people sometimes want
to know who caused the contention, IOW who's the owner of the locks.
The -o/--lock-owner option tries to follow the lock owners for the
contended mutexes and rwsems from BPF, and then attributes the
contention time to the owner instead of the waiter. It's a best effort
approach to get the owner info at the time of the contention and doesn't
guarantee to have the precise tracking of owners if it's changing over
time.
Currently it only handles mutex and rwsem that have owner field in their
struct and it basically points to a task_struct that owns the lock at
the moment.
Technically its type is atomic_long_t and it comes with some LSB bits
used for other meanings. So it needs to clear them when casting it to a
pointer to task_struct.
Also the atomic_long_t is a typedef of the atomic 32 or 64 bit types
depending on arch which is a wrapper struct for the counter value. I'm
not aware of proper ways to access those kernel atomic types from BPF so
I just read the internal counter value directly. Please let me know if
there's a better way.
When -o/--lock-owner option is used, it goes to the task aggregation
mode like -t/--threads option does. However it cannot get the owner for
other lock types like spinlock and sometimes even for mutex.
$ sudo ./perf lock con -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome
As you can see, the owner is unknown for the most cases. But if we
filter only for the mutex locks, it'd more likely get the onwers.
$ sudo ./perf lock con -abo -Y mutex -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.910 [sec]
4.910435 usecs/op
203647 ops/sec
contended total wait max wait avg wait pid owner
2 15.50 us 8.29 us 7.75 us 1582852 sched-pipe
7 7.20 us 2.47 us 1.03 us -1 Unknown
1 6.74 us 6.74 us 6.74 us 1582851 sched-pipe
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230207002403.63590-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf BPF filter test fails in environment where "kernel-debuginfo"
is not installed.
Test failure logs:
<<>>
42: BPF filter :
42.1: Basic BPF filtering : Ok
42.2: BPF pinning : Ok
42.3: BPF prologue generation : FAILED!
<<>>
Enabling verbose option provided debug logs, which says debuginfo
needs to be installed. Snippet of verbose logs:
<<>>
42.3: BPF prologue generation :
--- start ---
test child forked, pid 28218
<<>>
Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo
package.
bpf_probe: failed to convert perf probe events
Failed to add events selected by BPF
test child finished with -1
---- end ----
BPF filter subtest 3: FAILED!
<<>>
Here the subtest "BPF prologue generation" failed and logs shows
debuginfo is needed. After installing kernel-debuginfo package, testcase
passes.
The "BPF prologue generation" subtest failed because, the do_test()
returns TEST_FAIL without checking the error type returned by
parse_events_load_bpf_obj().
parse_events_load_bpf_obj() can also return error of type -ENODATA
incase kernel-debuginfo package is not installed. Fix this by adding
check for -ENODATA error.
Test result after the patch changes:
Test failure logs:
<<>>
42: BPF filter :
42.1: Basic BPF filtering : Ok
42.2: BPF pinning : Ok
42.3: BPF prologue generation : Skip (clang/debuginfo isn't installed or environment missing BPF support)
<<>>
Fixes: ba1fae431e ("perf test: Add 'perf test BPF'")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/linux-perf-users/Y7bIk77mdE4j8Jyi@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "bpf" tests fails in environment with missing libtraceevent support
as below:
# ./perf test 36
36: BPF filter :
36.1: Basic BPF filtering : FAILED!
36.2: BPF pinning : FAILED!
36.3: BPF prologue generation : FAILED!
The environment has clang but missing the libtraceevent devel. Hence
perf is compiled without libtraceevent support.
Detailed logs:
./perf test -v "Basic BPF filtering"
Failed to add BPF event syscalls:sys_enter_epoll_pwait
bpf: tracepoint call back failed, stop iterate
Failed to add events selected by BPF
The bpf tests tris to add probe event which fails at
"parse_events_add_tracepoint" function due to missing libtraceevent. Add
check for "HAVE_LIBTRACEEVENT" in the "tests/bpf.c" before proceeding
with the test.
With the change,
# ./perf test 36
36: BPF filter :
36.1: Basic BPF filtering : Skip (not compiled in or missing libtraceevent support)
36.2: BPF pinning : Skip (not compiled in or missing libtraceevent support)
36.3: BPF prologue generation : Skip (not compiled in or missing libtraceevent support)
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230131135001.54578-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull ELF fix from Al Viro:
"One of the many equivalent build warning fixes for !CONFIG_ELF_CORE
configs. Geert's is the earliest one I've been able to find"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: Move dump_emit_page() to kill unused warning
Pull USB fixes from Greg KH:
"Here are some small USB fixes that resolve some reported problems.
These include:
- gadget driver fixes
- dwc3 driver fix
- typec driver fix
- MAINTAINERS file update.
All of these have been in linux-next with no reported problems"
* tag 'usb-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Don't attempt to resume the ports before they exist
usb: gadget: udc: do not clear gadget driver.bus
usb: gadget: f_uac2: Fix incorrect increment of bNumEndpoints
usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait
usb: dwc3: qcom: enable vbus override when in OTG dr-mode
MAINTAINERS: Add myself as UVC Gadget Maintainer
Pull tty/serial driver fixes from Greg KH:
"Here are some small serial and vt fixes. These include:
- 8250 driver fixes relating to dma issues
- stm32 serial driver fix for threaded irqs
- vc_screen bugfix for reported problems.
All have been in linux-next for a while with no reported problems"
* tag 'tty-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF
serial: 8250_dma: Fix DMA Rx rearm race
serial: 8250_dma: Fix DMA Rx completion race
serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler
Pull char/misc driver fixes from Greg KH:
"Here are a number of small char/misc/whatever driver fixes. They
include:
- IIO driver fixes for some reported problems
- nvmem driver fixes
- fpga driver fixes
- debugfs memory leak fix in the hv_balloon and irqdomain code
(irqdomain change was acked by the maintainer)
All have been in linux-next with no reported problems"
* tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (33 commits)
kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()
HV: hv_balloon: fix memory leak with using debugfs_lookup()
nvmem: qcom-spmi-sdam: fix module autoloading
nvmem: core: fix return value
nvmem: core: fix cell removal on error
nvmem: core: fix device node refcounting
nvmem: core: fix registration vs use race
nvmem: core: fix cleanup after dev_set_name()
nvmem: core: remove nvmem_config wp_gpio
nvmem: core: initialise nvmem->id early
nvmem: sunxi_sid: Always use 32-bit MMIO reads
nvmem: brcm_nvram: Add check for kzalloc
iio: imu: fxos8700: fix MAGN sensor scale and unit
iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
iio: imu: fxos8700: fix failed initialization ODR mode assignment
iio: imu: fxos8700: fix incorrect ODR mode readback
iio: light: cm32181: Fix PM support on system with 2 I2C resources
iio: hid: fix the retval in gyro_3d_capture_sample
iio: hid: fix the retval in accel_3d_capture_sample
iio: imu: st_lsm6dsx: fix build when CONFIG_IIO_TRIGGERED_BUFFER=m
...
Pull fbdev fixes from Helge Deller:
- fix fbcon to prevent fonts bigger than 32x32 pixels to avoid
overflows reported by syzbot
- switch omapfb to use kstrtobool()
- switch some fbdev drivers to use the backlight helpers
* tag 'fbdev-for-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbcon: Check font dimension limits
fbdev: omapfb: Use kstrtobool() instead of strtobool()
fbdev: fbmon: fix function name in kernel-doc
fbdev: atmel_lcdfb: Rework backlight status updates
fbdev: riva: Use backlight helper
fbdev: omapfb: panel-dsi-cm: Use backlight helper
fbdev: nvidia: Use backlight helper
fbdev: mx3fb: Use backlight helper
fbdev: radeon: Use backlight helper
fbdev: atyfb: Use backlight helper
fbdev: aty128fb: Use backlight helper
Pull x86 fix from Borislav Petkov:
- Prevent the compiler from reordering accesses to debug regs which
could cause a #VC exception in SEV-ES guests at the wrong place in
the NMI handling path
* tag 'x86_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Fix stack recursion caused by wrongly ordered DR7 accesses
Pull perf fix from Borislav Petkov:
- Lock the proper critical section when dealing with perf event context
* tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix perf_event_pmu_context serialization
Pull powerpc fixes from Michael Ellerman:
"It's a bit of a big batch for rc6, but just because I didn't send any
fixes the last week or two while I was on vacation, next week should
be quieter:
- Fix a few objtool warnings since we recently enabled objtool.
- Fix a deadlock with the hash MMU vs perf record.
- Fix perf profiling of asynchronous interrupt handlers.
- Revert the IMC PMU nest_init_lock to being a mutex.
- Two commits fixing problems with the kexec_file FDT size
estimation.
- Two commits fixing problems with strict RWX vs kernels running at
non-zero.
- Reconnect tlb_flush() to hash__tlb_flush()
Thanks to Kajol Jain, Nicholas Piggin, Sachin Sant Sathvika Vasireddy,
and Sourabh Jain"
* tag 'powerpc-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()
powerpc/kexec_file: Count hot-pluggable memory in FDT estimate
powerpc/64s/radix: Fix RWX mapping with relocated kernel
powerpc/64s/radix: Fix crash with unaligned relocated kernel
powerpc/kexec_file: Fix division by zero in extra size estimation
powerpc/imc-pmu: Revert nest_init_lock to being a mutex
powerpc/64: Fix perf profiling asynchronous interrupt handlers
powerpc/64s: Fix local irq disable when PMIs are disabled
powerpc/kvm: Fix unannotated intra-function call warning
powerpc/85xx: Fix unannotated intra-function call warning
Pull RTC fixes from Alexandre Belloni:
"Here are a few fixes for 6.2. The EFI one is the most important as it
allows some RTCs to actually work. The other two are warnings that are
worth fixing.
- efi: make WAKEUP services optional
- sunplus: fix format string warning"
* tag 'rtc-6.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: sunplus: fix format string for printing resource
dt-bindings: rtc: qcom-pm8xxx: allow 'wakeup-source' property
rtc: efi: Enable SET/GET WAKEUP services as optional
Pull Kbuild fixes from Masahiro Yamada:
- Fix two bugs (for building and for signing) when MODULE_SIG_KEY
contains a PKCS#11 URI
* tag 'kbuild-fixes-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: modinst: Fix build error when CONFIG_MODULE_SIG_KEY is a PKCS#11 URI
certs: Fix build error when PKCS#11 URI contains semicolon
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Yet another fix for non-CPU accesses to the memory backing the
VGICv3 subsystem
- A set of fixes for the setlftest checking for the S1PTW behaviour
after the fix that went in ealier in the cycle"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: aarch64: Test read-only PT memory regions
KVM: selftests: aarch64: Fix check of dirty log PT write
KVM: selftests: aarch64: Do not default to dirty PTE pages on all S1PTWs
KVM: selftests: aarch64: Relax userfaultfd read vs. write checks
KVM: arm64: Allow no running vcpu on saving vgic3 pending table
KVM: arm64: Allow no running vcpu on restoring vgic3 LPI pending status
KVM: arm64: Add helper vgic_write_guest_lock()
Pull parisc architecture fixes from Helge Deller:
- Fix PTRACE_GETREGS/PTRACE_SETREGS for 32-bit userspace on a 64-bit
kernel
- pdc_iodc_print() dropped chars for newline in strings
- Drop constants in favour of PRIV_USER
- use safer strscpy() function in pdc_stable driver
* tag 'parisc-for-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat case
parisc: Replace hardcoded value with PRIV_USER constant in ptrace.c
parisc: Fix return code of pdc_iodc_print()
parisc: pdc_stable: use strscpy() to instead of strncpy()
Pull OpenRISC mailing list update from Stafford Horne:
"The old mailing list for OpenRISC died due to some infrastructure
issues and the people in charge decided not to keep it running. We
have migrated this and the users over to kernel.org infrastructure.
Sending this out now to avoid kernel developers getting lots of
bounced mails for using the old list"
* tag 'for-linus' of https://github.com/openrisc/linux:
MAINTAINERS: Update OpenRISC mailing list
KVM/arm64 fixes for 6.2, take #3
- Yet another fix for non-CPU accesses to the memory backing
the VGICv3 subsystem
- A set of fixes for the setlftest checking for the S1PTW
behaviour after the fix that went in ealier in the cycle
The Retire Latency field is added in the var3_w of the
PERF_SAMPLE_WEIGHT_STRUCT. The Retire Latency reports the number of
elapsed core clocks between the retirement of the instruction indicated
by the Instruction Pointer field of the PEBS record and the retirement
of the prior instruction. That's quite useful to display the information
with perf script.
Add a new field retire_lat for the Retire Latency information.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20230104201349.1451191-9-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Retire Latency field is added in the var3_w of the
PERF_SAMPLE_WEIGHT_STRUCT. The Retire Latency reports pipeline stall of
this instruction compared to the previous instruction in cycles. That's
quite useful to display the information with perf mem report.
The p_stage_cyc for Power is also from the var3_w. Union the p_stage_cyc
and retire_lat to share the code.
Implement X86 specific codes to display the X86 specific header.
Add a new sort key retire_lat for the Retire Latency.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20230104201349.1451191-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It'd be useful to filter other than the current aggregation mode. For
example, users may want to see callstacks for specific locks only. Or
they may want tasks from a certain callstack.
The tracepoints already collected the information but it needs to check
the condition again when processing the event. And it needs to change
BPF to allow the key combinations.
The lock contentions on 'rcu_state' spinlock can be monitored:
$ sudo perf lock con -abv -L rcu_state sleep 1
...
contended total wait max wait avg wait type caller
4 151.39 us 62.57 us 37.85 us spinlock rcu_core+0xcb
0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46
0xffffffff8172d76b rcu_core+0xcb
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
0xffffffff81d49f78 cpuidle_enter_state+0xd8
0xffffffff81d4a259 cpuidle_enter+0x29
1 30.21 us 30.21 us 30.21 us spinlock rcu_core+0xcb
0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46
0xffffffff8172d76b rcu_core+0xcb
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc00c4 sysvec_apic_timer_interrupt+0x54
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
1 28.84 us 28.84 us 28.84 us spinlock rcu_accelerate_cbs_unlocked+0x40
0xffffffff81fd1c60 _raw_spin_lock+0x30
0xffffffff81728cf0 rcu_accelerate_cbs_unlocked+0x40
0xffffffff8172da82 rcu_core+0x3e2
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
0xffffffff81d49f78 cpuidle_enter_state+0xd8
...
To see tasks calling 'rcu_core' function:
$ sudo perf lock con -abt -S rcu_core sleep 1
contended total wait max wait avg wait pid comm
19 23.46 us 2.21 us 1.23 us 0 swapper
2 18.37 us 17.01 us 9.19 us 2061859 ThreadPoolForeg
3 5.76 us 1.97 us 1.92 us 3909 pipewire-pulse
1 2.26 us 2.26 us 2.26 us 1809271 MediaSu~isor #2
1 1.97 us 1.97 us 1.97 us 1514882 Chrome_ChildIOT
1 987 ns 987 ns 987 ns 3740 pipewire-pulse
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230203021324.143540-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull block fixes from Jens Axboe:
"A bit bigger than I'd like at this point, but mostly a bunch of little
fixes. In detail:
- NVMe pull request via Christoph:
- Fix a missing queue put in nvmet_fc_ls_create_association
(Amit Engel)
- Clear queue pointers on tag_set initialization failure
(Maurizio Lombardi)
- Use workqueue dedicated to authentication (Shin'ichiro
Kawasaki)
- Fix for an overflow in ublk (Liu)
- Fix for leaking a queue reference in block cgroups (Ming)
- Fix for a use-after-free in BFQ (Yu)"
* tag 'block-6.2-2023-02-03' of git://git.kernel.dk/linux:
blk-cgroup: don't update io stat for root cgroup
nvme-auth: use workqueue dedicated to authentication
nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set
nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set
nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
block: Fix the blk_mq_destroy_queue() documentation
block: ublk: extending queue_size to fix overflow
block, bfq: fix uaf for bfqq in bic_set_bfqq()
Pull ceph fix from Ilya Dryomov:
"A safeguard to prevent the kernel client from further damaging the
filesystem after running into a case of an invalid snap trace.
The root cause of this metadata corruption is still being investigated
but it appears to be stemming from the MDS. As such, this is the best
we can do for now"
* tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-client:
ceph: blocklist the kclient when receiving corrupted snap trace
ceph: move mount state enum to super.h
Pull RISC-V fixes from Palmer Dabbelt:
- A build fix to avoid static branches in cpu_relax(), which greatly
inflates the jump tables and breaks at least
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
- A fix for a kernel panic when probing impossible instruction
positions.
- A fix to disable unwind tables, which are enabled by default for
GCC-13 and result in unhandled relocations in modules.
* tag 'riscv-for-linus-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: disable generation of unwind tables
riscv: kprobe: Fixup kernel panic when probing an illegal position
riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y
Pull drm fixes from Dave Airlie:
"A few more fixes this week, a bit more spread out though.
We have a bunch of nouveau regression and stabilisation fixes, along
with usual amdgpu, and i915. Otherwise just some minor misc ones:
dma-fence:
- fix signaling bit for private fences
panel:
- boe-tv101wum-nl6 disable fix
nouveau:
- gm20b acr regression fix
- tu102 scrub status fix
- tu102 wait for firmware fix
i915:
- Fixes for potential use-after-free and double-free
- GuC locking and refcount fixes
- Display's reference clock value fix
amdgpu:
- GC11 fixes
- DCN 3.1.4 fixes
- NBIO 4.3 fix
- DCN 3.2 fixes
- Properly handle additional cases where DCN is not supported
- SMU13 fixes
vc4:
- fix CEC adapter names
ssd130x:
- fix display init regression"
* tag 'drm-fixes-2023-02-03' of git://anongit.freedesktop.org/drm/drm: (23 commits)
drm/amd/display: Properly handle additional cases where DCN is not supported
drm/amdgpu: Enable vclk dclk node for gc11.0.3
drm/amd: Fix initialization for nbio 4.3.0
drm/amdgpu: enable HDP SD for gfx 11.0.3
drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.4/11
drm/amd/display: Reset DMUB mailbox SW state after HW reset
drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2
drm/amd/display: Adjust downscaling limits for dcn314
drm/amd/display: Add missing brackets in calculation
drm/amdgpu: update wave data type to 3 for gfx11
drm/panel: boe-tv101wum-nl6: Ensure DSI writes succeed during disable
drm/nouveau/acr/gm20b: regression fixes
drm/nouveau/fb/tu102-: fix register used to determine scrub status
drm/nouveau/devinit/tu102-: wait for GFW_BOOT_PROGRESS == COMPLETED
drm/i915/adlp: Fix typo for reference clock
drm/i915: Fix potential bit_17 double-free
drm/i915: Fix up locking around dumping requests lists
drm/i915: Fix request ref counting during error capture & debugfs dump
drm/i915/guc: Fix locking when searching for a hung request
drm/i915: Avoid potential vm use-after-free
...
Pull misc fixes from Andrew Morton:
"25 hotfixes, mainly for MM. 13 are cc:stable"
* tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits)
mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
Kconfig.debug: fix the help description in SCHED_DEBUG
mm/swapfile: add cond_resched() in get_swap_pages()
mm: use stack_depot_early_init for kmemleak
Squashfs: fix handling and sanity checking of xattr_ids count
sh: define RUNTIME_DISCARD_EXIT
highmem: round down the address passed to kunmap_flush_on_unmap()
migrate: hugetlb: check for hugetlb shared PMD in node migration
mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
Revert "mm: kmemleak: alloc gray object for reserved region with direct map"
freevxfs: Kconfig: fix spelling
maple_tree: should get pivots boundary by type
.mailmap: update e-mail address for Eugen Hristev
mm, mremap: fix mremap() expanding for vma's with vm_ops->close()
squashfs: harden sanity check in squashfs_read_xattr_id_table
ia64: fix build error due to switch case label appearing next to declaration
mm: multi-gen LRU: fix crash during cgroup migration
Revert "mm: add nodes= arg to memory.reclaim"
zsmalloc: fix a race with deferred_handles storing
...