Pull RISC-V fixes from Palmer Dabbelt:
- a workaround for a compiler surprise related to the "r" inline
assembly that allows LLVM to boot.
- a fix to avoid WX-only mappings, which the ISA does not allow. While
this probably manifests in many ways, the bug was found in stress-ng.
- a missing lock in set_direct_map_*(), which due to a recent lockdep
change started asserting.
* tag 'riscv-for-linus-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Acquire mmap lock before invoking walk_page_range
RISC-V: Don't allow write+exec only page mapping request in mmap
riscv/atomic: Fix sign extension for RV64I
Pull kselftest cleanups from Shuah Khan:
- ftrace "requires:" list for simplifying and unifying requirement
checks for each test case, adding "requires:" line instead of
checking required ftrace interfaces in each test case.
- a minor spelling correction patch
* tag 'linux-kselftest-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Support ":README" suffix for requires
selftests/ftrace: Support ":tracer" suffix for requires
selftests/ftrace: Convert check_filter_file() with requires list
selftests/ftrace: Convert required interface checks into requires list
selftests/ftrace: Add "requires:" list support
selftests/ftrace: Return unsupported for the unconfigured features
selftests/ftrace: Allow ":" in description
tools: testing: ftrace: trigger: fix spelling mistake
The fileserver probe timer, net->fs_probe_timer, isn't cancelled when
the kafs module is being removed and so the count it holds on
net->servers_outstanding doesn't get dropped..
This causes rmmod to wait forever. The hung process shows a stack like:
afs_purge_servers+0x1b5/0x23c [kafs]
afs_net_exit+0x44/0x6e [kafs]
ops_exit_list+0x72/0x93
unregister_pernet_operations+0x14c/0x1ba
unregister_pernet_subsys+0x1d/0x2a
afs_exit+0x29/0x6f [kafs]
__do_sys_delete_module.isra.0+0x1a2/0x24b
do_syscall_64+0x51/0x95
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by:
(1) Attempting to cancel the probe timer and, if successful, drop the
count that the timer was holding.
(2) Make the timer function just drop the count and not schedule the
prober if the afs portion of net namespace is being destroyed.
Also, whilst we're at it, make the following changes:
(3) Initialise net->servers_outstanding to 1 and decrement it before
waiting on it so that it doesn't generate wake up events by being
decremented to 0 until we're cleaning up.
(4) Switch the atomic_dec() on ->servers_outstanding for ->fs_timer in
afs_purge_servers() to use the helper function for that.
Fixes: f6cbb368bc ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix afs_do_lookup()'s fallback case for when FS.InlineBulkStatus isn't
supported by the server.
In the fallback, it calls FS.FetchStatus for the specific vnode it's
meant to be looking up. Commit b6489a49f7 broke this by renaming one
of the two identically-named afs_fetch_status_operation descriptors to
something else so that one of them could be made non-static. The site
that used the renamed one, however, wasn't renamed and didn't produce
any warning because the other was declared in a header.
Fix this by making afs_do_lookup() use the renamed variant.
Note that there are two variants of the success method because one is
called from ->lookup() where we may or may not have an inode, but can't
call iget until after we've talked to the server - whereas the other is
called from within iget where we have an inode, but it may or may not be
initialised.
The latter variant expects there to be an inode, but because it's being
called from there former case, there might not be - resulting in an oops
like the following:
BUG: kernel NULL pointer dereference, address: 00000000000000b0
...
RIP: 0010:afs_fetch_status_success+0x27/0x7e
...
Call Trace:
afs_wait_for_operation+0xda/0x234
afs_do_lookup+0x2fe/0x3c1
afs_lookup+0x3c5/0x4bd
__lookup_slow+0xcd/0x10f
walk_component+0xa2/0x10c
path_lookupat.isra.0+0x80/0x110
filename_lookup+0x81/0x104
vfs_statx+0x76/0x109
__do_sys_newlstat+0x39/0x6b
do_syscall_64+0x4c/0x78
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: b6489a49f7 ("afs: Fix silly rename")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull io_uring fixes from Jens Axboe:
- Catch a case where io_sq_thread() didn't do proper mm acquire
- Ensure poll completions are reaped on shutdown
- Async cancelation and run fixes (Pavel)
- io-poll race fixes (Xiaoguang)
- Request cleanup race fix (Xiaoguang)
* tag 'io_uring-5.8-2020-06-19' of git://git.kernel.dk/linux-block:
io_uring: fix possible race condition against REQ_F_NEED_CLEANUP
io_uring: reap poll completions while waiting for refs to drop on exit
io_uring: acquire 'mm' for task_work for SQPOLL
io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
io_uring: don't fail links for EAGAIN error in IOPOLL mode
io_uring: cancel by ->task not pid
io_uring: lazy get task
io_uring: batch cancel in io_uring_cancel_files()
io_uring: cancel all task's requests on exit
io-wq: add an option to cancel all matched reqs
io-wq: reorder cancellation pending -> running
io_uring: fix lazy work init
Pull block fixes from Jens Axboe:
- Use import_uuid() where appropriate (Andy)
- bcache fixes (Coly, Mauricio, Zhiqiang)
- blktrace sparse warnings fix (Jan)
- blktrace concurrent setup fix (Luis)
- blkdev_get use-after-free fix (Jason)
- Ensure all blk-mq maps are updated (Weiping)
- Loop invalidate bdev fix (Zheng)
* tag 'block-5.8-2020-06-19' of git://git.kernel.dk/linux-block:
block: make function 'kill_bdev' static
loop: replace kill_bdev with invalidate_bdev
partitions/ldm: Replace uuid_copy() with import_uuid() where it makes sense
block: update hctx map when use multiple maps
blktrace: Avoid sparse warnings when assigning q->blk_trace
blktrace: break out of blktrace setup on concurrent calls
block: Fix use-after-free in blkdev_get()
trace/events/block.h: drop kernel-doc for dropped function parameter
blk-mq: Remove redundant 'return' statement
bcache: pr_info() format clean up in bcache_device_init()
bcache: use delayed kworker fo asynchronous devices registration
bcache: check and adjust logical block size for backing devices
bcache: fix potential deadlock problem in btree_gc_coalesce
Pull libata fixes from Jens Axboe:
"A few minor changes that should go into this release"
* tag 'libata-5.8-2020-06-19' of git://git.kernel.dk/linux-block:
libata: Use per port sync for detach
ata/libata: Fix usage of page address by page_address in ata_scsi_mode_select_xlat function
sata_rcar: handle pm_runtime_get_sync failure cases
Pull drm fixes from Dave Airlie:
"Just i915 and amd here.
i915 has some workaround movement so they get applied at the right
times, and a timeslicing fix, along with some display fixes.
AMD has a few display floating point fix and a devcgroup fix for
amdkfd.
i915:
- Fix for timeslicing and virtual engines/unpremptable requests (+ 1
dependency patch)
- Fixes into TypeC register programming and interrupt storm detecting
- Disable DIP on MST ports with the transcoder clock still on
- Avoid missing GT workarounds at reset for HSW and older gens
- Fix for unwinding multiple requests missing force restore
- Fix encoder type check for DDI vswing sequence
- Build warning fixes
amdgpu:
- Fix kvfree/kfree mixup
- Fix hawaii device id in powertune configuration
- Display FP fixes
- Documentation fixes
amdkfd:
- devcgroup check fix"
* tag 'drm-fixes-2020-06-19' of git://anongit.freedesktop.org/drm/drm: (23 commits)
drm/amdgpu: fix documentation around busy_percentage
drm/amdgpu/pm: update comment to clarify Overdrive interfaces
drm/amdkfd: Use correct major in devcgroup check
drm/i915/display: Fix the encoder type check
drm/i915/icl+: Fix hotplug interrupt disabling after storm detection
drm/i915/gt: Move gen4 GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move ilk GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move snb GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move vlv GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move ivb GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move hsw GT workarounds from init_clock_gating to workarounds
drm/i915/icl: Disable DIP on MST ports with the transcoder clock still on
drm/i915/gt: Incrementally check for rewinding
drm/i915/tc: fix the reset of ln0
drm/i915/gt: Prevent timeslicing into unpreemptable requests
drm/i915/selftests: Restore to default heartbeat
drm/i915: work around false-positive maybe-uninitialized warning
drm/i915/pmu: avoid an maybe-uninitialized warning
drm/i915/gt: Incorporate the virtual engine into timeslicing
drm/amd/display: Rework dsc to isolate FPU operations
...
Pull ceph fixes from Ilya Dryomov:
"An important follow-up for replica reads support that went into -rc1
and two target_copy() fixups"
* tag 'ceph-for-5.8-rc2' of git://github.com/ceph/ceph-client:
libceph: don't omit used_replica in target_copy()
libceph: don't omit recovery_deletes in target_copy()
libceph: move away from global osd_req_flags
Pull arm64 fixes from Will Deacon:
"Unfortunately, we still have a number of outstanding issues so there
will be more fixes to come, but this lot are a good start.
- Fix handling of watchpoints triggered by uaccess routines
- Fix initialisation of gigantic pages for CMA buffers
- Raise minimum clang version for BTI to avoid miscompilation
- Fix data race in SVE vector length configuration code
- Ensure address tags are ignored in kern_addr_valid()
- Dump register state on fatal BTI exception
- kexec_file() cleanup to use struct_size() macro"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints
arm64: kexec_file: Use struct_size() in kmalloc()
arm64: mm: reserve hugetlb CMA after numa_init
arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support
arm64: sve: Fix build failure when ARM64_SVE=y and SYSCTL=n
arm64: pgtable: Clear the GP bit for non-executable kernel pages
arm64: mm: reset address tag set by kasan sw tagging
arm64: traps: Dump registers prior to panic() in bad_mode()
arm64/sve: Eliminate data races on sve_default_vl
docs/arm64: Fix typo'd #define in sve.rst
arm64: remove TEXT_OFFSET randomization
Pull flex-array size helper from Kees Cook:
"During the treewide clean-ups of zero-length "flexible arrays", the
struct_size() helper was heavily used, but it was noticed that many
times it would have been nice to have an additional helper to get the
size of just the flexible array itself.
This need appears to be even more common when cleaning up the 1-byte
array "flexible arrays", so Gustavo implemented it.
I'd love to get this landed early so it can be used during the v5.9
dev cycle to ease the 1-byte array cleanups."
* tag 'overflow-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
overflow.h: Add flex_array_size() helper
Pull perf tooling fixes from Arnaldo Carvalho de Melo:
- Update various UAPI headers, some automatically adding support for a
new MSR and the faccess2 syscall.
- Fix corner case NULL deref in the histograms code.
- Fix corner case NULL deref in 'perf stat' aggregation code.
- Fix array pointer deref and old style declaration in the parsing of
events.
- Fix segfault when processing ZSTD compressed perf.data files in 'perf
script' due to lack of initialization of the ZSTD library.
- Handle __attribute__((user)) in libtraceevent fixing the parsing of
syscall tracepoints with user buffers.
- Make libtraevent aware of __builtin_expect() appearing in tracepoint
fields.
- Make the BPF prologue generation use bpf_probe_read_{user,kernel}().
- Fix the '@user' attribute parsing in kprobes variables in 'perf
probe'.
- Fix error message when asking for -fsanitize=address without required
libraries.
* tag 'perf-tools-fixes-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (22 commits)
perf build: Fix error message when asking for -fsanitize=address without required libraries
tools lib traceevent: Add handler for __builtin_expect()
tools lib traceevent: Handle __attribute__((user)) in field names
tools lib traceevent: Add append() function helper for appending strings
tools headers UAPI: Sync linux/fs.h with the kernel sources
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf script: Initialize zstd_data
perf pmu: Remove unused declaration
perf parse-events: Fix an old style declaration
perf parse-events: Fix an incompatible pointer
perf bpf: Fix bpf prologue generation
perf probe: Fix user attribute access in kprobes
perf stat: Fix NULL pointer dereference
perf report: Fix NULL pointer dereference in hists__fprintf_nr_sample_events()
tools headers UAPI: Sync kvm.h headers with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
perf beauty: Add support to STATX_MNT_ID in the 'statx' syscall 'mask' argument
tools headers uapi: Sync linux/stat.h with the kernel sources
...
As per the table 4.4 of version "20190608-Priv-MSU-Ratified" of the
RISC-V instruction set manual[0], the PTE permission bit combination of
"write+exec only" is reserved for future use. Hence, don't allow such
mapping request in mmap call.
An issue is been reported by David Abdurachmanov, that while running
stress-ng with "sysbadaddr" argument, RCU stalls are observed on RISC-V
specific kernel.
This issue arises when the stress-sysbadaddr request for pages with
"write+exec only" permission bits and then passes the address obtain
from this mmap call to various system call. For the riscv kernel, the
mmap call should fail for this particular combination of permission bits
since it's not valid.
[0]: http://dabbelt.com/~palmer/keep/riscv-isa-manual/riscv-privileged-20190608-1.pdf
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Reported-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
[Palmer: Refer to the latest ISA specification at the only link I could
find, and update the terminology.]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
- Fix for timeslicing and virtual engines/unpremptable requests
(+ 1 dependency patch)
- Fixes into TypeC register programming and interrupt storm detecting
- Disable DIP on MST ports with the transcoder clock still on
- Avoid missing GT workarounds at reset for HSW and older gens
- Fix for unwinding multiple requests missing force restore
- Fix encoder type check for DDI vswing sequence
- Build warning fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200618124659.GA12342@jlahtine-desk.ger.corp.intel.com
Merge non-faulting memory access cleanups from Christoph Hellwig:
"Andrew and I decided to drop the patches implementing your suggested
rename of the probe_kernel_* and probe_user_* helpers from -mm as
there were way to many conflicts.
After -rc1 might be a good time for this as all the conflicts are
resolved now"
This also adds a type safety checking patch on top of the renaming
series to make the subtle behavioral difference between 'get_user()' and
'get_kernel_nofault()' less potentially dangerous and surprising.
* emailed patches from Christoph Hellwig <hch@lst.de>:
maccess: make get_kernel_nofault() check for minimal type compatibility
maccess: rename probe_kernel_address to get_kernel_nofault
maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault
maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
Now that we've renamed probe_kernel_address() to get_kernel_nofault()
and made it look and behave more in line with get_user(), some of the
subtle type behavior differences end up being more obvious and possibly
dangerous.
When you do
get_user(val, user_ptr);
the type of the access comes from the "user_ptr" part, and the above
basically acts as
val = *user_ptr;
by design (except, of course, for the fact that the actual dereference
is done with a user access).
Note how in the above case, the type of the end result comes from the
pointer argument, and then the value is cast to the type of 'val' as
part of the assignment.
So the type of the pointer is ultimately the more important type both
for the access itself.
But 'get_kernel_nofault()' may now _look_ similar, but it behaves very
differently. When you do
get_kernel_nofault(val, kernel_ptr);
it behaves like
val = *(typeof(val) *)kernel_ptr;
except, of course, for the fact that the actual dereference is done with
exception handling so that a faulting access is suppressed and returned
as the error code.
But note how different the casting behavior of the two superficially
similar accesses are: one does the actual access in the size of the type
the pointer points to, while the other does the access in the size of
the target, and ignores the pointer type entirely.
Actually changing get_kernel_nofault() to act like get_user() is almost
certainly the right thing to do eventually, but in the meantime this
patch adds logit to at least verify that the pointer type is compatible
with the type of the result.
In many cases, this involves just casting the pointer to 'void *' to
make it obvious that the type of the pointer is not the important part.
It's not how 'get_user()' acts, but at least the behavioral difference
is now obvious and explicit.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.
Also switch the argument order around, so that it acts and looks
like get_user().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, address spaces in warnings are displayed as '<asn:X>' with
'X' being the address space's arbitrary number.
But since sparse v0.6.0-rc1 (late December 2018), sparse allows you to
define the address spaces using an identifier instead of a number. This
identifier is then directly used in the warnings.
So, use the identifiers '__user', '__iomem', '__percpu' & '__rcu' for
the corresponding address spaces. The default address space, __kernel,
being not displayed in warnings, stays defined as '0'.
With this change, warnings that used to be displayed as:
cast removes address space '<asn:1>' of expression
... void [noderef] <asn:2> *
will now be displayed as:
cast removes address space '__user' of expression
... void [noderef] __iomem *
This also moves the __kernel annotation to be the first one, since it is
quite different from the others because it's the default one, and so:
- it's never displayed
- it's normally not needed, nor in type annotations, nor in cast
between address spaces. The only time it's needed is when it's
combined with a typeof to express "the same type as this one but
without the address space"
- it can't be defined with a name, '0' must be used.
So, it seemed strange to me to have it in the middle of the other
ones.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a filesystem is mounted on a loop device and on a loop ioctl
LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting
destroyed.
kill_bdev
truncate_inode_pages
truncate_inode_pages_range
do_invalidatepage
block_invalidatepage
discard_buffer -->clear BH_Mapped flag
sb_bread
__bread_gfp
bh = __getblk_gfp
-->discard_buffer clear BH_Mapped flag
__bread_slow
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON
Fixes: 5db470e229 ("loop: drop caches if offset or block_size are changed")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 130f4caf14 ("libata: Ensure ata_port probe has completed before
detach") may cause system freeze during suspend.
Using async_synchronize_full() in PM callbacks is wrong, since async
callbacks that are already scheduled may wait for not-yet-scheduled
callbacks, causes a circular dependency.
Instead of using big hammer like async_synchronize_full(), use async
cookie to make sure port probe are synced, without affecting other
scheduled PM callbacks.
Fixes: 130f4caf14 ("libata: Ensure ata_port probe has completed before detach")
Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: John Garry <john.garry@huawei.com>
BugLink: https://bugs.launchpad.net/bugs/1867983
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a specific API to treat raw data as UUID, i.e. import_uuid().
Use it instead of uuid_copy() with explicit casting.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In io_read() or io_write(), when io request is submitted successfully,
it'll go through the below sequence:
kfree(iovec);
req->flags &= ~REQ_F_NEED_CLEANUP;
return ret;
But clearing REQ_F_NEED_CLEANUP might be unsafe. The io request may
already have been completed, and then io_complete_rw_iopoll()
and io_complete_rw() will be called, both of which will also modify
req->flags if needed. This causes a race condition, with concurrent
non-atomic modification of req->flags.
To eliminate this race, in io_read() or io_write(), if io request is
submitted successfully, we don't remove REQ_F_NEED_CLEANUP flag. If
REQ_F_NEED_CLEANUP is set, we'll leave __io_req_aux_free() to the
iovec cleanup work correspondingly.
Cc: stable@vger.kernel.org
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When build perf with ASan or UBSan, if libasan or libubsan can not find,
the feature-glibc is 0 and there exists the following error log which is
wrong, because we can find gnu/libc-version.h in /usr/include,
glibc-devel is also installed.
[yangtiezhu@linux perf]$ make DEBUG=1 EXTRA_CFLAGS='-fno-omit-frame-pointer -fsanitize=address'
BUILD: Doing 'make -j4' parallel build
HOSTCC fixdep.o
HOSTLD fixdep-in.o
LINK fixdep
<stdin>:1:0: warning: -fsanitize=address and -fsanitize=kernel-address are not supported for this target
<stdin>:1:0: warning: -fsanitize=address not supported for this target
Auto-detecting system features:
... dwarf: [ OFF ]
... dwarf_getlocations: [ OFF ]
... glibc: [ OFF ]
... gtk2: [ OFF ]
... libaudit: [ OFF ]
... libbfd: [ OFF ]
... libcap: [ OFF ]
... libelf: [ OFF ]
... libnuma: [ OFF ]
... numa_num_possible_cpus: [ OFF ]
... libperl: [ OFF ]
... libpython: [ OFF ]
... libcrypto: [ OFF ]
... libunwind: [ OFF ]
... libdw-dwarf-unwind: [ OFF ]
... zlib: [ OFF ]
... lzma: [ OFF ]
... get_cpuid: [ OFF ]
... bpf: [ OFF ]
... libaio: [ OFF ]
... libzstd: [ OFF ]
... disassembler-four-args: [ OFF ]
Makefile.config:393: *** No gnu/libc-version.h found, please install glibc-dev[el]. Stop.
Makefile.perf:224: recipe for target 'sub-make' failed
make[1]: *** [sub-make] Error 2
Makefile:69: recipe for target 'all' failed
make: *** [all] Error 2
[yangtiezhu@linux perf]$ ls /usr/include/gnu/libc-version.h
/usr/include/gnu/libc-version.h
After install libasan and libubsan, the feature-glibc is 1 and the build
process is success, so the cause is related with libasan or libubsan, we
should check them and print an error log to reflect the reality.
Committer testing:
$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf
$ make DEBUG=1 EXTRA_CFLAGS='-fno-omit-frame-pointer -fsanitize=address' O=/tmp/build/perf -C tools/perf/ install-bin
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j12' parallel build
HOSTCC /tmp/build/perf/fixdep.o
HOSTLD /tmp/build/perf/fixdep-in.o
LINK /tmp/build/perf/fixdep
Auto-detecting system features:
... dwarf: [ OFF ]
... dwarf_getlocations: [ OFF ]
... glibc: [ OFF ]
... gtk2: [ OFF ]
... libbfd: [ OFF ]
... libcap: [ OFF ]
... libelf: [ OFF ]
... libnuma: [ OFF ]
... numa_num_possible_cpus: [ OFF ]
... libperl: [ OFF ]
... libpython: [ OFF ]
... libcrypto: [ OFF ]
... libunwind: [ OFF ]
... libdw-dwarf-unwind: [ OFF ]
... zlib: [ OFF ]
... lzma: [ OFF ]
... get_cpuid: [ OFF ]
... bpf: [ OFF ]
... libaio: [ OFF ]
... libzstd: [ OFF ]
... disassembler-four-args: [ OFF ]
Makefile.config:401: *** No libasan found, please install libasan. Stop.
make[1]: *** [Makefile.perf:231: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/home/acme/git/perf/tools/perf'
$
$
$ sudo dnf install libasan
<SNIP>
Installed:
libasan-9.3.1-2.fc31.x86_64
$
$
$ make DEBUG=1 EXTRA_CFLAGS='-fno-omit-frame-pointer -fsanitize=address' O=/tmp/build/perf -C tools/perf/ install-bin
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j12' parallel build
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... gtk2: [ on ]
... libbfd: [ on ]
... libcap: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libcrypto: [ on ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
... disassembler-four-args: [ on ]
<SNIP>
CC /tmp/build/perf/util/pmu-flex.o
FLEX /tmp/build/perf/util/expr-flex.c
CC /tmp/build/perf/util/expr-bison.o
CC /tmp/build/perf/util/expr.o
CC /tmp/build/perf/util/expr-flex.o
CC /tmp/build/perf/util/parse-events-flex.o
CC /tmp/build/perf/util/parse-events.o
LD /tmp/build/perf/util/intel-pt-decoder/perf-in.o
LD /tmp/build/perf/util/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
<SNIP>
INSTALL python-scripts
INSTALL perf_completion-script
INSTALL perf-tip
make: Leaving directory '/home/acme/git/perf/tools/perf'
$ ldd ~/bin/perf | grep asan
libasan.so.5 => /lib64/libasan.so.5 (0x00007f0904164000)
$
And if we rebuild without -fsanitize-address:
$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf
$ make O=/tmp/build/perf -C tools/perf/ install-bin
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j12' parallel build
HOSTCC /tmp/build/perf/fixdep.o
HOSTLD /tmp/build/perf/fixdep-in.o
LINK /tmp/build/perf/fixdep
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... gtk2: [ on ]
... libbfd: [ on ]
... libcap: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libcrypto: [ on ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
... disassembler-four-args: [ on ]
GEN /tmp/build/perf/common-cmds.h
CC /tmp/build/perf/exec-cmd.o
<SNIP>
INSTALL perf_completion-script
INSTALL perf-tip
make: Leaving directory '/home/acme/git/perf/tools/perf'
$ ldd ~/bin/perf | grep asan
$
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: tiezhu yang <yangtiezhu@loongson.cn>
Cc: xuefeng li <lixuefeng@loongson.cn>
Link: http://lore.kernel.org/lkml/1592445961-28044-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Unprivileged memory accesses generated by the so-called "translated"
instructions (e.g. STTR) at EL1 can cause EL0 watchpoints to fire
unexpectedly if kernel debugging is enabled. In such cases, the
hw_breakpoint logic will invoke the user overflow handler which will
typically raise a SIGTRAP back to the current task. This is futile when
returning back to the kernel because (a) the signal won't have been
delivered and (b) userspace can't handle the thing anyway.
Avoid invoking the user overflow handler for watchpoints triggered by
kernel uaccess routines, and instead single-step over the faulting
instruction as we would if no overflow handler had been installed.
(Fixes tag identifies the introduction of unprivileged memory accesses,
which exposed this latent bug in the hw_breakpoint code)
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Fixes: 57f4959bad ("arm64: kernel: Add support for User Access Override")
Reported-by: Luis Machado <luis.machado@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
There is an issue when tune the number for read and write queues,
if the total queue count was not changed. The hctx->type cannot
be updated, since __blk_mq_update_nr_hw_queues will return directly
if the total queue count has not been changed.
Reproduce:
dmesg | grep "default/read/poll"
[ 2.607459] nvme nvme0: 48/0/0 default/read/poll queues
cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c
48 default
tune the write queues to 24:
echo 24 > /sys/module/nvme/parameters/write_queues
echo 1 > /sys/block/nvme0n1/device/reset_controller
dmesg | grep "default/read/poll"
[ 433.547235] nvme nvme0: 24/24/0 default/read/poll queues
cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c
48 default
The driver's hardware queue mapping is not same as block layer.
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add rename the gpu busy percentage for consistency and
add the mem busy percentage documentation.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The existing code used the major version number of the DRM driver
instead of the device major number of the DRM subsystem for
validating access for a devices cgroup.
This meant that accesses allowed by the devices cgroup weren't
permitted and certain accesses denied by the devices cgroup were
permitted (if they matched the wrong major device number).
Signed-off-by: Lorenz Brun <lorenz@brun.one>
Fixes: 6b855f7b83 ("drm/amdkfd: Check against device cgroup")
Reviewed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
If we're doing polled IO and end up having requests being submitted
async, then completions can come in while we're waiting for refs to
drop. We need to reap these manually, as nobody else will be looking
for them.
Break the wait into 1/20th of a second time waits, and check for done
poll completions if we time out. Otherwise we can have done poll
completions sitting in ctx->poll_list, which needs us to reap them but
we're just waiting for them.
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we're unlucky with timing, we could be running task_work after
having dropped the memory context in the sq thread. Since dropping
the context requires a runnable task state, we cannot reliably drop
it as part of our check-for-work loop in io_sq_thread(). Instead,
abstract out the mm acquire for the sq thread into a helper, and call
it from the async task work handler.
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
completed are two independent store operations, to ensure that once
iopoll_completed is ture and then req->result must been perceived by
the cpu executing io_do_iopoll(), proper memory barrier should be used.
And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
we'll need to issue this io request using io-wq again. In order to just
issue a single smp_rmb() on the completion side, move the re-submit work
to io_iopoll_complete().
Cc: stable@vger.kernel.org
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
[axboe: don't set ->iopoll_completed for -EAGAIN retry]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In IOPOLL mode, for EAGAIN error, we'll try to submit io request
again using io-wq, so don't fail rest of links if this io request
has links.
Cc: stable@vger.kernel.org
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull dma-mapping fixes from Christoph Hellwig:
"Fixes for the SEV atomic pool (Geert Uytterhoeven and David Rientjes)"
* tag 'dma-mapping-5.8-3' of git://git.infradead.org/users/hch/dma-mapping:
dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL
dma-pool: fix too large DMA pools on medium memory size systems
To pick the changes from:
b383a73f2b ("fs/ext4: Introduce DAX inode flag")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
It causes various beautifiers for things like fspick, fsmount, etc (see
below) to get rebuilt, but this specific change doesn't make 'perf
trace' be capable of decoding anything new, as we still don't decode
what comes from ioctls, just its cmds.
Details about the update:
$ cp include/uapi/linux/fs.h tools/include/uapi/linux/fs.h
$ git diff
diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h
index 379a612f8f1d..f44eb0a04afd 100644
--- a/tools/include/uapi/linux/fs.h
+++ b/tools/include/uapi/linux/fs.h
@@ -262,6 +262,7 @@ struct fsxattr {
#define FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define FS_EOFBLOCKS_FL 0x00400000 /* Reserved for ext4 */
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
+#define FS_DAX_FL 0x02000000 /* Inode is DAX */
#define FS_INLINE_DATA_FL 0x10000000 /* Reserved for ext4 */
#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define FS_CASEFOLD_FL 0x40000000 /* Folder is case insensitive */
$ m
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j8' parallel build
INSTALL GTK UI
CC /tmp/build/perf/builtin-trace.o
DESCEND plugins
CC /tmp/build/perf/trace/beauty/fsmount.o
CC /tmp/build/perf/trace/beauty/fspick.o
CC /tmp/build/perf/trace/beauty/mount_flags.o
CC /tmp/build/perf/trace/beauty/move_mount.o
CC /tmp/build/perf/trace/beauty/renameat.o
CC /tmp/build/perf/trace/beauty/sync_file_range.o
INSTALL trace_plugins
LD /tmp/build/perf/trace/beauty/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
<SNIP>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
7e5b3c267d ("x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation")
Addressing these tools/perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
With this one will be able to use these new AMD MSRs in filters, by
name, e.g.:
# perf trace -e msr:* --filter "msr==IA32_MCU_OPT_CTRL"
^C#
Using -v we can see how it sets up the tracepoint filters, converting
from the string in the filter to the numeric value:
# perf trace -v -e msr:* --filter "msr==IA32_MCU_OPT_CTRL"
Using CPUID GenuineIntel-6-8E-A
0x123
New filter for msr:read_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
0x123
New filter for msr:write_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
0x123
New filter for msr:rdpmc: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
mmap size 528384B
^C#
The updating process shows how this affects tooling in more detail:
$ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
--- tools/arch/x86/include/asm/msr-index.h 2020-06-03 10:36:09.959910238 -0300
+++ arch/x86/include/asm/msr-index.h 2020-06-17 10:04:20.235052901 -0300
@@ -128,6 +128,10 @@
#define TSX_CTRL_RTM_DISABLE BIT(0) /* Disable RTM feature */
#define TSX_CTRL_CPUID_CLEAR BIT(1) /* Disable TSX enumeration */
+/* SRBDS support */
+#define MSR_IA32_MCU_OPT_CTRL 0x00000123
+#define RNGDS_MITG_DIS BIT(0)
+
#define MSR_IA32_SYSENTER_CS 0x00000174
#define MSR_IA32_SYSENTER_ESP 0x00000175
#define MSR_IA32_SYSENTER_EIP 0x00000176
$ set -o vi
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2020-06-17 10:05:49.653114752 -0300
+++ after 2020-06-17 10:06:01.777258731 -0300
@@ -51,6 +51,7 @@
[0x0000011e] = "IA32_BBL_CR_CTL3",
[0x00000120] = "IDT_MCR_CTRL",
[0x00000122] = "IA32_TSX_CTRL",
+ [0x00000123] = "IA32_MCU_OPT_CTRL",
[0x00000140] = "MISC_FEATURES_ENABLES",
[0x00000174] = "IA32_SYSENTER_CS",
[0x00000175] = "IA32_SYSENTER_ESP",
$
The related change to cpu-features.h affects this:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
This shouldn't be affecting that 'perf bench' entry:
$ find tools/perf/ -type f | xargs grep SRBDS
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Gross <mgross@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get some newer headers that got out of sync with the copies in tools/
so that we can try to have the tools/perf/ build clean for v5.8 with
fewer pull requests.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes segmentation fault when trying to interpret zstd-compressed data
with perf script:
```
$ perf record -z ls
...
[ perf record: Captured and wrote 0,010 MB perf.data, compressed (original 0,001 MB, ratio is 2,190) ]
$ memcheck perf script
...
==67911== Invalid read of size 4
==67911== at 0x5568188: ZSTD_decompressStream (in /usr/lib/libzstd.so.1.4.5)
==67911== by 0x6E726B: zstd_decompress_stream (zstd.c:100)
==67911== by 0x65729C: perf_session__process_compressed_event (session.c:72)
==67911== by 0x6598E8: perf_session__process_user_event (session.c:1583)
==67911== by 0x65BA59: reader__process_events (session.c:2177)
==67911== by 0x65BA59: __perf_session__process_events (session.c:2234)
==67911== by 0x65BA59: perf_session__process_events (session.c:2267)
==67911== by 0x5A7397: __cmd_script (builtin-script.c:2447)
==67911== by 0x5A7397: cmd_script (builtin-script.c:3840)
==67911== by 0x5FE9D2: run_builtin (perf.c:312)
==67911== by 0x711627: handle_internal_command (perf.c:364)
==67911== by 0x711627: run_argv (perf.c:408)
==67911== by 0x711627: main (perf.c:538)
==67911== Address 0x71d8 is not stack'd, malloc'd or (recently) free'd
```
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LPU-Reference: 20200612230333.72140-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mostly for historical reasons, q->blk_trace is assigned through xchg()
and cmpxchg() atomic operations. Although this is correct, sparse
complains about this because it violates rcu annotations since commit
c780e86dd4 ("blktrace: Protect q->blk_trace with RCU") which started
to use rcu for accessing q->blk_trace. Furthermore there's no real need
for atomic operations anymore since all changes to q->blk_trace happen
under q->blk_trace_mutex and since it also makes more sense to check if
q->blk_trace is set with the mutex held earlier.
So let's just replace xchg() with rcu_replace_pointer() and cmpxchg()
with explicit check and rcu_assign_pointer(). This makes the code more
efficient and sparse happy.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We use one blktrace per request_queue, that means one per the entire
disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
or just two calls on /dev/vda.
We check for concurrent setup only at the very end of the blktrace setup though.
If we try to run two concurrent blktraces on the same block device the
second one will fail, and the first one seems to go on. However when
one tries to kill the first one one will see things like this:
The kernel will show these:
```
debugfs: File 'dropped' in directory 'nvme1n1' already present!
debugfs: File 'msg' in directory 'nvme1n1' already present!
debugfs: File 'trace0' in directory 'nvme1n1' already present!
``
And userspace just sees this error message for the second call:
```
blktrace /dev/nvme1n1
BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
```
The first userspace process #1 will also claim that the files
were taken underneath their nose as well. The files are taken
away form the first process given that when the second blktrace
fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
This means that even if go-happy process #1 is waiting for blktrace
data, we *have* been asked to take teardown the blktrace.
This can easily be reproduced with break-blktrace [0] run_0005.sh test.
Just break out early if we know we're already going to fail, this will
prevent trying to create the files all over again, which we know still
exist.
[0] https://github.com/mcgrof/break-blktrace
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>