Although commit 9dd4b06544 ("drm/i915/gt: Move pm debug files into a
gt aware debugfs") says it was moving debug files to gt/, the
i915_frequency_info file was left behind and its implementation copied
into drivers/gpu/drm/i915/gt/debugfs_gt_pm.c. Over time we had several
patches having to change both places to keep them in sync (and some
patches failing to do so). The initial idea was to remove
i915_frequency_info, but there are user space tools using it. From a
quick code search there are other scripts and test tools besides igt, so
it's not simply updating igt to get rid of the older file.
Here we export a function using drm_printer as parameter and make
both show() implementations to call this same function. Aside from a few
variable name differences, for i915_frequency_info this brings a few
lines that were not previously printed: RP UP EI, RP UP THRESHOLD, RP
DOWN THRESHOLD and RP DOWN EI. These came in as part of
commit 9c878557b1 ("drm/i915/gt: Use the RPM config register to
determine clk frequencies"), which didn't change both places.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210918025754.1254705-4-lucas.demarchi@intel.com
i915 will soon gain an eviction path that trylock a whole lot of locks
for eviction, getting dmesg failures like below:
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48 max: 48!
48 locks held by i915_selftest/5776:
#0: ffff888101a79240 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x88/0x160
#1: ffffc900009778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
#2: ffff88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
#3: ffff88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c4/0x9d0 [i915]
#4: ffff88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
#5: ffff88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
...
#46: ffff88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
#47: ffff88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
INFO: lockdep is turned off.
Fixing eviction to nest into ww_class_acquire is a high priority, but
it requires a rework of the entire driver, which can only be done one
step at a time.
As an intermediate solution, add an acquire context to
ww_mutex_trylock, which allows us to do proper nesting annotations on
the trylocks, making the above lockdep splat disappear.
This is also useful in regulator_lock_nested, which may avoid dropping
regulator_nesting_mutex in the uncontended path, so use it there.
TTM may be another user for this, where we could lock a buffer in a
fastpath with list locks held, without dropping all locks we hold.
[peterz: rework actual ww_mutex_trylock() implementations]
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YUBGPdDDjKlxAuXJ@hirez.programming.kicks-ass.net
GPU wedged flag now set on driver unregister to prevent from further
using the GPU can be then cleared unintentionally when calling
__intel_gt_unset_wedged() still before the flag is finally marked
unrecoverable. We need to have it marked unrecoverable earlier.
Implement that by replacing a call to intel_gt_set_wedged() in
intel_gt_driver_unregister() with intel_gt_set_wedged_on_fini().
With the above in place, intel_gt_set_wedged_on_fini() is now called
twice on driver remove, second time from __intel_gt_disable(). This
seems harmless, while dropping intel_gt_set_wedged_on_fini() from
__intel_gt_disable() proved to break some driver probe error unwind
paths as well as mock selftest exit path.
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210903142837.216978-1-janusz.krzysztofik@linux.intel.com
Close the divergence which has caused patches not to apply and
have a solid baseline for the PXP patches that Rodrigo will send
a topic branch PR for.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Clang warns:
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:127:13: warning:
variable 'err' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
} else if (PTR_ERR(import) != -EOPNOTSUPP) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:138:9: note:
uninitialized use occurs here
return err;
^~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:127:9: note: remove
the 'if' if its condition is always true
} else if (PTR_ERR(import) != -EOPNOTSUPP) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:95:9: note:
initialize the variable 'err' to silence this warning
int err;
^
= 0
The test is expected to pass if i915_gem_prime_import() returns
-EOPNOTSUPP so initialize err to zero in this case.
Fixes: cdb35d1ed6 ("drm/i915/gem: Migrate to system at dma-buf attach time (v7)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210824225427.2065517-3-nathan@kernel.org
Clang warns a couple of times:
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:63:6: warning:
variable 'import_obj' is used uninitialized whenever 'if' condition is
true [-Wsometimes-uninitialized]
if (import != &obj->base) {
^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:80:22: note:
uninitialized use occurs here
i915_gem_object_put(import_obj);
^~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:63:2: note: remove
the 'if' if its condition is always false
if (import != &obj->base) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:38:46: note:
initialize the variable 'import_obj' to silence this warning
struct drm_i915_gem_object *obj, *import_obj;
^
= NULL
Shuffle the import_obj initialization above these if statements so that
it is not used uninitialized.
Fixes: d7b2cb380b ("drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210824225427.2065517-2-nathan@kernel.org
Rework and simplify the locking with GuC subission. Drop
sched_state_no_lock and move all fields under the guc_state.sched_state
and protect all these fields with guc_state.lock . This requires
changing the locking hierarchy from guc_state.lock -> sched_engine.lock
to sched_engine.lock -> guc_state.lock.
v2:
(Daniele)
- Don't check fields outside of lock during sched disable, check less
fields within lock as some of the outside are no longer needed
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-18-matthew.brost@intel.com
A subsequent patch will flip the locking hierarchy from
ce->guc_state.lock -> sched_engine->lock to sched_engine->lock ->
ce->guc_state.lock. As such we need to release the submit fence for a
request from an IRQ to break a lock inversion - i.e. the fence must be
release went holding ce->guc_state.lock and the releasing of the can
acquire sched_engine->lock.
v2:
(Daniele)
- Delete request from list before calling irq_work_queue
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-16-matthew.brost@intel.com
While debugging an issue with full GT resets I went down a rabbit hole
thinking the scrubbing of lost G2H wasn't working correctly. This proved
to be incorrect as this was working just fine but this chase inspired me
to write a selftest to prove that this works. This simple selftest
injects errors dropping various G2H and then issues a full GT reset
proving that the scrubbing of these G2H doesn't blow up.
v2:
(Daniel Vetter)
- Use ifdef instead of macros for selftests
v3:
(Checkpatch)
- A space after 'switch' statement
v4:
(Daniele)
- A comment saying GT won't idle if G2H are lost
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-12-matthew.brost@intel.com
When the GuC does a media reset, it copies a golden context state back
into the corrupted context's state. The address of the golden context
and the size of the engine state restore are passed in via the GuC ADS.
The i915 had a bug where it passed in the whole size of the golden
context, not the size of the engine state to restore resulting in a
memory corruption.
Also copy the entire golden context on init rather than just the engine
state that is restored.
v2 (Daniele): use defines to avoid duplicated const variables (John).
Fixes: 481d458cae ("drm/i915/guc: Add golden context to GuC ADS")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-11-matthew.brost@intel.com
Propagating errors to dependent fences is broken and can lead to errors
from one client ending up in another. In commit 3761baae90 ("Revert
"drm/i915: Propagate errors on awaiting already signaled fences""), we
attempted to get rid of fence error propagation but missed the case
added in commit 8e9f84cf5c ("drm/i915/gt: Propagate change in error
status to children on unhold"). Revert that one too. This error was
found by an up-and-coming selftest which triggers a reset during
request cancellation and verifies that subsequent requests complete
successfully.
v2:
(Daniel Vetter)
- Use revert
v3:
(Jason)
- Update commit message
v4 (Daniele):
- fix checkpatch error in commit message.
References: '3761baae908a ("Revert "drm/i915: Propagate errors on awaiting already signaled fences"")'
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-8-matthew.brost@intel.com
If the context is reset as a result of the request cancellation the
context reset G2H is received after schedule disable done G2H which is
the wrong order. The schedule disable done G2H release the waiting
request cancellation code which resubmits the context. This races
with the context reset G2H which also wants to resubmit the context but
in this case it really should be a NOP as request cancellation code owns
the resubmit. Use some clever tricks of checking the context state to
seal this race until the GuC firmware is fixed.
v2:
(Checkpatch)
- Fix typos
v3:
(Daniele)
- State that is a bug in the GuC firmware
Fixes: 62eaf0ae21 ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-7-matthew.brost@intel.com
A small race that could result in incorrect accounting of the number
of outstanding G2H. Basically prior to this patch we did not increment
the number of outstanding G2H if we encoutered a GT reset while sending
a H2G. This was incorrect as the context state had already been updated
to anticipate a G2H response thus the counter should be incremented.
As part of this change we remove a legacy (now unused) path that was the
last caller requiring a G2H response that was not guaranteed to loop.
This allows us to simplify the accounting as we don't need to handle the
case where the send fails due to the channel being busy.
Also always use helper when decrementing this value.
v2 (Daniele): update GEM_BUG_ON check, pull in dead code removal from
later patch, remove loop param from context_deregister.
Fixes: f4eb1f3fe9 ("drm/i915/guc: Ensure G2H response has space in buffer")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-3-matthew.brost@intel.com
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Add missing fields and remove some duplicate fields when printing a
perf_event_attr.
- Fix hybrid config terms list corruption.
- Update kernel header copies, some resulted in new kernel features
being automagically added to 'perf trace' syscall/tracepoint argument
id->string translators.
- Add a file generated during the documentation build to .gitignore.
- Add an option to build without libbfd, as some distros, like Debian
consider its ABI unstable.
- Add support to print a textual representation of IBS raw sample data
in 'perf report'.
- Fix bpf 'perf test' sample mismatch reporting
- Fix passing arguments to stackcollapse report in a 'perf script'
python script.
- Allow build-id with trailing zeros.
- Look for ImageBase in PE file to compute .text offset.
* tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (25 commits)
tools headers UAPI: Update tools's copy of drm.h headers
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers UAPI: Sync linux/fs.h with the kernel sources
tools headers UAPI: Sync linux/in.h copy with the kernel sources
perf tools: Add an option to build without libbfd
perf tools: Allow build-id with trailing zeros
perf tools: Fix hybrid config terms list corruption
perf tools: Factor out copy_config_terms() and free_config_terms()
perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
perf tools: Ignore Documentation dependency file
perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
tools include UAPI: Update linux/mount.h copy
perf beauty: Cover more flags in the move_mount syscall argument beautifier
tools headers UAPI: Sync linux/prctl.h with the kernel sources
tools include UAPI: Sync sound/asound.h copy with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
perf report: Add support to print a textual representation of IBS raw sample data
perf report: Add tools/arch/x86/include/asm/amd-ibs.h
perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
...
Pull compiler attributes updates from Miguel Ojeda:
- Fix __has_attribute(__no_sanitize_coverage__) for GCC 4 (Marco Elver)
- Add Nick as Reviewer for compiler_attributes.h (Nick Desaulniers)
- Move __compiletime_{error|warning} (Nick Desaulniers)
* tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux:
compiler_attributes.h: move __compiletime_{error|warning}
MAINTAINERS: add Nick as Reviewer for compiler_attributes.h
Compiler Attributes: fix __has_attribute(__no_sanitize_coverage__) for GCC 4
Pull auxdisplay updates from Miguel Ojeda:
"An assortment of improvements for auxdisplay:
- Replace symbolic permissions with octal permissions (Jinchao Wang)
- ks0108: Switch to use module_parport_driver() (Andy Shevchenko)
- charlcd: Drop unneeded initializers and switch to C99 style (Andy
Shevchenko)
- hd44780: Fix oops on module unloading (Lars Poeschel)
- Add I2C gpio expander example (Ralf Schlatterbeck)"
* tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux:
auxdisplay: Replace symbolic permissions with octal permissions
auxdisplay: ks0108: Switch to use module_parport_driver()
auxdisplay: charlcd: Drop unneeded initializers and switch to C99 style
auxdisplay: hd44780: Fix oops on module unloading
auxdisplay: Add I2C gpio expander example
Pull CPU hotplug updates from Thomas Gleixner:
"Updates for the SMP and CPU hotplug:
- Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the
original hotplug code and now causing trouble with the ARM64 cache
topology setup due to the pointless SMP function call.
It's not longer required as the hotplug callbacks are guaranteed to
be invoked on the upcoming CPU.
- Remove the deprecated and now unused CPU hotplug functions
- Rewrite the CPU hotplug API documentation"
* tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: core-api/cpuhotplug: Rewrite the API section
cpu/hotplug: Remove deprecated CPU-hotplug functions.
thermal: Replace deprecated CPU-hotplug functions.
drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
Pull misc driver fix from Greg KH:
"Here is a single patch for 5.15-rc1, for the lkdtm misc driver.
It resolves a build issue that many people were hitting with your
current tree, and Kees and others felt would be good to get merged
before -rc1 comes out, to prevent them from having to constantly hit
it as many development trees restart on -rc1, not older -rc releases.
It has NOT been in linux-next, but has passed 0-day testing and looks
'obviously correct' when reviewing it locally :)"
* tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
lkdtm: Use init_uts_ns.name instead of macros
Pull IPMI updates from Corey Minyard:
"A couple of very minor fixes for style and rate limiting.
Nothing big, but probably needs to go in"
* tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi:
char: ipmi: use DEVICE_ATTR helper macro
ipmi: rate limit ipmi smi_event failure message
Pull scheduler fixes from Borislav Petkov:
- Make sure the idle timer expires in hardirq context, on PREEMPT_RT
- Make sure the run-queue balance callback is invoked only on the
outgoing CPU
* tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Prevent balance_push() on remote runqueues
sched/idle: Make the idle timer expire in hard interrupt context
Pull locking fixes from Borislav Petkov:
- Fix the futex PI requeue machinery to not return to userspace in
inconsistent state
- Avoid a potential null pointer dereference in the ww_mutex deadlock
check
- Other smaller cleanups and optimizations
* tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Fix ww_mutex deadlock check
futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic()
futex: Avoid redundant task lookup
futex: Clarify comment for requeue_pi_wake_futex()
futex: Prevent inconsistent state and exit race
futex: Return error code instead of assigning it without effect
locking/rwsem: Add missing __init_rwsem() for PREEMPT_RT