This patch re-introduces support for GuC v69 in parallel to v70. As this
is a quick fix, v69 has been re-introduced as the single "fallback" guc
version in case v70 is not available on disk and only for platforms that
are out of force_probe and require the GuC by default. All v69 specific
code has been labeled as such for easy identification, and the same was
done for all v70 functions for which there is a separate v69 version,
to avoid accidentally calling the wrong version via the unlabeled name.
When the fallback mode kicks in, a drm_notice message is printed in
dmesg to inform the user of the required update. The existing
logging of the fetch function has also been updated so that we no
longer complain immediately if we can't find a fw and we only throw an
error if the fetch of both the base and fallback blobs fails.
The plan is to follow this up with a more complex rework to allow for
multiple different GuC versions to be supported at the same time.
v2: reduce the fallback to platform that require it, switch to
firmware_request_nowarn(), improve logs.
Fixes: 2584b3549f ("drm/i915/guc: Update to GuC version 70.1.1")
Link: https://lists.freedesktop.org/archives/intel-gfx/2022-July/301640.html
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220718230732.1409641-1-daniele.ceraolospurio@intel.com
When resuming after hibernate sometimes we see hangs in unrelated kernel
subsystems. These hangs often result in the following i915 trace:
i915 0000:00:02.0: [drm] *ERROR* \
intel_gt_reset_global timed out, cancelling all in-flight rendering
implying our reset task has been starved by the hanging kernel subsystem,
causing us to inappropiately declare the system as wedged beyond recovery.
The trace would be caused by our synchronize_srcu_expedited() taking more
than the allowed 5s due to the unrelated kernel hang. But we neither need
to perform that synchronisation inside the reset watchdog, nor do we need
such a short timeout before declaring the device as unrecoverable.
v2: Restore watchdog timeout to the previous 5 seconds (Ashutosh)
Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/3575
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220630043959.5708-1-ashutosh.dixit@intel.com
For testing purposes, support forcing the lmem_bar_size through a new
modparam. In CI we only have a limited number of configurations for DG2,
but we still need to be reasonably sure we get a usable device (also
verifying we report the correct values for things like
probed_cpu_visible_size etc) with all the potential lmem_bar sizes that
we might expect see in the wild.
v2: Update commit message and a minor modification.(Matt)
v3: Optimised lmem bar size code and modified code to resize
bar maximum upto lmem_size instead of maximum supported size.(Nirmoy)
v4: Optimised lmem bar size code.(Nirmoy)
Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220713130209.2573233-3-priyanka.dandamudi@intel.com
One impact of commit 047a1b877e ("dma-buf & drm/amdgpu: remove
dma_resv workaround") is that it stores many, many more fences. Whereas
adding an exclusive fence used to remove the shared fence list, that
list is now preserved and the write fences included into the list. Not
just a single write fence, but now a write/read fence per context. That
causes us to have to track more fences than before (albeit half of those
are redundant), and we trigger more interrupts for multi-engine
workloads.
As part of reducing the impact from handling more signaling, we observe
we only need to kick the signal worker after adding a fence iff we have
good cause to believe that there is work to be done in processing the
fence i.e. we either need to enable the interrupt or the request is
already complete but we don't know if we saw the interrupt and so need
to check signaling.
References: 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d7b953c7a4ba747c8196a164e2f8c5aef468d048.1657289332.git.karolina.drobnik@intel.com
We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.
On applying commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.
Commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.
Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.
Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.
Reported-by: Thomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
Although all DSS belong to a single pool on Xe_HP platforms (i.e.,
they're not organized into slices from a topology point of view), we do
still need to pass 'group' and 'instance' targets when steering register
accesses to a specific instance of a per-DSS multicast register. The
rules for how to determine group and instance IDs (which previously used
legacy terms "slice" and "subslice") varies by platform. Some platforms
determine steering by gslice membership, some platforms by cslice
membership, and future platforms may have other rules.
Since looping over each DSS and performing steered unicast register
accesses is a relatively common pattern, let's add a dedicated iteration
macro to handle this (and replace the platform-specific "instdone" loop
we were using previously. This will avoid the calling code needing to
figure out the details about how to obtain steering IDs for a specific
DSS.
Most of the places where we use this new loop are in the GPU errorstate
code at the moment, but we do have some additional features coming in
the future that will also need to loop over each DSS and steer some
register accesses accordingly.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220701232006.1016135-1-matthew.d.roper@intel.com
The global sseu config is applicable only to gen11 platforms where
concurrent media, render and OA use cases may cause some subslices to be
turned off and hence lose NOA configuration. Ideally we want to return
ENODEV for non-gen11 platforms, however, this has shipped with gfx12, so
disable only for gfx12.50+.
v2: gfx12 is already shipped with this, disable for gfx12.50+ (Lionel)
v3: (Matt)
- Update commit message and replace "12.5" with "12.50"
- Replace DRM_DEBUG() with driver specific drm_dbg()
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220707193002.2859653-2-umesh.nerlige.ramappa@intel.com
Both error-capture and relay-logging mechanism use the GuC
log infrastructure. That means the KMD must send a log flush
complete notification back to GuC after reading the data out.
This call is currently being sent synchronously.
However, synchronous H2Gs cause problems when the system is
backed up. There is no need for this to be synchronous. The
KMD wasn't even looking at the return status from it. So make
it asynchronous and then there is no issue about time outs.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220607002314.1451656-2-alan.previn.teres.alexis@intel.com
VM_BIND and related uapi definitions
v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
I915_GEM_VM_BIND/UNBIND_FENCE_VALID
and I915_GEM_VM_BIND_TLB_FLUSH flags.
v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
documentation for vm_bind/unbind.
v5: Remove TLB flush requirement on VM_UNBIND.
Add version support to stage implementation.
v6: Define and use drm_i915_gem_timeline_fence structure for
all timeline fences.
v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
Update documentation on async vm_bind/unbind and versioning.
Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
batch_count field and I915_EXEC3_SECURE flag.
v8: Remove I915_GEM_VM_BIND_READONLY and minor documentation
updates.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220701003110.24843-4-niranjana.vishwanathapura@intel.com
If the move or clear operation somehow fails, and the memory underneath
is not cleared, like when moving to lmem, then we currently fallback to
memcpy or memset. However with small-BAR systems this fallback might no
longer be possible. For now we use the set_wedged sledgehammer if we
ever encounter such a scenario, and mark the object as borked to plug
any holes where access to the memory underneath can happen. Add some
basic selftests to exercise this.
v2:
- In the selftests make sure we grab the runtime pm around the reset.
Also make sure we grab the reset lock before checking if the device
is wedged, since the wedge might still be in-progress and hence the
bit might not be set yet.
- Don't wedge or put the object into an unknown state, if the request
construction fails (or similar). Just returning an error and
skipping the fallback should be safe here.
- Make sure we wedge each gt. (Thomas)
- Peek at the unknown_state in io_reserve, that way we don't have to
export or hand roll the fault_wait_for_idle. (Thomas)
- Add the missing read-side barriers for the unknown_state. (Thomas)
- Some kernel-doc fixes. (Thomas)
v3:
- Tweak the ordering of the set_wedged, also add FIXME.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-11-matthew.auld@intel.com
On small BAR configurations, when dealing with I915_MEMORY_CLASS_DEVICE
allocations, we assume that by default, all userspace allocations should
be placed in the non-CPU visible portion. Note that dumb buffers are
not included here, since these are not "GPU accelerated" and likely need
CPU access. We choose to just always set GPU_ONLY, and let the backend
figure out if that should be ignored or not, for example on full BAR
systems.
In a later patch userspace will be able to provide a hint if CPU access
to the buffer is needed.
v2(Thomas)
- Apply GPU_ONLY on all discrete devices, but only if the BO can be
placed in LMEM. Down in the depths this should be turned into a noop,
where required, and as an annotation it still make some sense. If we
apply it regardless of the placements then we end up needing to check
the placements during exec capture. Also it's slightly inconsistent
since the NEEDS_CPU_ACCESS can only be applied on objects that can be
placed in LMEM. The other annoyance would be gem_create_ext vs plain
gem_create, if we were to always apply GPU_ONLY.
Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-5-matthew.auld@intel.com
This test will validate we can achieve actual frequency of RP0. Pcode
grants frequencies based on what GuC is requesting. However, thermal
throttling can limit what is being granted. Add a test to request for
max, but don't fail the test if RP0 is not granted due to throttle
reasons.
Also optimize the selftest by using a common run_test function to avoid
code duplication. Rename the "clamp" tests to vary_max_freq and
vary_min_freq.
v2: Fix compile warning
v3: Review comments (Ashutosh). Added a FIXME for the media RP0 case.
v4: Checkpatch (strict) fixes, remove FIXME and other comments (Ashutosh)
Fixes commit 8ee2c22782 ("drm/i915/guc/slpc: Add SLPC selftest")
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220627230346.27720-1-vinay.belgaumkar@intel.com