Gen12 can improve bandwidth efficiency by pairing up memory requests
with similar addresses. We need to program the BW_BUDDY1 and BW_BUDDY2
registers according to the memory configuration during display
initialization to take advantage of this capability.
The magic numbers we program here feel like something that could
definitely change on future platforms, so let's use a table-based
programming scheme to make this easy to extend in the future.
v2:
- Add separate table for Wa_1409767108. (Stan)
- Reorder structure reduce size by a word. Page mask can still be up
to 28 bits (even though current values are small) so we should keep
it as a u32, but just using a u8 for DRAM type instead of the actual
enum type saves space. (Lucas, Ville)
- Rename function to tgl_bw_buddy_init() to be more precise about what
it does. (Lucas)
Bspec: 49189
Bspec: 49218
Bspec: 52890
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205224848.76712-1-matthew.d.roper@intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
In order to avoid confusing the HW, we must never submit an empty ring
during lite-restore, that is we should always advance the RING_TAIL
before submitting to stay ahead of the RING_HEAD.
Normally this is prevented by keeping a couple of spare NOPs in the
request->wa_tail so that on resubmission we can advance the tail. This
relies on the request only being resubmitted once, which is the normal
condition as it is seen once for ELSP[1] and then later in ELSP[0]. On
preemption, the requests are unwound and the tail reset back to the
normal end point (as we know the request is incomplete and therefore its
RING_HEAD is even earlier).
However, if this w/a should fail we would try and resubmit the request
with the RING_TAIL already set to the location of this request's wa_tail
potentially causing a GPU hang. We can spot when we do try and
incorrectly resubmit without advancing the RING_TAIL and spare any
embarrassment by forcing the context restore.
In the case of preempt-to-busy, we leave the requests running on the HW
while we unwind. As the ring is still live, we cannot rewind our
rq->tail without forcing a reload so leave it set to rq->wa_tail and
only force a reload if we resubmit after a lite-restore. (Normally, the
forced reload will be a part of the preemption event.)
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/673
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@kernel.vger.org
Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk
Instead of relying on the workqueue, the upcoming reworked GuC
submission flow will offer the host driver indipendent control over
the execution status of each context submitted to GuC. As part of this,
the doorbell usage model has been reworked, with each doorbell being
paired to a single lrc and a doorbell ring representing new work
available for that specific context. This mechanism, however, limits
the number of contexts that can be registered with GuC to the number of
doorbells, which is an undesired limitation. To avoid this limitation,
we requested the GuC team to also provide a H2G that will allow the host
to notify the GuC of work available for a specified lrc, so we can use
that mechanism instead of relying on the doorbells. We can therefore drop
the doorbell code we currently have, also given the fact that in the
unlikely case we'd want to switch back to using doorbells we'd have to
heavily rework it.
The workqueue will still have a use in the new interface to pass special
commands, so that code has been retained for now.
With the doorbells gone and the GuC client becoming even simpler, the
existing GuC selftests don't give us any meaningful coverage so we can
remove them as well. Some selftests might come with the new code, but
they will look different from what we have now so if doesn't seem worth
it to keep the file around in the meantime.
v2: fix comments and commit message (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-3-daniele.ceraolospurio@intel.com
On glk+ the hardware gets confused if we disable FBC while
it's recompressing and we perform a plane update during the
same frame. The result is that top of the screen gets corrupted.
We can avoid that by giving the hardware enough time to finish
the FBC disable before we touch the plane registers. Ie. we need
an extra vblank wait after FBC disable.
v2: Don't do the vblank wait if we never activated FBC in hw
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128150338.12490-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The hardware automagically nukes the cfb on flip. We can use
that whenever the plane/crtc configuration doesn't change too
much. Let's hook that up.
We'll need this for glk+ since we need to introduce an extra
vblank wait after FBC disable. As we're currently disabling
FBC around all plane updates we'd slow them down by an extra
frame. Not a great user experience when your fps is always
capped at vrefres/2. With flip nuke we don't need the extra
vblank wait.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127201222.16669-12-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
We don't want to use the FBC hardware render tracking so let's not
enable it. To use the hw tracking properly we'd anyway need to
integrate this into the command submissing path as the register is
context saved, and if rendering happens via the ppgtt we'd have
to configure it with the ppgtt address instead of the ggtt address.
Easier to use software tracking instead.
Note that on pre-ilk we can't actually disable render tracking.
However we can't rely on it because it requires that DSPSURF to
match the render target address, and since we play tricks
with DSPSURF that may not be the case. Hence we shall rely on
software render tracking on all platforms.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127201222.16669-5-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
All pinning must be done prior to i915_request_create, to avoid
timeline->mutex inversions.
Here we slightly abuse the context_barrier_task stages to utilise the
'skip' decision as an opportunity to acquire the pin on the new ppgtt.
Consider it s/skip/prepare/. At the moment, we only have on user of
context_barrier_task, so it might be worth breaking it down for the
specific task of set-vm and refactor it later if we find a second
purpose.
<4> [402.377487] WARNING: possible circular locking dependency detected
<4> [402.377493] 5.4.0-rc8-CI-CI_DRM_7491+ #1 Tainted: G U
<4> [402.377497] ------------------------------------------------------
<4> [402.377502] gem_exec_parall/2506 is trying to acquire lock:
<4> [402.377507] ffff888403cdac70 (&kernel#2){+.+.}, at: i915_request_create+0x16/0x1c0 [i915]
<4> [402.377593]
but task is already holding lock:
<4> [402.377597] ffff88835efad550 (&ppgtt->pin_mutex){+.+.}, at: gen6_ppgtt_pin+0x4d/0x110 [i915]
<4> [402.377660]
which lock already depends on the new lock.
<4> [402.377664]
the existing dependency chain (in reverse order) is:
<4> [402.377668]
-> #1 (&ppgtt->pin_mutex){+.+.}:
<4> [402.377674] __mutex_lock+0x9a/0x9d0
<4> [402.377713] gen6_ppgtt_pin+0x4d/0x110 [i915]
<4> [402.377756] emit_ppgtt_update+0x1dc/0x370 [i915]
<4> [402.377801] context_barrier_task+0x176/0x310 [i915]
<4> [402.377844] ctx_setparam+0x400/0xb10 [i915]
<4> [402.377886] i915_gem_context_setparam_ioctl+0xc8/0x160 [i915]
<4> [402.377891] drm_ioctl_kernel+0xa7/0xf0
<4> [402.377895] drm_ioctl+0x2e1/0x390
<4> [402.377899] do_vfs_ioctl+0xa0/0x6f0
<4> [402.377903] ksys_ioctl+0x35/0x60
<4> [402.377906] __x64_sys_ioctl+0x11/0x20
<4> [402.377910] do_syscall_64+0x4f/0x210
<4> [402.377914] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [402.377917]
-> #0 (&kernel#2){+.+.}:
<4> [402.377923] __lock_acquire+0x1328/0x15d0
<4> [402.377926] lock_acquire+0xa7/0x1c0
<4> [402.377930] __mutex_lock+0x9a/0x9d0
<4> [402.377977] i915_request_create+0x16/0x1c0 [i915]
<4> [402.378013] intel_engine_flush_barriers+0x4c/0x100 [i915]
<4> [402.378062] i915_ggtt_pin+0x7d/0x130 [i915]
<4> [402.378108] gen6_ppgtt_pin+0x9c/0x110 [i915]
<4> [402.378148] ring_context_pin+0x2e/0xc0 [i915]
<4> [402.378183] __intel_context_do_pin+0x6b/0x190 [i915]
<4> [402.378226] i915_gem_do_execbuffer+0x180c/0x26b0 [i915]
<4> [402.378268] i915_gem_execbuffer2_ioctl+0x11b/0x460 [i915]
<4> [402.378272] drm_ioctl_kernel+0xa7/0xf0
<4> [402.378275] drm_ioctl+0x2e1/0x390
<4> [402.378279] do_vfs_ioctl+0xa0/0x6f0
<4> [402.378282] ksys_ioctl+0x35/0x60
<4> [402.378286] __x64_sys_ioctl+0x11/0x20
<4> [402.378289] do_syscall_64+0x4f/0x210
<4> [402.378292] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [402.378295]
other info that might help us debug this:
<4> [402.378299] Possible unsafe locking scenario:
<4> [402.378302] CPU0 CPU1
<4> [402.378305] ---- ----
<4> [402.378307] lock(&ppgtt->pin_mutex);
<4> [402.378310] lock(&kernel#2);
<4> [402.378314] lock(&ppgtt->pin_mutex);
<4> [402.378317] lock(&kernel#2);
<4> [402.378320]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206105527.1130413-4-chris@chris-wilson.co.uk
Commit 9c722e17c1 ("drm/i915: Disable pipes in reverse order")
reverted the order that pipes gets disabled because of TGL
master/slave relationship between transcoders in MST mode.
But as stated in a comment in skl_commit_modeset_enables() the
enabling order is not always crescent, possibly causing previously
selected slave transcoder being enabled before master so another
approach will be needed to select a transcoder to master in MST mode.
It will be similar to the approach taken in port sync.
But instead of implement something like
intel_trans_port_sync_modeset_disables() to MST lets simply it and
iterate over all pipes 2 times, the first one disabling any slave and
then disabling everything else.
The MST bits will be added in another patch.
v2:
Not using crtc->active as it is deprecated
v3:
Removing is_trans_port_sync_mode() check, just check for
is_trans_port_sync_master() is enough
v4:
Adding and using is_trans_port_sync_slave(), otherwise non-port sync
pipes will be disabled in the first loop, what is not wrong but is
not what patch description promises
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v2)
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205210350.96795-3-jose.souza@intel.com
For TGL the step to turn off the transcoder clock was moved to after
the complete shutdown of DDI. Only the MST slave transcoders should
disable the clock before that.
v2:
- Adding last_mst_stream to intel_mst_post_disable_dp, make code more
easy to read and is similar to first_mst_stream in
intel_mst_pre_enable_dp()(Ville's idea)
- Calling intel_ddi_disable_pipe_clock() for GEN12+ right
intel_disable_ddi_buf() as stated in BSpec(Ville)
BSpec: 49190
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205210350.96795-2-jose.souza@intel.com
We used to report the minimum possible frequency as both requested and
active while GPU was in sleep state. This was a consequence of sampling
the value from the "current frequency" field in our software tracking.
This was strictly speaking wrong, but given that until recently the
current frequency in sleeping state used to be equal to minimum, it did
not stand out sufficiently to be noticed as such.
After some recent changes have made the current frequency be reported
as last active before GPU went to sleep, meaning both requested and active
frequencies could end up being reported at their maximum values for the
duration of the GPU idle state, it became much more obvious that this does
not make sense.
To fix this we will now sample the frequency counters only when the GPU is
awake. As a consequence reported frequencies could be reported as below
the GPU reported minimum but that should be much less confusing that the
current situation.
v2:
* Split out early exit conditions for readability. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Closes: https://gitlab.freedesktop.org/drm/intel/issues/675
Link: https://patchwork.freedesktop.org/patch/msgid/20191129105436.20100-1-tvrtko.ursulin@linux.intel.com