Given the mechanism to unwind and replay requests (designed to support
preemption), we have an alternative to the current method of
resubmitting the ELSP upon reset. Resubmitting ELSP turns out to be more
complicated than expected, due to having to handle lost context-switch
interrupts and so guessing what ELSP we need to resubmit later. Instead,
by unwinding the requests and clearing the ELSP tracking entirely, we
can then just dequeue the first pair of ready requests after resetting,
using the normal submission procedure.
Currently, the unwound requests have maximum priority and so are
guaranteed to be resubmitted upon resume. If we are lucky, we may be
able to coalesce a new request on top!
Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-4-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
During a reset, we may skip over completed requests and lost
context-switch interrupts. Following the reset, we may then may end up
with no active requests in the ELSP (and so do not resubmit to restart
the engine), but have a queue of requests ready for execution. This is
unlikely, it requires the last request to complete after the hang is
detected, but not impossible. The outcome of this is that the engine
stalls, possibly leading to full ring and indefinite wait under
struct_mutex, eventually leading to a full driver hang.
Alternatively, we can solve this by unsubmitting the incomplete requests
and just kickstarting the tasklet. Michał has patches for that, which I
initially disliked due to the extra complexity, but the complexity of
this "simple" restart is growing...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Currently we're unmasking some random looking bits in HWSTAM
on gen3/4/5. The two bits we apparently unmask are 0 and 12,
and also bits 16-31 on gen4/5.
What those bits do depends on the gen as follows:
bit 0: Breakpoint (gen2), ASLE (gen3), reserved (gen4), render user interrupt (gen5)
bit 12: Sync flush statusa (gen2-4), reserved (gen5)
bit 16-31: The ones that can unmasked seem to be mostly some
display stuff on gen4. Bit 18 is the PIPE_CONTROL notify,
which might be the only intresting one. On gen5 all the
bits are reserved.
So I don't know whether we actually depend on that status page write
somehow. Extra seqno coherency by accident perhaps? Except we don't
even unmask the user interrupt bit in HWSTAM except on gen5, and
sync flush isn't something we use normally, so seems unlikely. So
let's just assume we don't need any of this and mask everything in
HWSTAM.
From gen6 onwards there's a separate HWSTAM for each engine, and so
we deal with them during the engine setup.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-15-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bspec claims that HWSTAM is only 16 bits on gen3, but the other
interrupts registers are 32 bits and there are 18 valid interrupt
bits. Hence a 16 bit HWSTAM wouldn't be able to contain all the
bits, so it seems the spec is incorrect about the size of the
register. And indeed I can clear bits 16 and 17 just fine with
a 32 bit write. So let's adjust the code to treat the register
as 32 bits.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-14-ville.syrjala@linux.intel.com
Eliminate the loops from the gen2-3 irq handlers. Since we don't use
MSI anymore on these platforms, and thus the CPU interrupt will be level
triggered, we shouldn't need to play any tricks with IER to induce edges
from IIR. IIR itself still detects only edges from PIPESTAT & co. on
gen4 but since IIR is double buffered and we only clear one bit per irq
handler invocation we can use the normal "clear PIPESTAT & co. -> clear
IIR" approach to ack the interrupts. On gen2 everything is level
triggered, and gen3 presumably follows either the gen2 or gen4 approach
since nothing else would really make sense.
v2: Drop the IER tricks since we no longer use MSI
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-12-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We have a lot of different ways of clearing the PIPESTAT registers.
Let's unify it all into one function. There's no magic in PIPESTAT
that would require any of the double clearing and whatnot that
some of the code tries to do. All we can really do is clear the status
bits and disable the enable bits. There is no way to mask anything
so as soon as another event happens the status bit will become set
again, and trying to clear them twice or something can't protect
against that.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-3-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
The private PAT management is to support PPAT entry manipulation. Two
APIs are introduced for dynamically managing PPAT entries: intel_ppat_get
and intel_ppat_put.
intel_ppat_get will search for an existing PPAT entry which perfectly
matches the required PPAT value. If not, it will try to allocate a new
entry if there is any available PPAT indexs, or return a partially
matched PPAT entry if there is no available PPAT indexes.
intel_ppat_put will put back the PPAT entry which comes from
intel_ppat_get. If it's dynamically allocated, the reference count will
be decreased. If the reference count turns into zero, the PPAT index is
freed again.
Besides, another two callbacks are introduced to support the private PAT
management framework. One is ppat->update_hw(), which writes the PPAT
configurations in ppat->entries into HW. Another one is ppat->match, which
will return a score to show how two PPAT values match with each other.
v17:
- Refine the comparision of score of BDW. (Joonas)
v16:
- Fix a bug in PPAT match function of BDW. (Joonas)
v15:
- Refine some code flow. (Joonas)
v12:
- Fix a problem "not returning the entry of best score". (Zhenyu)
v7:
- Keep all the register writes unchanged in this patch. (Joonas)
v6:
- Address all comments from Chris:
http://www.spinics.net/lists/intel-gfx/msg136850.html
- Address all comments from Joonas:
http://www.spinics.net/lists/intel-gfx/msg136845.html
v5:
- Add check and warnnings for those platforms which don't have PPAT.
v3:
- Introduce dirty bitmap for PPAT registers. (Chris)
- Change the name of the pointer "dev_priv" to "i915". (Chris)
- intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. (Chris)
v2:
- API re-design. (Chris)
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v7
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[Joonas: Use BIT() in the enum in bdw_private_pat_match]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1505392783-4084-1-git-send-email-zhi.a.wang@intel.com
The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!
v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.
Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.
v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)
v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
read and we want to stop the compiler from issuing a later (v.slow) reload.
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The sgt iterators cause an
drivers/gpu/drm/i915/i915_gpu_error.c:846 i915_error_object_create() warn: statement has no effect 7
everywhere they are used. If we change the code slightly, we can achieve
the same increment without altering the output or raising a warning.
text data bss dec hex filename
1267906 20587 3168 1291661 13b58d before
1267906 20587 3168 1291661 13b58d after
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913105754.4423-1-chris@chris-wilson.co.uk
The context descriptor is stored inside the per-engine context state, as
we only need to compute it once and access it frequently. However,
currently only intel_lrc.c has easy access, but i915_guc_submission.c
would like to frequently read it as well, and more so only ever needs
the lower 32bits. Make it an inline as the compiler should be able to
retrieve the value in less instructions than it takes to do the function
call:
add/remove: 0/1 grow/shrink: 1/0 up/down: 8/-45 (-37)
function old new delta
i915_guc_submit 621 629 +8
intel_lr_context_descriptor 45 - -45
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170912214905.21987-1-chris@chris-wilson.co.uk
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
This reverts commit bbdf0b2ff3 ("drm/i915/bxt: Disable device ready
before shutdown command").
Disable device ready before shutdown command was added previously to
avoid a split screen issue seen on dual link DSI panels. As of now, dual
link is not supported and will need some rework in the upstream
code. For single link DSI panels, the change is not required. This will
cause failure in sending SHUTDOWN packet during disable. Hence reverting
the change. Will handle the change as part of dual link enabling in
upstream.
Fixes: bbdf0b2ff3 ("drm/i915/bxt: Disable device ready before shutdown command")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Vidya Srinivas <vidya.srinivas@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1504604671-17237-1-git-send-email-vidya.srinivas@intel.com