A simple mutex used for guarding the flow of requests in and out of the
timeline. In the short-term, it will be used only to guard the addition
of requests into the timeline, taken on alloc and released on commit so
that only one caller can construct a request into the timeline
(important as the seqno and ring pointers must be serialised). This will
be used by observers to ensure that the seqno/hwsp is stable. Later,
when we have reduced retiring to only operate on a single timeline at a
time, we can then use the mutex as the sole guard required for retiring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301110547.14758-2-chris@chris-wilson.co.uk
WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.
The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).
v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
We currently use a worker queued from an rcu callback to determine when
a how grace period has elapsed while we remained idle. We use this idle
delay to infer that we will be idle for a while and this is a suitable
point at which we can trim our global memory caches.
Since we wrote that, this mechanism now exists as rcu_work, and having
converted the idle shrinkers over to using that, we can remove our own
variant.
v2: Say goodbye to gt.epoch as well.
v3: Remove the misplaced and redundant comment before parking globals
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-3-chris@chris-wilson.co.uk
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.
The benefit for us is much reduced pointer dancing along the frequent
allocation paths.
v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
Read the HDMI infoframes from the hbuf and unpack them into
the crtc state.
Well, actually just AVI infoframe for now but let's write the
infoframe readout code in a more generic fashion in case we
expand this later.
Note that Daniel was sceptical about the benefit if this and
also concerned about the potential for crappy sdvo encoders not
implementing the hbuf read commands. My (admittedly limited)
experience is that such encoders don't implement even the
get/set hdmi encoding commands and thus would always be treated
as dvi only. Hence I believe this is safe, and also IMO preferable
having quirks to deal with missing readout support. The readout
support is neatly isolated in the sdvo code whereas the quirk
would leak to other parts of the driver (state checker, fastboot,
etc.) thus complicating the lives of other people.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190225174106.2163-8-ville.syrjala@linux.intel.com
In selftests/live_hangcheck, we have a lot of tests for resetting simple
spinners, but nothing quite prepared us for how the GPU reacted to
triggering a reset outside of the safe spinner. These two subtests fill
the ring with plain old empty, non-spinning requests, and then triggers
a reset. Without a user-payload to blame, these requests will exercise
the 'non-started' paths and mostly be replayed verbatim.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-4-chris@chris-wilson.co.uk
To determine whether an engine has 'stuck', we simply check whether or
not is still on the same seqno for several seconds. To keep this simple
mechanism intact over the loss of a global seqno, we can simply add a
new global heartbeat seqno instead. As we cannot know the sequence in
which requests will then be completed, we use a primitive random number
generator instead (with a cycle long enough to not matter over an
interval of a few thousand requests between hangcheck samples).
The alternative to using a dedicated seqno on every request is to issue
a heartbeat request and query its progress through the system. Sadly
this requires us to reduce struct_mutex so that we can issue requests
without requiring that bkl.
v2: And without the extra CS_STALL for the hangcheck seqno -- we don't
need strict serialisation with what comes later, we just need to be sure
we don't write the hangcheck seqno before our batch is flushed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-1-chris@chris-wilson.co.uk
Annoyingly, struct_mutex was not entirely eliminated from the reset
pathway; for reasons of its own, intel_display_resume() requires
struct_mutex to prepare the planes it already captured. To avoid the
immediate problem of a deadlock between the struct_mutex and the reset
srcu, we have to acquire the reset_lock before struct_mutex in
i915_gem_fault(). Now any wait underneath struct_mutex will result us in
having to forcibly reset all inflight rendering, less than ideal, but
better than a deadlock (and will do for the short term).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221102924.13442-1-chris@chris-wilson.co.uk
Introduce a new ABI method for detecting a wedged driver by reporting
-EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.
This came up in considering how to handle context recovery from
userspace. There we wish to create a new context after the original is
banned (the clients opts into the no recovery after reset strategy) in
order to rebuild the mesa context from scratch. In doing so, if the
device was wedged and not the context banned, we would fall into a loop
of permanently trying to recreate the context and never making forward
progress. This patch would inform the client that we are no longer able
to create a context, and the client would have no choice but to abort
(or at least inform its callers about the lost device for anv).
References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215469.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220225556.28715-1-chris@chris-wilson.co.uk
DP CRCs don't really work on g4x. If you want any CRCs on DP you must
select the CRC source before the port is enabled, otherwise the CRC
source select bits simply ignore any writes to them. And once the port
is enabled we mustn't change the CRC source select until the port is
disabled. That almost works, but not quite :( Eventually the CRC source
select bits get permanently stuck one way or the other, and after that
a reboot (or possibly a display reset) is needed to get working CRCs
on that pipe (not matter which CRC source we try to use).
Additionally the DFT scrambler reset bits we're trying to use don't
seem to exist on g4x. There are some potentially relevant looking bits
in the pipe registers, but when I tried it I got stable looking CRCs
without setting any bits for this.
If there is a way to make DP CRCs work reliably on g4x, I wasn't
able to find it. So let's just remove the broken code we have.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190214192219.3858-3-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Limit deboosting and boosting to keep ourselves at the extremes
when in the respective power modes (i.e. slowly decrease frequencies
while in the HIGH_POWER zone and slowly increase frequencies while
in the LOW_POWER zone). On idle, we will hit the timeout and drop
to the next level quickly, and conversely if busy we expect to
hit a waitboost and rapidly switch into max power.
This should improve the UX experience by keeping the GPU clocks higher
than they ostensibly should be (based on simple busyness) by switching
into the INTERACTIVE mode (due to waiting for pageflips) and increasing
clocks via waitboosting. This will incur some additional power, our
saving grace should be rc6 and powergating to keep the extra current
draw in check.
Food for future thought would be deadline scheduling? If we know certain
contexts (high priority compositors) absolutely must hit the next vblank
then we can raise the frequencies ahead of time. Part of this is covered
by per-context frequencies, where userspace is given control over the
frequency range they want the GPU to execute at (for largely the same
problem as this, where the workload is very latency sensitive but at the
EI level appears mostly idle). Indeed, the per-context series does
extend the modeset boosting to include a frequency range tweak which
seems applicable to solving this jittery UX behaviour.
Reported-by: Lyude Paul <lyude@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109408
References: 0d55babc83 ("drm/i915: Drop stray clearing of rps->last_adj")
References: 60548c554b ("drm/i915: Interactive RPS mode")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Quoting Lyude Paul:
> Before reverting 0d55babc83: [4.20]
>
> 35 measurements [of gnome-shell animations]
> Average: 33.65657142857143 FPS
> FPS observed: 20.8 - 46.87 FPS
> Percentage under 60 FPS: 100.0%
> Percentage under 55 FPS: 100.0%
> Percentage under 50 FPS: 100.0%
> Percentage under 45 FPS: 97.14285714285714%
> Percentage under 40 FPS: 97.14285714285714%
> Percentage under 35 FPS: 45.714285714285715%
> Percentage under 30 FPS: 11.428571428571429%
> Percentage under 25 FPS: 2.857142857142857%
>
> After reverting: [4.19 behaviour]
>
> 30 measurements
> Average: 49.833666666666666 FPS
> FPS observed: 33.85 - 60.0 FPS
> Percentage under 60 FPS: 86.66666666666667%
> Percentage under 55 FPS: 70.0%
> Percentage under 50 FPS: 53.333333333333336%
> Percentage under 45 FPS: 20.0%
> Percentage under 40 FPS: 6.666666666666667%
> Percentage under 35 FPS: 6.666666666666667%
> Percentage under 30 FPS: 0%
> Percentage under 25 FPS: 0%
>
> Patched:
> 42 measurements
> Average: 46.05428571428571 FPS
> FPS observed: 1.82 - 59.98 FPS
> Percentage under 60 FPS: 88.09523809523809%
> Percentage under 55 FPS: 61.904761904761905%
> Percentage under 50 FPS: 45.23809523809524%
> Percentage under 45 FPS: 35.714285714285715%
> Percentage under 40 FPS: 33.33333333333333%
> Percentage under 35 FPS: 19.047619047619047%
> Percentage under 30 FPS: 7.142857142857142%
> Percentage under 25 FPS: 4.761904761904762%
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-13-chris@chris-wilson.co.uk
HDCP transmitter is supposed to indicate the HDCP encryption status of
the link through enc_en signals in a window of time called "window of
opportunity" defined by HDCP HDMI spec.
But on KBL this timing of signalling has an issue. To fix the issue this
WA of resetting the signalling is required.
v2:
WA is moved into the toggle_signalling [Daniel]
v3:
Commit msg is rewritten with more information
v4:
Reviewed-by Daniel.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1550338640-17470-17-git-send-email-ramalingam.c@intel.com
Implements the
Waitqueue is created to wait for CP_IRQ
Signaling the CP_IRQ arrival through atomic variable.
For applicable DP HDCP2.2 msgs read wait for CP_IRQ.
As per HDCP2.2 spec "HDCP Transmitters must process CP_IRQ interrupts
when they are received from HDCP Receivers"
Without CP_IRQ processing, DP HDCP2.2 H_Prime msg was getting corrupted
while reading it based on corresponding status bit. This creates the
random failures in reading the DP HDCP2.2 msgs.
v2:
CP_IRQ arrival is tracked based on the atomic val inc [daniel]
Recording the reviewed-by Daniel from IRC.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1550338640-17470-16-git-send-email-ramalingam.c@intel.com
Implements the HDMI adaptation specific HDCP2.2 operations.
Basically these are DDC read and write for authenticating through
HDCP2.2 messages.
v2: Rebased.
v3:
No more special handling of Gmbus burst read for AKE_SEND_CERT.
Style fixed with few naming. [Uma]
%s/PARING/PAIRING
v4:
msg_sz is initialized at definition.
Lookup table is defined for HDMI HDCP2.2 msgs [Daniel].
v5: Rebased.
v6:
Make a function as inline [Uma]
%s/uintxx_t/uxx
v7:
Errors due to sinks are reported as DEBUG logs.
Adjust to the new mei interface.
v8:
ARRAY_SIZE for the # of array members [Jon & Daniel].
hdcp adaptation is added as a const in the hdcp_shim [Daniel]
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1550338640-17470-15-git-send-email-ramalingam.c@intel.com
Implements the DP adaptation specific HDCP2.2 functions.
These functions perform the DPCD read and write for communicating the
HDCP2.2 auth message back and forth.
v2:
wait for cp_irq is merged with this patch. Rebased.
v3:
wait_queue is used for wait for cp_irq [Chris Wilson]
v4:
Style fixed.
%s/PARING/PAIRING
Few style fixes [Uma]
v5:
Lookup table for DP HDCP2.2 msg details [Daniel].
Extra lines are removed.
v6: Rebased.
v7:
Fixed some regression introduced at v5. [Ankit]
Macro HDCP_2_2_RX_CAPS_VERSION_VAL is reused [Uma]
Converted a function to inline [Uma]
%s/uintxx_t/uxx
v8:
Error due to the sinks are reported as DEBUG logs.
Adjust to the new mei interface.
v9:
ARRAY_SIZE for no of array members [Jon & Daniel]
return of the wait_for_cp_irq is made as void [Daniel]
Wait for HDCP2.2 msg is done based on polling the reg bit than
CP_IRQ based. [Daniel]
hdcp adaptation is added as a const in the hdcp_shim [Daniel]
v10:
config_stream_type is redefined [Daniel]
DP Errata specific defines are moved into intel_dp.c.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Ankit K Nautiyal <ankit.k.nautiyal@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1550338640-17470-14-git-send-email-ramalingam.c@intel.com