Split gen11_irq_handler() to receive as parameter the function
pointers. This allows to share the interrupt handler even if the enable/disable
functions are different.
Make sure it's always inlined to avoid the extra indirect call on the
hot path. Checking with gcc 9 this produce the exact same code as of
now:
$ size drivers/gpu/drm/i915/i915_irq*.o
text data bss dec hex filename
47511 560 0 48071 bbc7 drivers/gpu/drm/i915/i915_irq.o
47511 560 0 48071 bbc7 drivers/gpu/drm/i915/i915_irq_new.o
$ gdb -batch -ex 'file drivers/gpu/drm/i915/i915_irq.o' -ex 'disassemble gen11_irq_handler' > /tmp/old.s
$ gdb -batch -ex 'file drivers/gpu/drm/i915/i915_irq_new.o' -ex 'disassemble gen11_irq_handler' > /tmp/new.s
$ git diff --no-index /tmp/{old,new}.s
$
So, no change in behavior, just a simple refactor.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024195122.22877-4-lucas.demarchi@intel.com
BAT is growing a little fat and CI is under pressure and needs to trim
off some redundant runtime. An easy option is to reduce the selftest
runtimes, so try halving our default subtest timeout. While this reduces
the number of iterations used, for the majority of tests that are
passing, repeat runs (with different CI_DRM) will make up the
difference -- a negative consequence though is that we may reduce the
frequency of sporadic failures. Hopefully, we have no tests that were
crucially dependent on the previous 1s timeout...
Suggested-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191025092749.13468-1-chris@chris-wilson.co.uk
ivb+ supports fp16 pixel formats on the sprite planes planes. Expose
that capability.
On ivb/hsw fp16 scanout is slightly busted. The output from the plane
will have 1/4 the expected value. For the sprite plane we can fix that
up with the plane gamma unit. This was fixed on bdw.
v2: Rebase on top of icl fp16
Split the ivb+ sprite birs into a separate patch
v3: Move ivb_need_sprite_gamma() check one level up so that
we don't waste time programming garbage into he gamma registers
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015193035.25982-10-ville.syrjala@linux.intel.com
gen4+ supports fp16 pixel formats on the primary planes. Add the
relevant code.
On ivb fp16 scanout is slightly busted. The output from the plane will
have 1/4 the expected value. For the primary plane we would have to
use the pipe gamma or pipe csc to correct that which would affect all
the other planes as well, hence we simply choose not to expose fp16
on the ivb primary plane. On hsw the primary plane got fixed.
On gmch platforms I observed that the plane width must be below 2k
pixels with fp16 or else we get a corrupted image. This limitation
does not seem to be documented in bspec. I verified the exact limit
using the chv pipe B primary plane since it has windowing capability.
The stride limits are unaffected by fp16.
v2: Rebase on top of icl fp16
Split thea gen4+ primary plane bits into a separate patch
Deal with HAS_GMCH()
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015193035.25982-9-ville.syrjala@linux.intel.com
Various pixel formats and plane scaling impose additional constraints
on the cdclk frequency. Provide a new plane->min_cdclk() hook that
will be used to compute the minimum acceptable cdclk frequency for
each plane.
Annoyingly on some platforms the numer of active planes affects
this calculation so we must also toss in more planes into the
state when the number of active planes changes.
The sequence of state computation must also be changed:
1. check_plane() (updates plane's visibility etc.)
2. figure out if more planes now require update min_cdclk
computaion
3. calculate the new min cdclk for each plane in the state
4. if the minimum of any plane now exceeds the current
logical cdclk we recompute the cdclk
4. during cdclk computation take the planes' min_cdclk into
accoutn
5. follow the normal cdclk programming to change the
cdclk frequency. This may now require a modeset (except
on bxt/glk in some cases), which either succeeds or
fails depending on whether userspace has given
us permission to perform a modeset or not.
v2: Fix plane id check in intel_crtc_add_planes_to_state()
Only print the debug message when cdclk needs bumping
Use dev_priv->cdclk... as the old state explicitly
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015193035.25982-5-ville.syrjala@linux.intel.com
So far we've sort of protected the global state under dev_priv with
the connection_mutex. I wan to change that so that we can change the
cdclk even for pure plane updates. To that end let's formalize the
protection of the global state to follow what I started with the cdclk
code already (though not entirely properly) such that any crtc mutex
will suffice as a read lock, and all crtcs mutexes act as the write
lock.
We'll also pimp intel_atomic_state_clear() to clear the entire global
state, so that we don't accidentally leak stale information between
the locking retries.
As a slight optimization we'll only lock the crtc mutexes to protect
the global state, however if and when we actually have to poke the
hw (eg. if the actual cdclk changes) we must serialize commits
across all crtcs so that a parallel nonblocking commit can't get
ahead of the cdclk reprogamming. We do that by adding all crtcs to
the state.
TODO: the old global state examined during commit may still
be a problem since it always looks at the _latest_ swapped state
in dev_priv. Need to add proper old/new state for that too I think.
v2: Remeber to serialize the commits if necessary
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015193035.25982-3-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Normally, we rely on our hangcheck to prevent persistent batches from
hogging the GPU. However, if the user disables hangcheck, this mechanism
breaks down. Despite our insistence that this is unsafe, the users are
equally insistent that they want to use endless batches and will disable
the hangcheck mechanism. We are looking at replacing hangcheck, in the
next patch, with a softer mechanism, that sends a pulse down the engine
to check if it is well. We can use the same preemptive pulse to flush an
active context off the GPU upon context close, preventing resources
being lost and unkillable requests remaining on the GPU after process
termination.
Testcase: igt/gem_ctx_exec/basic-nohangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-4-chris@chris-wilson.co.uk
On schedule-out (CS completion) of a banned context, scrub the context
image so that we do not replay the active payload. The intent is that we
skip banned payloads on request submission so that the timeline
advancement continues on in the background. However, if we are returning
to a preempted request, i915_request_skip() is ineffective and instead we
need to patch up the context image so that it continues from the start
of the next request.
v2: Fixup cancellation so that we only scrub the payload of the active
request and do not short-circuit the breadcrumbs (which might cause
other contexts to execute out of order).
v3: Grammar pass
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-3-chris@chris-wilson.co.uk
If the preempted context takes too long to relinquish control, e.g. it
is stuck inside a shader with arbitration disabled, evict that context
with an engine reset. This ensures that preemptions are reasonably
responsive, providing a tighter QoS for the more important context at
the cost of flagging unresponsive contexts more frequently (i.e. instead
of using an ~10s hangcheck, we now evict at ~100ms). The challenge of
lies in picking a timeout that can be reasonably serviced by HW for
typical workloads, balancing the existing clients against the needs for
responsiveness.
Note that coupled with timeslicing, this will lead to rapid GPU "hang"
detection with multiple active contexts vying for GPU time.
The forced preemption mechanism can be compiled out with
./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-2-chris@chris-wilson.co.uk