To complete the idle worker, we must complete 2 passes of wait-for-idle.
At the end of the first pass, we queue a switch-to-kernel-context and
may only idle after waiting for its completion. Speed up the flush_work
by doing the wait explicitly, which then allows us to remove the
unbounded loop trying to complete the flush_work in the next patch.
References: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Testcase: igt/gem_ppgtt/flind-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-1-chris@chris-wilson.co.uk
v2: Fix commit msg to reflect why issue occurs(Jani)
Set GCP_COLOR_INDICATION only when we set 10/12 bit deep color.
Changing settings from 10/12 bit deep color to 8 bit(& vice versa)
doesn't work correctly using xrandr max bpc property. When we
connect a monitor which supports deep color, the highest deep color
setting is selected; which sets GCP_COLOR_INDICATION. When we change
the setting to 8 bit color, we still set GCP_COLOR_INDICATION which
doesn't allow the switch back to 8 bit color.
v3,4: Add comments & drop changes in intel_hdmi_compute_config(Ville)
Since HSW+, GCP_COLOR_INDICATION is not required for 8bpc.
Drop the changes in intel_hdmi_compute_config as desired_bpp
is needed to change values for pipe_bpp based on bw_constrained flag.
v5: Fix missing logical && in condition for setting GCP_COLOR_INDICATION.
v6: Fix comment formatting (Ville)
v7: Add reviewed by Ville
v8: Set GCP_COLOR_INDICATION based on spec:
For Gen 7.5 or later platforms, indicate color depth only for deep
color modes. Bspec: 8135,7751,50524
Pre DDI platforms, indicate color depth if deep color is supported
by sink. Bspec: 7854
Exception: CHERRYVIEW behaves like Pre DDI platforms.
Bspec: 15975
Check pipe_bpp is less than bpp * 3 in hdmi_deep_color_possible,
to not set 12 bit deep color for every modeset. This fixes the issue
where 12 bit color was selected even when user selected 10 bit.(Ville)
v9: Maintain a consistent behavior for all platforms and support
GCP_COLOR_INDICATION only when we are in deep color mode. Remove
hdmi_sink_is_deep_color() - no longer needed as checking pipe_bpp > 24
takes care of the deep color mode scenario.
Separate patch for fixing switch from 12 bit to 10 bit deep color
mode.
Co-developed-by: Aditya Swarup <aditya.swarup@intel.com>
Signed-off-by: Clinton Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190429230811.9983-1-aditya.swarup@intel.com
The counter goes to zero at the start of the parking cycle, but the
wakeref itself is held until the end. Likewise, the counter becomes one
at the end of the unparking, but the wakeref is taken first. If we check
the wakeref instead of the counter, we include the unpark/unparking time
as intel_wakeref_is_active(), and do not spuriously declare inactive if
we fail to park (i.e. the parking and wakeref drop is postponed).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503115225.30831-2-chris@chris-wilson.co.uk
Asking the GPU to busywait on a memory address, perhaps not unexpectedly
in hindsight for a shared system, leads to bus contention that affects
CPU programs trying to concurrently access memory. This can manifest as
a drop in transcode throughput on highly over-saturated workloads.
The only clue offered by perf, is that the bus-cycles (perf stat -e
bus-cycles) jumped by 50% when enabling semaphores. This corresponds
with extra CPU active cycles being attributed to intel_idle's mwait.
This patch introduces a heuristic to try and detect when more than one
client is submitting to the GPU pushing it into an oversaturated state.
As we already keep track of when the semaphores are signaled, we can
inspect their state on submitting the busywait batch and if we planned
to use a semaphore but were too late, conclude that the GPU is
overloaded and not try to use semaphores in future requests. In
practice, this means we optimistically try to use semaphores for the
first frame of a transcode job split over multiple engines, and fail if
there are multiple clients active and continue not to use semaphores for
the subsequent frames in the sequence. Periodically, we try to
optimistically switch semaphores back on whenever the client waits to
catch up with the transcode results.
With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:
x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| * |
| *+ |
| **+ |
| **+ x |
| x * +**+ x |
| x x * * +***x xx |
| x x * * *+***x *x |
| x x* + * * *****x *x x |
| + x xx+x* + *** * ********* x * |
| + x xx+x* * *** +** ********* xx * |
| * + ++++* + x*x****+*+* ***+*************+x* * |
|*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *|
| |__________A_____M_____| |
| |_______________A____M_________| |
| |____________A___M________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399
+ 120 2.3826 3.57077 3.25101 3.1414161 0.28146407
Difference at 95.0% confidence
-0.0729792 +/- 0.0629585
-2.27039% +/- 1.95864%
(Student's t, pooled s = 0.248814)
* 120 2.35536 3.66713 3.2849 3.2059917 0.24618565
No difference proven at 95.0% confidence
With 10 clients over-saturating the pipeline:
x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| ++ ** |
| ++ ** |
| ++ ** |
| ++ ** |
| ++ xx *** |
| ++ xx *** |
| ++ xxx*** |
| ++ xxx*** |
| +++ xxx*** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| ++++ xx**** |
| +++++ xx**** |
| +++++ x x****** |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xx******** |
| ++++++ xxxx******** |
| ++++++ xxxx******** |
| ++++++++ xxxxx********* |
|+ + + + ++++++++ xxx*xx**********x* *|
| |__A__| |
| |__AM__| |
| |__A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933
+ 120 1.17367 1.77459 1.71977 1.6966782 0.085850697
Difference at 95.0% confidence
-1.02266 +/- 0.0203502
-37.607% +/- 0.748352%
(Student's t, pooled s = 0.0804246)
* 120 2.57868 3.00821 2.80142 2.7923878 0.058646477
Difference at 95.0% confidence
0.0730476 +/- 0.0169791
2.68622% +/- 0.624383%
(Student's t, pooled s = 0.0671018)
Indicating that we've recovered the regression from enabling semaphores
on this saturated setup, with a hint towards an overall improvement.
Very similar, but of smaller magnitude, results are observed on both
Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
core, in this particular test.
One observation to make here is that for a greedy client trying to
maximise its own throughput, using semaphores is the right choice. It is
only the holistic system-wide view that semaphores of one client
impacts another and reduces the overall throughput where we would choose
to disable semaphores.
The most noticeable negactive impact this has is on the no-op
microbenchmarks, which are also very notable for having no cpu bus load.
In particular, this increases the runtime and energy consumption of
gem_exec_whisper.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
I fumbled the PIPEMISC write into the wrong place. It only gets
called for fastsets, but since value needs to be updated based on
the set of active planes it needs to be done for all plane updates.
Move it to the correct spot.
The symptoms include SDR planes never showing up if a previous
modeset/fastset left the pipe in HDR mode. This was immediately
obvious when running the kms_plane pixel format tests. Unfortunately
the test didn't realize it was scanning out pure black all the time
and declared success anyway.
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Fixes: 09b25812db ("drm/i915: Enable pipe HDR mode on ICL if only HDR planes are used")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190502200607.14504-1-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Currently we submit the semaphore busywait as soon as the signaler is
submitted to HW. However, we may submit the signaler as the tail of a
batch of requests, and even not as the first context in the HW list,
i.e. the busywait may start spinning far in advance of the signaler even
starting.
If we wait until the request before the signaler is completed before
submitting the busywait, we prevent the busywait from starting too
early, if the signaler is not first in submission port.
To handle the case where the signaler is at the start of the second (or
later) submission port, we will need to delay the execution callback
until we know the context is promoted to port0. A challenge for later.
Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchroni
sation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
While at it, rename intel_i2c.c to intel_gmbus.c and the functions to
intel_gmbus_*.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5834b8fbbfd4ac2e3d0159e69c87f6926066f537.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2843b028d65e118dc40316aa84bf620a93f6c67b.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9bc1317a67df0b9d019eca5b36f474b76a1cad26.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9101a58b9f10bcf11332175e17b6e6e45f4ebd17.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/876a1671a84c6839bcafdf276cf9c4e1da6c631c.1556809195.git.jani.nikula@intel.com
Factor out the combo PHY lane power configuration code to a separate
helper; it will be also needed by the next patch adding the same
configuration for DDI ports.
Add support for DDI ports and lane reversal as preparation for the next
patch.
The PWR_DOWN_LN_1 value is unspecified in the BSpec register description
so remove it.
v2:
- Fix up the wrong assumption that the encodings are the same for DDI
and DSI ports. (Jani)
v3:
- Use intel_ instead of icl_ prefix. (Jani)
- Add required headers to intel_combo_phy.h after the upstream header
refactoring.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Madhav Chauhan <madhav.chauhan@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com> (v2)
Link: https://patchwork.freedesktop.org/patch/msgid/20190425185253.3197-1-imre.deak@intel.com
Currently due to regression CI machine displays show corrupt picture.
Problem is when CDCLK is as low as 79200, picture gets unstable, while
DSI and DE pll values were confirmed to be correct. Limiting to 158400
as agreed with Ville.
We could not come up with any better solution yet, as PLL divider values
both for MIPI(DSI PLL) and CDCLK(DE PLL) are correct, however seems that
due to some boundary conditions, when clocking is too low we get wrong
timings for DSI display. Similar workaround exists for VLV though, so
just took similar condition into use. At least that way GLK platform
will start to be usable again, with current drm-tip.
v2: Fixed commit subject as suggested.
v3: Added generic bugs(crc failures, screen not init
for GLK DSI which might be affected).
v4: Added references tag for bugs affected.
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=109267
References: https://bugs.freedesktop.org/show_bug.cgi?id=103184
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190430125119.7478-1-stanislav.lisovskiy@intel.com
The workqueue code complains viciously if we try to queue more work onto
the queue while attampting to drain it. As we asynchronously free
objects and defer their enqueuing with RCU, it is quite tricky to
quiesce the system before attempting to drain the workqueue. Yet drain
we must to ensure that the worker is idle before unloading the module.
Give the freed object drain 3 whole passes with multiple rcu_barrier()
to give the defer freeing of several levels each protected by RCU and
needing a grace period before its parent can be freed, ultimately
resulting in a GEM object being freed after another RCU period.
A consequence is that it will make module unload even slower.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110550
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501135753.8711-1-chris@chris-wilson.co.uk
The pipe has a special HDR mode with higher precision when only
HDR planes are active. Let's use it.
Curiously this fixes the kms_color gamma/degamma tests when
using a HDR plane, which is always the case unless one hacks
the test to use an SDR plane. If one does hack the test to use
an SDR plane it does pass already.
I have no actual explanation how the output after the gamma
LUT can be different between the two modes. The way the tests
are written should mean that the output should be identical
between the solid color vs. the gradient. But clearly that
somehow doesn't hold true for the HDR planes in non-HDR pipe
mode. Anyways, as long as we stick to one type of plane the
test should produce sensible results now.
v2: s/HDR_MODE/HDR_MODE_PRECISION/ (Shashank)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190412183009.8237-2-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Tested-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6aea17072684dec0b04b6831c0c0e5a134edf87e.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87904259868782c1ad664d852b27a50c1597cfaa.1556540890.git.jani.nikula@intel.com