Since the worker may rearm, we currently are only guaranteed to flush
the work if we cancel the timer. If the work was running at the time we
try and cancel it, we will wait for it to complete, but it may leave
items in the pool and requeue the work. If we rearrange the immediate
discard of the pool then cancel the work, we know that the work cannot
rearm and so our flush will be final.
<0> [314.146044] i915_mod-1321 2.... 299799443us : intel_gt_fini_buffer_pool: intel_gt_fini_buffer_pool:227 GEM_BUG_ON(!list_empty(&pool->cache_list[n]))
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1920
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200525141957.3061-1-chris@chris-wilson.co.uk
Pre-allocate command buffer in atomic_commit using intel_dsb_prepare
function which also includes pinning and map in cpu domain.
No functional change is dsb write/commit functions.
Now dsb get/put function is removed and ref-count mechanism is
not needed. Below dsb api added to do respective job mentioned
below.
intel_dsb_prepare - Allocate, pin and map the buffer.
intel_dsb_cleanup - Unpin and release the gem object.
RFC: Initial patch for design review.
v2: included _init() part in _prepare(). [Daniel, Ville]
v3: dsb_cleanup called after cleanup_planes. [Daniel]
v4: dsb structure is moved to intel_crtc_state from intel_crtc. [Maarten]
v5: dsb get/put/ref-count mechanism removed. [Maarten]
v6: Based on review feedback following changes are added,
- replaced intel_dsb structure by pointer in intel_crtc_state. [Maarten]
- passing intel_crtc_state to dsp-api to simplify the code. [Maarten]
- few dsb functions prototype modified to simplify code.
v7: added few cosmetic changes suggested by Jani and null check for
crtc_state in dsb_cleanup removed as suggested by Maarten.
v8: changed the function parameter to intel_crtc_state* of
ivb_load_lut_ext_max() from intel_crtc. [Maarten]
v9: error handling improved in _write() and prepare(). [Maarten]
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200520130737.11240-1-animesh.manna@intel.com
According to BSpec max BW per slice is calculated using formula
Max BW = CDCLK * 64. Currently when calculating min CDCLK we
account only per plane requirements, however in order to avoid
FIFO underruns we need to estimate accumulated BW consumed by
all planes(ddb entries basically) residing on that particular
DBuf slice. This will allow us to put CDCLK lower and save power
when we don't need that much bandwidth or gain additional
performance once plane consumption grows.
v2: - Fix long line warning
- Limited new DBuf bw checks to only gens >= 11
v3: - Lets track used Dbuf bw per slice and per crtc in bw state
(or may be in DBuf state in future), that way we don't need
to have all crtcs in state and those only if we detect if
are actually going to change cdclk, just same way as we
do with other stuff, i.e intel_atomic_serialize_global_state
and co. Just as per Ville's paradigm.
- Made dbuf bw calculation procedure look nicer by introducing
for_each_dbuf_slice_in_mask - we often will now need to iterate
slices using mask.
- According to experimental results CDCLK * 64 accounts for
overall bandwidth across all dbufs, not per dbuf.
v4: - Fixed missing const(Ville)
- Removed spurious whitespaces(Ville)
- Fixed local variable init(reduced scope where not needed)
- Added some comments about data rate for planar formats
- Changed struct intel_crtc_bw to intel_dbuf_bw
- Moved dbuf bw calculation to intel_compute_min_cdclk(Ville)
v5: - Removed unneeded macro
v6: - Prevent too frequent CDCLK switching back and forth:
Always switch to higher CDCLK when needed to prevent bandwidth
issues, however don't switch to lower CDCLK earlier than once
in 30 minutes in order to prevent constant modeset blinking.
We could of course not switch back at all, however this is
bad from power consumption point of view.
v7: - Fixed to track cdclk using bw_state, modeset will be now
triggered only when CDCLK change is really needed.
v8: - Lock global state if bw_state->min_cdclk is changed.
- Try getting bw_state only if there are crtcs in the commit
(need to have read-locked global state)
v9: - Do not do Dbuf bw check for gens < 9 - triggers WARN
as ddb_size is 0.
v10: - Lock global state for older gens as well.
v11: - Define new bw_calc_min_cdclk hook, instead of using
a condition(Manasi Navare)
v12: - Fixed rebase conflict
v13: - Added spaces after declarations to make checkpatch happy.
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200520150058.16123-1-stanislav.lisovskiy@intel.com
In Gen11+ whenever we might exceed DBuf bandwidth we might need to
recalculate CDCLK which DBuf bandwidth is scaled with.
Total Dbuf bw used might change based on particular plane needs.
Thus to calculate if cdclk needs to be changed it is not enough
anymore to check plane configuration and plane min cdclk, per DBuf
bw can be calculated only after wm/ddb calculation is done and
all required planes are added into the state. In order to keep
all min_cdclk related checks in one place let's extract it into
separate function, checking and modifying any_ms.
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200519131117.17190-3-stanislav.lisovskiy@intel.com
We need to calculate cdclk after watermarks/ddb has been calculated
as with recent hw CDCLK needs to be adjusted accordingly to DBuf
requirements, which is not possible with current code organization.
Setting CDCLK according to DBuf BW requirements and not just rejecting
if it doesn't satisfy BW requirements, will allow us to save power when
it is possible and gain additional bandwidth when it's needed - i.e
boosting both our power management and perfomance capabilities.
This patch is preparation for that, first we now extract modeset
calculation from modeset checks, in order to call it after wm/ddb
has been calculated.
v2: - Extract only intel_modeset_calc_cdclk from intel_modeset_checks
(Ville Syrjälä)
v3: - Clear plls after intel_modeset_calc_cdclk
v4: - Added r-b from previous revision to commit message
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200519131117.17190-2-stanislav.lisovskiy@intel.com
In order to be valid to dereference during the i915_fence_release, after
retiring the fence and releasing its refererences, we assume that
rq->engine can only be a real engine (that stay intact until the device
is shutdown after all fences have been flushed). However, due to a quirk
of preempt-to-busy, we may retire a request that still belongs to a
virtual engine and so eventually free it with rq->engine being invalid.
To avoid dereferencing that invalid engine, we look at the
execution_mask which if it indicates it may be executed on more than one
engine, we know it originated on a virtual engine and may still be on
one.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906
Fixes: 43acd6516c ("drm/i915: Keep a per-engine request pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200521140617.30015-2-chris@chris-wilson.co.uk
Since the removal of the no-semaphore boosting, we rely on timeslicing to
reorder passed inter-dependency hogs across the engines. However, we
require preemption to support timeslicing into user payloads, and not all
machine support preemption so we do not universally enable timeslicing,
even when it would correctly preempt our own inter-engine semaphores.
Since timeslicing and semaphore priority deboosting is now disabled on
Broadwell/Braswell, we have to follow suite and not use semaphores.
Testcase: igt/gem_exec_schedule/semaphore-codependency # bdw/bsw
Fixes: 18e4af04d2 ("drm/i915: Drop no-semaphore boosting")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200521140617.30015-1-chris@chris-wilson.co.uk
If we decide to timeslice out the current virtual request, we will
unsubmit it while it is still busy (ve->context.inflight == sibling[0]).
If the virtual tasklet and then the other sibling tasklets run before we
completely schedule out the active virtual request for the preemption,
those other tasklets will see that the virtul request is still inflight
on sibling[0] and leave it be. Therefore when we finally schedule-out
the virtual request and if we see that we have passed it back to the
virtual engine, reschedule the virtual tasklet so that it may be
resubmitted on any of the siblings.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200519132046.22443-2-chris@chris-wilson.co.uk
struct drm_device specific drm_WARN* macros include device information
in the backtrace, so we know what device the warnings originate from.
Prefer drm_WARN_ON over WARN_ON.
Conversion is done with below sementic patch:
@@
identifier func, T;
@@
func(...) {
...
struct drm_i915_private *T = ...;
<+...
-WARN_ON(
+drm_WARN_ON(&T->drm,
...)
...+>
}
@@
identifier func, T;
@@
func(struct intel_digital_port *T,...) {
+struct drm_i915_private *i915 = to_i915(T->base.base.dev);
<+...
-WARN_ON(
+drm_WARN_ON(&i915->drm,
...)
...+>
}
changes since v1:
- Add i915 local variable and use it in drm_WARN_ON (Jani)
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504181600.18503-5-pankaj.laxminarayan.bharadiya@intel.com
struct drm_device specific drm_WARN* macros include device information
in the backtrace, so we know what device the warnings originate from.
Prefer drm_WARN_ON over WARN_ON at places where struct i915_power_domains
struct is available.
Conversion is done with below sementic patch:
@@
identifier func, T;
@@
func(struct i915_power_domains *T,...) {
+ struct drm_i915_private *i915 = container_of(T, struct drm_i915_private, power_domains);
<+...
-WARN_ON(
+drm_WARN_ON(&i915->drm,
...)
...+>
}
changes since v1:
- Fix commit subject (Jani)
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504181600.18503-2-pankaj.laxminarayan.bharadiya@intel.com
I've checked a bunch of gen3/4 machines and all seem to have
consistent FSB frequency information in the CLKCFG register.
So let's read out hrawclk on all gen3+ machines. Although
apart from g4x/pnv aux/pps dividers we only really need this
for for i965g/gm cs timestamp increment.
The CLKCFG memory clock values seem less consistent but we
don't care about those here.
For posterity here's a list of CLKCFG vs. FSB dumps from
a bunch of machines (only missing lpt for a full set):
machine CLKCFG FSB
alv1 0x00001411 533
alv2 0x00000420 400 (Chris)
gdg1 0x20000022 800
gdg2 0x20000022 800
cst 0x00010043 666
blb 0x00002034 1333
pnv1 0x00000423 666
pnv2 0x00000433 666
965gm 0x00004342 800
946gz 0x00000022 800
965g 0x00000422 800
g35 0x00000430 1066
0x00000434 1333
ctg1 0x00644056 1066
ctg2 0x00644066 1066
elk1 0x00012420 1066
0x00012424 1333
0x00012436 1600
0x00012422 800
elk2 0x00012040 1066
For the mobile parts the chipset docs generally have these
documented to some degree (alv being the exception).
The two settings w/o any evidence are 0x5=400MHz on desktop
and 0x7=1333MHz on mobile. Though the mobile 1333MHz case
probably doesn't even exist since ctg is only documented
to go up to 1066MHz.
v2: Fix 400mhz readout for Chris's alv/celeron machine
Do a clean mobile vs. dekstop split since that's really
what seems to be going on
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200514123838.3017-3-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>