The recent commit [0bdf5a0564: drm/i915: Add reverse mapping between
port and intel_encoder] introduced a reverse mapping to retrieve
intel_dig_port object from the port number. The code assumed that the
port vs intel_dig_port are 1:1 mapping. But in reality, this was a
too naive assumption.
As Martin reported about the missing HDMI audio on his SNB machine,
pre-HSW chips may have multiple intel_dig_port objects corresponding
to the same port. Since we assign the mapping statically at the init
time and the multiple objects override the map, it may not match with
the actually enabled output.
This patch tries to address the regression above. The reverse mapping
is provided basically only for the audio callbacks, so now we set /
clear the mapping dynamically at enabling and disabling HDMI/DP audio,
so that we can always track the latest and correct object
corresponding to the given port.
Fixes: 0bdf5a0564 ('drm/i915: Add reverse mapping between port and intel_encoder')
Reported-and-tested-by: Martin Kepplinger <martink@posteo.de>
Cc: drm-intel-fixes@lists.freedesktop.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456324522-21591-1-git-send-email-tiwai@suse.de
In commit 1e657ad7 we moved the last step of firmware initialization to
skl_display_core_init(), where it will be run only during system resume,
but not during driver loading. Since this init step needs to be done
whenever we program the firmware fix this by moving the initialization
to the end of intel_csr_load_program().
While at it simplify a bit csr_load_work_fn().
This issue prevented DC5/6 transitions, this change will re-enable those.
v2:
- remove debugging left-over and redundant comment in csr_load_work_fn()
Fixes: 1e657ad7a4 ("drm/i915/gen9: Write dc state debugmask bits only once")
CC: Mika Kuoppala <mika.kuoppala@intel.com>
CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457121461-16729-1-git-send-email-imre.deak@intel.com
With full-ppgtt, it takes the GPU an eon to traverse the entire 256PiB
address space, causing a loop to be detected. Under the current scheme,
if ACTHD walks off the end of a batch buffer and into an empty
address space, we "never" detect the hang. If we always increment the
score as the ACTHD is progressing then we will eventually timeout (after
~46.5s (31 * 1.5s) without advancing onto a new batch). To counter act
this, increase the amount we reduce the score for good batches, so that
only a series of almost-bad batches trigger a full reset. DoS detection
suffers slightly but series of long running shader tests will benefit.
Based on a patch from Chris Wilson.
Testcase: igt/drv_hangman/hangcheck-unterminated
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456930109-21532-1-git-send-email-mika.kuoppala@intel.com
Now that the mess with AUX clock divder rounding is sorted out and
we have both cdclk and rawclk cached in dev_priv, we can clean up
the .get_aux_clock_divider() functions a bit.
The main thing here is just calling ilk_get_aux_clock_divider()
from hsw_get_aux_clock_divider() except for the LPT:H special
case.
We could go further and call g4x_get_aux_clock_divider() from
ilk_get_aux_clock_divider() for the PCH ports, but I'm sure Jani
would object, so leave that be.
While at it repeat the comment where the AUX clock comes from
in ilk_get_aux_clock_divider().
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456932138-14004-6-git-send-email-ville.syrjala@linux.intel.com
Generalize rawclk handling by storing it in dev_priv.
Presumably our hrawclk readout works at least for CTG and ELK
since we've been using it for DP AUX on those platforms. There
are no real docs anymore after configdb vanished, so the only
reference is the public CTG GMCH spec. What bits are listed in
that doc match our code. The ELK GMCH spec have no relevant
details unfortunately.
The PNV situation is less clear. Starting from
commit aa17cdb4f8 ("drm/i915: initialize backlight max from VBT")
we assume that the CTG/ELK hrawclk readout works for PNV as well.
At least the results *seem* reasonable for one PNV machine (Lenovo
Ideapad S10-3t). Sadly the PNV GMCH spec doesn't have the goods on
the relevant register either.
So let's keep assuming it works for PNV,ELK,CTG and read it out on
those platforms. G33 also has hrawclk according to some notes
in BSpec, but we don't actually need it for anything, so let's not
even try to read it out there.
v2: Rebase due to IS_VALLYVIEW vs. IS_CHERRYVIEW split
Use KHz() all over, and kill off a few useless temp variables
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456932138-14004-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Currently the wait_for_atomic_us only allows for a jiffie
timeout granularity which is not nice towards callers
requesting small micro-second timeouts.
Re-implement it so micro-second timeout granularity is really
supported and not just in the name of the macro.
This has another beneficial side effect that it improves
"gem_latency -n 100" results by approximately 2.5% (throughput
and latencies) and 3% (CPU usage). (Note this improvement is
relative to not yet merged execlist lock uncontention patch
which moves the CSB MMIO outside this lock.)
It also shrinks some hot functions like fw_domains_get by a
tiny 3%.
v2:
* Warn when used from non-atomic context (if possible).
* Warn on too long atomic waits.
v3:
* Added comment explaining CONFIG_PREEMPT_COUNT.
* Fixed pre-processor indentation.
(Chris Wilson)
v4:
* Commit msg update (gem_latency) and rebase.
v5:
* Commit message re-wording.
* Added comment about no need for double cond check. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
I do not see that this needs to be done atomically and up to
one second is quite a long time to busy loop.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This is for callers who want micro-second precision but are not
waiting from the atomic context.
v2:
* Fix atomic waits. (Dave Gordon)
* Use USEC_PER_SEC and USEC_PER_MSEC. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
As Paulo has noted we can help bisectability by separating computing
watermarks on a noop in 2 separate commits.
This patch no longer clears the crtc watermark state, but recalculates
it completely. Regardless whether a level is used the full values for
each level are calculated. If a level is invalid wm[level].enable is
unset.
Changes since v1:
- Only call ilk_validate_wm_level when level <= usable_level. (Ville)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/56D6D09E.5040007@linux.intel.com
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
The reason for spcial casing 20MHz in the iclkip calculations is that
it would overflow the 7 bit divisor value. Let's rewrite the special
case to check for just that, and bump up auxdiv when needed. This makes
the code work for freqeuencies close to but not exactly 20MHz. The real
lower limit for auxdiv=0 is actually:
172800000/(0x7f+2)*64)=~20930 kHz, and below that we must resort to
auxdiv=1.
Actually this is all very theoretical since we limit the dotclock to
min 25MHz with CRT on all platforms. 25Mhz is actually the documented
limit in Bspec, so it seems we ought to never need to worry about the
auxdiv=1 case. But no harm in having it.
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455738073-14502-5-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Currently we check if the encoder's idea of dotclock agrees with what
we calculated based on the FDI parameters. We do this in the encoder
.get_config() hooks, which isn't so nice in case the BIOS (or some other
outside party) made a mess of the state and we're just trying to take
over.
So as a prep step to being able sanitize such a bogus state, move the
the sanity check to just after we've read out the entire state. If
we then need to sanitize a bad state, it should be easier to move the
sanity check to occur after sanitation instead of before it.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455738073-14502-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
SKL+ needs >4K alignment for tiled surfaces, so make
intel_compute_page_offset() handle it.
The way we do it is first we compute the closest tile boundary
as before, and then figure out how many tiles we need to go
to reach the desired alignment. The difference in the offset
is then added into the x/y offsets.
v2: Be less confusing wrt. units (pixels vs. bytes) (Daniel)
v3: Use u32 for offsets
Have intel_adjust_tile_offset() return the new offset (will be
useful later)
Add an offset_aligned variable (Daniel)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-5-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The page aligned surface address calculation needs to know which way
things are rotated. The contract now says that the caller must pass the
rotate x/y coordinates, as well as the tile_height aligned stride in
the tile_height direction. This will make it fairly simple to deal with
90/270 degree rotation on SKL+ where we have to deal with the rotated
view into the GTT.
v2: Pass rotation instead of bool even thoughwe only care about 0/180 vs. 90/270
v3: Introduce intel_tile_dims(), and don't mix up different units so much
v4: Unconfuse bytes vs. pixels even more
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Assorted changes in the areas of code cleanup, reduction of
invariant conditional in the interrupt handler and lock
contention and MMIO access optimisation.
* Remove needless initialization.
* Improve cache locality by reorganizing code and/or using
branch hints to keep unexpected or error conditions out
of line.
* Favor busy submit path vs. empty queue.
* Less branching in hot-paths.
v2:
* Avoid mmio reads when possible. (Chris Wilson)
* Use natural integer size for csb indices.
* Remove useless return value from execlists_update_context.
* Extract 32-bit ppgtt PDPs update so it is out of line and
shared with two callers.
* Grab forcewake across all mmio operations to ease the
load on uncore lock and use chepear mmio ops.
v3:
* Removed some more pointless u8 data types.
* Removed unused return from execlists_context_queue.
* Commit message updates.
v4:
* Unclumsify the unqueue if statement. (Chris Wilson)
* Hide forcewake from the queuing function. (Chris Wilson)
Version 3 now makes the irq handling code path ~20% smaller on
48-bit PPGTT hardware, and a little bit less elsewhere. Hot
paths are mostly in-line now and hammering on the uncore
spinlock is greatly reduced together with mmio traffic to an
extent.
Benchmarking with "gem_latency -n 100" (keep submitting
batches with 100 nop instruction) shows approximately 4% higher
throughput, 2% less CPU time and 22% smaller latencies. This was
on a big-core while small-cores could benefit even more.
Most likely reason for the improvements are the MMIO
optimization and uncore lock traffic reduction.
One odd result is with "gem_latency -n 0" (dispatching empty
batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which is yet unexplained.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com