During transcoder enabling we'll configure the transcoder in MST mode
and enable the VC payload allocation, which will start the ACT sequence.
Before waiting for the ACT sequence completion, we need to clear the ACT
sent flag, but based on the above we can do this right before enabling
the transcoder.
For clarity, move the flag clearing closer to where we wait for it.
While at it also factor out some common code.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616141855.746-3-imre.deak@intel.com
During the initial probing of an MST sink, MST core will determine the
sink's link bandwidth based on its own version of the sink link
rate/lane count caps it reads from the DPCD. At a later point (after
probing and 1 or more modesets) i915 may limit the link parameters wrt.
the original source/sink common caps above due to link training failures
during a modeset and the resulting link training fallback logic.
Based on the above a modeset following another modeset with a link
training error will compute the i915 HW specific and DP protocol timing
parameters (data/link M/N and MST TU values) taking into account only
the unlimited source/sink common caps, but not taking into account the
fallback limits. This will also let DRM core oversubscribe the actual
link bandwidth during the MST payload allocation.
Prevent the above problem by disabling the link training fallback on MST
links for now, until the MST probe time initialization and the MST
compute config logic can deal with changing link parameters.
The misconfigured timings lead at least to a
'Timed out waiting for DP idle patterns'
error.
v2: (Ville)
- Print link training error message on the MST path too.
- Clarify the problem in the commit log.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616211146.23027-2-imre.deak@intel.com
MST encoders must use the master MST transcoder's DP_TP_STATUS and
DP_TP_CONTROL registers. Atm, during the HW readout of an MST encoder
connected to a slave transcoder we reset these register addresses in
intel_dp::regs.dp_tp_* to the slave transcoder's DP_TP_* register
addresses incorrectly; fix this.
One example where the above overwite happens is the encoder HW state
validation after enabling multiple streams; see
intel_dp_mst_enc_get_config(). After that during disabling any stream
we'll get a
'Timed out waiting for ACT sent when disabling'
error, due to reading from the incorrect DP_TP_STATUS register.
This change replaces
https://patchwork.freedesktop.org/patch/369577/?series=78193&rev=1
which just papered over the problem.
v2:
- Correct the failure scenario in the commit log. (José)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616211146.23027-1-imre.deak@intel.com
Start using device specific parameters instead of module parameters for
most things. The module parameters become the immutable initial values
for i915 parameters. The device specific parameters in i915->params
start life as a copy of i915_modparams. Any later changes are only
reflected in the debugfs.
The stragglers are:
* i915.force_probe and i915.modeset. Needed before dev_priv is
available. This is fine because the parameters are read-only and never
modified.
* i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and
I915_STATE_WARN_ON would result in massive and ugly churn. This is
handled by not exposing the parameter via debugfs, and leaving the
parameter writable in sysfs. This may be fixed up in follow-up work.
* i915.inject_probe_failure. Only makes sense in terms of the module,
not the device. This is handled by not exposing the parameter via
debugfs.
v2: Fix uc i915 lookup code (Michał Winiarski)
Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
We only emit the renderstate once now during module load, it is no
longer a concern that we are delaying context creation and so do not
need to so eagerly optimise. Since the last time we have looked at the
renderstate, we have a pin_map / flush_map facility that supports simple
single mappings, replacing the open-coded kmap_atomic() and
prepare_write. As it should be a single page, of which we only write a
small portion, we stick to a simple WB [kmap] and use clflush on !llc
platforms, rather than creating a temporary WC vmapping for the single
page.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200619234543.17499-2-chris@chris-wilson.co.uk
Not too long ago, we realised we had issues with a rolling back a
context so far for a preemption request we considered the resubmit not
to be a rollback but a forward roll. This means we would issue a lite
restore instead of forcing a full restore, continuing execution of the
old requests rather than causing a preemption. Add a selftest to
exercise such a far rollback, such that if we were to skip the full
restore, we would execute invalid instructions in the ring and hang.
Note that while I was able to confirm that this causes us to do a
lite-restore preemption rollback (with commit e36ba817fa ("drm/i915/gt:
Incrementally check for rewinding") disabled), it did not trick the HW
into rolling past the old RING_TAIL. Myybe on other HW.
References: e36ba817fa ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616185518.11948-1-chris@chris-wilson.co.uk
Alexandre Oliva has recently removed these files from Linux Libre
with concerns that the sources weren't available.
The sources are available on IGT repository, and only open source
tools are used to generate the {ivb,hsw}_clear_kernel.c files.
However, the remaining concern from Alexandre Oliva was around
GPL license and the source not been present when distributing
the code.
So, it looks like 2 alternatives are possible, the use of
linux-firmware.git repository to store the blob or making sure
that the source is also present in our tree. Since the goal
is to limit the i915 firmware to only the micro-controller blobs
let's make sure that we do include the asm sources here in our tree.
Btw, I tried to have some diligence here and make sure that the
asms that these commits are adding are truly the source for
the mentioned files:
igt$ ./scripts/generate_clear_kernel.sh -g ivb \
-m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "ivb-cb_assembled"
Generating gen7 CB Kernel assembled file "ivb_clear_kernel.c"
for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/ivb_clear_kernel.c \
ivb_clear_kernel.c
< * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:29:32 AM UTC
> * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:00:54 AM PDT
61c61
< };
> };
\ No newline at end of file
igt$ ./scripts/generate_clear_kernel.sh -g hsw \
-m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "hsw-cb_assembled"
Generating gen7.5 CB Kernel assembled file "hsw_clear_kernel.c"
for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/hsw_clear_kernel.c \
hsw_clear_kernel.c
5c5
< * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:30:13 AM UTC
> * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:01:42 AM PDT
61c61
< };
> };
\ No newline at end of file
Used IGT and Mesa master repositories from Fri Jun 5 2020)
IGT: 53e8c878a6fb ("tests/kms_chamelium: Force reprobe after replugging
the connector")
Mesa: 5d13c7477eb1 ("radv: set keep_statistic_info with
RADV_DEBUG=shaderstats")
Mesa built with: meson build -D platforms=drm,x11 -D dri-drivers=i965 \
-D gallium-drivers=iris -D prefix=/usr \
-D libdir=/usr/lib64/ -Dtools=intel \
-Dkulkan-drivers=intel && ninja -C build
v2: Header clean-up and include build instructions in a readme (Chris)
Modified commit message to respect check-patch
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003374.html
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003375.html
Fixes: 47f8253d2b ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Cc: <stable@vger.kernel.org> # v5.7+
Cc: Alexandre Oliva <lxoliva@fsfla.org>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200610201807.191440-1-rodrigo.vivi@intel.com
Just in case everything fails (like for example "missed interrupt
syndrome" on Sandybridge), always flush the submission tasklet from the
heartbeat. This papers over such issues, but will still appear as a
second long glitch, and prevents us from detecting it unless we happen
to be performing a timed test.
v2: We rely on flush_submission() synchronizing with the tasklet on
another CPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615165013.22973-3-chris@chris-wilson.co.uk
gen3 does not fully flush MI stores to memory on MI_FLUSH, such that a
subsequent read from e.g. the sampler can bypass the store and read the
stale value from memory. This is a serious issue when we are using MI
stores to rewrite the batches for relocation, as it means that the batch
is reading from random user/kernel memory. While it is particularly
sensitive [and detectable] for relocations, reading stale data at any
time is a worry.
Having started with a small number of delaying stores and doubling until
no more incoherency was seen over a few hours (with and without
background memory pressure), 32 was the magic number.
Note that it definitely doesn't fix the issue, merely adds a long delay
between requests, sufficient to mostly hide the problem, enough to raise
the mtbf to several hours. This is merely a stop gap.
v2: Follow more closer with the gen5 w/a and include some
post-invalidate flushes as well.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2018
References: a889580c08 ("drm/i915: Flush GPU relocs harder for gen3")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612123949.7093-1-chris@chris-wilson.co.uk
Reduce the smoke depth by trimming the number of contexts, repetitions
and wait times. This is in preparation for a less greedy scheduler that
tries to be fair across contexts, resulting in a great many more context
switches. A thousand context switches may be 50-100ms, causing us to
timeout as the HW is not fast enough to complete the deep smoketests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-5-chris@chris-wilson.co.uk
If we find ourselves trying to reuse a misplaced but active vma, we
currently try to discard it to avoid having to wait to unbind it
(upsetting the current user fo the vma). An alternative to marking it as
a dicarded vma and keeping it in both the obj->vma.list and
obj->vma.tree, is to simply remove it from the lookup rbtree.
While it remains in the list of vma, it will be unbound under eviction
pressure and freed along with the object. We will never reuse it again
for new instances. As before, with no pruning, the list may continually
grow, but eventually we will have the most constrained version of the
ggtt view that meets all requirements -- so the list of vma should not
grow without bound.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2012
Fixes: 9bdcaa5e3a ("drm/i915: Discard a misplaced GGTT vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200611180421.23262-1-chris@chris-wilson.co.uk
Currently MST on a port can get enabled/disabled from the hotplug work
and get disabled from the short pulse work in a racy way. Fix this by
relying on the MST state checking in the hotplug work and just schedule
a hotplug work from the short pulse handler if some problem happened
during the MST interrupt handling.
This removes the explicit MST disabling in case of an AUX failure, but
if AUX fails, then probably the detection will also fail during the
scheduled hotplug work and it's not guaranteed that we'll see
intermittent errors anyway.
While at it also simplify the error checking of the MST interrupt
handler.
v2:
- Convert intel_dp_check_mst_status() to return bool. (Ville)
- Change the intel_dp->is_mst check to an assert, since after this patch
the condition can't change after we checked it previously.
- Document the return value from intel_dp_check_mst_status().
v3:
- Remove the intel_dp->is_mst check from intel_dp_check_mst_status().
There is no point in checking the same condition twice, even though
there is a chance that the hotplug work running concurrently changes
it.
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200605094801.17709-1-imre.deak@intel.com
According to BSpec the Data Island Packet should be disabled after
disabling the transcoder, but before the transcoder clock select is set
to none. On an ICL RVP, daisy-chained MST config not following this
leads to a hang with the following MCE when disabling the output:
[ 870.948739] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: ba00000011000402
[ 871.019212] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81aca652> {poll_idle+0x92/0xb0}
[ 871.019212] mce: [Hardware Error]: TSC 135a261fe61
[ 871.019212] mce: [Hardware Error]: PROCESSOR 0:706e5 TIME 1591739604 SOCKET 0 APIC 0 microcode 20
[ 871.019212] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 871.019212] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 871.019212] Kernel panic - not syncing: Fatal machine check
[ 871.019212] Kernel Offset: disabled
Bspec: 4287
Fixes: fa37a21327 ("drm/i915: Stop sending DP SDPs on ddi disable")
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609220616.6015-1-imre.deak@intel.com
In commit 5ba32c7be8 ("drm/i915/execlists: Always force a context
reload when rewinding RING_TAIL"), we placed the check for rewinding a
context on actually submitting the next request in that context. This
was so that we only had to check once, and could do so with precision
avoiding as many forced restores as possible. For example, to ensure
that we can resubmit the same request a couple of times, we include a
small wa_tail such that on the next submission, the ring->tail will
appear to move forwards when resubmitting the same request. This is very
common as it will happen for every lite-restore to fill the second port
after a context switch.
However, intel_ring_direction() is limited in precision to movements of
upto half the ring size. The consequence being that if we tried to
unwind many requests, we could exceed half the ring and flip the sense
of the direction, so missing a force restore. As no request can be
greater than half the ring (i.e. 2048 bytes in the smallest case), we
can check for rollback incrementally. As we check against the tail that
would be submitted, we do not lose any sensitivity and allow lite
restores for the simple case. We still need to double check upon
submitting the context, to allow for multiple preemptions and
resubmissions.
Fixes: 5ba32c7be8 ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609151723.12971-1-chris@chris-wilson.co.uk
Rocket Lake uses the same 'abox0' mechanism to handle pixel data
transfers from memory that gen11 platforms used, rather than the
abox1/abox2 interfaces used by TGL/DG1. For the most part this is a
hardware implementation detail that's transparent to driver software,
but we do have to program a couple of tuning registers (MBUS_ABOX_CTL
and BW_BUDDY registers) according to which ABOX instances are used by a
platform. Let's track the platform's ABOX usage in the device info
structure and use that to determine which instances of these registers
to program.
As an exception to this rule is that even though TGL/DG1 use ABOX1+ABOX2
for data transfers, we're still directed to program the ABOX_CTL
register for ABOX0; so we'll handle that as a special case.
v2:
- Store the mask of platform-specific abox registers in the device
info structure.
- Add a TLB_REQ_TIMER() helper macro. (Aditya)
v3:
- Squash ABOX and BW_BUDDY patches together and use a single mask for
both of them, plus a special-case for programming the ABOX0 instance
on all gen12. (Ville)
Bspec: 50096
Bspec: 49218
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Aditya Swarup <aditya.swarup@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200606025740.3308880-2-matthew.d.roper@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
This reverts commit 82ea174dc5.
Unfortunately according to our recent findings there is still some
unidentified factor, requiring CDCLK to be set higher - otherwise we
still get underruns on some multipipe configurations, despite CDCLK
being set according to BSpec formula. So getting again back into debug
mode to indentify the cause, meanwhile setting CDCLK=Pixel rate back in
order to remove regression in 10% of the cases due to FIFO underruns.
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Fixes: cd19154608 ("drm/i915: Adjust CDCLK accordingly to our DBuf bw needs")
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200608065552.21728-1-stanislav.lisovskiy@intel.com