In order to simplify future patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.
v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).
Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2]
v3: Comments for hangcheck paranoia. Mika wanted to keep the extra
barrier inside the hangcheck, just in case. I can argue that it doesn't
provide a barrier against anything, but the side-effects of applying the
barrier may prevent a false declaration of a hung GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk
Currently for the case where there is enough space at the end of Ring
buffer for accommodating only the base request, the wrapround is done
immediately and as a result the base request gets added at the start
of Ring buffer. But there may not be enough free space at the beginning
to accommodate the base request, as before the wraparound, the wait was
effectively done for the reserved_size free space from the start of
Ring buffer. In such a case there is a potential of Ring buffer overflow,
the instructions at the head of Ring (ACTHD) can get overwritten.
Since the base request can fit in the remaining space, there is no need
to wraparound immediately. The wraparound will anyway happen later when
the reserved part starts getting used.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1457688402-10411-1-git-send-email-akash.goel@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
At BXT DSI, PIPE registers are inactive. So we can't get the
PIPE's mode parameters from them. The possible option is
retriving them from the PORT registers.
The required changes are added for BXT in intel_dsi_get_config
(encoder->get_config).
v2: Addressed the Jani's comments
-removed the redundant call to encoder->get_config
-read bpp from port register
-removed retrival of src_size from encoder->get_config
v3: pipe_config->pipe_bpp is fixed
Jani's review comments addressed:
Few horizontal timing parameters dropped from the patch to make
progress, as there seems to be some disagreement on
best/feasible/possible options.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Previously Reviewed at: https://lists.freedesktop.org/archives/intel-gfx/2016-April/091737.html
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460019967-26501-2-git-send-email-ramalingam.c@intel.com
This patch sets the invert bit for hpd detection for each port
based on VBT configuration. Since each AOB can be designed to
depend on invert bit or not, it is expected if an AOB requires
invert bit, the user will set respective bit in VBT.
v2: Separated VBT parsing from the rest of the logic. (Jani)
v3: Moved setting invert bit logic to bxt_hpd_irq_setup()
and changed its logic to avoid looping twice. (Ville)
v4: Changed the logic to mask out the bits first and then
set them to remove need of temporary variable. (Ville)
v5: Moved defines to existing set of defines for the register
and added required breaks. (Ville)
Signed-off-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Shubhangi Shrivastava <shubhangi.shrivastava@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[Jani: fixed some checkpatch noise, added kernel-doc.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459420907-11383-2-git-send-email-shubhangi.shrivastava@intel.com
Extract the GPLL reference frequency from CCK and use it in the
GPU freq<->opcode conversions on VLV/CHV. This eliminates all the
assumptions we have about which divider is used for which czclk
frequency.
Note that unlike most clocks from CCK, the GPLL ref clock is a divided
down version of the CZ clock rather than the HPLL clock. CZ clock itself
is a divided down version of the HPLL clock though, so in effect it just
gets divided down twice.
While at it, throw in a few comments explaining the remaining constants
for anyone who later wants to compare this to the spreadsheets.
v2: Add slow/fast notes for CHV clocks (Imre)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457120584-26080-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com> (v1)
After a suspend-resume cycle, the resumed kernel has no idea what the
booted kernel may have done to the GuC before replacing itself with the
resumed image. In particular, it may have already loaded the GuC with
firmware, which will then cause this kernel's attempt to (re)load the
firmware to fail (GuC program memory is write-once!). The symptoms
(GuC firmware reload fails after hibernation) are further described
in the Bugzilla reference below.
So let's *always* reset the GuC just before (re)loading the firmware;
the hardware should then be in a well-known state, and we may even
avoid some of the issues arising from unpredictable timing.
Also added some more fields & values to the definition of the GUC_STATUS
register, which is the key diagnostic indicator if the GuC load fails.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94390
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Due to timing issues in the HW, some of the status bits required for GuC
authentication occasionally don't get set; when that happens, the GuC
cannot be initialized and we will be left with a wedged GPU. The W/A
suggested is to perform a soft reset of the GuC and attempt to reload
the F/W again for few times before giving up.
As the failure is dependent on timing, tests performed by triggering
manual full gpu reset (i915_wedged) showed that we could sometimes hit
this after several thousand iterations, but sometimes tests ran even
longer without any issues. Reset and reload mechanism proved helpful
when we indeed hit f/w load failure, so it is better to include this
to improve driver stability.
This change implements the following WAs,
WaEnableuKernelHeaderValidFix:skl,bxt
WaEnableGuCBootHashCheckNotSet:skl,bxt
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.
Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
Modules linked in: [redacted for brevity]
CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G U L 4.5.0-160321+ #183
Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
Workqueue: i915 gen6_pm_rps_work [i915]
task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
RIP: 0010:[<ffffffff8104a3c2>] [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
RSP: 0000:ffff88014f403f38 EFLAGS: 00000206
RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
FS: 0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
Call Trace:
<IRQ>
[<ffffffff8104a716>] irq_exit+0x86/0x90
[<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
[<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
<EOI>
[<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
[<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
[<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
[<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
[<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
[<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
[<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
[<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
[<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
[<ffffffff8105ab29>] process_one_work+0x139/0x350
[<ffffffff8105b186>] worker_thread+0x126/0x490
[<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
[<ffffffff8105fa64>] kthread+0xc4/0xe0
[<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
[<ffffffff814f351f>] ret_from_fork+0x3f/0x70
[<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.
When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.
By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.
Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:
gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us
This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:
gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us
Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.
Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)
More interesting scenario with regards to throughput is
"gem_latency -n 100" which shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.
I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.
v2:
* execlists_lock should be taken as spin_lock_bh when
queuing work from userspace now. (Chris Wilson)
* uncore.lock must be taken with spin_lock_irq when
submitting requests since that now runs from either
softirq or process context.
v3:
* Expanded commit message with more testing data;
* converted missed locking sites to _bh;
* added execlist_lock comment. (Chris Wilson)
v4:
* Mention dispatch parallelism in commit. (Chris Wilson)
* Do not hold uncore.lock over MMIO reads since the block
is already serialised per-engine via the tasklet itself.
(Chris Wilson)
* intel_lrc_irq_handler should be static. (Chris Wilson)
* Cancel/sync the tasklet on GPU reset. (Chris Wilson)
* Document and WARN that tasklet cannot be active/pending
on engine cleanup. (Chris Wilson/Imre Deak)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
DPLL_MD(PIPE_C) is AWOL on CHV. Instead of fixing it someone added
chicken bits to propagate the pixel multiplier from DPLL_MD(PIPE_B)
to either pipe B or C. So do that to make pixel repeat work on pipes
B and C. Pipe A is fine without any tricks.
Fortunately the pixel repeat propagation appears to be a oneshot
operation, so once the value has been written we can clear the
chicken bits. So it is still possible to drive pipe B and C with
different pixel multipliers simultaneosly.
Looks like DPLL_VGA_MODE_DIS must also be set in DPLL(PIPE_B)
for this to work. But since we keep that bit always set in all
DPLLs there's no problem.
This of course means we can't reliably read out the pixel multiplier
for pipes B and C. That would make the state checker unhappy, so I
added shadow copies of those registers in to dev_priv. The other
option would have been to skip pixel multiplier, dpll_md an dotclock
checks entirely on CHV, but that feels like a serious loss of cross
checking, so just pretending that we have working DPLL MD registers
seemed better. Obviously with the shadow copies we can't detect if
the pixel multiplier was properly configured, nor can we take over
its state from the BIOS, but hopefully people won't have displays
that would be limitd to such crappy modes.
There is one strange flicker still remaining. It's visible on
pipe C/HDMID when HDMIB is enabled while driven by pipe B.
It doesn't occur if pipe A drives HDMIB, nor is there any glitch
on pipe B/HDMIB when port C/HDMID starts up. I don't have a board
with HDMIC so not sure if it happens there too. So I'm not sure
if it's somehow tied in with this strange linkage between pipe B
and C. Sadly I was unable to find an enable sequence that would
avoid the glitch, but at least it's not fatal ie. the output
recovers afterwards.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458052809-23426-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
The VLV and CHV DPLL disable and update are almost identical in
how the DPLL/DPLL_MD registers need to be set up. But the code
looks more different than it really is. Try to bring them into
line.
Note that we now leave the refclock always enabled for both
DPLLs in the dual channel PHY. But that's perfectly fine since
it's the same clock, and we anyway already do that when turning
the disp2d power well on.
v2: s/chv_update_pll/chv_compute_dpll/
v3: Add a note that we leave refclocks enabled for both DPLLs (Jani)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458052809-23426-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Bspec is confused w.r.t. the HSW/BDW FDI disable sequence. It lists
FDI RX disable both as step 13 and step 18 in the sequence. But I dug
up an old BUN mail from Art that moved the FDI RX disable to happen
before DDI_BUF_CTL disable. That BUN did not renumber the steps and just
added a note:
"Workaround: Disable PCH FDI Receiver before disabling DDI_BUF_CTL."
The BUN described the symptoms of the fixed issue as:
"PCH display underflow and a black screen on the analog CRT port that
happened after a FDI re-train"
I suppose later someone tried to renumber the steps to match, but forgot
to remove the FDI RX disable from its old position in the sequence.
They also forgot to update the note describing what should be done in
case of an FDI training failure. Currently it says:
"To retry FDI training, follow the Disable Sequence steps to Disable FDI,
but skip the steps related to clocks and PLLs (16, 19, and 20), ..."
It should really say "17, 20, and 21" with the current sequence because
those are the steps that deal with PLLs and whatnot, after step 13 became
FDI RX disable. And had the step 18 FDI RX disable been removed, as I
suspect it should have, the note should actually say "17, 19, and 20".
So, let's move the FDI RX disable to happen before DDI_BUF_CTL disable,
as that would appear to be the correct order based on the BUN.
Note that Art has since unconfused the spec, and so this patch should
now match the steps listed in the spec.
v2: Add a note that the spec is now correct
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Art Runyan <arthur.j.runyan@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456841783-4779-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>