We don't handle resetting the kernel context very well, or presumably any
context executing its breadcrumb commands in the ring as opposed to the
batchbuffer and flush. If we trigger a device reset twice in quick
succession while the kernel context is executing, we may end up skipping
the breadcrumb. This is really only a problem for the selftest as
normally there is a large interlude between resets (hangcheck), or we
focus on resetting just one engine and so avoid repeatedly resetting
innocents.
Something to try would be a preempt-to-idle to quiesce the engine before
reset, so that innocent contexts would be spared the reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180330131801.18327-1-chris@chris-wilson.co.uk
If we skip the intel_prepare_reset(), we should also skip the
intel_display_reset(). If we we use a flag set by intel_prepare_reset()
then we do not have to second guess based on external user controlled
state whether or not the prepare was called before deciding to finish
it after the reset. igt/gem_eio is one such example that may tweak
i915.reset faster than the code is expecting, leading to
[ 190.233528] =====================================
[ 190.233534] WARNING: bad unlock balance detected!
[ 190.233540] 4.16.0-rc7-g335ef9849310-drmtip_10+ #1 Tainted: G U
[ 190.233547] -------------------------------------
[ 190.233553] gem_eio/1348 is trying to release lock (crtc_ww_class_acquire) at:
[ 190.233569] [<ffffffff895c7810>] drm_modeset_acquire_fini+0x0/0x60
[ 190.233575] but there are no more locks to release!
[ 190.233580]
other info that might help us debug this:
[ 190.233588] 3 locks held by gem_eio/1348:
[ 190.233592] #0: (&f->f_pos_lock){+.+.}, at: [<00000000ab90c784>] __fdget_pos+0x3a/0x50
[ 190.233607] #1: (sb_writers#11){.+.+}, at: [<00000000e1529265>] vfs_write+0x188/0x1a0
[ 190.233622] #2: (&attr->mutex){+.+.}, at: [<0000000011f40afe>] simple_attr_write+0x36/0xd0
[ 190.233635]
stack backtrace:
[ 190.233644] CPU: 0 PID: 1348 Comm: gem_eio Tainted: G U 4.16.0-rc7-g335ef9849310-drmtip_10+ #1
[ 190.233655] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005
[ 190.233664] Call Trace:
[ 190.233674] dump_stack+0x67/0x95
[ 190.233682] ? drm_modeset_backoff+0x1b0/0x1b0
[ 190.233690] print_unlock_imbalance_bug+0xd2/0xe0
[ 190.233698] ? drm_modeset_backoff+0x1b0/0x1b0
[ 190.233704] lock_release+0x23e/0x300
[ 190.233712] drm_modeset_acquire_fini+0x16/0x60
[ 190.233835] intel_finish_reset+0x72/0x160 [i915]
[ 190.233894] i915_reset_device+0x1e9/0x240 [i915]
[ 190.233953] ? __intel_get_crtc_scanline+0x1c0/0x1c0 [i915]
[ 190.233962] ? work_on_cpu_safe+0x50/0x50
[ 190.234020] i915_handle_error+0x1f2/0x470 [i915]
[ 190.234031] ? __might_fault+0x39/0x90
[ 190.234037] ? __might_fault+0x39/0x90
[ 190.234099] i915_wedged_set+0x7f/0xc0 [i915]
[ 190.234107] simple_attr_write+0xb0/0xd0
[ 190.234117] full_proxy_write+0x51/0x80
[ 190.234125] __vfs_write+0x21/0x140
[ 190.234133] ? rcu_read_lock_sched_held+0x6f/0x80
[ 190.234140] ? rcu_sync_lockdep_assert+0x29/0x50
[ 190.234147] ? __sb_start_write+0x152/0x1f0
[ 190.234152] ? __sb_start_write+0x168/0x1f0
[ 190.234159] vfs_write+0xbd/0x1a0
[ 190.234166] SyS_write+0x40/0xa0
[ 190.234173] ? do_syscall_64+0x19/0x1b0
[ 190.234180] do_syscall_64+0x6b/0x1b0
[ 190.234188] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 190.234196] RIP: 0033:0x7f84c1b392b7
[ 190.234201] RSP: 002b:00007f84b6755b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[ 190.234211] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f84c1b392b7
[ 190.234218] RDX: 0000000000000002 RSI: 000055ec20abc8d6 RDI: 0000000000000046
[ 190.234225] RBP: 000055ec20abc8d6 R08: 0000000000000000 R09: 0000000000000000
[ 190.234231] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
[ 190.234238] R13: 0000000000000000 R14: 00007f84b0000b20 R15: 000055ec20ce4eb8
Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180405123714.3638-1-chris@chris-wilson.co.uk
As per DP spec when R0 mismatch is detected, HDCP source supported
re-read the R0 atleast twice.
And For HDMI and DP minimum wait required for the R0 availability is
100mSec. So this patch changes the wait time to 100mSec but retries
twice with the time interval of 100mSec for each attempt.
This patch is needed for DP HDCP1.4 CTS Test: 1A-06.
v2:
No Change
v3:
Comment on R0 retry is moved closer to the code[Seanpaul]
v4:
Removing unwanted noise introduced in v3.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1522669822-2508-1-git-send-email-ramalingam.c@intel.com
Commit 'aee3bac0a3a8 ("drm/i915/psr: Tie PSR2 support to Y
coordinate requirement")' got merged to drm-intel-next-queued
but the variable was defined commit 'c5fe47327b06 ("drm: Add PSR
version 3 macro") who was merged through drm-misc.
So backmerging to get drm-intel-next-queued compiling back again.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Although i915 don't implement aux sync frame through tests was
findout that pannels can do selective update when the y-coordinate
is also included in SDP, that is why it is required to run PSR2 in
i915.
So moving to only one place the sink requirements that the actual
driver needs to enable PSR2.
Also intel_psr2_config_valid() is called every time the crtc config
is computed, wasting some time every time it was checking for
Y coordinate requirement.
This allow us to nuke y_cord_support and some of VSC setup code that
was handling a scenario that would never happen(PSR2 without Y
coordinate).
Also here renaming intel_dp_get_y_cord_status() to
intel_dp_get_y_coord_required() as it more accurate to the name and
function of bit according to eDP spec.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180328223046.16125-4-jose.souza@intel.com
eDP spec states that aux frame is required to do PSR2 selective
update but i915 don't fully implement it. It sends the aux frame
sync messages but the value is always zero as the GTC is not enabled
in driver.
Through tests was findout that pannels can do selective update when
the y-coordinate is also included in SDP, that is why it is required
to run PSR2 in i915.
A dummy value is not useful at all to sink, so removing everything
related to aux frame sync, if GTC is enabled we can bring this back.
Cc: Vathsala Nagaraju <vathsala.nagaraju@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180328223046.16125-3-jose.souza@intel.com
Only sleep and repeat when asked for a full device reset (ALL_ENGINES)
and avoid using sleeping waits when asked for a per-engine reset. The
goal is to be able to use a per-engine reset from hardirq/softirq/timer
context. A consequence is that our individual wait timeouts are a
thousand times shorter, on the order of a hundred microseconds rather
than hundreds of millisecond. This may make hitting the timeouts more
common, but hopefully the fallover to the full-device reset will be
sufficient to pick up the pieces.
Note, that the sleeps inside older gen (pre-gen8) have been left as they
are only used in full device reset mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180329224519.13598-1-chris@chris-wilson.co.uk
- Mask mode type garbage from userspace (Ville)
Something went wrong on the misc tree side, but I'll pull the patch directly.
* 'drm-misc-next-fixes' of git://anongit.freedesktop.org/drm/drm-misc:
drm: Fix uabi regression by allowing garbage mode->type from userspace
Tvrtko uncovered a fun issue with recovering from a wedge device. In his
tests, he wedged the driver by injecting an unrecoverable hang whilst a
batch was spinning. As we reset the gpu in the middle of the spinner,
when resumed it would continue on from the next instruction in the ring
and write it's breadcrumb. However, on wedging we updated our
bookkeeping to indicate that the GPU had completed executing and would
restart from after the breadcrumb; so the emission of the stale
breadcrumb from before the reset came as a bit of a surprise.
A simple fix is to when rebinding the context into the GPU, we update
the ring register state in the context image to match our bookkeeping.
We already have to update the RING_START and RING_TAIL, so updating
RING_HEAD as well is trivial. This works because whenever we unbind the
context, we keep the bookkeeping in check; and on wedging we unbind all
contexts.
Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180327210136.16750-1-chris@chris-wilson.co.uk
Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
- GPUVM support for dGPUs
- KFD events support for dGPUs
- Fix live-lock situation when restoring multiple evicted processes
- Fix VM page table allocation on large-bar systems
- Fix for build failure on frv architecture
* tag 'drm-amdkfd-next-2018-03-27' of git://people.freedesktop.org/~gabbayo/linux:
drm/amdkfd: Use ordered workqueue to restore processes
drm/amdgpu: Fix acquiring VM on large-BAR systems
drm/amdkfd: Add module option for testing large-BAR functionality
drm/amdkfd: Kmap event page for dGPUs
drm/amdkfd: Add ioctls for GPUVM memory management
drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
drm/amdkfd: Add per-process IDR for buffer handles
drm/amdkfd: Aperture setup for dGPUs
drm/amdkfd: Remove limit on number of GPUs
drm/amdkfd: Populate DRM render device minor
drm/amdkfd: Create KFD VMs on demand
drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
drm/amdgpu: Add helper to turn an existing VM into a compute VM
drm/amdgpu: Fix initial validation of PD BO for KFD VMs
drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
drm/amdkfd: fix uninitialized variable use
drm/amdkfd: add missing include of mm.h
- Display fixes for booting with MST hub lid closed and display
freezing after hibernation (fd.o bugs 105470 & 105196)
- Fix for a very rare interrupt handling race resulting in GPU hang
* tag 'drm-intel-next-fixes-2018-03-27' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Fix hibernation with ACPI S0 target state
drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt
drm/i915: Specify which engines to reset following semaphore/event lockups
drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.