Commit Graph

1265414 Commits

Author SHA1 Message Date
Nirmoy Das
e29a7a34c3 drm/xe: Remove uninitialized end var from xe_gt_tlb_invalidation_range()
This fixes commit c4f1870362 ("drm/xe: Add
xe_gt_tlb_invalidation_range and convert PT layer to use this")
which added the end variable as part of the function param.

v2: Add fixes tag(Matt)

Fixes: c4f1870362 ("drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240429203039.26918-1-nirmoy.das@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-30 06:13:33 -07:00
Lucas De Marchi
4caf410766 drm/xe: Merge 16021540221 and 18034896535 WAs
In order to detect duplicate implementations for the same workaround,
early in the implementation of RTP it was decided to error out even if
the values set are exactly the same. With the introduction of 18034896535
in commit 74671d23ca ("drm/xe/xe2: Add workaround 18034896535"), LNL
stepping with graphics stepping A1 now gives the following error on
module load:

	xe 0000:00:02.0: [drm] *ERROR* GT0: [GT OTHER] \
	discarding save-restore reg e48c (clear: 00000200, set: 00000200,\
	masked: yes, mcr: yes): ret=-22

RTP may be improved in the future, but for now simply join the entries
like done with e.g. "1607297627, 1607030317, 1607186500".

Fixes: 74671d23ca ("drm/xe/xe2: Add workaround 18034896535")
Cc: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240427135339.3485559-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-29 08:03:23 -07:00
Shekhar Chauhan
bb442bfb9b drm/xe/xe2hpg: Add Wa_14021490052
Add Wa_14021490052 for Xe2HPG 20.01.

Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424034247.1352755-1-shekhar.chauhan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-29 07:04:39 -07:00
Matthew Brost
98ad158e54 drm/xe: Delete PT update selftest
IGTs (e.g. xe_vm) can provide the exact same coverage as the PT update
selftest. The PT update selftest is dependent on internal functions
which can change thus maintaining this test is costly and provide no
extra coverage. Delete this test.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-14-matthew.brost@intel.com
2024-04-26 12:10:09 -07:00
Matthew Brost
c4f1870362 drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this
xe_gt_tlb_invalidation_range accepts a start and end address rather than
a VMA. This will enable multiple VMAs to be invalidated in a single
invalidation. Update the PT layer to use this new function.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-13-matthew.brost@intel.com
2024-04-26 12:10:07 -07:00
Matthew Brost
5aa5eea09a drm/xe: Move ufence add to vm_bind_ioctl_ops_fini
Rather than adding a ufence to a VMA in the bind function, add the
ufence to all VMAs in the IOCTL that require binds in
vm_bind_ioctl_ops_fini. This help withs the transition to job 1 per VM
bind IOCTL.

v2:
 - Rebase
v3:
 - Fix typo in commit (Oak)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-12-matthew.brost@intel.com
2024-04-26 12:10:06 -07:00
Matthew Brost
fda75ef80b drm/xe: Move ufence check to op_lock_and_prep
Rather than checking for an unsignaled ufence ay unbind time, check for
this during the op_lock_and_prep function. This helps with the
transition to job 1 per VM bind IOCTL.

v2:
 - Rebase
v3:
 - Fix typo in commit message (Oak)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-11-matthew.brost@intel.com
2024-04-26 12:10:05 -07:00
Matthew Brost
61e3270ef9 drm/xe: Add vm_bind_ioctl_ops_fini helper
Simplify VM bind code by signaling out-fences / destroying VMAs in a
single location. Will help with transition single job for many bind ops.

v2:
 - s/vm_bind_ioctl_ops_install_fences/vm_bind_ioctl_ops_fini (Oak)
 - Set last fence in vm_bind_ioctl_ops_fini (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-10-matthew.brost@intel.com
2024-04-26 12:10:04 -07:00
Matthew Brost
22cfdd2865 drm/xe: Add some members to xe_vma_ops
This will help with moving to single jobs for many bind operations.

v2:
 - Rebase

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-9-matthew.brost@intel.com
2024-04-26 12:10:03 -07:00
Matthew Brost
bf69918b71 drm/xe: Use xe_vma_ops to implement page fault rebinds
In effort to make multiple VMA binds operations atomic (1 job), all
device page tables updates will be implemented via a xe_vma_ops (atomic
unit) interface,

Add xe_vma_rebind function which is implemented using xe_vma_ops
interface. Use xe_vma_rebind in GPU page faults for rebinds rather than
directly called deprecated function in PT layer.

v3:
 - Update commit message (Oak)
v4:
 - Fix tile_mask argument (CI)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-8-matthew.brost@intel.com
2024-04-26 12:10:01 -07:00
Matthew Brost
4dbbe45794 drm/xe: Simplify VM bind IOCTL error handling and cleanup
Clean up everything in VM bind IOCTL in 1 path for both errors and
non-errors. Also move VM bind IOCTL cleanup from ops (also used by
non-IOCTL binds) to the VM bind IOCTL.

v2:
 - Break ops_execute on error (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-7-matthew.brost@intel.com
2024-04-26 12:10:00 -07:00
Matthew Brost
5f677a9b65 drm/xe: Use xe_vma_ops to implement xe_vm_rebind
All page tables updates are moving to a xe_vma_ops interface to
implement 1 job per VM bind IOCTL. Convert xe_vm_rebind to use a
xe_vma_ops based interface.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-6-matthew.brost@intel.com
2024-04-26 12:09:59 -07:00
Matthew Brost
701109f2e3 drm/xe: Add struct xe_vma_ops abstraction
Having a structure which encapsulates a list of VMA operations will help
enable 1 job for the entire list.

v2:
 - Rebase

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-5-matthew.brost@intel.com
2024-04-26 12:09:58 -07:00
Matthew Brost
0a34c12449 drm/xe: Move migrate to prefetch to op_lock_and_prep function
All non-binding operations in VM bind IOCTL should be in the lock and
prepare step rather than the execution step. Move prefetch to conform to
this pattern.

v2:
 - Rebase
 - New function names (Oak)
 - Update stale comment (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-4-matthew.brost@intel.com
2024-04-26 12:09:57 -07:00
Matthew Brost
75192758d6 drm/xe: Add ops_execute function which returns a fence
Add ops_execute function which returns a fence. This will be helpful to
initiate all binds (VM bind IOCTL, rebinds in exec IOCTL, rebinds in
preempt rebind worker, and rebinds in pagefaults) via a gpuva ops list.
Returning a fence is needed in various paths.

v2:
 - Rebase

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-3-matthew.brost@intel.com
2024-04-26 12:09:56 -07:00
Matthew Brost
77f2ef3f16 drm/xe: Lock all gpuva ops during VM bind IOCTL
Lock all BOs used in gpuva ops and validate all BOs in a single step
during the VM bind IOCTL.

This help with the transition to making all gpuva ops in a VM bind IOCTL
a single atomic job which is required for proper error handling.

v2:
 - Better commit message (Oak)
 - s/op_lock/op_lock_and_prep, few other renames too (Oak)
 - Use DRM_EXEC_IGNORE_DUPLICATES flag in drm_exec_init (local testing)
 - Do not reserve slots in locking step (direction based on series from Thomas)
v3:
 - Validate BO if is immediate set (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-2-matthew.brost@intel.com
2024-04-26 12:09:55 -07:00
Lucas De Marchi
6a2a90cba1 drm/xe/display: Fix ADL-N detection
Contrary to i915, in xe ADL-N is kept as a different platform, not a
subplatform of ADL-P. Since the display side doesn't need to
differentiate between P and N, i.e. IS_ALDERLAKE_P_N() is never called,
just fixup the compat header to check for both P and N.

Moving ADL-N to be a subplatform would be more complex as the firmware
loading in xe only handles platforms, not subplatforms, as going forward
the direction is to check on IP version rather than
platforms/subplatforms.

Fix warning when initializing display:

	xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
	------------[ cut here ]------------
	xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !((dev_priv)->info.platform == XE_ALDERLAKE_P))

And wrong paths being taken on the display side.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425181610.2704633-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-26 08:00:29 -07:00
Colin Ian King
445237d67a drm/xe: Fix spelling mistake "forcebly" -> "forcibly"
There is a spelling mistake in a drm_dbg message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240426094904.816033-1-colin.i.king@gmail.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-26 05:54:05 -07:00
Michal Wajdeczko
e77dff51ba drm/xe/pf: Initialize and update PF services on driver init
The xe_gt_sriov_pf_init_early() and xe_gt_sriov_pf_init_hw() are
ideal places to call per-GT PF service init and update functions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425143927.2265-2-michal.wajdeczko@intel.com
2024-04-26 11:44:03 +02:00
Michal Wajdeczko
d6c5bac8e3 drm/xe/pf: Re-initialize SR-IOV specific HW settings
On older platforms (12.00) the PF driver must explicitly unblock
VF's modifications to the GGTT. On newer platforms this capability
is enabled by default.

Bspec: 49908, 53204
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425143927.2265-1-michal.wajdeczko@intel.com
2024-04-26 11:44:00 +02:00
Himal Prasad Ghimiray
c832541ca8 drm/xe: Change xe_guc_submit_stop return to void
The function xe_guc_submit_stop consistently returns 0 without an error
state, prompting the caller to verify it, which is redundant.

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424041911.2184868-1-himal.prasad.ghimiray@intel.com
2024-04-25 20:38:49 -07:00
Himal Prasad Ghimiray
c79828e0c7 drm/xe: Use xe_bo_lock()/xe_bo_unlock() helpers
There is no change in functionality. Using the helper function
defined within the driver for locking/unlocking the reservation
object.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424043910.2190376-3-himal.prasad.ghimiray@intel.com
2024-04-25 20:38:35 -07:00
Himal Prasad Ghimiray
a1adb3d250 drm/xe/vm: Use xe_vm_lock()/xe_vm_unlock() helpers
There is no change in functionality. Using the helper function
defined within the driver.

-v2
Use xe_vm_unlock() (Ashutosh/Matt)

-v3
Use xe_vm_unlock() for error label too (Matt)

Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424043910.2190376-2-himal.prasad.ghimiray@intel.com
2024-04-25 20:38:34 -07:00
Matthew Brost
edc9f11af3 drm/xe: Replace engine references with exec queue in xe_guc_submit.c
Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-6-matthew.brost@intel.com
2024-04-25 18:41:29 -07:00
Matthew Brost
3713a383f5 drm/xe: Fix alignment in GuC exec queue state defines
Normalize the alignment for readability.

v3:
 - Fix typo in commit (Himal)
 - Fix EXEC_QUEUE_STATE_WEDGED too (Himal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-5-matthew.brost@intel.com
2024-04-25 18:41:28 -07:00
Matthew Brost
1a1563e324 drm/xe: s/ENGINE_STATE_KILLED/EXEC_QUEUE_STATE_KILLED
Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-4-matthew.brost@intel.com
2024-04-25 18:41:28 -07:00
Matthew Brost
03b3517630 drm/xe: s/ENGINE_STATE_SUSPENDED/EXEC_QUEUE_STATE_SUSPENDED
Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-3-matthew.brost@intel.com
2024-04-25 18:41:27 -07:00
Matthew Brost
f85ada84f6 drm/xe: s/ENGINE_STATE_ENABLED/EXEC_QUEUE_STATE_ENABLED
Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-2-matthew.brost@intel.com
2024-04-25 18:41:26 -07:00
Matthew Brost
3f371a98de drm/xe: Delete unused GuC submission_state.suspend
GuC submission_state.suspend is unused, delete it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425054747.1918811-1-matthew.brost@intel.com
2024-04-25 14:27:19 -07:00
Matthew Auld
3d44d67c44 drm/xe/vm: prevent UAF in rebind_work_func()
We flush the rebind worker during the vm close phase, however in places
like preempt_fence_work_func() we seem to queue the rebind worker
without first checking if the vm has already been closed.  The concern
here is the vm being closed with the worker flushed, but then being
rearmed later, which looks like potential uaf, since there is no actual
refcounting to track the queued worker. We can't take the vm->lock here
in preempt_rebind_work_func() to first check if the vm is closed since
that will deadlock, so instead flush the worker again when the vm
refcount reaches zero.

v2:
 - Grabbing vm->lock in the preempt worker creates a deadlock, so
   checking the closed state is tricky. Instead flush the worker when
   the refcount reaches zero. It should be impossible to queue the
   preempt worker without already holding vm ref.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1676
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1591
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1364
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1304
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1249
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-4-matthew.auld@intel.com
2024-04-25 16:52:34 +01:00
Matthew Auld
6e78e0719d Revert "drm/xe/vm: drop vm->destroy_work"
This reverts commit 5b259c0d1d.

Cleanup here is good, however we need to able to flush a worker during
vm destruction which might involve sleeping, so bring back the worker.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-3-matthew.auld@intel.com
2024-04-25 16:52:34 +01:00
Matthew Auld
3cd1585e57 drm/xe/preempt_fence: enlarge the fence critical section
It is really easy to introduce subtle deadlocks in
preempt_fence_work_func() since we operate on single global ordered-wq
for signalling our preempt fences behind the scenes, so even though we
signal a particular fence, everything in the callback should be in the
fence critical section, since blocking in the callback will prevent
other published fences from signalling. If we enlarge the fence critical
section to cover the entire callback, then lockdep should be able to
understand this better, and complain if we grab a sensitive lock like
vm->lock, which is also held when waiting on preempt fences.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418144630.299531-2-matthew.auld@intel.com
2024-04-25 16:52:34 +01:00
Michal Wajdeczko
7547a23cae drm/xe/guc: Fix typos in VF CFG KLVs descriptions
Apart from the obvious spelling typo, use the correct values for
infinity quantum/timeout settings (it's 0x0 instead of 0xFFFFFFFF).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140506.2133-1-michal.wajdeczko@intel.com
2024-04-25 17:27:48 +02:00
Michal Wajdeczko
4befb17e83 drm/xe/pf: Expose PF service details via debugfs
For debug purposes we might want to verify which registers values
PF is sharing with VFs and to view which VF/PF ABI versions were
negotiated by the VFs. Plug the 'print' functions already provided
by the PF service code into our debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424171030.2177-1-michal.wajdeczko@intel.com
2024-04-25 14:38:39 +02:00
Michal Wajdeczko
cbf7579304 drm/xe: Check result of drmm_mutex_init()
Although it's unlikely that drmm_mutex_init() will fail during
driver initialization, however we shouldn't ignore this case.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409153132.1111-1-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-24 15:37:06 -07:00
Tejas Upadhyay
b5ef80879d drm/xe/xe2: Add workaround 14021567978
Workaround 14021567978 applies to RenderCS xe2

V3:
  - Cover xe2_hpg as its landed upstream now
V2(MattR):
  - Move tuning to wa and apply to xe2

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410064640.1010098-1-tejas.upadhyay@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-24 15:34:01 -07:00
Michal Wajdeczko
ad4ca914de drm/xe/guc: Improve GuC doorbell/context ID manager intro message
We can use recently added str_plural() helper.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419153407.402-1-michal.wajdeczko@intel.com
2024-04-24 18:18:02 +02:00
Rodrigo Vivi
6b8ef44cc0 drm/xe: Introduce the wedged_mode debugfs
So, the wedged mode can be selected per device at runtime,
before the tests or before reproducing the issue.

v2: - s/busted/wedged
    - some locking consistency

v3: - remove mutex
    - toggle guc reset policy on any mode change

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-4-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-24 12:12:58 -04:00
Rodrigo Vivi
8ed9aaae39 drm/xe: Force wedged state and block GT reset upon any GPU hang
In many validation situations when debugging GPU Hangs,
it is useful to preserve the GT situation from the moment
that the timeout occurred.

This patch introduces a module parameter that could be used
on situations like this.

If xe.wedged module parameter is set to 2, Xe will be declared
wedged on every single execution timeout (a.k.a. GPU hang) right
after devcoredump snapshot capture and without attempting any
kind of GT reset and blocking entirely any kind of execution.

v2: Really block gt_reset from guc side. (Lucas)
    s/wedged/busted (Lucas)

v3: - s/busted/wedged
    - Really use global_flags (Dafna)
    - More robust timeout handling when wedging it.

v4: A really robust clean exit done by Matt Brost.
    No more kernel warns on unbind.

v5: Simplify error message (Lucas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Dafna Hirschfeld <dhirschfeld@habana.ai>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himanshu Somaiya <himanshu.somaiya@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-24 12:12:58 -04:00
Rodrigo Vivi
692818678e drm/xe: declare wedged upon GuC load failure
Let's block the device upon any GuC load failure.
But let's continue with the probe so guc logs can be read
from the debugfs.

v2: - s/wedged/busted
    - do not block probe or we lose guc_logs in debugfs (Matt)

v3: - s/busted/wedged

v4: Do not change __xe_guc_upload return. (Himal)

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-24 12:12:58 -04:00
Rodrigo Vivi
fb74b205cd drm/xe: Introduce a simple wedged state
Introduce a very simple 'wedged' state where any attempt
to access the GPU is entirely blocked.

On some critical cases, like on gt_reset failure, we need to
block any other attempt to use the GPU. Otherwise we are at
a risk of reaching cases that would force us to reboot the machine.

So, when this cases are identified we corner and block any GPU
access. No IOCTL and not even another GT reset should be attempted.

The 'wedged' state in Xe is an end state with no way back.
Only a device "re-probe" (unbind + bind) can restore the GPU access.

v2: - s/wedged/busted (Lucas)
    - use unbind+bind instead of module reload (Lucas)
    - added more info on unbind operations and instruction on bug report
    - only print the message once.

v3: - s/busted/wedged (Ashutosh, Tvrtko, Thomas)
    - don't assume user has sudo and tee available (Lucas)

v4: - remove unnecessary cases around ct communication or migration.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-24 12:12:58 -04:00
José Roberto de Souza
c8d4524ecc drm/xe: Add INSTDONE registers to devcoredump
This registers contains important information that can help with debug
of GPU hangs.

While at it also fixing the double line jump at the end of engine
registers for CCS engines.

v2:
- print other INSTDONE registers

v3:
- add for_each_geometry/compute_dss()

v4:
- print one slice_common_instdone per glice in DG2+

v5:
- rename registers prefix from DG2 to XEHPG (Zhanjun)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-3-jose.souza@intel.com
2024-04-24 09:06:39 -07:00
José Roberto de Souza
082a634f60 drm/xe: Add helpers to loop over geometry and compute DSS
Some DSS can only be available for geometry while others can only be
available for compute.
So here adding helpers to loop only available DSS for given usage.

User of this helper will come in the next patch.

v2:
- drop has_dss()

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-2-jose.souza@intel.com
2024-04-24 09:06:38 -07:00
José Roberto de Souza
f332625733 drm/xe: Store xe_hw_engine in xe_hw_engine_snapshot
A future patch will require gt and xe device structs, so here
replacing class by hwe.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-1-jose.souza@intel.com
2024-04-24 09:06:37 -07:00
Michal Wajdeczko
49f853c78e drm/xe/pf: Clamp maximum execution quantum to 100s
GuC is silently clamping values of the execution quantum and
preemption timeout KLVs to 100s. Perform explicit clamping on the
driver side as later there is no way to read back values used by
the firmware and we shouldn't mislead the user about actual values
being used when we print them in dmesg or debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-3-michal.wajdeczko@intel.com
2024-04-24 15:32:26 +02:00
Michal Wajdeczko
5a8c292f74 drm/xe/guc: Update VF configuration KLVs definitions
GuC firmware specification says that maximum value for the execution
quantum KLV is 100s and anything exceeding that will be clamped.
The same limitation applies to the preemption timeout KLV.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-2-michal.wajdeczko@intel.com
2024-04-24 15:32:24 +02:00
Michal Wajdeczko
2cab6319b4 drm/xe/pf: Expose SR-IOV policy settings over debugfs
We already have functions to configure SR-IOV policies.
Allow to tweak those policy settings over debugfs.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-4-michal.wajdeczko@intel.com
2024-04-24 15:18:41 +02:00
Michal Wajdeczko
b00240b6a2 drm/xe/pf: Expose SR-IOV VF control commands over debugfs
We already have functions to control the VF.
Allow to control the VF using debugfs.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-3-michal.wajdeczko@intel.com
2024-04-24 15:18:40 +02:00
Michal Wajdeczko
e42a51fb9c drm/xe/pf: Expose SR-IOV VFs configuration over debugfs
We already have functions to configure VF resources and to print
actual provisioning details. Expose this functionality in debugfs
to allow experiment with different settings or inspect details in
case of unexpected issues with the provisioning.

As debugfs attributes are per-VF, we use parent d_inode->i_private
to store VFID, similarly how we did for per-GT attributes.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-2-michal.wajdeczko@intel.com
2024-04-24 15:18:38 +02:00
Michal Wajdeczko
11294bf38f drm/xe/kunit: Add PF service tests
Start with basic tests for VF/PF ABI version negotiation. As we
treat all platforms in the same way, we can run the tests on one
platform. More tests will likely come later.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-6-michal.wajdeczko@intel.com
2024-04-24 15:10:46 +02:00