Commit Graph

1326717 Commits

Author SHA1 Message Date
Lucas De Marchi
8e1ddfada4 drivers: base: devres: Allow to release group on device release
When releasing a device, if the release action causes a group to be
released, a warning is emitted because it can't find the group. This
happens because devres_release_all() moves the entire list to a todo
list and also move the group markers. Considering r* normal resource
nodes and g1 a group resource node:

		    g1 -----------.
		    v		  v
	r1 -> r2 -> g1[0] -> r3-> g[1] -> r4

After devres_release_all(), dev->devres_head becomes empty and the todo
list it iterates on becomes:

			       g1
			       v
	r1 -> r2 -> r3-> r4 -> g1[0]

When a call to component_del() is made and takes down the aggregate
device, a warning like this happen:

	RIP: 0010:devres_release_group+0x362/0x530
	...
	Call Trace:
	 <TASK>
	 component_unbind+0x156/0x380
	 component_unbind_all+0x1d0/0x270
	 mei_component_master_unbind+0x28/0x80 [mei_hdcp]
	 take_down_aggregate_device+0xc1/0x160
	 component_del+0x1c6/0x3e0
	 intel_hdcp_component_fini+0xf1/0x170 [xe]
	 xe_display_fini+0x1e/0x40 [xe]

Because the devres group corresponding to the hdcp component cannot be
found. Just ignore this corner case: if the dev->devres_head is empty
and the caller is trying to remove a group, it's likely in the process
of device cleanup so just ignore it instead of warning.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-25 14:29:05 -08:00
Umesh Nerlige Ramappa
30341f0b8e drm/xe/oa: Allow oa_exponent value of 0
OA exponent value of 0 is a valid value for periodic reports. Allow user
to pass 0 for the OA sampling interval since it gets converted to 2 gt
clock ticks.

v2: Update the check in xe_oa_stream_init as well (Ashutosh)
v3: Fix mi-rpc failure by setting default exponent to -1 (CI)
v4: Add the Fixes tag

Fixes: b6fd51c621 ("drm/xe/oa/uapi: Define and parse OA stream properties")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221213352.1712932-1-umesh.nerlige.ramappa@intel.com
2025-02-24 13:44:34 -08:00
Shuicheng Lin
046eda6525 drm/xe/devcoredump: Remove IS_ERR_OR_NULL check for kzalloc
kzalloc returns a valid pointer or NULL if the allocation fails.
It never returns an error pointer. It is better to check for NULL directly.

Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250220001710.1803749-3-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 13:31:35 -08:00
Shuicheng Lin
c504ad914f drm/xe/devcoredump: Fix print typo of offset
The log should print with "offset" instead of "size".
Correct the typo in the comment.

v2: split kzalloc change and add typo fix in commit message (Lucas)

Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250220001710.1803749-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 13:31:12 -08:00
Riana Tauro
c7f2b8bfca drm/xe/xe_pmu: Acquire forcewake on event init for engine events
When the engine events are created, acquire GT forcewake to read gpm
timestamp required for the events and release on event destroy. This
cannot be done during read due to the raw spinlock held my pmu.

v2: remove forcewake counting (Umesh)
v3: remove extra space (Umesh)
v4: use event pmu private data (Lucas)
    free local copy (Umesh)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-6-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 13:24:32 -08:00
Riana Tauro
6978c5f5a6 drm/xe/xe_pmu: Add PMU support for engine activity
PMU provides two counters (engine-active-ticks, engine-total-ticks)
to calculate engine activity. When querying engine activity,
user must group these 2 counters using the perf_event
group mechanism to ensure both counters are sampled together.

To list the events

	./perf list
	  xe_0000_03_00.0/engine-active-ticks/	[Kernel PMU event]
	  xe_0000_03_00.0/engine-total-ticks/	[Kernel PMU event]

The formats to be used with the above are

	engine_instance	- config:12-19
	engine_class	- config:20-27
	gt		- config:60-63

The events can then be read using perf tool

./perf stat -e xe_0000_03_00.0/engine-active-ticks,gt=0,
			       engine_class=0,engine_instance=0/,
	       xe_0000_03_00.0/engine-total-ticks,gt=0,
			       engine_class=0,engine_instance=0/ -I 1000

Engine activity can then be calculated as below
engine activity % = (engine active ticks/engine total ticks) * 100

v2: validate gt
    rename total-ticks to engine-total-ticks
    add helper to get hwe (Umesh)

v3: fix checkpatch warning
    add details to documentation (Umesh)
    remove ascii formats from documentation (Lucas)

v4: remove unnecessary warn within raw_spinlock (Lucas)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-5-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 13:23:57 -08:00
Riana Tauro
0e6ffdb2b7 drm/xe/guc: Expose engine activity only for supported GuC version
Engine activity is supported only on GuC submission version >= 1.14.1
Allow enabling/reading engine activity only on supported
GuC versions. Warn once if not supported.

v2: use guc interface version (John)
v3: use debug log (Umesh)
v4: use variable for supported and use gt logs
    use a friendlier log message (Michal)
v5: fix kernel-doc
    do not continue in init if not supported (Michal)
v6: remove hardcoding values (Michal)

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-4-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 12:32:10 -08:00
Riana Tauro
9e19f42955 drm/xe/trace: Add trace for engine activity
Add engine activity related information to trace events for
better debuggability

v2: add trace for engine activity (Umesh)
v3: use hex for quanta_ratio

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-3-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 12:32:09 -08:00
Riana Tauro
b729ea271e drm/xe: Add engine activity support
GuC provides support to read engine counters to calculate the
engine activity. KMD exposes two counters via the PMU interface to
calculate engine activity

Engine Active Ticks(engine-active-ticks) - active ticks of engine
Engine Total Ticks (engine-total-ticks) - total ticks of engine

Engine activity percentage can be calculated as below
Engine activity % = (engine active ticks/engine total ticks) * 100.

v2: fix cosmetic review comments
    add forcewake for gpm_ts (Umesh)

v3: fix CI hooks error
    change function parameters and unpin bo on error
    of allocate_activity_buffers
    fix kernel-doc (Umesh)
    use engine activity (Umesh, Lucas)
    rename xe_engine_activity to xe_guc_engine_*
    fix commit message to use engine activity (Lucas, Umesh)

v4: add forcewake in PMU layer

v5: fix makefile
    use drmm_kcalloc instead of kmalloc_array
    remove managed bo
    skip init for VF
    fix cosmetic review comments (Michal)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-2-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 12:32:09 -08:00
Matthew Auld
8b4b3af869 drm/xe/userptr: remove tmp_evict list
Doesn't look to be used.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-6-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 13:03:03 -06:00
Matthew Auld
6b93cb9891 drm/xe/userptr: fix EFAULT handling
Currently we treat EFAULT from hmm_range_fault() as a non-fatal error
when called from xe_vm_userptr_pin() with the idea that we want to avoid
killing the entire vm and chucking an error, under the assumption that
the user just did an unmap or something, and has no intention of
actually touching that memory from the GPU.  At this point we have
already zapped the PTEs so any access should generate a page fault, and
if the pin fails there also it will then become fatal.

However it looks like it's possible for the userptr vma to still be on
the rebind list in preempt_rebind_work_func(), if we had to retry the
pin again due to something happening in the caller before we did the
rebind step, but in the meantime needing to re-validate the userptr and
this time hitting the EFAULT.

This explains an internal user report of hitting:

[  191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[  191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]
[  191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[  191.738690] Call Trace:
[  191.738692]  <TASK>
[  191.738694]  ? show_regs+0x69/0x80
[  191.738698]  ? __warn+0x93/0x1a0
[  191.738703]  ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[  191.738759]  ? report_bug+0x18f/0x1a0
[  191.738764]  ? handle_bug+0x63/0xa0
[  191.738767]  ? exc_invalid_op+0x19/0x70
[  191.738770]  ? asm_exc_invalid_op+0x1b/0x20
[  191.738777]  ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[  191.738834]  ? ret_from_fork_asm+0x1a/0x30
[  191.738849]  bind_op_prepare+0x105/0x7b0 [xe]
[  191.738906]  ? dma_resv_reserve_fences+0x301/0x380
[  191.738912]  xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]
[  191.738966]  ? kmemleak_alloc+0x4b/0x80
[  191.738973]  ops_execute+0x188/0x9d0 [xe]
[  191.739036]  xe_vm_rebind+0x4ce/0x5a0 [xe]
[  191.739098]  ? trace_hardirqs_on+0x4d/0x60
[  191.739112]  preempt_rebind_work_func+0x76f/0xd00 [xe]

Followed by NPD, when running some workload, since the sg was never
actually populated but the vma is still marked for rebind when it should
be skipped for this special EFAULT case. This is confirmed to fix the
user report.

v2 (MattB):
 - Move earlier.
v3 (MattB):
 - Update the commit message to make it clear that this indeed fixes the
   issue.

Fixes: 521db22a1d ("drm/xe: Invalidate userptr VMA on page pin fault")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-5-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 13:02:54 -06:00
Matthew Auld
4e37e92892 drm/xe/userptr: restore invalidation list on error
On error restore anything still on the pin_list back to the invalidation
list on error. For the actual pin, so long as the vma is tracked on
either list it should get picked up on the next pin, however it looks
possible for the vma to get nuked but still be present on this per vm
pin_list leading to corruption. An alternative might be then to instead
just remove the link when destroying the vma.

v2:
 - Also add some asserts.
 - Keep the overzealous locking so that we are consistent with the docs;
   updating the docs and related bits will be done as a follow up.

Fixes: ed2bdf3b26 ("drm/xe/vm: Subclass userptr vmas")
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-4-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-24 13:02:33 -06:00
Tejas Upadhyay
b7b68c6e36 drm/xe/wa: Limit char per line to 100
Above 100 char per line checkpatch would complain. Fixing it.

Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221115344.389975-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-02-24 12:10:10 +05:30
Umesh Nerlige Ramappa
98c9d27ab3 drm/xe/oa: Ensure that polled read returns latest data
In polled mode, user calls poll() for read data to be available before
performing a read(). In the duration between these 2 calls, there may be
new data available in the OA buffer. To ensure user reads all available
data, check for latest data in the OA buffer in polled read.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250212010255.1423343-1-umesh.nerlige.ramappa@intel.com
2025-02-21 11:45:29 -08:00
Priyanka Dandamudi
70c7273778 drm/xe: Add fault injection for xe_sync_entry_parse
Add fault injection for xe_sync_entry_parse to allow it to fail while
executing xe_vm_bind_ioctl(). This needs to be added as it cannot be
reached by injecting error through IOCTL arguments.

Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250212093212.3069356-1-priyanka.dandamudi@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-02-21 13:02:54 +05:30
Marcin Bernatowicz
94030a1d32 drm/xe/client: Skip show_run_ticks if unable to read timestamp
RING_TIMESTAMP registers are inaccessible in VF mode.
Without drm-total-cycles-*, other keys provide little value.
Skip all optional "run_ticks" keys in this case.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250205191644.2550879-3-marcin.bernatowicz@linux.intel.com
2025-02-20 13:49:08 +01:00
Marcin Bernatowicz
5a9f8db2db drm/xe/vf: Return EOPNOTSUPP for DRM_XE_DEVICE_QUERY_ENGINE_CYCLES if VF
RING_TIMESTAMP registers are not available for VF (Virtual Function)
drivers. Return -EOPNOTSUPP when the DRM_XE_DEVICE_QUERY_ENGINE_CYCLES
ioctl is invoked on a VF device.

Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250205191644.2550879-2-marcin.bernatowicz@linux.intel.com
2025-02-20 13:49:06 +01:00
Matt Roper
a1e5b6d83e drm/xe: Drop unnecessary GT lookup in xe_exec_queue_create_ioctl()
xe_exec_queue_create_ioctl() performs a lookup of the xe_gt for the GT
ID passed from userspace, but the result is never actually used.  Since
there's already a separate (and earlier) check that the ID passed from
userspace is valid, the unnecessary lookup can be removed.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218200511.4050060-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-02-19 07:14:59 -08:00
Rodrigo Vivi
f2cd50990d drm/xe/display: Spin-off xe_display runtime/d3cold sequences
No functional change. This patch only splits the xe_display_pm
suspend/resume functions in the regular suspend/resume from the
runtime/d3cold ones.

v2: - Rename d3cold functions (Jonathan)
    - Rebase

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218010330.761340-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-18 18:03:47 -05:00
Rodrigo Vivi
ceb33b9de1 drm/{i915, xe}/display: Move dsm registration under intel_driver
Move dsm register/unregister calls from the drivers to under
intel_display_driver register/unregister.

v2: Rebase only

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250217200133.741758-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-18 18:03:18 -05:00
Ilia Levi
eb79d71e50 drm/xe: Add xe_mmio_init() initialization function
Add a convenience function for minimal initialization of struct xe_mmio.
This function also validates that the entirety of the provided mmio region
is usable with struct xe_reg.

v2: Modify commit message, add kernel doc, refactor assert (Michal)
v3: Fix off-by-one bug, add clarifying macro (Michal)
v4: Derive bitfield width from size (Michal)

Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213093559.204652-1-ilia.levi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-18 08:27:11 -08:00
Ilia Levi
5bee1e2de3 drm/xe: s/xe_mmio_init/xe_mmio_probe_early
Rename so that xe_mmio_init() can be used in subsequent patches to
initialize an instance of struct xe_mmio.

Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130105057.136586-1-ilia.levi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-18 08:26:41 -08:00
Maarten Lankhorst
339adeb104 drm/xe/display: Clarify XE_IOCTL_DBG message
This should make it easier to understand from userspace why importing BO
fails.

Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250117115305.53113-1-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-02-17 09:32:12 +01:00
Tejas Upadhyay
b5fa0913b5 drm/xe: Fix typo in xe_job_ptrs
%s/uinitialized/uninitialized/gc

Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213060838.32493-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-02-17 10:52:01 +05:30
Michal Wajdeczko
611160b02a drm/xe/pf: Release all VFs configs on device removal
If we try to manually provision VFs using debugfs and then we
try to unload the driver, we will see complains like:

 [ ] Memory manager not clean during takedown.
 [ ] RIP: 0010:drm_mm_takedown+0x3f/0x100
 [ ] [drm:drm_mm_takedown] *ERROR* node [fedff000 + 00001000]: inserted at
      drm_mm_insert_node_in_range+0x2bd/0x520
      xe_ggtt_node_insert+0x52/0x90 [xe]
      pf_provision_vf_ggtt+0x1fa/0xac0 [xe]
      xe_gt_sriov_pf_config_set_ggtt+0x79/0x7a0 [xe]
      ggtt_set+0x53/0x80 [xe]
      simple_attr_write_xsigned.isra.0+0xd2/0x150
      simple_attr_write+0x14/0x30
      debugfs_attr_write+0x4e/0x80

 [ ] xe 0000:00:02.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
 [ ] xe 0000:00:02.0: [drm] GT0:      total 65535
 [ ] xe 0000:00:02.0: [drm] GT0:      used 1
 [ ] xe 0000:00:02.0: [drm] GT0:      range 65534..65534 (1)

 [ ] xe 0000:00:02.0: [drm] *ERROR* GT0: GuC doorbells manager unclean (1/256)
 [ ] xe 0000:00:02.0: [drm] GT0:      count: 256
 [ ] xe 0000:00:02.0: [drm] GT0:      available range: 1..255 (255)
 [ ] xe 0000:00:02.0: [drm] GT0:      available total: 255
 [ ] xe 0000:00:02.0: [drm] GT0:      reserved range: 0..0 (1)
 [ ] xe 0000:00:02.0: [drm] GT0:      reserved total: 1

This could be easily fixed by adding config release action.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250211155034.1028-1-michal.wajdeczko@intel.com
2025-02-16 21:36:38 +01:00
Lucas De Marchi
62fbc75b28 drm/xe/hwmon: Stop ignoring errors on probe
Not registering hwmon because it's not available (SRIOV_VF and DGFX) is
different from failing the initialization. Handle the errors
appropriately.

Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-13-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
6b5506158f drm/xe/pmu: Fail probe if xe_pmu_register() fails
Now that previous callers in xe_device_probe() are handling the errors,
that can be done for xe_pmu_register() as well.

Cc: Riana Tauro <riana.tauro@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-12-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
960d71044e drm/xe/oa: Handle errors in xe_oa_register()
Let xe_oa_unregister() be handled by devm infra since it's only putting
the kobject. Also, since kobject_create_and_add may fail, handle the
error accordingly.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-11-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
00f6a86c3c drm/xe: Move drm_dev_unplug() out of display function
This is not really display-related and needed for any sequence on driver
removal that has to interact with drm_dev_enter()/drm_dev_exit().
Just remove xe_device_remove_display() and inline it in the single
caller to make clear this is not done only for display.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-10-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
d3f557d52e drm/xe/oa: Move fini to xe_oa
Like done with other functions, cleanup the error handling in
xe_device_probe() by moving the OA fini to be handled by xe_oa
itself, which relies on devm to call the cleanup function.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
f5ebe80e32 drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish()
Now that xe_gt_remove is handled entirely by xe_gt, it's clear there are
some extra calls to xe_hw_fence_irq_finish() that aren't necessary.
Neither all_fw_domain_init() or gt_fw_domain_init() need to do that
since it's handled by the caller on any error.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
ff6cd29b69 drm/xe: Cleanup unwind of gt initialization
The only thing in xe_gt_remove() that really needs to happen on the
device remove callback is the xe_uc_remove(). That's because of the
following call chain:

	xe_gt_remove()
	  xe_uc_remove()
	    xe_gsc_remove()
	      xe_gsc_proxy_remove()

Move xe_gsc_proxy_remove() to be handled as a xe_device_remove_action,
so it's recorded when it should run during device removal. The rest can
be handled normally by devm infra.

Besides removing the deep call chain above, xe_device_probe() doesn't
have to unwind the gt loop and it's also more in line with the
xe_device_probe() style.

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
c0aeb90b28 drm/xe: Remove leftover pxp comment
Not being able to initialize pxp is fatal if the platform is expected to
have it. Update comment after commit 9c9dc9ba4a ("drm/xe/pxp: Fail the
load if PXP fails to initialize").

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:55 -08:00
Lucas De Marchi
ff57025c35 drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init()
Make sure to differentiate normal behavior, e.g. there's no stolen, from
allocation errors or failure to initialize lower layers.

Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:54 -08:00
Lucas De Marchi
0bcf41171c drm/xe: Fix xe_tile_init_noalloc() error propagation
Propagate the error to the caller so initialization properly stops if
sysfs creation fails.

Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:54 -08:00
Lucas De Marchi
121b214cdf drm/xe: Fix error handling in xe_irq_install()
When devm_add_action_or_reset() fails, it already calls the function
passed as parameter and that function is already free'ing the irqs.
Drop the goto and just return.

The caller, xe_device_probe(), should also do the same thing instead of
wrongly doing `goto err` and calling the unrelated xe_display_fini()
function.

Fixes: 14d25d8d68 ("drm/xe: change old msi irq api to a new one")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:54 -08:00
Lucas De Marchi
8b3f09fb44 drm/xe: Fix xe_display_fini() calls
xe_display_fini() undoes things from xe_display_init() (technically from
intel_display_driver_probe()). Those `goto err` in xe_device_probe()
were wrong and being accumulated over time.

Commit 65e366ace5 ("drm/xe/display: Use a single early init call for
display") made it easier to fix now that we don't have xe_display_* init
calls spread on xe_device_probe(). Change xe_display_init() to use
devm_add_action_or_reset() that will finalize display in the right
order.

While at it, also add a newline and comment about calling
xe_driver_flr_fini.

Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:42:54 -08:00
Lucas De Marchi
776e3b502b drm/xe: Add callback support for driver remove
xe device probe uses devm cleanup in most places. However there are a
few cases where this is not possible: when the driver interacts with
component add/del. In that case, the resource group would be cleanup
while the entire device resources are in the process of cleanup.  One
example is the xe_gsc_proxy and display using that to interact with mei
and audio.

Add a callback-based remove so the exception doesn't make the probe
use multiple error handling styles.

v2: Change internal API to mimic the devm API. This will make it easier
    to migrate in future when devm can be used.

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14 11:39:29 -08:00
Xin Wang
6884d20510 drm/xe/debugfs: fixed the return value of wedged_mode_set
It is generally expected that the write() function should return a
positive value indicating the number of bytes written or a negative
error code if an error occurs. Returning 0 is unusual and can lead
to unexpected behavior.

When the user program writes the same value to wedged_mode twice in
a row, a lockup will occur, because the value expected to be
returned by the write() function inside the program should be equal
to the actual written value instead of 0.

To reproduce the issue:
echo 1 > /sys/kernel/debug/dri/0/wedged_mode
echo 1 > /sys/kernel/debug/dri/0/wedged_mode   <- lockup here

Signed-off-by: Xin Wang <x.wang@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213223615.2327367-1-x.wang@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-14 10:08:16 -05:00
Shuicheng Lin
b31e668d31 drm/xe/debugfs: Add missing xe_pm_runtime_put in wedge_mode_set
xe_pm_runtime_put is missed in the failure path.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213230322.1180621-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-14 10:07:36 -05:00
Rodrigo Vivi
1ed591582b drm/xe/display: Remove hpd cancel work sync from runtime pm path
This function will synchronously cancel and wait for many display
work queue items, which might try to take the runtime pm reference
causing a bad deadlock. So, remove it from the runtime_pm suspend patch.

Reported-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250212192447.402715-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-13 17:45:43 -05:00
Rodrigo Vivi
b7446752e5 drm/xe/display: Add missing watermark ipc update at runtime resume
Continuing the alignment with i915 runtime pm sequence. Add
this missing call.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131115014.29625-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-12 15:03:07 -05:00
Lucas De Marchi
1d3ae92191 drm/xe/debugfs: Add node to dump guc log to dmesg
Currently xe_guc_log_print_dmesg() is unused, as it's expected
developers to add those calls when needed. However it makes it hard to
guarantee it's working as nothing is testing it. Add a node in debugfs
so it can be tested. This is purely for testing purposes since with the
device probed and working, the guc log can be obtained by the regular
debugfs file.

Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131171716.3998432-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-11 19:41:30 -08:00
Daniele Ceraolo Spurio
768fec5ff7 drm/xe/pxp: Don't use 0 to indicate NULL
Explicitly using NULL is the correct approach.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502050322.VUBMyUHc-lkp@intel.com/
Fixes: dcdd6b84d9 ("drm/xe/pxp: Allocate PXP execution resources")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204200144.1445800-1-daniele.ceraolospurio@intel.com
2025-02-11 13:17:48 -08:00
Nirmoy Das
2c7f45cc7e drm/xe: Carve out wopcm portion from the stolen memory
The top of stolen memory is WOPCM, which shouldn't be accessed. Remove
this portion from the stolen memory region for discrete platforms.
This was already done for integrated, but was missing for discrete
platforms.

This also moves get_wopcm_size() so detect_bar2_dgfx() and
detect_bar2_integrated can use the same function.

v2: Improve commit message and suitable stable version tag(Lucas)

Fixes: d8b52a02cb ("drm/xe: Implement stolen memory.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250210143654.2076747-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2025-02-11 10:34:11 +01:00
Tejas Upadhyay
f74fd53ba3 drm/xe/client: bo->client does not need bos_lock
bos_lock is to protect list of bos used by client, it is
not required to protect bo->client so bring it outside of
bos_lock.

Fixes: b27970f3e1 ("drm/xe: Add tracking support for bos per client")
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250205051042.1991192-1-tejas.upadhyay@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2025-02-10 14:06:07 +01:00
Piotr Piórkowski
71163271dc drm/xe: Move VRAM manager to struct xe_vram_region
VRAM manager is related directly to struct xe_vram_region so it
should be inside this structure.
Let's move the VRAM to struct xe_vram_region.

v2:
 - remove xe_vram_region pointer from xe_ttm_vram_mgr
 - stop use dynamic alloaction for xe_ttm_vram_mgr in xe_vram_region
 - rename struct xe_ttm_vram_mgr vram_mgr to ttm
v3:
 - fix "'ttm' not described in 'xe_vram_region'"

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250210081511.906452-3-piotr.piorkowski@intel.com
2025-02-10 13:08:59 +01:00
Piotr Piórkowski
fc3a50c12e drm/xe: Rename struct xe_mem_region to struct xe_vram_region
The xe_mem_region structure has so far been used only in the context
of VRAM regions. Also, the description of its fields clearly indicates
that it was designed for VRAM regions. This struct is strictly related
only to VRAM.
So let's be clear on this point and rename it to xe_vram_region.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250210081511.906452-2-piotr.piorkowski@intel.com
2025-02-10 13:08:57 +01:00
Piotr Piórkowski
cbc0a0ee34 drm/xe/pf: Use an explicit check to see if the device has LMTT
So far, the main condition for using LMTT has been to check that
the device is a discrete gfx.
Let's add a dedicated function to check if the device supports LMTT
as not all future discrete GPU platforms will require LMTT.

v2:
 - use xe_has_device_lmtt only when necessary - leave IS_DGFX for other
   things related to LMEM provisioning
v3:
 - remove IS_SRIOV_PF condition from xe_device_has_lmtt (Michal
   Wajdeczko)
 - keep IS_SRIOV_PF asserts in LMTT-related code (Michal Wajdeczko)
v4:
 - update commit description

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250207113111.853821-2-piotr.piorkowski@intel.com
2025-02-10 11:10:14 +01:00
Michal Wajdeczko
6bb05b3631 drm/xe: Enable SR-IOV for PTL
We should now have sufficient changes in the driver to run it on
PTL platforms in the SR-IOV Physical Function (PF) mode, that would
allow us to enable SR-IOV Virtual Functions (VFs), and successfully
probe our driver in the VF mode on enabled VF devices.

To unblock SR-IOV PF mode you need to load xe with modparam:

 xe.max_vfs=7

Then to enable VFs it is sufficient to use:

 $ echo 7 > /sys/bus/pci/devices/0000:00:02.0/sriov_numvfs

Note that in default auto-provisioning all VFs are allocated with
some amount of shared resources (like unlimited GPU execution and
preemption times, fair GGTT space, fair GuC context IDs range, ...)

However with CONFIG_DEBUG_FS enabled it is possible to tweak most
of the SR-IOV configuration parameters using attributes like:

 /sys/kernel/debug/dri/0000:00:02.0/gt0/
 ├── pf
 │   ├── contexts_spare
 │   ├── doorbells_spare
 │   ├── exec_quantum_ms
 │   ├── ggtt_spare
 │   ├── preempt_timeout_us
 │   ├── sched_priority
 │   └── ...
 ├── vf1
 │   ├── contexts_quota
 │   ├── doorbells_quota
 │   ├── exec_quantum_ms
 │   ├── ggtt_quota
 │   ├── preempt_timeout_us
 │   ├── sched_priority
 │   └── ...
 ├── vf2
 │   └── ...
 :

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250206214545.940-1-michal.wajdeczko@intel.com
2025-02-07 18:32:12 +01:00