linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-03 20:34:23 -04:00

Author	SHA1	Message	Date
Aradhya Bhatia	eef3ede533	drm/xe/oa: Refactor WAs to use XE_WA() macro Refactor Wa_18013179988, Wa_14015568240, Wa_1508761755, and Wa_1509372804, to use the proper workaround-check implementation for out-of-band workarounds, XE_WA(), and drop the use of the platform based WA selection. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220094645.358647-3-aradhya.bhatia@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-26 07:50:11 -08:00
Aradhya Bhatia	8c5fe7d88b	drm/xe: Add Wa_16021333562 and Wa_14016712196 Wa_16021333562 and Wa_14016712196 are permanent workarounds that apply to multiple platforms. Wa_16021333562 applies to platforms ranging from TGL (12.00) to Xe_LPM (13.00), while Wa_14016712196 from DG2 (12.55) to Xe_LPG (12.74). Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220094645.358647-2-aradhya.bhatia@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-26 07:50:11 -08:00
Francois Dugast	278d4f4291	drm/xe/gt_pagefault: Change vma_pagefault unit to kilobyte Increase the amount of bytes that can be counted before the counter overflows, while not losing information as the VMA is not expected to have sub-kilobyte size. Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250225195902.1247100-3-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>	2025-02-26 11:01:00 +01:00
Francois Dugast	4f109b061c	drm/xe/gt_stats: Use atomic64_t for counters The stats counters are now used for things like counting the VMA bytes during page faults. During workload execution, the counter value can grow fast and easily reach the atomic int limit, in which case it overflows. To make this less likely to happen, push the limit by switching to 64b atomic to store the counter value. Overhead is very small as there are only 3 stat entries per GT as of now, and stats are only enabled with CONFIG_DEBUG_FS. Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250225195902.1247100-2-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>	2025-02-26 11:01:00 +01:00
Tejas Upadhyay	18fbd567e7	drm/xe: cancel pending job timer before freeing scheduler The async call to __guc_exec_queue_fini_async frees the scheduler while a submission may time out and restart. To prevent this race condition, the pending job timer should be canceled before freeing the scheduler. V3(MattB): - Adjust position of cancel pending job - Remove gitlab issue# from commit message V2(MattB): - Cancel pending jobs before scheduler finish Fixes: `a20c75dba1` ("drm/xe: Call __guc_exec_queue_fini_async direct for KERNEL exec_queues") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250225045754.600905-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-02-26 10:59:17 +05:30
Mingcong Bai	6b68c4542f	drm/xe/regs: remove a duplicate definition for RING_CTL_SIZE(size) Commit `b79e8fd954` ("drm/xe: Remove dependency on intel_engine_regs.h") introduced an internal set of engine registers, however, as part of this change, it has also introduced two duplicate `define' lines for `RING_CTL_SIZE(size)'. This commit was introduced to the tree in v6.8-rc1. While this is harmless as the definitions did not change, so no compiler warning was observed. Drop this line anyway for the sake of correctness. Cc: stable@vger.kernel.org # v6.8-rc1+ Fixes: `b79e8fd954` ("drm/xe: Remove dependency on intel_engine_regs.h") Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250225073104.865230-1-jeffbai@aosc.io Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-02-25 21:17:38 -05:00
Lucas De Marchi	35359c3635	drm/xe: Stop ignoring errors from xe_ttm_sys_mgr_init() xe_ttm_sys_mgr_init() already cleans up after itself, just return error if that failed. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-12-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:32:03 -08:00
Lucas De Marchi	1671c9617d	drm/xe: Rename update_device_info() after sriov This is only changing info flags for SR-IOV reasons. Rename it accordingly, because there are several other places in probe where the flags are updated, which is not inside this function. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-11-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:32:03 -08:00
Lucas De Marchi	292b1a8a50	drm/xe: Stop ignoring errors from xe_heci_gsc_init() Do not ignore errors from xe_heci_gsc_init(). For example, it shouldn't be fine to report successfully entering survivability mode when there's no communication with gsc working. The driver should also not be half-initialized in the normal case neither. Cc: Riana Tauro <riana.tauro@intel.com> Cc: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-10-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:32:03 -08:00
Lucas De Marchi	d40f275d96	drm/xe: Move survivability entirely to xe_pci There's an odd split between xe_pci.c and xe_device.c wrt xe_survivability: it's initialized by xe_device, but then finalized by xe_pci. Move it entirely to the outer layer, xe_pci, so it controls the flow entirely. This also allows to stop ignoring some of the errors. E.g.: if there's an -ENOMEM, it shouldn't continue as if it survivability had been enabled. One change worth mentioning is that if "wait for lmem" fails, it will also check the pcode status to decide if it should enter or not in survivability mode, which it was not doing before. The bit from pcode for that decision should remain the same after lmem failed initialization, so it should be fine. Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-9-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:32:03 -08:00
Lucas De Marchi	d41d048043	drm/xe/display: Drop xe_display_driver_remove() Handle it as part of xe_display_fini(). The error handling was already calling it if a step after xe_display_init() failed. Just re-use the same xe_display_fini() for driver remove. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-8-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:32:02 -08:00
Lucas De Marchi	d01bdc0025	drm/xe: Drop remove callback support Now that devres supports component driver cleanup during driver removal cleanup, the xe custom support for removal callbacks is not needed anymore. Drop it. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:29:06 -08:00
Lucas De Marchi	01b1ace3b4	drm/xe: Switch from xe to devm actions Now that component drivers are compatible with devm, switch to using it instead of our own. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:29:06 -08:00
Lucas De Marchi	83e3d08767	drm/xe: Stop setting drvdata to NULL PCI subsystem is not supposed to call the remove() function when probe fails and doesn't need a protection for that. The only places checking for NULL drvdata, is on 2 sysfs files and they shouldn't be needed since the files are removed and reads on open fds just return an error. For this protection the core driver implementation in drivers/base/dd.c:device_unbind_cleanup() already sets it to NULL, after the release of dev resources. Remove the setting to NULL so it's possible to obtain the xe pointer from callbacks like the component unbind from device_unbind_cleanup(), i.e. after xe_pci_remove() already finished. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:29:06 -08:00
Lucas De Marchi	2babfdfe2e	drivers: base: component: Add debug message for unbind Like when binding component, add a debug message to the unbinding case to make it easy to track the lifecycle. This also includes the component pointer since that is used to open a group in devres, making it easier to track the resources. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:29:05 -08:00
Lucas De Marchi	96d01ef3b1	drivers: base: devres: Fix find_group() documentation It returns the last open group, not the last group. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:29:05 -08:00
Lucas De Marchi	8e1ddfada4	drivers: base: devres: Allow to release group on device release When releasing a device, if the release action causes a group to be released, a warning is emitted because it can't find the group. This happens because devres_release_all() moves the entire list to a todo list and also move the group markers. Considering r* normal resource nodes and g1 a group resource node: g1 -----------. v v r1 -> r2 -> g1[0] -> r3-> g[1] -> r4 After devres_release_all(), dev->devres_head becomes empty and the todo list it iterates on becomes: g1 v r1 -> r2 -> r3-> r4 -> g1[0] When a call to component_del() is made and takes down the aggregate device, a warning like this happen: RIP: 0010:devres_release_group+0x362/0x530 ... Call Trace: <TASK> component_unbind+0x156/0x380 component_unbind_all+0x1d0/0x270 mei_component_master_unbind+0x28/0x80 [mei_hdcp] take_down_aggregate_device+0xc1/0x160 component_del+0x1c6/0x3e0 intel_hdcp_component_fini+0xf1/0x170 [xe] xe_display_fini+0x1e/0x40 [xe] Because the devres group corresponding to the hdcp component cannot be found. Just ignore this corner case: if the dev->devres_head is empty and the caller is trying to remove a group, it's likely in the process of device cleanup so just ignore it instead of warning. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-25 14:29:05 -08:00
Umesh Nerlige Ramappa	30341f0b8e	drm/xe/oa: Allow oa_exponent value of 0 OA exponent value of 0 is a valid value for periodic reports. Allow user to pass 0 for the OA sampling interval since it gets converted to 2 gt clock ticks. v2: Update the check in xe_oa_stream_init as well (Ashutosh) v3: Fix mi-rpc failure by setting default exponent to -1 (CI) v4: Add the Fixes tag Fixes: `b6fd51c621` ("drm/xe/oa/uapi: Define and parse OA stream properties") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221213352.1712932-1-umesh.nerlige.ramappa@intel.com	2025-02-24 13:44:34 -08:00
Shuicheng Lin	046eda6525	drm/xe/devcoredump: Remove IS_ERR_OR_NULL check for kzalloc kzalloc returns a valid pointer or NULL if the allocation fails. It never returns an error pointer. It is better to check for NULL directly. Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220001710.1803749-3-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:31:35 -08:00
Shuicheng Lin	c504ad914f	drm/xe/devcoredump: Fix print typo of offset The log should print with "offset" instead of "size". Correct the typo in the comment. v2: split kzalloc change and add typo fix in commit message (Lucas) Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220001710.1803749-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:31:12 -08:00
Riana Tauro	c7f2b8bfca	drm/xe/xe_pmu: Acquire forcewake on event init for engine events When the engine events are created, acquire GT forcewake to read gpm timestamp required for the events and release on event destroy. This cannot be done during read due to the raw spinlock held my pmu. v2: remove forcewake counting (Umesh) v3: remove extra space (Umesh) v4: use event pmu private data (Lucas) free local copy (Umesh) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-6-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:24:32 -08:00
Riana Tauro	6978c5f5a6	drm/xe/xe_pmu: Add PMU support for engine activity PMU provides two counters (engine-active-ticks, engine-total-ticks) to calculate engine activity. When querying engine activity, user must group these 2 counters using the perf_event group mechanism to ensure both counters are sampled together. To list the events ./perf list xe_0000_03_00.0/engine-active-ticks/ [Kernel PMU event] xe_0000_03_00.0/engine-total-ticks/ [Kernel PMU event] The formats to be used with the above are engine_instance - config:12-19 engine_class - config:20-27 gt - config:60-63 The events can then be read using perf tool ./perf stat -e xe_0000_03_00.0/engine-active-ticks,gt=0, engine_class=0,engine_instance=0/, xe_0000_03_00.0/engine-total-ticks,gt=0, engine_class=0,engine_instance=0/ -I 1000 Engine activity can then be calculated as below engine activity % = (engine active ticks/engine total ticks) * 100 v2: validate gt rename total-ticks to engine-total-ticks add helper to get hwe (Umesh) v3: fix checkpatch warning add details to documentation (Umesh) remove ascii formats from documentation (Lucas) v4: remove unnecessary warn within raw_spinlock (Lucas) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-5-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:23:57 -08:00
Riana Tauro	0e6ffdb2b7	drm/xe/guc: Expose engine activity only for supported GuC version Engine activity is supported only on GuC submission version >= 1.14.1 Allow enabling/reading engine activity only on supported GuC versions. Warn once if not supported. v2: use guc interface version (John) v3: use debug log (Umesh) v4: use variable for supported and use gt logs use a friendlier log message (Michal) v5: fix kernel-doc do not continue in init if not supported (Michal) v6: remove hardcoding values (Michal) Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-4-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 12:32:10 -08:00
Riana Tauro	9e19f42955	drm/xe/trace: Add trace for engine activity Add engine activity related information to trace events for better debuggability v2: add trace for engine activity (Umesh) v3: use hex for quanta_ratio Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-3-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 12:32:09 -08:00
Riana Tauro	b729ea271e	drm/xe: Add engine activity support GuC provides support to read engine counters to calculate the engine activity. KMD exposes two counters via the PMU interface to calculate engine activity Engine Active Ticks(engine-active-ticks) - active ticks of engine Engine Total Ticks (engine-total-ticks) - total ticks of engine Engine activity percentage can be calculated as below Engine activity % = (engine active ticks/engine total ticks) * 100. v2: fix cosmetic review comments add forcewake for gpm_ts (Umesh) v3: fix CI hooks error change function parameters and unpin bo on error of allocate_activity_buffers fix kernel-doc (Umesh) use engine activity (Umesh, Lucas) rename xe_engine_activity to xe_guc_engine_* fix commit message to use engine activity (Lucas, Umesh) v4: add forcewake in PMU layer v5: fix makefile use drmm_kcalloc instead of kmalloc_array remove managed bo skip init for VF fix cosmetic review comments (Michal) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-2-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 12:32:09 -08:00
Matthew Auld	8b4b3af869	drm/xe/userptr: remove tmp_evict list Doesn't look to be used. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-6-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:03:03 -06:00
Matthew Auld	6b93cb9891	drm/xe/userptr: fix EFAULT handling Currently we treat EFAULT from hmm_range_fault() as a non-fatal error when called from xe_vm_userptr_pin() with the idea that we want to avoid killing the entire vm and chucking an error, under the assumption that the user just did an unmap or something, and has no intention of actually touching that memory from the GPU. At this point we have already zapped the PTEs so any access should generate a page fault, and if the pin fails there also it will then become fatal. However it looks like it's possible for the userptr vma to still be on the rebind list in preempt_rebind_work_func(), if we had to retry the pin again due to something happening in the caller before we did the rebind step, but in the meantime needing to re-validate the userptr and this time hitting the EFAULT. This explains an internal user report of hitting: [ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe] [ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738690] Call Trace: [ 191.738692] <TASK> [ 191.738694] ? show_regs+0x69/0x80 [ 191.738698] ? __warn+0x93/0x1a0 [ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738759] ? report_bug+0x18f/0x1a0 [ 191.738764] ? handle_bug+0x63/0xa0 [ 191.738767] ? exc_invalid_op+0x19/0x70 [ 191.738770] ? asm_exc_invalid_op+0x1b/0x20 [ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738834] ? ret_from_fork_asm+0x1a/0x30 [ 191.738849] bind_op_prepare+0x105/0x7b0 [xe] [ 191.738906] ? dma_resv_reserve_fences+0x301/0x380 [ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe] [ 191.738966] ? kmemleak_alloc+0x4b/0x80 [ 191.738973] ops_execute+0x188/0x9d0 [xe] [ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe] [ 191.739098] ? trace_hardirqs_on+0x4d/0x60 [ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe] Followed by NPD, when running some workload, since the sg was never actually populated but the vma is still marked for rebind when it should be skipped for this special EFAULT case. This is confirmed to fix the user report. v2 (MattB): - Move earlier. v3 (MattB): - Update the commit message to make it clear that this indeed fixes the issue. Fixes: `521db22a1d` ("drm/xe: Invalidate userptr VMA on page pin fault") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.10+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-5-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:02:54 -06:00
Matthew Auld	4e37e92892	drm/xe/userptr: restore invalidation list on error On error restore anything still on the pin_list back to the invalidation list on error. For the actual pin, so long as the vma is tracked on either list it should get picked up on the next pin, however it looks possible for the vma to get nuked but still be present on this per vm pin_list leading to corruption. An alternative might be then to instead just remove the link when destroying the vma. v2: - Also add some asserts. - Keep the overzealous locking so that we are consistent with the docs; updating the docs and related bits will be done as a follow up. Fixes: `ed2bdf3b26` ("drm/xe/vm: Subclass userptr vmas") Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-4-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:02:33 -06:00
Tejas Upadhyay	b7b68c6e36	drm/xe/wa: Limit char per line to 100 Above 100 char per line checkpatch would complain. Fixing it. Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221115344.389975-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-02-24 12:10:10 +05:30
Umesh Nerlige Ramappa	98c9d27ab3	drm/xe/oa: Ensure that polled read returns latest data In polled mode, user calls poll() for read data to be available before performing a read(). In the duration between these 2 calls, there may be new data available in the OA buffer. To ensure user reads all available data, check for latest data in the OA buffer in polled read. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250212010255.1423343-1-umesh.nerlige.ramappa@intel.com	2025-02-21 11:45:29 -08:00
Priyanka Dandamudi	70c7273778	drm/xe: Add fault injection for xe_sync_entry_parse Add fault injection for xe_sync_entry_parse to allow it to fail while executing xe_vm_bind_ioctl(). This needs to be added as it cannot be reached by injecting error through IOCTL arguments. Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250212093212.3069356-1-priyanka.dandamudi@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-02-21 13:02:54 +05:30
Marcin Bernatowicz	94030a1d32	drm/xe/client: Skip show_run_ticks if unable to read timestamp RING_TIMESTAMP registers are inaccessible in VF mode. Without drm-total-cycles-*, other keys provide little value. Skip all optional "run_ticks" keys in this case. Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250205191644.2550879-3-marcin.bernatowicz@linux.intel.com	2025-02-20 13:49:08 +01:00
Marcin Bernatowicz	5a9f8db2db	drm/xe/vf: Return EOPNOTSUPP for DRM_XE_DEVICE_QUERY_ENGINE_CYCLES if VF RING_TIMESTAMP registers are not available for VF (Virtual Function) drivers. Return -EOPNOTSUPP when the DRM_XE_DEVICE_QUERY_ENGINE_CYCLES ioctl is invoked on a VF device. Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250205191644.2550879-2-marcin.bernatowicz@linux.intel.com	2025-02-20 13:49:06 +01:00
Matt Roper	a1e5b6d83e	drm/xe: Drop unnecessary GT lookup in xe_exec_queue_create_ioctl() xe_exec_queue_create_ioctl() performs a lookup of the xe_gt for the GT ID passed from userspace, but the result is never actually used. Since there's already a separate (and earlier) check that the ID passed from userspace is valid, the unnecessary lookup can be removed. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250218200511.4050060-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-02-19 07:14:59 -08:00
Rodrigo Vivi	f2cd50990d	drm/xe/display: Spin-off xe_display runtime/d3cold sequences No functional change. This patch only splits the xe_display_pm suspend/resume functions in the regular suspend/resume from the runtime/d3cold ones. v2: - Rename d3cold functions (Jonathan) - Rebase Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250218010330.761340-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-02-18 18:03:47 -05:00
Rodrigo Vivi	ceb33b9de1	drm/{i915, xe}/display: Move dsm registration under intel_driver Move dsm register/unregister calls from the drivers to under intel_display_driver register/unregister. v2: Rebase only Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217200133.741758-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-02-18 18:03:18 -05:00
Ilia Levi	eb79d71e50	drm/xe: Add xe_mmio_init() initialization function Add a convenience function for minimal initialization of struct xe_mmio. This function also validates that the entirety of the provided mmio region is usable with struct xe_reg. v2: Modify commit message, add kernel doc, refactor assert (Michal) v3: Fix off-by-one bug, add clarifying macro (Michal) v4: Derive bitfield width from size (Michal) Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213093559.204652-1-ilia.levi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-18 08:27:11 -08:00
Ilia Levi	5bee1e2de3	drm/xe: s/xe_mmio_init/xe_mmio_probe_early Rename so that xe_mmio_init() can be used in subsequent patches to initialize an instance of struct xe_mmio. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250130105057.136586-1-ilia.levi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-18 08:26:41 -08:00
Maarten Lankhorst	339adeb104	drm/xe/display: Clarify XE_IOCTL_DBG message This should make it easier to understand from userspace why importing BO fails. Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250117115305.53113-1-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-02-17 09:32:12 +01:00
Tejas Upadhyay	b5fa0913b5	drm/xe: Fix typo in xe_job_ptrs %s/uinitialized/uninitialized/gc Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213060838.32493-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-02-17 10:52:01 +05:30
Michal Wajdeczko	611160b02a	drm/xe/pf: Release all VFs configs on device removal If we try to manually provision VFs using debugfs and then we try to unload the driver, we will see complains like: [ ] Memory manager not clean during takedown. [ ] RIP: 0010:drm_mm_takedown+0x3f/0x100 [ ] [drm:drm_mm_takedown] ERROR node [fedff000 + 00001000]: inserted at drm_mm_insert_node_in_range+0x2bd/0x520 xe_ggtt_node_insert+0x52/0x90 [xe] pf_provision_vf_ggtt+0x1fa/0xac0 [xe] xe_gt_sriov_pf_config_set_ggtt+0x79/0x7a0 [xe] ggtt_set+0x53/0x80 [xe] simple_attr_write_xsigned.isra.0+0xd2/0x150 simple_attr_write+0x14/0x30 debugfs_attr_write+0x4e/0x80 [ ] xe 0000:00:02.0: [drm] ERROR GT0: GUC ID manager unclean (1/65535) [ ] xe 0000:00:02.0: [drm] GT0: total 65535 [ ] xe 0000:00:02.0: [drm] GT0: used 1 [ ] xe 0000:00:02.0: [drm] GT0: range 65534..65534 (1) [ ] xe 0000:00:02.0: [drm] ERROR GT0: GuC doorbells manager unclean (1/256) [ ] xe 0000:00:02.0: [drm] GT0: count: 256 [ ] xe 0000:00:02.0: [drm] GT0: available range: 1..255 (255) [ ] xe 0000:00:02.0: [drm] GT0: available total: 255 [ ] xe 0000:00:02.0: [drm] GT0: reserved range: 0..0 (1) [ ] xe 0000:00:02.0: [drm] GT0: reserved total: 1 This could be easily fixed by adding config release action. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250211155034.1028-1-michal.wajdeczko@intel.com	2025-02-16 21:36:38 +01:00
Lucas De Marchi	62fbc75b28	drm/xe/hwmon: Stop ignoring errors on probe Not registering hwmon because it's not available (SRIOV_VF and DGFX) is different from failing the initialization. Handle the errors appropriately. Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-13-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	6b5506158f	drm/xe/pmu: Fail probe if xe_pmu_register() fails Now that previous callers in xe_device_probe() are handling the errors, that can be done for xe_pmu_register() as well. Cc: Riana Tauro <riana.tauro@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-12-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	960d71044e	drm/xe/oa: Handle errors in xe_oa_register() Let xe_oa_unregister() be handled by devm infra since it's only putting the kobject. Also, since kobject_create_and_add may fail, handle the error accordingly. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-11-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	00f6a86c3c	drm/xe: Move drm_dev_unplug() out of display function This is not really display-related and needed for any sequence on driver removal that has to interact with drm_dev_enter()/drm_dev_exit(). Just remove xe_device_remove_display() and inline it in the single caller to make clear this is not done only for display. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-10-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	d3f557d52e	drm/xe/oa: Move fini to xe_oa Like done with other functions, cleanup the error handling in xe_device_probe() by moving the OA fini to be handled by xe_oa itself, which relies on devm to call the cleanup function. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-9-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	f5ebe80e32	drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish() Now that xe_gt_remove is handled entirely by xe_gt, it's clear there are some extra calls to xe_hw_fence_irq_finish() that aren't necessary. Neither all_fw_domain_init() or gt_fw_domain_init() need to do that since it's handled by the caller on any error. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-8-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	ff6cd29b69	drm/xe: Cleanup unwind of gt initialization The only thing in xe_gt_remove() that really needs to happen on the device remove callback is the xe_uc_remove(). That's because of the following call chain: xe_gt_remove() xe_uc_remove() xe_gsc_remove() xe_gsc_proxy_remove() Move xe_gsc_proxy_remove() to be handled as a xe_device_remove_action, so it's recorded when it should run during device removal. The rest can be handled normally by devm infra. Besides removing the deep call chain above, xe_device_probe() doesn't have to unwind the gt loop and it's also more in line with the xe_device_probe() style. Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	c0aeb90b28	drm/xe: Remove leftover pxp comment Not being able to initialize pxp is fatal if the platform is expected to have it. Update comment after commit `9c9dc9ba4a` ("drm/xe/pxp: Fail the load if PXP fails to initialize"). Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:55 -08:00
Lucas De Marchi	ff57025c35	drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init() Make sure to differentiate normal behavior, e.g. there's no stolen, from allocation errors or failure to initialize lower layers. Reviewed-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-14 11:42:54 -08:00

1 2 3 4 5 ...

1326733 Commits