linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-01 01:14:19 -04:00

Author	SHA1	Message	Date
John Harrison	fa171d49e4	drm/xe/guc: Fix uninitialised count in GuC load debug prints The debug prints about how long the GuC load takes have a loop counter. However that was neither initialised nor incremented! Plus, counting loops is no longer meaningful given the wait function returns early for any change in the status value. So fix it to only count loops due to actual timeouts. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405250151.IbH0l8FG-lkp@intel.com/ Fixes: `b0ac1b42db` ("drm/xe/guc: Port over the slow GuC loading support from i915") Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Fei Yang <fei.yang@intel.com> Cc: intel-xe@lists.freedesktop.org Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524202603.4011656-1-John.C.Harrison@Intel.com	2024-05-28 14:53:09 -07:00
Oded Gabbay	8de6625daf	MAINTAINERS: update Xe driver maintainers Because I left Intel, I'm removing myself from the list of Xe driver maintainers. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240515162222.12958-3-ogabbay@kernel.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-28 13:40:24 -07:00
Riana Tauro	38e8c4184e	drm/xe: Enable Coarse Power Gating Enable power gating for all units and sub-pipes that are disabled by default. v2: change the init function name use symmetric calls for enable/disable pg re-pharase commit message (Rodrigo) modify the sub-pipe power gating condition v3: set hysteresis value for render and media when GuC PC is disabled skip CPG for PVC (Vinay) v4: rebase Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2 Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524070916.143022-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-28 12:29:43 -04:00
Riana Tauro	9276bcc22f	drm/xe: Standardize power gate registers Standardize power gate registers No functional changes v2: change commit message (Rodrigo) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524070916.143022-2-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-28 12:29:42 -04:00
Matt Roper	5c9464e2c7	drm/xe: Don't refer to general LRC initialization as a "wa" During engine LRC initialization a number of registers need to be programmed as general setup. This programming is not a "workaround" so naming the RTP table as "lrc_was" is misleading; switch to the name "lrc_setup" to more accurately describe what the table is actually for. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524230444.1447797-2-matthew.d.roper@intel.com	2024-05-28 08:04:44 -07:00
Michal Wajdeczko	0aa256252d	drm/xe: Use platform name in xe_assert() We can now use more user-friendly platform name instead of previosly used magic platform enumerator value: [ ] xe 0000:00:02.0: [drm] Assertion `false` failed! platform: ALDERLAKE_S ... [ ] xe 0000:03:00.0: [drm] Assertion `false` failed! platform: DG2 ... vs [ ] xe 0000:00:02.0: [drm] Assertion `false` failed! platform: 3 ... [ ] xe 0000:03:00.0: [drm] Assertion `false` failed! platform: 7 ... Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-4-michal.wajdeczko@intel.com	2024-05-28 16:08:24 +02:00
Michal Wajdeczko	6ca7289756	drm/xe: Store platform name in xe_device.info We already maintain the platform name as part of the device descriptor, but in xe_device.info we only store platform enum, which is not the best for use in any user-facing messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-2-michal.wajdeczko@intel.com	2024-05-28 16:08:23 +02:00
Andrzej Hajda	82e0b1299a	drm/xe: allow unaligned start and size xe_res_cursor parameters xe_res_cursor code does not depend on the alignment. On the other side unaligned accesses are useful from pread/pwrite point of view. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240418-xe_res_cursor-no-align-v1-1-8df7834266c9@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-05-28 14:43:49 +02:00
Andrzej Hajda	38007fa964	drm/xe: flush gtt before signalling user fence on all engines Tests show that user fence signalling requires kind of write barrier, otherwise not all writes performed by the workload will be available to userspace. It is already done for render and compute, we need it also for the rest: video, gsc, copy. v2: added gsc and copy engines, added fixes and r-b tags Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488 Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522-xu_flush_vcs_before_ufence-v2-1-9ac3e9af0323@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-05-28 14:36:05 +02:00
Umesh Nerlige Ramappa	ce62827bc2	drm/xe: Do not access xe file when updating exec queue run_ticks The current code is running into a use after free case where xe file is closed before the exec queue run_ticks can be updated. This is occurring in the xe_file_close path. To fix that, do not access xe file when updating the exec queue run_ticks. Instead store the exec queue run_ticks locally in the exec queue object and accumulate it when the user dumps the drm client stats. We know that the xe file is valid when user is dumping the run_ticks for the drm client, so this effectively removes the dependency on xe file object in xe_exec_queue_update_run_ticks(). v2: - Fix the accumulation of q->run_ticks delta into xe file run_ticks - s/runtime/run_ticks/ (Rodrigo) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908 Fixes: `6109f24f87` ("drm/xe: Add helper to accumulate exec queue runtime") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524234744.1352543-2-umesh.nerlige.ramappa@intel.com	2024-05-27 14:07:46 -07:00
Umesh Nerlige Ramappa	45bb564de0	drm/xe: Use run_ticks instead of runtime for client stats Note that runtime is also used in the pm context, so it is confusing to use the same name to denote run time of the drm client. Use a more appropriate name for the client utilization. While at it, drop the incorrect multi-lrc comment in the helper description v2: s/show_runtime/show_run_ticks/ (Rodrigo) Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524234744.1352543-1-umesh.nerlige.ramappa@intel.com	2024-05-27 14:07:44 -07:00
Thomas Hellström	50e52592fb	drm/xe: Move job creation out of the struct xe_migrate::job_mutex In order to be able to run gpu jobs from reclaim context, move job creation (where allocation takes place) out of the struct xe_migrate::job_mutex, and prime that mutex as reclaim tainted. Jobs that may need to run from reclaim context include CCS metadata extraction at shrinking time. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-6-thomas.hellstrom@linux.intel.com	2024-05-27 21:26:07 +02:00
Thomas Hellström	577b83b0f4	drm/xe: Remove xe_lrc_create_seqno_fence() It's not used anymore. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-5-thomas.hellstrom@linux.intel.com	2024-05-27 21:26:06 +02:00
Thomas Hellström	0ac7a2c745	drm/xe: Don't initialize fences at xe_sched_job_create() Pre-allocate but don't initialize fences at xe_sched_job_create(), and initialize / arm them instead at xe_sched_job_arm(). This makes it possible to move xe_sched_job_create() with its memory allocation out of any lock that is required for fence initialization, and that may not allow memory allocation under it. Replaces the struct dma_fence_array for parallell jobs with a struct dma_fence_chain, since the former doesn't allow a split-up between allocation and initialization. v2: - Rebase. - Don't always use the first lrc when initializing parallel lrc fences. - Use dma_fence_chain_contained() to access the lrc fences. v4: - Add an assert that job->lrc_seqno == fence->seqno. (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-4-thomas.hellstrom@linux.intel.com	2024-05-27 21:26:03 +02:00
Thomas Hellström	e183910ae4	drm/xe: Split lrc seqno fence creation up Since sometimes a lock is required to initialize a seqno fence, and it might be desirable not to hold that lock while performing memory allocations, split the lrc seqno fence creation up into an allocation phase and an initialization phase. Since lrc seqno fences under the hood are hw_fences, do the same for these and remove the xe_hw_fence_create() function since it is not used anymore. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-3-thomas.hellstrom@linux.intel.com	2024-05-27 21:26:02 +02:00
Matthew Brost	08f7200899	drm/xe: Decouple job seqno and lrc seqno Tightly coupling these seqno presents problems if alternative fences for jobs are used. Decouple these for correctness. v2: - Slightly reword commit message (Thomas) - Make sure the lrc fence ops are used in comparison (Thomas) - Assume seqno is unsigned rather than signed in format string (Thomas) Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-2-thomas.hellstrom@linux.intel.com	2024-05-27 21:25:59 +02:00
Michal Wajdeczko	d79e8cab32	drm/xe/vf: Use only assigned GGTT region Each VF is assigned a limited range of the GGTT address space. To ensure that the VF driver does not use GGTT allocations outside of the assigned region, explicitly reserve GGTT space below and above this region when initializing GGTT. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527112015.1020-1-michal.wajdeczko@intel.com	2024-05-27 18:46:27 +02:00
Michal Wajdeczko	ea797cf4b7	drm/xe/vf: Read VF configuration prior to GGTT initialization Each VF will be assigned with only a limited range of the GGTT address space. Make sure that VF driver will read its own GGTT configuration before starting any GGTT initialization. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524113714.932-2-michal.wajdeczko@intel.com	2024-05-27 18:46:26 +02:00
Michal Wajdeczko	5cef849397	drm/xe/vf: Treat GMDID as another runtime register While the GMDID registers are not part of the runtime register list shared by the PF driver, we may still return cached values from our VF specific read32() helper function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-7-michal.wajdeczko@intel.com	2024-05-24 10:08:41 +02:00
Michal Wajdeczko	9081f8ca27	drm/xe/vf: Cache value of the GMDID register Read and cache value of the GMDID register as part of the config query that VF driver is doing over MMIO. While the VF driver likely already obtained the value of the GMDID register once during the early driver probe, we couldn't cache it then as the GT structures were not ready yet. Cache it now, in case the driver needs it later when the GuC MMIO communication, required to query GMDID from GuC, could be no longer desired as it will be replaced by the CTB communication. While around, assert that we will query GMDID only when applicable. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-6-michal.wajdeczko@intel.com	2024-05-24 10:08:41 +02:00
Michal Wajdeczko	fcc6b719ae	drm/xe/vf: Provide early access to GMDID register VFs do not have direct access to the GMDID register and must obtain its value from the GuC. Since we need GMDID value very early in the driver probe flow, before we even start the full setup of GT and GuC data structures, we must do some early initializations ourselves. Additionally, since we also need GMDID for the media GT, which isn't created yet, temporarly tweak the root GT type into MEDIA to allow communication with the correct GuC, as only it can provide the value of the media GMDID register. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523223042.888-1-michal.wajdeczko@intel.com	2024-05-24 10:08:28 +02:00
Michal Wajdeczko	2948b24233	drm/xe/vf: Obtain value of GMDID register from GuC VFs don't have access to the GMDID register and must obtain it value using GuC VF ABI KLV query. Add function for doing that. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-4-michal.wajdeczko@intel.com	2024-05-24 10:02:28 +02:00
Michal Wajdeczko	e70aa1016e	drm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definition VF drivers can't access GMD_ID register over MMIO. The value of the GMD_ID register must be queried from GuC. It is available as GLOBAL_CFG_GMD_ID KLV. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-3-michal.wajdeczko@intel.com	2024-05-24 10:02:27 +02:00
Michal Wajdeczko	4edadc41a3	drm/xe/vf: Use register values obtained from the PF As part of the its initialization, the VF driver has already obtained a list of the runtime (fuse) register values from the PF driver. When VF driver is attempting to read register that is inaccessible to the VF, use the values from this list instead. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-2-michal.wajdeczko@intel.com	2024-05-24 10:02:26 +02:00
John Harrison	b0ac1b42db	drm/xe/guc: Port over the slow GuC loading support from i915 GuC loading can take longer than it is supposed to for various reasons. So add in the code to cope with that and to report it when it happens. There are also many different reasons why GuC loading can fail, so add in the code for checking for those and for reporting issues in a meaningful manner rather than just hitting a timeout and saying 'fail: status = %x'. Also, remove the 'FIXME' comment about an i915 bug that has never been applicable to Xe! v2: Actually report the requested and granted frequencies rather than showing granted twice (review feedback from Badal). v3: Locally code all the timeout and end condition handling because a helper function is not allowed (review feedback from Lucas/Rodrigo). v4: Add more documentation comments and rename a define to add units (review feedback from Lucas). v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from Lucas) and rebase (no more return value from guc_wait_ucode). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com	2024-05-23 10:55:31 -07:00
John Harrison	fcc8f80517	drm/xe: Make read_perf_limit_reasons globally accessible Other driver code beyond the sysfs interface wants to know about throttling. So make the query function globally accessible. v2: Revert include order change (review feedback from Lucas) v3: Remove '_sysfs' from throttle file names and keep limit query in the same file rather than moving elsewhere (review feedback from Rodrigo). v4: Correct #include while renaming header file (review feedback from Lucas). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-2-John.C.Harrison@Intel.com	2024-05-23 10:55:28 -07:00
José Roberto de Souza	83ee002df0	drm/xe: Nuke simple error capture This error capture prints into dmesg HW state when a gpu hang happens. It was useful when we did not had devcoredump, now it is a incompleted version of devcoredump that has potential to flood dmesg. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522203431.191594-1-jose.souza@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 13:38:26 -04:00
José Roberto de Souza	b10d0c5e9d	drm/xe: Add process name to devcoredump Process name help us track what application caused the gpug hang, this is crucial when running several applications at the same time. v2: - handle Xe KMD exec_queues without VM v3: - use get_pid_task() (suggested by Nirmoy) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522201203.145403-1-jose.souza@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 13:37:56 -04:00
Dr. David Alan Gilbert	e8ac8048a7	drm/xe: remove unused struct 'xe_gt_desc' 'xe_gt_desc' is unused since commit `1e6c20be6c` ("drm/xe: Drop extra_gts[] declarations and XE_GT_TYPE_REMOTE"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240522175840.382107-1-linux@treblig.org Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 13:33:55 -04:00
Rodrigo Vivi	f91806033f	drm/xe: Enable D3Cold on 'low' VRAM utilization Now that we eliminated all the mem_access get/put with its locking issues from the inner calls of migration, we can allow D3Cold. Enable it when VRAM utilization is lower then 300Mb. On higher utilization we only allow D3hot so we don't increase so much the latency on runtime resume due to the memory restoration. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-7-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:54:08 -04:00
Rodrigo Vivi	8d490e019b	drm/xe: Stop checking for power_lost on D3Cold GuC reset status is not reliable for this purpose and it is once in a while ending up in a situation of D3Cold, where power_reset is false and without the proper memory restoration the GuC reload and Display will fail to come back from D3Cold. So, let's do a full restoration of everything if we have a risk of losing power, without further optimizations. v2: also remove the gut_in_reset function (Anshuman) Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:54:07 -04:00
Rodrigo Vivi	e7b180b220	drm/xe: Prepare display for D3Cold Prepare power-well and DC handling for a full power lost during D3Cold, then sanitize it upon D3->D0. Otherwise we get a bunch of state mismatch. Ideally we could leave DC9 enabled and wouldn't need to move DC9->DC0 on every runtime resume, however, the disable_DC is part of the power-well checks and intrinsic to the dc_off power well. In the future that can be detangled so we can have even bigger power savings. But for now, let's focus on getting a D3Cold, which saves much more power by itself. v2: create new functions to avoid full-suspend-resume path, which would result in a deadlock between xe_gem_fault and the modeset-ioctl. v3: Only avoid the full modeset to avoid the race, for a more robust suspend-resume. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Uma Shankar <uma.shankar@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-5-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:54:07 -04:00
Rodrigo Vivi	73ba282e7f	drm/xe: Relax runtime pm protection around VM In the regular use case scenario, user space will create a VM, and keep it alive for the entire duration of its workload. For the regular desktop cases, it means that the VM is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Limit the VM protection solely for long-running workloads that are not protected by the scheduler references. By design, run_job for long-running workloads returns NULL and the scheduler drops all the references of it, hence protecting the VM for this case is necessary. v2: Update commit message to a more imperative language and to reflect why the VM protection is really needed. Also add a comment in the code to let the reason visbible. v3: Remove vma_access case and the mentions to mmap. Mmap cases are already protected by the gem page fault. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:53:50 -04:00
Rodrigo Vivi	ad1e331fc4	drm/xe: Relax runtime pm protection during execution Limit the protection only during moments of actual job execution, and introduce protection for guc submit fini, which is currently unprotected due to the absence of exec_queue life protection. In the regular use case scenario, user space will create an exec queue, and keep it alive to reuse that until it is done with that kind of workload. For the regular desktop cases, it means that the exec_queue is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:52:56 -04:00
Rodrigo Vivi	967c5d7c64	drm/xe: Fix xe_pm_runtime_get_if_in_use documentation Let's be clear on what it is actually doing and align with xe_pm_runtime_get_if_active doc style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:52:56 -04:00
Rodrigo Vivi	46edb0a3eb	drm/xe: Fix xe_pm_runtime_get_if_active return Current callers of this function are already taking the result to a boolean and using in an if. It might be a problem because current function might return negative error codes on failure, without increasing the reference counter. In this scenario we could end up with extra 'put' call ending in unbalanced scenarios. Let's fix it, while aligning with the current xe_pm_get_if_in_use style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:52:56 -04:00
Niranjana Vishwanathapura	40672b792a	drm/xe: Properly handle alloc_guc_id() failure Release the submission_state lock if alloc_guc_id() fails. v2: Add Fixes tag and CC stable kernel Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521201711.4934-1-niranjana.vishwanathapura@intel.com	2024-05-22 12:33:37 -07:00
Michal Wajdeczko	3ec3b42752	drm/xe/uc: Don't emit false error if running in execlist mode When running in execlist mode (using force_execlist=1 modparam) we incorrectly select the error path in xe_uc_init(), leading to an unwanted error message like this: [ ] xe 0000:00:00.0: [drm] ERROR GT0: Failed to initialize uC (0000000000000000) Fix that by doing early return like we do in other similar cases. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521114857.712-1-michal.wajdeczko@intel.com	2024-05-22 18:26:22 +02:00
Matthew Auld	dc51c682dd	drm/xe/display: move device_remove over to drmm i915 display calls this when releasing the drm_device, match this also in xe by using drmm. intel_display_device_remove() is freeing purely software state for the drm_device. v2: fix build error Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-36-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	48d74a0a45	drm/xe/display: stop calling domains_driver_remove twice Unclear why we call this twice. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-35-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	5b6937b65e	drm/xe/display: move display fini stuff to devm Match the i915 display handling here with calling both no_irq and noaccel when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-34-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	c711741978	drm/xe: reset mmio mappings with devm Set our various mmio mappings to NULL. This should make it easier to catch something rogue trying to mess with mmio after device removal. For example, we might unmap everything and then start hitting some mmio address which has already been unmamped by us and then remapped by something else, causing all kinds of carnage. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-33-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	a0b834c895	drm/xe/mmio: move mmio_fini over to devm Not valid to touch mmio once the device is removed, so make sure we unmap on removal and not just when driver instance goes away. Also set the mmio pointers to NULL to hopefully catch such issues more easily. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-32-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	cd506a33b0	drm/xe: make gt_remove use devm No need to hand roll the onion unwind here, just move gt_remove over to devm which will already have the correct ordering. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-31-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	1bd985ff9f	drm/xe/gt: break out gt_fini into sw vs hw state Have a cleaner separation between hw vs sw. v2: Fix missing return Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-30-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	cf13ae6b81	drm/xe/coredump: move over to devm Here we are using drmm to ensure we release the coredump when unloading the module, however the coredump is very much tied to the struct device underneath. We can see this when we hotunplug the device, for which we have already got a coredump attached. In such a case the coredump still remains and adding another is not possible. However we still register the release action via xe_driver_devcoredump_fini(), so in effect two or more releases for one dump. The other consideration is that the coredump state is embedded in the xe_driver instance, so technically once the drmm release action fires we might free the coredumpe state from a different driver instance, assuming we have two release actions and they can race. Rather use devm here to remove the coredump when the device is released. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1679 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-29-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	cee70645a7	drm/xe/device: move xe_device_sanitize over to devm Disable GuC submission when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-28-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	bc54f42c0e	drm/xe/device: move flr to devm Should be called when driver is removed, not when this particular driver instance is destroyed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-27-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	bbc9651fe9	drm/xe/irq: move irq_uninstall over to devm Makes sense to trigger this when the device is removed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-26-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	6d95155ae7	drm/xe/guc_pc: s/pc_fini/pc_fini_hw/ Make it clear that is about cleaning up the HW/FW side, and not software state. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-25-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00

1 2 3 4 5 ...

1268029 Commits