linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 20:42:29 -04:00

Author	SHA1	Message	Date
Shuicheng Lin	4761791c1e	drm/xe: Skip address copy for sync-only execs For parallel exec queues, xe_exec_ioctl() copied the batch buffer address array from userspace without checking num_batch_buffer. If user creates a sync-only exec that doesn't use the address field, the exec will fail with -EFAULT. Add num_batch_buffer check to skip the copy, and the exec could be executed successfully. Here is the sync-only exec: struct drm_xe_exec exec = { .extensions = 0, .exec_queue_id = qid, .num_syncs = 1, .syncs = (uintptr_t)&sync, .address = 0, /* ignored for sync-only / .num_batch_buffer = 0, / sync-only */ }; Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260122214053.3189366-2-shuicheng.lin@intel.com	2026-01-23 16:34:36 -08:00
Nitin Gote	6ef02656c3	drm/xe: derive mem copy capability from graphics version Drop .has_mem_copy_instr from the platform descriptors and set it in xe_info_init() after handle_gmdid() populates graphics_verx100. Centralizing the GRAPHICS_VER(xe) >= 20 check keeps MEM_COPY enabled on Xe2+ and removes redundant per-platform plumbing. Bspec: 57561 Fixes: `1e12dbae9d` ("drm/xe/migrate: support MEM_COPY instruction") Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patch.msgid.link/20260120054724.1982608-2-nitin.r.gote@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2026-01-23 11:55:06 +05:30
Sanjay Yadav	dc2fc00ba9	drm/xe: Use DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations The VRAM/stolen memory managers do not currently set DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations. Enabling this flag activates the buddy allocator's try_harder path, which helps handle fragmented memory scenarios. This enables the __alloc_contig_try_harder fallback in the buddy allocator, allowing contiguous allocation requests to succeed even when memory is fragmented by combining allocations from both(RHS and LHS) sides of a large free block. v2: (Matt B) - Remove redundant logic for rounding allocation size and trimming when TTM_PL_FLAG_CONTIGUOUS is set, since drm_buddy now handles this when DRM_BUDDY_CONTIGUOUS_ALLOCATION is enabled Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6713 Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260121111416.3104399-2-sanjay.kumar.yadav@intel.com	2026-01-22 09:52:27 +00:00
Thomas Hellström	9386f49316	drm/xe: Select CONFIG_DEVICE_PRIVATE when DRM_XE_GPUSVM is selected CONFIG_DEVICE_PRIVATE is a prerequisite for DRM_XE_GPUSVM. Explicitly select it so that DRM_XE_GPUSVM is not unintentionally left out from distro configs not explicitly enabling CONFIG_DEVICE_PRIVATE. v2: - Select also CONFIG_ZONE_DEVICE since it's needed by CONFIG_DEVICE_PRIVATE. v3: - Depend on CONFIG_ZONE_DEVICE rather than selecting it. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260121091048.41371-3-thomas.hellstrom@linux.intel.com	2026-01-22 09:56:28 +01:00
Thomas Hellström	1e372b2461	drm, drm/xe: Fix xe userptr in the absence of CONFIG_DEVICE_PRIVATE CONFIG_DEVICE_PRIVATE is not selected by default by some distros, for example Fedora, and that leads to a regression in the xe driver since userptr support gets compiled out. It turns out that DRM_GPUSVM, which is needed for xe userptr support compiles also without CONFIG_DEVICE_PRIVATE, but doesn't compile without CONFIG_ZONE_DEVICE. Exclude the drm_pagemap files from compilation with !CONFIG_ZONE_DEVICE, and remove the CONFIG_DEVICE_PRIVATE dependency from CONFIG_DRM_GPUSVM and the xe driver's selection of it, re-enabling xe userptr for those configs. v2: - Don't compile the drm_pagemap files unless CONFIG_ZONE_DEVICE is set. - Adjust the drm_pagemap.h header accordingly. Fixes: `9e97874148` ("drm/xe/userptr: replace xe_hmm with gpusvm") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.18+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/20260121091048.41371-2-thomas.hellstrom@linux.intel.com	2026-01-22 09:56:27 +01:00
Matthew Auld	9dd1048bca	drm/xe/migrate: fix job lock assert We are meant to be checking the user vm for the bind queue, but actually we are checking the migrate vm. For various reasons this is not currently firing but this will likely change in the future. Now that we have the user_vm attached to the bind queue, we can fix this by directly checking that here. Fixes: `dba89840a9` ("drm/xe: Add GT TLB invalidation jobs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260120110609.77958-4-matthew.auld@intel.com	2026-01-21 10:07:21 +00:00
Matthew Auld	9dd08fdecc	drm/xe/uapi: disallow bind queue sharing Currently this is very broken if someone attempts to create a bind queue and share it across multiple VMs. For example currently we assume it is safe to acquire the user VM lock to protect some of the bind queue state, but if allow sharing the bind queue with multiple VMs then this quickly breaks down. To fix this reject using a bind queue with any VM that is not the same VM that was originally passed when creating the bind queue. This a uAPI change, however this was more of an oversight on kernel side that we didn't reject this, and expectation is that userspace shouldn't be using bind queues in this way, so in theory this change should go unnoticed. Based on a patch from Matt Brost. v2 (Matt B): - Hold the vm lock over queue create, to ensure it can't be closed as we attach the user_vm to the queue. - Make sure we actually check for NULL user_vm in destruction path. v3: - Fix error path handling. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Link: https://patch.msgid.link/20260120110609.77958-3-matthew.auld@intel.com	2026-01-21 10:07:18 +00:00
Matthew Brost	6cdaa5346d	drm/xe: Add context-based invalidation to GuC TLB invalidation backend Introduce context-based invalidation support to the GuC TLB invalidation backend. This is implemented by iterating over each exec queue per GT within a VM, skipping inactive queues, and issuing a context-based (GuC ID) H2G TLB invalidation. All H2G messages, except the final one, are sent with an invalid seqno, which the G2H handler drops to ensure the TLB invalidation fence is only signaled once all H2G messages are completed. A watermark mechanism is also added to switch between context-based TLB invalidations and full device-wide invalidations, as the return on investment for context-based invalidation diminishes when many exec queues are mapped. v2: - Fix checkpatch warnings v3: - Rebase on PRL - Use ref counting to avoid racing with deregisters v4: - Extra braces (Stuart) - Use per GT list (CI) - Reorder put Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-12-matthew.brost@intel.com	2026-01-16 18:24:57 -08:00
Matthew Brost	628d59392c	drm/xe: Add exec queue active vfunc If an exec queue is inactive (e.g., not registered or scheduling is disabled), TLB invalidations are not issued for that queue. Add a virtual function to determine the active state, which TLB invalidation logic can hook into. v5: - Operate on primary in active function Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-11-matthew.brost@intel.com	2026-01-16 18:24:56 -08:00
Matthew Brost	6b42b635d6	drm/xe: Add xe_tlb_inval_idle helper Introduce the xe_tlb_inval_idle helper to detect whether any TLB invalidations are currently in flight. This is used in context-based TLB invalidations to determine whether dummy TLB invalidations need to be sent to maintain proper TLB invalidation fence ordering.. v2: - Implement xe_tlb_inval_idle based on pending list Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-10-matthew.brost@intel.com	2026-01-16 18:24:54 -08:00
Matthew Brost	2d93d5d530	drm/xe: Add send_tlb_inval_ppgtt helper Extract the common code that issues a TLB invalidation H2G for PPGTTs into a helper function. This helper can be reused for both ASID-based and context-based TLB invalidations. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-9-matthew.brost@intel.com	2026-01-16 18:24:53 -08:00
Matthew Brost	edcc15f489	drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt Context-based TLB invalidations have their own set of GuC TLB invalidation operations. Rename the current PPGTT invalidation function, which operates on ASIDs, to a more descriptive name that reflects its purpose. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-8-matthew.brost@intel.com	2026-01-16 18:24:52 -08:00
Matthew Brost	8d7a9f801e	drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL Taint TLB invalidation seqno lock with GFP_KERNEL as TLB invalidations can be in the path of reclaim (e.g., MMU notifiers). Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-7-matthew.brost@intel.com	2026-01-16 18:24:51 -08:00
Matthew Brost	a3866ce7b1	drm/xe: Add vm to exec queues association Maintain a list of exec queues per vm which will be used by TLB invalidation code to do context-ID based tlb invalidations. v4: - More asserts (Stuart) - Per GT list (CI) - Skip adding / removal if context TLB invalidatiions not supported (Stuart) Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-6-matthew.brost@intel.com	2026-01-16 18:24:49 -08:00
Matthew Brost	43c3e6eacb	drm/xe: Add xe_device_asid_to_vm helper Introduce the xe_device_asid_to_vm helper, which can be used throughout the driver to resolve the VM from a given ASID. v4: - Move forward declare after includes (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-5-matthew.brost@intel.com	2026-01-16 18:24:48 -08:00
Matthew Brost	dea333b244	drm/xe: Add has_ctx_tlb_inval to device info Add has_ctx_tlb_inval to device info indicating a device has context basd TLB invalidation. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-4-matthew.brost@intel.com	2026-01-16 18:24:46 -08:00
Matthew Brost	444d78578e	drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT Ensure the asid_to_vm lookup is reclaim-safe so it can be performed during TLB invalidations, which is necessary for context-based TLB invalidation support. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-3-matthew.brost@intel.com	2026-01-16 18:24:45 -08:00
Matthew Brost	888c7f991f	drm/xe: Add normalize_invalidation_range Extract the code that determines the alignment of TLB invalidation into a helper function — normalize_invalidation_range. This will be useful when adding context-based invalidations to the GuC TLB invalidation backend. Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-2-matthew.brost@intel.com	2026-01-16 18:24:44 -08:00
Niranjana Vishwanathapura	769d7774a1	drm/xe/multi_queue: Enable multi_queue on xe3p_xpc xe3p_xpc supports multi_queue, enable it. v2: Rename multi_queue_enable_mask to multi_queue_engine_class_mask (Matt Brost) Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260116220333.861850-3-matthew.brost@intel.com	2026-01-16 18:19:09 -08:00
Matthew Brost	bbd3678730	drm/xe: Ban entire multi-queue group on any job timeout In multi-queue mode, we only have control over the entire group, so we cannot ban individual queues or signal fences until the whole group is removed from hardware. Implement banning of the entire group if any job within it times out. v2: - Fix lock inversion (Niranjana) - Initialize new queues in group to stopped v3: - Blindly call xe_exec_queue_multi_queue_primary (Niranjana) - More comments around temporary list when stopping (Niranjana) - Restart group on false timeout (Niranjana) Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260116220333.861850-2-matthew.brost@intel.com	2026-01-16 18:18:15 -08:00
Nakshtra Goyal	c51595b3d2	drm/xe/xe_query: Remove check for gt There's no need to check a userspace-provided GT ID (which may come from any tile) against the number of GTs that can be present on a single tile. The xe_device_get_gt() lookup already checks that the GT ID passed is valid for the current device.(Matt Roper) Signed-off-by: Nakshtra Goyal <nakshtra.goyal@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260113091928.67446-1-nakshtra.goyal@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-01-15 12:57:25 -08:00
Matthew Brost	e89aacd1ec	drm/xe: Reduce LRC timestamp stuck message on VFs to notice An LRC timestamp getting stuck is a somewhat normal occurrence. If a single VF submits a job that does not get timesliced, the LRC timestamp will not increment. Reduce the LRC timestamp stuck message on VFs to notice (same log level as job timeout) to avoid false CI bugs in tests where a VF submits a job that does not get timesliced. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7032 Fixes: `bb63e7257e` ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR") Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260114184905.4189026-1-matthew.brost@intel.com	2026-01-15 09:36:08 -08:00
Matt Roper	8367585154	drm/xe: Cleanup unused header includes clangd reports many "unused header" warnings throughout the Xe driver. Start working to clean this up by removing unnecessary includes in our .c files and/or replacing them with explicit includes of other headers that were previously being included indirectly. By far the most common offender here was unnecessary inclusion of xe_gt.h. That likely originates from the early days of xe.ko when xe_mmio did not exist and all register accesses, including those unrelated to GTs, were done with GT functions. There's still a lot of additional #include cleanup that can be done in the headers themselves; that will come as a followup series. v2: - Squash the 79-patch series down to a single patch. (MattB) Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260115032803.4067824-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-01-15 07:05:04 -08:00
Michal Wajdeczko	def675cf3f	drm/xe/mert: Improve handling of MERT CAT errors All MERT catastrophic errors but VF's LMTT fault are serious, so we shouldn't limit our handling only to print debug messages. Change CATERR message to error level and then declare the device as wedged to match expectation from the design document. For the LMTT faults, add a note about adding tracking of this unexpected VF activity. While at it, rename register fields defnitions to match the BSpec. Also drop trailing include guard name from the regs.h file. BSpec: 74625 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260112183716.28700-1-michal.wajdeczko@intel.com	2026-01-14 16:02:50 +01:00
Fei Yang	6b2ff1d7c5	drm/xe: vram addr range is expanded to bit[17:8] The bit field used to be [14:8] with [17:15] marked as SPARE and defaulted to 0. So, simply expand the read to bit[17:8] assuming the platforms using only bit[14:8] have zeros in the expanded bits. BSpec: 54991 Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260112220330.2267122-2-fei.yang@intel.com	2026-01-13 23:39:10 -08:00
Marco Crivellari	a3753a3319	drm/xe: Replace use of system_wq with tlb_inval->timeout_wq This patch continues the effort to refactor workqueue APIs, which has begun with the changes introducing new workqueues and a new alloc_workqueue flag: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The point of the refactoring is to eventually alter the default behavior of workqueues to become unbound by default so that their workload placement is optimized by the scheduler. Before that to happen, workqueue users must be converted to the better named new workqueues with no intended behaviour changes: system_wq -> system_percpu_wq system_unbound_wq -> system_dfl_wq This way the old obsolete workqueues (system_wq, system_unbound_wq) can be removed in the future. After a carefully evaluation, because this is the fence signaling path, we changed the code in order to use one of the Xe's workqueue. So, a new workqueue named 'timeout_wq' has been added to 'struct xe_tlb_inval' and has been initialized with 'gt->ordered_wq' changing the system_wq uses with tlb_inval->timeout_wq. Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260112094406.82641-1-marco.crivellari@suse.com	2026-01-13 23:39:09 -08:00
Karthik Poosa	49a4983384	drm/xe/hwmon: Expose individual VRAM channel temperature Expose individual VRAM temperature attributes. Update Xe hwmon documentation for this entry. v2: - Avoid using default switch case for VRAM individual temperatures. - Append labels with VRAM channel number. - Update kernel version in Xe hwmon documentation. v3: - Add missing brackets in Xe hwmon documentation from VRAM channel sysfs. - Reorder BMG_VRAM_TEMPERATURE_N macro in xe_pcode_regs.h. - Add api to check if VRAM is available on the channel. v4: - Improve VRAM label handling to eliminate temp variable by introducing a dedicated array vram_label in xe_hwmon_thermal_info. - Remove a magic number. - Change the label from vram_X to vram_ch_X. v5: - Address review comments from Raag. - Change vram to VRAM in commit title and subject. - Refactor BMG_VRAM_TEMPERATURE_N macro. - Refactor is_vram_ch_available(). - Rephrase a comment. - Check individual VRAM temperature limits in addition to VRAM availability in xe_hwmon_temp_is_visible. (Raag) - Move VRAM label change out of this patch. v6: - Use in_range() for VRAM_N index check instead of if check. (Raag) - Minor aesthetic changes. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20260112203521.1014388-5-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-01-12 17:00:29 -05:00
Karthik Poosa	8d2511686e	drm/xe/hwmon: Expose GPU PCIe temperature Expose GPU PCIe average temperature and its limits via hwmon sysfs entry temp5_xxx. Update Xe hwmon sysfs documentation for this. v2: Update kernel version in Xe hwmon documentation. (Raag) v3: - Address review comments from Raag. - Remove redundant debug log. - Update kernel version in Xe hwmon documentation. (Raag) v4: - Address review comments from Raag. - Group new temperature attributes with existing temperature attributes as per channel index in Xe hwmon documentation. - Use TEMP_MASK instead of TEMP_MASK_MAILBOX. - Add PCIE_SENSOR_MASK which uses REG_FIELD_GET as replacement of PCIE_SENSOR_SHIFT. v5: - Address review comments from Raag. - Use REG_FIELD_GET to get PCIe temperature. - Move PCIE_SENSOR_GROUP_ID and PCIE_SENSOR_MASK to xe_pcode_api.h - Cosmetic change. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20260112203521.1014388-4-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-01-12 17:00:29 -05:00
Karthik Poosa	3a0cb885e1	drm/xe/hwmon: Expose memory controller temperature Expose GPU memory controller average temperature and its limits under temp4_xxx. Update Xe hwmon documentation for this. v2: - Rephrase commit message. (Badal) - Update kernel version in Xe hwmon documentation. (Raag) v3: - Update kernel version in Xe hwmon documentation. - Address review comments from Raag. - Remove obvious comments. - Remove redundant debug logs. - Remove unnecessary checks. - Avoid magic numbers. - Add new comments. - Use temperature sensors count to make memory controller visible. - Use temperature limits of package for memory controller. v4: - Address review comments from Raag. - Group new temperature attributes with existing temperature attributes as per channel index in Xe hwmon documentation. - Use DIV_ROUND_UP to calculate dwords needed for temperature limits. - Minor aesthetic refinements. - Remove unused TEMP_MASK_MAILBOX. v5: - Use REG_FIELD_GET to get count from READ_THERMAL_DATA output. (Raag) - Change count print from decimal to hexadecimal. - Cosmetic changes. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20260112203521.1014388-3-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-01-12 17:00:29 -05:00
Karthik Poosa	c332fba805	drm/xe/hwmon: Expose temperature limits Read temperature limits using pcode mailbox and expose shutdown temperature limit as tempX_emergency, critical temperature limit as tempX_crit and GPU max temperature limit as temp2_max. Update Xe hwmon documentation with above entries. v2: - Resolve a documentation warning. - Address below review comments from Raag. - Update date and kernel version in Xe hwmon documentation. - Remove explicit disable of has_mbx_thermal_info for unsupported platforms. - Remove unnecessary default case in switches. - Remove obvious comments. - Use TEMP_LIMIT_MAX to compute number of dwords needed in xe_hwmon_thermal_info. - Remove THERMAL_LIMITS_DWORDS macro. - Use has_mbx_thermal_info for checking thermal mailbox support. v3: - Address below minor comments. (Raag) - Group new temperature attributes with existing temperature attributes as per channel index in Xe hwmon documentation. - Rename enums of xe_temp_limit to improve clarity. - Use DIV_ROUND_UP to calculate dwords needed for temperature limits. - Use return instead of breaks in xe_hwmon_temp_read. - Minor aesthetic refinements. v4: - Remove a redundant break. (Raag) - Update drm_dbg to drm_warn to inform user of unavailability for thermal mailbox on expected platforms. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20260112203521.1014388-2-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-01-12 17:00:29 -05:00
Daniele Ceraolo Spurio	b1dcec9bd8	drm/xe/ptl: Enable PXP for PTL Now that the GSC FW is defined, we can enable PXP for PTL. The feature will only be turned on if the binary is found on disk. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260108011340.2562349-8-daniele.ceraolospurio@intel.com	2026-01-12 10:07:28 -08:00
Daniele Ceraolo Spurio	6d24027d55	drm/xe/ptl: Define GSC for PTL PTL is identified by GSC major version 105. The compatibility version is still 1.0. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patch.msgid.link/20260108011340.2562349-7-daniele.ceraolospurio@intel.com	2026-01-12 10:07:27 -08:00
Daniele Ceraolo Spurio	65b65ffcf6	drm/xe/gsc: Make GSC FW load optional for newer platforms On newer platforms GSC FW is only required for content protection features, so the core driver features work perfectly fine without it (and we did in fact not enable it to start with on PTL). Therefore, we can selectively enable the GSC only if the FW is found on disk, without failing if it is not found. Note that this means that the FW can now be enabled (i.e., we're looking for it) but not available (i.e., we haven't found it), so checks on FW support should use the latter state to decide whether to go on or not. As part of the rework, the message for FW not found has been cleaned up to be more readable. While at it, drop the comment about xe_uc_fw_init() since the code has been reworked and the statement no longer applies. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patch.msgid.link/20260108011340.2562349-6-daniele.ceraolospurio@intel.com	2026-01-12 10:06:28 -08:00
Balasubramani Vivekanandan	d758c8d6e2	drm/xe/device: Convert wait for lmem init into an assert Prior to lmem init check, driver is waiting for the pcode uncore_init status. uncore_init status will be flagged after the complete boot and initialization of the SoC by the pcode. uncore_init confirms that lmem init and mmio unblock has been already completed. It makes no sense to check for lmem init after the pcode uncore_init check. So change the wait for lmem init check into an assert which confirms lmem init is set. Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251219145024.2955946-2-balasubramani.vivekanandan@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-01-12 08:56:20 -08:00
Maarten Lankhorst	987167b119	drm/xe: Privatize xe_ggtt_node Nothing requires it any more, make the member private. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-16-dev@lankhorst.se	2026-01-12 16:28:48 +01:00
Maarten Lankhorst	8d88aa149a	drm/xe: Improve xe_gt_sriov_pf_config GGTT handling Do not directly dereference xe_ggtt_node, and add a function to retrieve the allocated GGTT size. Reviewed-by: Matthew.brost@intel.com Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-15-dev@lankhorst.se	2026-01-12 16:28:48 +01:00
Maarten Lankhorst	9086170bfb	drm/xe: Do not dereference ggtt_node in xe_bo.c A careful inspection of __xe_ggtt_insert_bo_at() shows that the ggtt_node can always be seen as inserted from xe_bo.c due to the way error handling is performed. The checks are also a little bit too paranoid, since we never create a bo with ggtt_node[id] initialised but not inserted into the GGTT, which can be seen by looking at __xe_ggtt_insert_bo_at() Additionally, the size of the GGTT is never bigger than 4 GB, so adding a check at that level is incorrect. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108101014.579906-14-dev@lankhorst.se	2026-01-12 16:28:47 +01:00
Maarten Lankhorst	a7ae083691	drm/xe/display: Avoid dereferencing xe_ggtt_node Start using xe_ggtt_node_addr, and avoid comparing the base offset as vma->node is dynamically allocated. Also sneak in a xe_bo_size() for stolen, too small to put as separate commit. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260108101014.579906-13-dev@lankhorst.se	2026-01-12 16:28:34 +01:00
Maarten Lankhorst	c818b26515	drm/xe: Add xe_ggtt_node_addr() to avoid dereferencing xe_ggtt_node This function makes it possible to add an offset that is applied to all xe_ggtt_node's, and hides the internals from all its users. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108101014.579906-12-dev@lankhorst.se	2026-01-12 16:28:34 +01:00
Maarten Lankhorst	004311aa7d	drm/xe: Convert xe_fb_pin to use a callback for insertion into GGTT The rotation details belong in xe_fb_pin.c, while the operations involving GGTT belong to xe_ggtt.c. As directly locking xe_ggtt etc results in exposing all of xe_ggtt details anyway, create a special function that allocates a ggtt_node, and allow display to populate it using a callback as a compromise. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-11-dev@lankhorst.se	2026-01-12 16:28:25 +01:00
Maarten Lankhorst	22437f30d2	drm/xe: Start using ggtt->start in preparation of balloon removal Instead of having ggtt->size point to the end of ggtt, have ggtt->size be the actual size of the GGTT, and introduce ggtt->start to point to the beginning of GGTT. This will allow a massive cleanup of GGTT in case of SRIOV-VF. Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-10-dev@lankhorst.se	2026-01-12 16:27:33 +01:00
Michal Wajdeczko	7970e04d17	drm/xe/mert: Move MERT initialization to xe_mert.c Most of the MERT code is already in dedicated file, no reason to keep internal MERT data structure initialization elsewhere. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-6-michal.wajdeczko@intel.com	2026-01-12 14:38:44 +01:00
Michal Wajdeczko	401fabd6e2	drm/xe/mert: Use local mert variable to simplify the code There is no need to always refer to MERT data using tile pointer. Use of local mert pointer will simplify the code and make it look like other existing MERT function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-5-michal.wajdeczko@intel.com	2026-01-12 14:38:43 +01:00
Michal Wajdeczko	ff4eca1f46	drm/xe/mert: Always refer to MERT using xe_device There is only one MERT instance and while it is located on the root tile, it is safer to refer to it using xe_device rather than xe_tile. This will also allow to align signature with other MERT function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260111213847.27869-1-michal.wajdeczko@intel.com	2026-01-12 14:38:41 +01:00
Michal Wajdeczko	a92c68eb1e	drm/xe/mert: Fix kernel-doc for struct xe_mert Add simple top level kernel-doc for the struct itself to allow the script recognize that and fix tag of the one member. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-3-michal.wajdeczko@intel.com	2026-01-12 14:38:40 +01:00
Michal Wajdeczko	e7994954c2	drm/xe/mert: Normalize xe_mert.h include guards Most of our header files are using include guard names with single underscore and we don't use trailing comments on final #endif. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260109151219.26206-2-michal.wajdeczko@intel.com	2026-01-12 14:38:37 +01:00
Matthew Brost	bb63e7257e	drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR We now have proper infrastructure to accurately check the LRC timestamp without toggling the scheduling state for non-VFs. For VFs, it is still possible to get an inaccurate view if the context is on hardware. We guard against free-running contexts on VFs by banning jobs whose timestamps are not moving. In addition, VFs have a timeslice quantum that naturally triggers context switches when more than one VF is running, thus updating the LRC timestamp. For multi-queue, it is desirable to avoid scheduling toggling in the TDR because this scheduling state is shared among many queues. Furthermore, this change simplifies the GuC state machine. The trade-off for VF cases seems worthwhile. v5: - Add xe_lrc_timestamp helper (Umesh) v6: - Reduce number of tries on stuck timestamp (VF testing) - Convert job timestamp save to a memory copy (VF testing) v7: - Save ctx timestamp to LRC when start VF job (VF testing) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-8-matthew.brost@intel.com	2026-01-10 13:39:52 -08:00
Matthew Brost	efffd56e4b	drm/xe: Disable timestamp WA on VFs The timestamp WA does not work on a VF because it requires reading MMIO registers, which are inaccessible on a VF. This timestamp WA confuses LRC sampling on a VF during TDR, as the LRC timestamp would always read as 1 for any active context. Disable the timestamp WA on VFs to avoid this confusion. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: `617d824c53` ("drm/xe: Add WA BB to capture active context utilization") Link: https://patch.msgid.link/20260110012739.2888434-7-matthew.brost@intel.com	2026-01-10 13:39:52 -08:00
Matthew Brost	ddb5cf9b90	drm/xe: Remove special casing for LR queues in submission Now that LR jobs are tracked by the DRM scheduler, there's no longer a need to special-case LR queues. This change removes all LR queue-specific handling, including dedicated TDR logic, reference counting schemes, and other related mechanisms. v4: - Remove xe_exec_queue_lr_cleanup tracepoint (Niranjana) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-6-matthew.brost@intel.com	2026-01-10 13:39:52 -08:00
Matthew Brost	58624c195b	drm/xe: Do not deregister queues in TDR Deregistering queues in the TDR introduces unnecessary complexity, requiring reference-counting techniques to function correctly, particularly to prevent use-after-free (UAF) issues while a deregistration initiated from the TDR is in progress. All that's needed in the TDR is to kick the queue off the hardware, which is achieved by disabling scheduling. Queue deregistration should be handled in a single, well-defined point in the cleanup path, tied to the queue's reference count. v4: - Explain why extra ref were needed prior to this patch (Niranjana) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260110012739.2888434-5-matthew.brost@intel.com	2026-01-10 13:39:52 -08:00

1 2 3 4 5 ...

121033 Commits