linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 11:40:19 -04:00

Author	SHA1	Message	Date
Michal Wajdeczko	400a6da1e9	drm/xe/configfs: Enforce canonical device names While we expect config directory names to match PCI device name, currently we are only scanning provided names for domain, bus, device and function numbers, without checking their format. This would pass slightly broken entries like: /sys/kernel/config/xe/ ├── 0000:00:02.0000000000000 │ └── ... ├── 0000:00:02.0x │ └── ... ├── 0: 0: 2. 0 │ └── ... └── 0:0:2.0 └── ... To avoid such mistakes, check if the name provided exactly matches the canonical PCI device address format, which we recreated from the parsed BDF data. Also simplify scanf format as it can't really catch all formatting errors. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 05:59:07 -07:00
Michal Wajdeczko	0bdd05c2a8	drm/xe/configfs: Fix pci_dev reference leak We are using pci_get_domain_bus_and_slot() function to verify if the given config directory name matches any existing PCI device, but we missed to call matching pci_dev_put() to release reference. While around, also change error code in case of no device match, to make it more specific than generic formatting error. Fixes: `16280ded45` ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 05:59:07 -07:00
Shuicheng Lin	f98de826b4	drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() Memory allocated with drmm_kzalloc() should not be freed using kfree(), as it is managed by the DRM subsystem. The memory will be automatically freed when the associated drm_device is released. These 3 group pointers are allocated using drmm_kzalloc() in hw_engine_group_alloc(), so they don't require manual deallocation. Fixes: `6797906074` ("drm/xe/hw_engine_group: Fix potential leak") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250724193854.1124510-2-shuicheng.lin@intel.com	2025-07-25 10:01:02 +02:00
Matthew Brost	51330ba66c	drm/xe: Remove unused GT TLB invalidation trace points Remove unused GT TLB invalidation trace points after converting to used GT TLB invalidation jobs. The trace points removed were used during early bring up of unstable driver, with a stable driver no need to replace with new tracepoints. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-8-matthew.brost@intel.com	2025-07-24 18:27:47 -07:00
Matthew Brost	b8d5779eee	drm/xe: Use GT TLB invalidation jobs in PT layer Rather than open-coding GT TLB invalidations in the PT layer, use GT TLB invalidation jobs. The real benefit is that GT TLB invalidation jobs use a single dma-fence context, allowing the generated fences to be squashed in dma-resv/DRM scheduler. v2: - s/;;/; (checkpatch) - Move ijob/mjob job push after range fence install v3: - Remove extra newline (Stuart) - Set ijob/mjob near creation (Stuart) - Add comment back in (Stuart) Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-7-matthew.brost@intel.com	2025-07-24 18:27:47 -07:00
Matthew Brost	dba89840a9	drm/xe: Add GT TLB invalidation jobs Add GT TLB invalidation jobs which issue GT TLB invalidations. Built on top of Xe generic dependency scheduler. v2: - Fix checkpatch v3: - Fix kernel doc in xe_gt_tlb_inval_job_alloc_dep, xe_gt_tlb_inval_job_push - Use IS_ERR_OR_NULL in xe_gt_tlb_inval_job_put - Squash migrate lock / unlock helpers into this patch (Stuart) Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-6-matthew.brost@intel.com	2025-07-24 18:27:22 -07:00
Matthew Brost	535c445eb9	drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues Add a generic dependency scheduler for GT TLB invalidations, used to schedule jobs that issue GT TLB invalidations to bind queues. v2: - Use shared GT TLB invalidation queue for dep scheduler - Break allocation of dep scheduler into its own function - Add define for max number tlb invalidations - Skip media if not present Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-5-matthew.brost@intel.com	2025-07-24 18:25:59 -07:00
Matthew Brost	ada5121948	drm/xe: Create ordered workqueue for GT TLB invalidation jobs No sense to schedule GT TLB invalidation jobs in parallel which target the same GT given these all contend on the same lock, create ordered workqueue for GT TLB invalidation jobs. v3: - Fix type in commmit message (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-4-matthew.brost@intel.com	2025-07-24 18:25:58 -07:00
Matthew Brost	69f187d446	drm/xe: Add generic dependecy jobs / scheduler Add generic dependecy jobs / scheduler which serves as wrapper for DRM scheduler. Useful when we want delay a generic operation until a dma-fence signals. Existing use cases could be destroying of resources based fences / dma-resv, the preempt rebind worker, and pipelined GT TLB invalidations. Written in such a way it could be moved to DRM subsystem if needed. v3: - Remove unnecessary cast (Staurt) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-3-matthew.brost@intel.com	2025-07-24 18:25:56 -07:00
Matthew Brost	c3ead4ecfc	drm/xe: Explicitly mark migration queues with flag Rather than inferring if an exec queue is a migration queue for a flag, explicitly mark migration queues with a flag. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-2-matthew.brost@intel.com	2025-07-24 18:25:55 -07:00
Sk Anirban	d72779c29d	drm/xe/ptl: Apply Wa_16026007364 As part of this WA GuC will save and restore value of two XE3_Media control registers that were not included in the HW power context. Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250716101622.3421480-2-sk.anirban@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 14:01:27 -07:00
Tvrtko Ursulin	a98cdd979c	drm/xe: Use emit_flush_imm_ggtt helper instead of open coding Helper is already there so lets just use it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250724131711.74291-2-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 07:45:40 -07:00
Nitin Gote	a313d9059f	drm/xe: Rename MCFG_MCR_SELECTOR to STEER_SEMAPHORE The register at offset 0xfd0 was incorrectly named MCFG_MCR_SELECTOR, likely copied from i915. According to the hardware specification (Bspec), this register is actually called STEER_SEMAPHORE. Rename the register definition and update its usage in xe_gt_mcr.c to match the official hardware documentation. No functional changes. v2: Add Bspec reference (Tejas) Bspec: 67113 Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250723141039.3848390-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 07:37:45 -07:00
Michal Wajdeczko	159afd92ba	drm/xe/guc: Clear whole g2h_fence during initialization The struct g2h_fence must be explicitly initializated using the g2h_fence_init() function to avoid trash values in its members, but we missed to update this helper function with the new member. To fix that and avoid any future mistakes, memset the whole struct first, then update remaining non-zero members. Fixes: `94de94d24e` ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723175639.206875-1-michal.wajdeczko@intel.com	2025-07-24 14:48:42 +02:00
Michal Wajdeczko	538b27a09a	drm/xe: Make GGTT TLB invalidation failure message GT oriented GGTT TLB invalidation is performed on the specific GT, thus any failure message shall be also GT specific. And to help investigate any unexpected failures, promote message from warn level to WARN to get full call stack of this unlikely case. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723133015.206601-1-michal.wajdeczko@intel.com	2025-07-24 14:44:31 +02:00
Michal Wajdeczko	6983ea9cd7	drm/xe: Enable SR-IOV for TGL While we don't have official CI SR-IOV coverage for the Tigerlake platforms, we were using this platform for the feature enabling and Xe driver already has all required changes to support it. Since TGL platforms are guarded by the xe.require_force_probe flag enable SR-IOV feature on them, like we recently did for ADL/ATSM. Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-5-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-23 11:49:49 -07:00
Michal Wajdeczko	2e76103998	drm/xe: Enable SR-IOV for ADL/ATSM We were already testing those two platforms for a while on our CI, but enabling flag (has_sriov) was only available on the topic branch and only for builds with CONFIG_DRM_XE_DEBUG config. Since those two platforms are guarded by the another enabling flag (require_force_probe) and we believe our SR-IOV support for them is at sufficient level to start enjoying the feature, turn on the SR-IOV enabling flag unconditionally. Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-4-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-23 11:49:48 -07:00
Michal Wajdeczko	a2b461bd6f	drm/xe/pf: Enable SR-IOV PF mode by default We already claim official support for SR-IOV PF/VF modes on PTL and BMG platforms, but by default we start the Xe driver on those platforms in non-virtualized mode (native) since we still have max_vfs modparam set to disable creation of the VFs. It's time to let the Xe driver support SR-IOV PF mode by default. We were already testing this on our CI, which was relying on the patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-23 11:49:48 -07:00
Lucas De Marchi	4d3bbe9dd2	drm/xe: Fix build without debugfs When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o is not built and build fails on some setups with: ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset': drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure' ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure' collect2: error: ld returned 1 exit status Do not use the gt_reset_failure attribute if debugfs is not enabled. Fixes: `8f3013e0b2` ("drm/xe: Introduce fault injection for gt reset") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-23 08:45:40 -07:00
Satyanarayana K V P	916ee4704a	drm/xe/vf: Register CCS read/write contexts with Guc Register read write contexts with newly added flags with GUC and enable the context immediately after registration. Re-register the context with Guc when resuming from runtime suspend as soft reset is applied to Guc during xe_pm_runtime_resume(). Make Ring head=tail while unbinding device to avoid issues with VF pause after device is unbinded. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250722120506.6483-4-satyanarayana.k.v.p@intel.com	2025-07-23 07:22:34 -07:00
Satyanarayana K V P	864690cf4d	drm/xe/vf: Attach and detach CCS copy commands with BO Attach CCS read/write copy commands to BO for old and new mem types as NULL -> tt or system -> tt. Detach the CCS read/write copy commands from BO while deleting ttm bo from xe_ttm_bo_delete_mem_notify(). Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250722120506.6483-3-satyanarayana.k.v.p@intel.com	2025-07-23 07:22:31 -07:00
Satyanarayana K V P	f3009272ff	drm/xe/vf: Create contexts for CCS read write Create two LRCs to handle CCS meta data read / write from CCS pool in the VM. Read context is used to hold GPU instructions to be executed at save time and write context is used to hold GPU instructions to be executed at the restore time. Allocate batch buffer pool using suballocator for both read and write contexts. Migration framework is reused to create LRCAs for read and write. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250722120506.6483-2-satyanarayana.k.v.p@intel.com	2025-07-23 07:22:28 -07:00
Lukasz Laguna	9a220e0659	drm/xe/vf: Don't register I2C devices if VF VF drivers can't access I2C devices, so skip their registration when running as VF. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Fixes: `f0e53aadd7` ("drm/xe: Support for I2C attached MCUs") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250717155420.25298-1-lukasz.laguna@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-23 09:10:53 -04:00
Zhanjun Dong	176f44a5ec	drm/xe/uc: Fix missing unwind goto Fix missing unwind goto on error handling. Fixes: `b2c4ac219f` ("drm/xe/uc: Disable GuC communication on hardware initialization error") Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250721214520.954014-1-zhanjun.dong@intel.com	2025-07-22 10:46:52 -07:00
Dan Carpenter	6c9e64e83b	drm/xe: Fix an IS_ERR() vs NULL bug in xe_tile_alloc_vram() The xe_vram_region_alloc() function returns NULL on error. It never returns error pointers. Update the error checking to match. Fixes: `4b0a5f5ce7` ("drm/xe: Unify the initialization of VRAM regions") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/5449065e-9758-4711-b706-78771c0753c4@sabinyo.mountain Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-21 10:38:05 -04:00
Harish Chegondi	4b5514f786	drm/xe: Remove unnecessary EU stall debug message The EU stall debug message may cause CI to complain on unsupported platforms. Remove it. Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/dfb6a080b3442d481c567489aabe47e72f3e784c.1752870172.git.harish.chegondi@intel.com	2025-07-18 15:05:32 -07:00
Soham Purkait	487579fd85	drm/xe/xe_debugfs: Exposure of G-State and pcie link state residency counters through debugfs Add debug nodes, "dgfx_pkg_residencies" for G-states (G2, G6, G8, G10, ModS) and "dgfx_pcie_link_residencies" for PCIe link states(L0, L1, L1.2) residency counters. v1: - Expose all G-State residency counter values under dgfx_pkg_residencies. (Anshuman) - Include runtime_get/put. (Riana) v2: - Move offset macros to drm/xe/regs/xe_pmt. (Riana) v3: - Include debugfs node "dgfx_pcie_link_residencies" for pcie link residency counter values. (Anshuman) v4: - Include check for BMG and add helper function for repetitive code. (Riana) - Add for loop and local struct to avoid repetition. (Riana) - Use "drm_debugfs_create_files" to create debugfs. (Karthik) v5: - Reorder commits to reflect the correct dependency hierarchy. (Jonathan) - Simplification of commit message and rectified register offset.(Karthik) - Error handling and return before printing. (Riana) v6: - Remove check for DGFX as BMG is discrete. (Karthik) - Rearrange residency offsets in ascending order. (Riana) v7: - Squash the macros into the patch they are used in. (Lucas) Signed-off-by: Soham Purkait <soham.purkait@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250716101412.3062780-2-soham.purkait@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-17 12:33:00 -04:00
Piotr Piórkowski	4b0a5f5ce7	drm/xe: Unify the initialization of VRAM regions Currently in the drivers we have defined VRAM regions per device and per tile. Initialization of these regions is done in two completely different ways. To simplify the logic of the code and make it easier to add new regions in the future, let's unify the way we initialize VRAM regions. v2: - fix doc comments in struct xe_vram_region - remove unnecessary includes (Jani) v3: - move code from xe_vram_init_regions_managers to xe_tile_init_noalloc (Matthew) - replace ioremap_wc to devm_ioremap_wc for mapping VRAM BAR (Matthew) - Replace the tile id parameter with vram region in the xe_pf_begin function. v4: - remove tile back pointer from struct xe_vram_region - add new back pointers: xe and migarte to xe_vram_region Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-6-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-16 12:15:00 -07:00
Piotr Piórkowski	d65ff1ec85	drm/xe: Split xe_migrate allocation from initialization Currently, xe_migrate_init handled both allocation and initialization, Lets moves allocation to xe_tile_alloc via a new xe_migrate_alloc function, and keep initialization in xe_migrate_init. This will allow the migration pointers to be passed to other structures before full initialization. Also replaces devm_kzalloc with drmm_kzalloc for better DRM-managed memory. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-5-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-16 12:12:49 -07:00
Piotr Piórkowski	7a20b4f558	drm/xe: Move struct xe_vram_region to a dedicated header Let's move the xe_vram_region structure to a new header dedicated to VRAM to improve modularity and avoid unnecessary dependencies when only VRAM-related structures are needed. v2: Fix build if CONFIG_DRM_XE_DEVMEM_MIRROR is enabled v3: Fix build if CONFIG_DRM_XE_DISPLAY is enabled v4: Move helper to get tile dpagemap to xe_svm.c Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-4-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-16 12:12:36 -07:00
Piotr Piórkowski	f92cfd72d9	drm/xe: Use dynamic allocation for tile and device VRAM region structures In future platforms, we will need to represent the device and tile VRAM regions in a more dynamic way, so let's abandon the static allocation of these structures and start use a dynamic allocation. v2: - Add a helpers for accessing fields of the xe_vram_region structure v3: - Add missing EXPORT_SYMBOL_IF_KUNIT for xe_vram_region_actual_physical_size Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-3-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-16 12:08:31 -07:00
Piotr Piórkowski	922ae87523	drm/xe: Use devm_ioremap_wc for VRAM mapping and drop manual unmap Let's replace the manual call to ioremap_wc function with devm_ioremap_wc function, ensuring that VRAM mappings are automatically released when the driver is detached. Since devm_ioremap_wc registers the mapping with the device's managed resources, the explicit iounmap call in vram_fini is no longer needed, so let's remove it. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-2-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-16 12:06:25 -07:00
Michal Wajdeczko	bf81505f7d	drm/xe: Move debugfs GT attributes under tile directory While in sysfs we are correctly trying to reflect the hardware architecture and we expose GT attributes in per tile hierarchy, in debugfs we expose GT attributes at flat level, without tiles. Create debugfs directories to represent each tile and move GT attributes under matching parent tile directory. To not break existing debugfs tools, create symlink under old location: /sys/kernel/debug/dri/0000:00:02.0/ ├── ... ├── gt0 -> tile0/gt0 ├── gt1 -> tile0/gt1 ├── tile0 │ ├── gt0 │ │ ├── ... │ ├── gt1 │ │ ├── ... Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250714193645.763-1-michal.wajdeczko@intel.com	2025-07-16 17:24:00 +02:00
Dan Carpenter	2f264d58cc	drm/xe: Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter() The fwnode_create_software_node() function returns error pointers. It never returns NULL. Update the checks to match. Fixes: `f0e53aadd7` ("drm/xe: Support for I2C attached MCUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/65825d00-81ab-4665-af51-4fff6786a250@sabinyo.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-16 10:51:36 -04:00
Ashutosh Dixit	308dc9b278	drm/xe/oa: Fix static checker warning about null gt There is a static checker warning that gt returned by xe_device_get_gt can be NULL and that is being dereferenced. Use xe_root_mmio_gt instead, which is equivalent and cannot return a NULL gt 0. Fixes: `10d42ef34b` ("drm/xe/oa: Assign hwe for OAM_SAG") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250715181422.2807624-1-ashutosh.dixit@intel.com	2025-07-15 22:36:16 -07:00
Raag Jadav	ed5461daa1	drm/xe: Don't fail probe on unsupported mailbox command If the device is running older pcode firmware, it is possible that newer mailbox commands are not supported by it. The sysfs attributes aren't useful in that case, but we shouldn't fail driver probe because of it. As of now, it is unknown if we can distinguish unsupported commands before attempting them. But until we figure out a way to do that, fix the regressions. v2: Add debug message (Lucas) Fixes: `cdc36b66cd` ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Tested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250714215503.2897748-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-15 15:26:01 -04:00
Michal Wajdeczko	a816487681	drm/xe/pf: Invalidate LMTT after completing changes Once we finish populating all leaf pages in the VF's LMTT we should make sure that hardware will not access any stale data. Explicitly force LMTT invalidation (as it was already planned in the past). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-7-michal.wajdeczko@intel.com	2025-07-15 13:05:22 +02:00
Michal Wajdeczko	e497957fee	drm/xe/pf: Invalidate LMTT during LMEM unprovisioning Invalidate LMTT immediately after removing VF's LMTT page tables and clearing root PTE in the LMTT PD to avoid any invalid access by the hardware (and VF) due to stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-6-michal.wajdeczko@intel.com	2025-07-15 13:05:20 +02:00
Michal Wajdeczko	68ae022278	drm/xe/pf: Force GuC virtualization mode By default the GuC starts in the 'native' mode and enables the VGT mode (aka 'virtualization' mode) only after it receives at least one set of VF configuration data. While this happens naturally while PF begins VFs provisioning, we might need this sooner as some actions, like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in the VGT mode. And this becomes a real problem if we would want to use above action to invalidate the LMTT early during VFs auto-provisioning, before VFs are enabled, as such H2G would be rejected: [ ] xe 0000:4d:00.0: [drm] ERROR GT0: FAST_REQ H2G fence 0x804e failed! e=0x30, h=0 [ ] xe 0000:4d:00.0: [drm] ERROR GT0: Fence 0x804e was used by action 0x7002 sent at: h2g_write+0x33e/0x870 [xe] __guc_ct_send_locked+0x1e1/0x1110 [xe] guc_ct_send_locked+0x9f/0x740 [xe] xe_guc_ct_send_locked+0x19/0x60 [xe] send_tlb_invalidation+0xc2/0x470 [xe] xe_gt_tlb_invalidation_all_async+0x45/0xa0 [xe] xe_gt_tlb_invalidation_all+0x4b/0xa0 [xe] lmtt_invalidate_hw+0x64/0x1a0 [xe] xe_lmtt_invalidate_hw+0x5c/0x340 [xe] pf_update_vf_lmtt+0x398/0xae0 [xe] pf_provision_vf_lmem+0x350/0xa60 [xe] xe_gt_sriov_pf_config_bulk_set_lmem+0xe2/0x410 [xe] xe_gt_sriov_pf_config_set_fair_lmem+0x1c6/0x620 [xe] xe_gt_sriov_pf_config_set_fair+0xd5/0x3f0 [xe] xe_pci_sriov_configure+0x360/0x1200 [xe] sriov_numvfs_store+0xbc/0x1d0 dev_attr_store+0x17/0x40 sysfs_kf_write+0x4a/0x80 kernfs_fop_write_iter+0x166/0x220 vfs_write+0x2ba/0x580 ksys_write+0x77/0x100 __x64_sys_write+0x19/0x30 x64_sys_call+0x2bf/0x2660 do_syscall_64+0x93/0x7a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [ ] xe 0000:4d:00.0: [drm] ERROR GT0: CT dequeue failed: -71 [ ] xe 0000:4d:00.0: [drm] GT0: trying reset from receive_g2h [xe] This could be mitigated by pushing earlier a PF self-configuration with some hard-coded values that cover unlimited access to the GGTT, use of all GuC contexts and doorbells. This step is sufficient for the GuC to switch into the VGT mode. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-5-michal.wajdeczko@intel.com	2025-07-15 13:05:19 +02:00
Michal Wajdeczko	92ba2032a1	drm/xe/pf: Move GGTT config KLVs encoding to helper In upcoming patch we will want to encode GGTT config KLVs based on raw numbers, without relying on the allocated GGTT node. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-4-michal.wajdeczko@intel.com	2025-07-15 13:05:18 +02:00
Michal Wajdeczko	1c38dd6afa	drm/xe/pf: Resend PF provisioning after GT reset If we reload the GuC due to suspend/resume or GT reset then we have to resend not only any VFs provisioning data, but also PF configuration, like scheduling parameters (EQ, PT), as otherwise GuC will continue to use default values. Fixes: `411220808c` ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com	2025-07-15 13:05:17 +02:00
Michal Wajdeczko	9f50b729dd	drm/xe/pf: Prepare to stop SR-IOV support prior GT reset As part of the resume or GT reset, the PF driver schedules work which is then used to complete restarting of the SR-IOV support, including resending to the GuC configurations of provisioned VFs. However, in case of short delay between those two actions, which could be seen by triggering a GT reset on the suspened device: $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset this PF worker might be still busy, which lead to errors due to just stopped or disabled GuC CTB communication: [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe] [ ] xe 0000:00:02.0: [drm] GT0: reset queued [ ] xe 0000:00:02.0: [drm] GT0: reset started [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled! [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED) [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed While this VFs reprovisioning will be successful during next spin of the worker, to avoid those errors, make sure to cancel restart worker if we are about to trigger next reset. Fixes: `411220808c` ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com	2025-07-15 13:05:15 +02:00
Lucas De Marchi	f4d51b6ce5	drm/xe/lrc: Add table with LRC layout Add a table to document the LRC's BO layout to make it easier to visualize how each region stacks on top of each other. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-4-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Tvrtko Ursulin	aded26ccaa	drm/xe: Waste fewer instructions in emit_wa_job() I was debugging some unrelated issue and noticed the current code was very verbose. We can improve it easily by using the more common batch buffer building pattern. Before: bb->cs[bb->len++] = MI_LOAD_REGISTER_REG \| MI_LRR_DST_CS_MMIO; c4d: 41 8b 56 10 mov 0x10(%r14),%edx c51: 49 8b 4e 08 mov 0x8(%r14),%rcx c55: 8d 72 01 lea 0x1(%rdx),%esi c58: 41 89 76 10 mov %esi,0x10(%r14) c5c: c7 04 91 01 00 08 15 movl $0x15080001,(%rcx,%rdx,4) bb->cs[bb->len++] = entry->reg.addr; c63: 8b 08 mov (%rax),%ecx c65: 41 8b 56 10 mov 0x10(%r14),%edx c69: 49 8b 76 08 mov 0x8(%r14),%rsi c6d: 81 e1 ff ff 3f 00 and $0x3fffff,%ecx c73: 8d 7a 01 lea 0x1(%rdx),%edi c76: 41 89 7e 10 mov %edi,0x10(%r14) c7a: 89 0c 96 mov %ecx,(%rsi,%rdx,4) ..etc.. After: cs++ = MI_LOAD_REGISTER_REG \| MI_LRR_DST_CS_MMIO; c52: 41 c7 04 24 01 00 08 movl $0x15080001,(%r12) c59: 15 cs++ = entry->reg.addr; c5a: 8b 10 mov (%rax),%edx ..etc.. Resulting in the following binary change: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-348 (-348) Function old new delta xe_gt_record_default_lrcs.cold 304 296 -8 xe_gt_record_default_lrcs 2200 1860 -340 Total: Before=13554, After=13206, chg -2.57% Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-7-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Lucas De Marchi	f4b538245f	drm/xe/gt: Drop third submission for default context There's no need to submit the nop job again on the first queue. Any state needed is already saved when the first LRC is switched out. The comment is a little misleading regarding indirect W/A: first of all there's still no indirect W/A enabled and secondly, even after they are, there's no need to submit this job again for having their state propagated: the indirect W/A will actually run on every LRC switch. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-6-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Lucas De Marchi	6d891d22c6	drm/xe/lrc: Remove leftover TODO/FIXME There isn't anything to set for CTX_TIMESTAMP handling in the empty LRC: that is set on every LRC init since it should always start from 0 rather than the value saved in the image after first submission. The FIXME about perma-pinning also doesn't make much sense as we will always going to pin the lrc and the GGTT mapping has nothing to do with VM bind. Nuke these leftover comments. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-5-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Lucas De Marchi	fab2cc0c09	drm/xe/gt: Extract emit_job_sync() Both the nop and wa jobs are going through the same boiler plate calls to emit the job with a timeout and handling error for both bb and job. Extract emit_job_sync() so those functions create the bb, handling possible errors and delegate the part about really emitting the job and waiting for its completion. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-3-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:20:03 -07:00
Lucas De Marchi	e4cb5823ba	drm/xe: Count dwords before allocating The bb allocation in emit_wa_job() is wrong in 2 ways: first it's allocating enough space for the 3DSTATE or hardcoding 4k depending on the engine. In the first case it doesn't account for the WAs and in the former it may not be sufficient. Secondly it's using the size instead of number of dwords, causing the buffer to be 4x bigger than needed: xe_bb_new() receives number of dwords as parameter and its declaration was also not following its implementation. Lastly, reword the debug message since it's not only about the LRC WAs anymore as it also include the 3DSTATE for render. While it's unlikely this is causing any real issue, let's calculate the needed space and allocate just enough. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-2-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:20:02 -07:00
Lucas De Marchi	76650bcf2a	drm/xe/lrc: Reduce scope of empty lrc data The only case in which new lrc data is created from scratch is when it's called prior to recording the default lrc. There's no need to check for NULL init_data since in that case the function already failed: just move the allocation where it's needed. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-1-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:20:02 -07:00
Michal Wajdeczko	b533b8e5a1	drm/xe/vf: Store negotiated VF/PF ABI version at device level There is no need to maintain PF ABI version on per-GT level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-8-michal.wajdeczko@intel.com	2025-07-14 18:19:39 +02:00

1 2 3 4 5 ...

1368304 Commits