linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-17 06:40:11 -05:00

Author	SHA1	Message	Date
Michal Wajdeczko	94de94d24e	drm/xe/guc: Cancel ongoing H2G requests when stopping CT Once we have started a GT reset sequence, which includes stopping GuC CTB communication, we should also cancel all ongoing H2G send- recv requests, as either GuC is already dead, or due to imminent reset GuC will not be able to reply, or due to internal cleanup we will lose pending fences. With this we will report dedicated -ECANCELED error instead of misleading -ETIME. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-4-michal.wajdeczko@intel.com	2025-07-10 21:46:29 +02:00
Michal Wajdeczko	4ecdcf9caf	drm/xe/guc: Move state change logger to helper In the state change helper we are already doing extra stuff, move debug state logger there to cover all state changes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-3-michal.wajdeczko@intel.com	2025-07-10 21:39:25 +02:00
Michal Wajdeczko	1b822b7f56	drm/xe/guc: Rename CT state change helper In this helper we are already doing much more than just setting a new CT state and its name was little misleading. Rename it. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250709174038.1876-2-michal.wajdeczko@intel.com	2025-07-10 21:38:43 +02:00
Shuicheng Lin	0efec05001	drm/xe/pm: Correct comment of xe_pm_set_vram_threshold() The parameter threshold is with size in MiB, not in bits. Correct it to avoid any confusion. v2: s/mb/MiB, s/vram/VRAM, fix return section. (Michal) Fixes: `30c399529f` ("drm/xe: Document Xe PM component") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250708021450.3602087-2-shuicheng.lin@intel.com Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-10 14:51:04 -04:00
Michal Wajdeczko	1d2e2503e5	drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VF These workarounds are not applicable for use by the VFs. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Link: https://lore.kernel.org/r/20250710103040.375610-2-jakub1.kolakowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-10 09:07:23 -07:00
Juston Li	ce3d39fae3	drm/xe/bo: add GPU memory trace points Add TRACE_GPU_MEM tracepoints for tracking global GPU memory usage. These are required by VSR on Android 12+ for reporting GPU driver memory allocations. v5: - Drop process_mem tracking - Set the gpu_id field to dev->primary->index (Lucas, Tvrtko) - Formatting cleanup under 80 columns v3: - Use now configurable CONFIG_TRACE_GPU_MEM instead of adding a per-driver Kconfig (Lucas) v2: - Use u64 as preferred by checkpatch (Tvrtko) - Fix errors in comments/Kconfig description (Tvrtko) - drop redundant "CONFIG" in Kconfig Signed-off-by: Juston Li <justonli@chromium.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250709192313.479336-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-10 07:55:21 -07:00
Riana Tauro	f5c5d29522	drm/xe/xe_i2c: Add support for i2c in survivability mode Initialize i2c in survivability mode to allow firmware update of Add-In Management Controller (AMC) in survivability mode. Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-6-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-10 10:19:41 -04:00
Raag Jadav	0ea07b6951	drm/xe/pm: Wire up suspend/resume for I2C controller Wire up suspend/resume handles for I2C controller to match its power state with SGUnit. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-5-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-10 10:19:41 -04:00
Heikki Krogerus	f0e53aadd7	drm/xe: Support for I2C attached MCUs Adding adaption/glue layer where the I2C host adapter (Synopsys DesignWare I2C adapter) and the I2C clients (the microcontroller units) are enumerated. The microcontroller units (MCU) that are attached to the GPU depend on the OEM. The initially supported MCU will be the Add-In Management Controller (AMC). Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo fixed the co-developed tags and SPDX format in the .c file]	2025-07-10 10:19:41 -04:00
Heikki Krogerus	f6a8e9f3de	i2c: designware: Add quirk for Intel Xe The regmap is coming from the parent also in case of Xe GPUs. Reusing the Wangxun quirk for that. Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20250701122252.2590230-3-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo fixed the co-developed tags while merging]	2025-07-10 10:19:24 -04:00
Heikki Krogerus	22290cc904	i2c: designware: Use polling by default when there is no irq resource The irq resource itself can be used as a generic way to determine when polling is needed. This not only removes the need for special additional device properties that would soon be needed when the platform may or may not have the irq, but it also removes the need to check the platform in the first place in order to determine is polling needed or not. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20250701122252.2590230-2-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-10 10:18:47 -04:00
Michal Wajdeczko	621a422079	drm/xe/guc: Don't allocate temporary policies object Since we are already using reusable buffer objects from the GuC buffer cache, we can directly write into their CPU pointers and spare unnecessary temporary allocation. While around, also make sure to clear obtained buffer, to avoid sending some stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com	2025-07-10 16:00:13 +02:00
Michal Wajdeczko	1fbe023d30	drm/xe/pf: Print configuration KLVs using debug printer While we print VF's configuration KLVs only under DEBUG_SRIOV config, we should be doing it at debug level, not info level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250703145709.1832-1-michal.wajdeczko@intel.com	2025-07-10 14:26:13 +02:00
Michal Wajdeczko	89cd027c94	drm/xe/pf: Print runtime registers using debug printer While we already print VF's runtime registers only under DEBUG_SRIOV config, we should be still doing it at debug level, not info. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250604190021.725-2-michal.wajdeczko@intel.com	2025-07-10 14:26:11 +02:00
Raag Jadav	cdc36b66cd	drm/xe: Expose fan control and voltage regulator version Add sysfs attributes for late binding features which expose bound version to the user. v2: Rework attribute and macro naming (Badal) v3: Drop fancy formatting (Rodrigo) v4: Form version string using local variables (Rodrigo) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250709164224.2676086-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-09 18:25:22 -04:00
Shuicheng Lin	017ef1228d	drm/xe: Release runtime pm for error path of xe_devcoredump_read() xe_pm_runtime_put() is missed to be called for the error path in xe_devcoredump_read(). Add function description comments for xe_devcoredump_read() to help understand it. v2: more detail function comments and refine goto logic (Matt) Fixes: `c4a2e5f865` ("drm/xe: Add devcoredump chunking") Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com	2025-07-08 15:11:37 -07:00
Shuicheng Lin	8ce560d8e1	drm/xe: Remove unused code in devcoredump_snapshot() The deleted code is no longer needed because patch "drm/xe/guc: Plumb GuC-capture into dev coredump" has removed the related usage code. Remove the code to tidy up the function. v2: s/bacause/because Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707004911.3502904-5-shuicheng.lin@intel.com	2025-07-08 15:11:36 -07:00
Zhanjun Dong	b2c4ac219f	drm/xe/uc: Disable GuC communication on hardware initialization error Disable GuC communication on Xe micro controller hardware initialization error. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917 Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707231108.3217573-1-zhanjun.dong@intel.com	2025-07-08 14:56:49 -07:00
Shuicheng Lin	83dcee1785	drm/xe/pm: Restore display pm if there is error after display suspend xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there is error with xe_bo_evict_all(), display pm should be restored. Fixes: `51462211f4` ("drm/xe/pxp: add PXP PM support") Fixes: `cb8f81c175` ("drm/xe/display: Make display suspend/resume work on discrete") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-08 16:05:07 -04:00
Matt Atwood	94de1dfd47	drm/xe/ptl: Drop force_probe requirement Panther Lake has proven to be stable through testing and use. Remove the force_probe requirement and enable the platform by default. Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-08 09:55:35 -04:00
Daniele Ceraolo Spurio	4c93e2c341	drm/xe/ptl: Add HuC FW definition for PTL Add the unversioned define for the PTL HuC FW. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-15-daniele.ceraolospurio@intel.com	2025-07-07 10:37:12 -07:00
Daniele Ceraolo Spurio	5cdb71d3b0	drm/xe/ptl: Add GuC FW definition for PTL The first official GuC relase for PTL is 70.47.0, which maps to API version 1.22.4. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-14-daniele.ceraolospurio@intel.com	2025-07-07 10:37:12 -07:00
Julia Filipchuk	0b64addcae	drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2 UAPI compatibility version 1.22.2 Resolves various bugs. Recommend newer version. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-13-daniele.ceraolospurio@intel.com	2025-07-07 10:37:01 -07:00
Vodapalli, Ravi Kumar	ccfb15b815	drm/xe/bmg: Add one additional PCI ID One additional PCI ID is added in Bspec for BMG, Add it so that driver recognizes this device with this new ID. Bspec: 68090 Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Vodapalli, Ravi Kumar <ravi.kumar.vodapalli@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250704103527.100178-1-ravi.kumar.vodapalli@intel.com	2025-07-04 14:55:51 +01:00
Matthew Auld	f7a2fd776e	drm/xe/bmg: fix compressed VRAM handling There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user. To fix this pass along use_comp_pat into emit_pte() on the src side, to indicate that compression should be considered. v2 (Jonathan): tweak the commit message Fixes: `523f191cc0` ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com	2025-07-04 12:31:06 +01:00
Matthew Brost	03d85ab36b	Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2" This reverts commit `fe0154cf82`. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit `3a1edef8f4` ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: `fe0154cf82` ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-03 12:23:45 -07:00
Tomasz Lis	7eba6a80fe	drm/xe/vf: Make multi-GT migration less error prone There is a remote chance that after migration, some GTs will not send the MIGRATED interrupt, or due to current VF KMD state the interrupt will not lead to marking the GT for recovery. Requiring IRQs from all GTs before starting migration introduces the possibility that the process will get stalled due to one GuC. One could argue it is also waste of time to wait for all IRQs, but we should get them all IRQs as soon as VGPU starts, so that's not really an impactful argument. Still, not waiting for all GTs makes it easier to handle situations: * where one GuC IRQ is missing * where state before probe is unclean - getting MIGRATED IRQ as soon as interrupts are enabled * where multiple migrations happen close to each other To help with these cases, this patch alters the post-migration recovery so that recovery task is started as soon as one GuC IRQ is handled, and other GTs are included in recovery later as the subsequent IRQs are serviced. The post-migration recovery can now be called for any selection of GTs, and it will perform recovery on all GTs for which IRQs have arrived, even multiple times if necessary. v2: Typos and style fixes v3: Transferring gt_flags by value rather than reference to last function where it is used Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250630152155.195648-1-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-07-03 21:05:21 +02:00
Matthew Brost	491b978312	drm/xe: Allocate PF queue size on pow2 boundary CIRC_SPACE does not work unless the size argument is a power of 2, allocate PF queue size on power of 2 boundary. Cc: stable@vger.kernel.org Fixes: `3338e4f90c` ("drm/xe: Use topology to determine page fault queue size") Fixes: `29582e0ea7` ("drm/xe: Add page queue multiplier") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com	2025-07-03 10:19:35 -07:00
Michal Wajdeczko	3fae6918a3	drm/xe/pf: Clear all LMTT pages on alloc Our LMEM buffer objects are not cleared by default on alloc and during VF provisioning we only setup LMTT PTEs for the actually provisioned LMEM range. But beyond that valid range we might leave some stale data that could either point to some other VFs allocations or even to the PF pages. Explicitly clear all new LMTT page to avoid the risk that a malicious VF would try to exploit that gap. While around add asserts to catch any undesired PTE overwrites and low-level debug traces to track LMTT PT life-cycle. Fixes: `b1d2040582` ("drm/xe/pf: Introduce Local Memory Translation Table") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com	2025-07-03 16:38:45 +02:00
Riana Tauro	b9329f5167	drm/xe/xe_pmu: Validate gt in event supported Validate gt instead of checking gt_id is lesser than max gts per tile Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250630093741.2435281-1-riana.tauro@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:09:11 -07:00
Matt Roper	d4eb4a0102	drm/xe/xe_query: Use separate iterator while filling GT list The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs being iterated over, and may skip over values if a GT is not present on the device. Use a separate iterator for GT list array assignments to ensure that the array will be filled properly on future platforms where index in the GT query list may not match the uapi ID. v2: - Include the missing increment of the iterator. (Jonathan) Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	457123d5a0	drm/xe: Don't compare GT ID to GT count when determining valid GTs On current platforms with multiple GTs, all of the GT IDs are consecutive; as a result we know that the GT IDs range from 0 to gt_count-1 and can determine if a GT ID is valid by comparing against the count. The consecutive nature of GT IDs may not hold true on future platforms if/when we have platforms that are both multi-tile and have multiple GTs within each tile. Once such platforms exist, it's quite possible that we could wind up with something like a GT list composed of IDs 0, 2, and 3 with no GT 1 (which would be a 2-tile platform with media only on the second tile). To future-proof the code we should stop comparing against the GT count to determine whether a GT ID is valid or not. Instead we should do an actual lookup of the ID to determine whether the GT exists. This also means that our GT loop macro should not end at the GT count, but should rather examine the entire space up to (# of tiles) * (max GT per tile) to ensure it doesn't stop prematurely. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-15-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	bd6a4b9785	drm/xe: Assign GT IDs properly on multi-tile + multi-GT platforms Although "multi-tile" and "multiple GTs per tile" are mutually-exclusive characteristics on all of our platforms today, this may not always be true. Assign GT IDs according to xe->info.max_gt_per_tile in a way that should work even if future platforms have different configurations. This patch should not change the behavior of current platforms; it only future-proofs for potential future designs. v2: - Re-calculate gt_count if tile count gets reduced by MTCFG. (PVC CI) Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-14-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	fb72cd2104	drm/xe/tests/pci: Ensure all platforms have a valid GT/tile count Add a simple kunit test to ensure each platform's GT per tile count is non-zero and does not exceed the global XE_MAX_GT_PER_TILE definition. We need to move 'struct xe_subplatform_desc' from the .c file to the types header to ensure it is accessible from the kunit test. v2: - Rebase on latest xe_pci test rework from Michal and convert to a parameterized test that runs on each PCI ID supported by the driver. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	f8e0f4c526	drm/xe: Track maximum GTs per tile on a per-platform basis Today all of our platforms fall into one of three cases: * Single tile platforms with a single (primary) GT * Single tile platforms with two GTs (primary + media) * Two-tile platforms with a single GT (primary) in each Our numbering of GTs has been a bit inconsistent between platforms (e.g., GT1 is the media GT on some platforms, but the second tile's primary GT on others). In the future we'll likely have platforms that are both multi-tile and multi-GT, which will make the situation more confusing. We could also wind up with more than just two types of GTs at some point in the future. Going forward we should standardize the way we assign uapi GT IDs to internal GT structures. Let's declare that for userspace GT ID n, GT[n]'s tile = n / (max gt per tile) GT[n]'s slot within tile = n % (max gt per tile) We don't want the GT numbering to change for any of our current platforms since the current IDs are part of our ABI contract with userspace so this means we should track the 'max gt per tile' value on a per-platform basis rather than just using a single value across the driver. Encode this into device descriptors in xe_pci.c and use the per-platform number for various checks in the code. Constant XE_MAX_GT_PER_TILE will remain just as the maximum across all platforms for easy of sizing array allocations. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	0fc957c20d	drm/xe: Export xe_step_name for kunit tests xe_step_name() is used by xe_assert(), so adding assertions to functions like xe_device_get_gt() will result in ERROR: modpost: "xe_step_name" [drivers/gpu/drm/xe/tests/xe_test.ko] undefined! while building the kunit tests. Export xe_step_name to avoid these build failures when adding assertions. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-11-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Michal Wajdeczko	6797906074	drm/xe/hw_engine_group: Fix potential leak If we fail to allocate a workqueue we will leak kzalloc'ed group object since it was designed to be kfree'ed in the drmm cleanup action, but we didn't have a chance to register this action yet. To avoid this leak allocate a group object using drmm_kzalloc() and start using predefined drmm action to release the workqueue. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250627184143.1480-1-michal.wajdeczko@intel.com	2025-07-02 13:31:30 +02:00
Tvrtko Ursulin	a34ba68d09	drm/xe: Consolidate LRC offset calculations Attempt to consolidate the LRC offsets calculations by aligning the recently added wa_bb_offset with the naming scheme in the file and also change the size stored in struct xe_lrc to not include the ring buffer. The former makes it somewhat visually easier to follow the layout of the various logical blocks stored in the LRC bo, while the latter reduces the number of sprinkled around calculations. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250630124711.8209-2-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 08:35:23 -07:00
Maarten Lankhorst	5ac5e19197	drm/xe: Fix typo in Kconfig doubut -> doubt. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250627074119.347826-1-dev@lankhorst.se Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 08:01:40 -07:00
Harry Austen	a559434880	drm/xe: Allow dropping kunit dependency as built-in Fix Kconfig symbol dependency on KUNIT, which isn't actually required for XE to be built-in. However, if KUNIT is enabled, it must be built-in too. Fixes: `08987a8b68` ("drm/xe: Fix build with KUNIT=m") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Harry Austen <hpausten@protonmail.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-28 07:11:12 -07:00
Matthew Brost	ec9223b49a	drm/xe: Drop bo->size bo->size is redundant because the base GEM object already has a size field with the same value. Drop bo->size and use the base GEM object’s size instead. While at it, introduce xe_bo_size() to abstract the BO size. v2: - Fix typo in kernel doc (Ashutosh) - Fix kunit (CI) - Fix line wrap (Checkpatch) v3: - Fix sriov build (CI) v4: - Fix display build (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com	2025-06-27 14:52:31 -07:00
Daniele Ceraolo Spurio	9c7d93a8f1	drm/xe/guc: Enable the Dynamic Inhibit Context Switch optimization The Dynamic Inhibit Context Switch is an optimization aimed at reducing the amount of time the HW is stuck waiting on an unsatisfied semaphore. When this optimization is enabled, the GuC will dynamically modify the CTX_CTRL_INHIBIT_SYN_CTX_SWITCH in the CTX_CONTEXT_CONTROL register of LRCs to enable immediate switching out on an unsatisfied semaphore wait when multiple contexts are competing for time on the same engine. This feature is available on recent HW from GuC 70.40.1 onwards and it is enabled via a per-VF feature opt-in. v2: rebase v3: switch to using guc_buf_cache instead of dedicated alloc v4: add helper to check for feature availability (Michal), don't enable if multi-lrc is possible. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-4-daniele.ceraolospurio@intel.com	2025-06-27 11:08:50 -07:00
Daniele Ceraolo Spurio	a7ffcea863	drm/xe/guc: Enable extended CAT error reporting On newer HW (Xe2 onwards + PVC) it is possible to get extra information when a CAT error occurs, specifically a dword reporting the error type. To enable this extra reporting, we need to opt-in with the GuC, which is done via a specific per-VF feature opt-in H2G. On platforms where the HW does not support the extra reporting, the GuC will set the type to 0xdeadbeef, so we can keep the code simple and opt-in to the feature on every platform and then just discard the data if it is invalid. Note that on native/PF we're guaranteed that the opt in is available because we don't support any GuC old enough to not have it, but if we're a VF we might be running on a non-XE PF with an older GuC, so we need to handle that case. We can re-use the invalid type above to handle this scenario the same way as if the feature was not supported in HW. Given that this patch is the first user of the guc_buf_cache on native and VF, it also extends that feature to non-PF use-cases. v2: simpler print for the error type (John), rebase v3: use guc_buf_cache instead of new alloc, simpler doc (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com	2025-06-27 11:08:40 -07:00
Jia Yao	c038bdba98	drm/xe: Fix out-of-bounds field write in MI_STORE_DATA_IMM According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE. The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum valid value for x is 0x1FE, not 0x1FF. v2 - Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej) v3 - Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas) Bspec: 60246 Fixes: `9c44fd5f6e` ("drm/xe: Add migrate layer functions for SVM support") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Brian3 Nguyen <brian3.nguyen@intel.com> Cc: Alex Zuo <alex.zuo@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-26 14:10:28 -07:00
Maarten Lankhorst	18635b6328	drm/xe: Rename xe_uc_init_hw to xe_uc_load_hw It feels to me like load is closer to the intention than init_hw. It makes the init calls slightly less confusing to me. :) Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-24-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	a42939ee86	drm/xe: Remove xe_uc_fini_hw xe_uc_init_hw() is called multiple times from xe_gt.c, and that makes the name xe_uc_fini_hw(), called for a different reason in xe_guc.c confusing. Remove it and inline the xe_uc_sanitize_reset into xe_guc.c directly. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-23-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	80fa03eb8a	drm/xe: Remove xe_uc_init_hwconfig() This function is called immediately after xe_uc_init(), so just put it there instead. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-22-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	3effd109c6	drm/xe: Move xe_ttm_sys_mgr_init() downwards. Now that all previous allocations are gone, ensure no new allocations will ever be done before xe_display_init_early(), by moving the call that allows allocations downwards. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-21-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	11bf0f0b3a	drm/xe: Split init of xe_gt_init_hwconfig to xe_gt_init and *_early Now that we added the separate step of initialising GUC in xe_gt_init_early, it should be ok to initialise the minimum during early init, and the rest after allocations are allowed. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-20-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	6386a49951	drm/xe: Rename gt_init sub-functions s/gt_fw_domain_init/gt_init_with_gt_forcewake()/ s/all_fw_domain_init/gt_init_with_all_forcewake()/ Clarify that the functions are the part of gt_init() that are called with the respective power domains held. all_domain() of course only works after discovering and initialisation of force_wake on all engines, that's why the split is needed in the first place. Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-19-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00

1 2 3 4 5 ...

1368226 Commits