Commit Graph

1368244 Commits

Author SHA1 Message Date
Tvrtko Ursulin
1ec31d355c drm/xe: Rename utilization workaround emission function
Lucas suggested to consolidate to a slightly different naming scheme which
will align with the upcoming additions better.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-4-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:14 -07:00
Tvrtko Ursulin
81b79670a3 drm/xe: Pass wa bb setup arguments in a struct
Group the function arguments in a struct for more readable code and easier
extending.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-3-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:09 -07:00
Tvrtko Ursulin
fa7c2a2460 drm/xe: Generalize wa bb emission code
Generalize the wa bb emission by splitting it into three phases - setup,
emit and finish, and extract setup and finish steps into helpers.

This will enable using the same infrastructure for emitting the indirect
context workarounds.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-2-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:04:35 -07:00
Lucas De Marchi
e08c0fa02e drm/xe: Fix missing kernel-doc
Fix warning:

	Warning: drivers/gpu/drm/xe/xe_device_types.h:658 struct member 'wa_active' not described in 'xe_device'

Fixes: 661a6950e0 ("drm/xe: Add infrastructure for Device OOB workarounds")
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Jonathan Cavitt <joanthan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250711214911.2009714-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:01:19 -07:00
Dr. David Alan Gilbert
8f3d1c9fb0 drm/xe: Remove unused functions
xe_bo_create_from_data() last use was removed in 2023 by
commit 0e1a47fcab ("drm/xe: Add a helper for DRM device-lifetime BO
create")

xe_rtp_match_first_gslice_fused_off() last use was removed in 2023 by
commit 4e124151fc ("drm/xe/dg2: Drop pre-production workarounds")

Remove them, and xe_dss_mask_empty whose last use was by
xe_rtp_match_first_gslice_fused_off().

(Xe has a bunch ofother symbols that have been added but not used,
given how new it is, I've left those, as opposed to these that
had the code that used them removed).

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20250713152531.219326-1-linux@treblig.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 07:55:18 -07:00
Lucas De Marchi
7b6db1731a drm/xe: Normalize default param values
Document xe module params with the default values following a similar
strategy for all of them:

	1) Define a DEFAULT_* macro with the default value. When the
	   value can't be directly stringified, also define a *_STR
	   variant
	2) Use __stringify() or the _STR variant to make sure the
	   default value shows up in the param description

This allows us to show the correct default according to the
configuration. max_vfs for example was wrongly documented for
CONFIG_DRM_XE_DEBUG and svm_notifier_size didn't have its default
documented.

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250626-guc-log-level-v3-1-c3ed8b452e91@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11 12:56:38 -07:00
Lucas De Marchi
81e139db69 drm/xe/migrate: Fix alignment check
The check would fail if the address is unaligned, but not when
accounting the offset. Instead of `buf | offset` it should have
been `buf + offset`. To make it more readable and also drop the
uintptr_t, just use the IS_ALIGNED() macro.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11 12:40:58 -07:00
Matthew Brost
4a1eaf7d11 drm/xe: Remove references to CONFIG_DRM_XE_DEVMEM_MIRROR
The prefetch code was referencing CONFIG_DRM_XE_DEVMEM_MIRROR, which has
been replaced by CONFIG_DRM_XE_PAGEMAP. As a result, prefetches were
limited to SRAM. Update the code to use CONFIG_DRM_XE_PAGEMAP instead of
the deprecated option.

Fixes: f86ad0ed62 ("drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250710205413.1105595-1-matthew.brost@intel.com
2025-07-11 10:43:12 -07:00
Matthew Brost
beb72acb5b drm/xe: Move page fault init after topology init
We need the topology to determine GT page fault queue size, move page
fault init after topology init.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com
2025-07-11 10:43:11 -07:00
Matthew Auld
c12fe703ca drm/xe/migrate: fix copy direction in access_memory
After we do the modification on the host side, ensure we write the
result back to VRAM and not the other way around, otherwise the
modification will be lost if treated like a read.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com
2025-07-11 16:33:29 +01:00
Tejas Upadhyay
b528e896fa drm/xe: Dont skip TLB invalidations on VF
Skipping TLB invalidations on VF causing unrecoverable
faults. Probable reason for skipping TLB invalidations
on SRIOV could be lack of support for instruction
MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some
additional handling.

Helps in resolving,
[  704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]]
                ASID: 0
                VFID: 0
                PDATA: 0x0d92
                Faulted Address: 0x0000000002fa0000
                FaultType: 0
                AccessType: 1
                FaultLevel: 0
                EngineClass: 3 bcs
                EngineInstance: 8
[  704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22

V2:
 - Use Xmas tree (MichalW)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 97515d0b3e ("drm/xe/vf: Don't emit access to Global HWSP if VF")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-07-11 14:39:47 +05:30
Michal Wajdeczko
908d9d56c8 drm/xe/sriov: Mark BMG as SR-IOV capable
Enable SR-IOV support for BMG platforms. Note that as other flags from
the platform descriptor, it only means it may have that capability: it
still depends on runtime checks for the proper support in HW and
firmware.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250710103040.375610-3-jakub1.kolakowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 20:50:17 -07:00
Matt Atwood
77fa16c8f8 drm/xe: extend Wa_15015404425 to apply to PTL
Wa_15015404425 only needs to be applied on PTL platforms with an A step
compute die. There is no way to map PCI revid to the compute die
stepping. The easiest way to figure out compute die stepping our end is
to map the media IP's stepping to the compute die. For PTL, compute die
has an A stepping if and only if the media IP's stepping is also A-step
(This relationship is determined on a per platform basis and just
happens to be this way on PTL).

In addition this workaround is a chicken-and-egg problem. Wa_15015404425
requires that all register reads be preceded by four dummy MMIO writes
(including during early driver  init and even pre-OS firmware). The
driver needs to perform some MMIO reads during init which include the
GMD_ID register that contains the Media IPs stepping. To handle this in
the safest manner assume the workaround applies to all of PTL during
driver probe and deactivate the workaround after.

The overall solution becomes a set of two workarounds:

* 15015404425 - a Device OOB workaround that's always active for PTL
* 15015404425_disable - a GT OOB workaround that applies to PTL
  platfroms with a B0 or later stepping

The first of these workarounds issues dummy MMIO writes we do when
reading registers. The second guards logic that disables the first once
we have the necessary information later in the probe process.

v2: rename SoC to device, avoid null pointer dereference, update commit
message.
v3: rebase
v5: move disable check into xe_device_probe to avoid linking in xe_wa
into xe_pci, reword commit message
v6: squash extension and b0 support into 1 patch

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-7-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:31 -07:00
Matt Atwood
ac596dee80 drm/xe: Move Wa_15015404425 to use the new XE_DEVICE_WA macro
Move Wa_15015404425 to use the new implemented OOB macro XE_DEVICE_WA()

v2: rename from SoC to Device
v5: move workaround call back into the flush call
v6: remove redundant commenting

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-6-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:31 -07:00
Matt Atwood
661a6950e0 drm/xe: Add infrastructure for Device OOB workarounds
Some workarounds need to be able to be applied ahead of any GT
initialization for example 15015404425. This patch creates XE_DEVICE_WA
macro, in the same vein as XE_WA. This macro can be used ahead of GT
initialization, and can be tracked in sysfs. This should alleviate some
of the complexities that exist in i915.

v2: name change SoC to Device, address style issues
v5: split into separate patch from RTP changes, put oob within a struct,
move the initiation of oob workarounds into xe_device_probe_early(),
clean up the comments around XE_WA.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-5-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:30 -07:00
Matt Atwood
e7201d98ca drm/xe: add new type to RTP context
Prepare the RTP context to be used before GT init. Add the xe device as
a type, put WARN_ONs to protect existing RTP_MATCHes.

v5: split out into separate patch, change definition order
v6: catch missing cases for checking gt init

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-4-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:30 -07:00
Matt Atwood
f037e0b78e drm/xe: add xe_device_wa infrastructure
There are some workarounds that must be appplied before gt init,
wa_15015404425 for example. Instead of sprinking them conditionally
throughout the driver as we did for i915 generate an oob.rules file
reusing the RTP infrastructure to make these easier to track.

v2: rename xe_soc_wa to xe_device_wa
v5: derive prefix from argument rather than hard coding the values.
v6: split out xe_gen-wa_oob changes

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-3-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:30 -07:00
Matt Atwood
b0a2ee5567 drm/xe: prepare xe_gen_wa_oob to be multi-use
There is a need for additional oob rules files. Make the current gen
file more robust to support more files.

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-2-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:30 -07:00
Michal Wajdeczko
94de94d24e drm/xe/guc: Cancel ongoing H2G requests when stopping CT
Once we have started a GT reset sequence, which includes stopping
GuC CTB communication, we should also cancel all ongoing H2G send-
recv requests, as either GuC is already dead, or due to imminent
reset GuC will not be able to reply, or due to internal cleanup
we will lose pending fences. With this we will report dedicated
-ECANCELED error instead of misleading -ETIME.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250709174038.1876-4-michal.wajdeczko@intel.com
2025-07-10 21:46:29 +02:00
Michal Wajdeczko
4ecdcf9caf drm/xe/guc: Move state change logger to helper
In the state change helper we are already doing extra stuff,
move debug state logger there to cover all state changes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250709174038.1876-3-michal.wajdeczko@intel.com
2025-07-10 21:39:25 +02:00
Michal Wajdeczko
1b822b7f56 drm/xe/guc: Rename CT state change helper
In this helper we are already doing much more than just setting
a new CT state and its name was little misleading. Rename it.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250709174038.1876-2-michal.wajdeczko@intel.com
2025-07-10 21:38:43 +02:00
Shuicheng Lin
0efec05001 drm/xe/pm: Correct comment of xe_pm_set_vram_threshold()
The parameter threshold is with size in MiB, not in bits.
Correct it to avoid any confusion.

v2: s/mb/MiB, s/vram/VRAM, fix return section. (Michal)

Fixes: 30c399529f ("drm/xe: Document Xe PM component")
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250708021450.3602087-2-shuicheng.lin@intel.com
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10 14:51:04 -04:00
Michal Wajdeczko
1d2e2503e5 drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VF
These workarounds are not applicable for use by the VFs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Link: https://lore.kernel.org/r/20250710103040.375610-2-jakub1.kolakowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 09:07:23 -07:00
Juston Li
ce3d39fae3 drm/xe/bo: add GPU memory trace points
Add TRACE_GPU_MEM tracepoints for tracking global GPU memory usage.

These are required by VSR on Android 12+ for reporting GPU driver memory
allocations.

v5:
 - Drop process_mem tracking
 - Set the gpu_id field to dev->primary->index (Lucas, Tvrtko)
 - Formatting cleanup under 80 columns

v3:
 - Use now configurable CONFIG_TRACE_GPU_MEM instead of adding a
   per-driver Kconfig (Lucas)

v2:
 - Use u64 as preferred by checkpatch (Tvrtko)
 - Fix errors in comments/Kconfig description (Tvrtko)
 - drop redundant "CONFIG" in Kconfig

Signed-off-by: Juston Li <justonli@chromium.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250709192313.479336-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 07:55:21 -07:00
Riana Tauro
f5c5d29522 drm/xe/xe_i2c: Add support for i2c in survivability mode
Initialize i2c in survivability mode to allow firmware
update of Add-In Management Controller (AMC) in
survivability mode.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-6-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10 10:19:41 -04:00
Raag Jadav
0ea07b6951 drm/xe/pm: Wire up suspend/resume for I2C controller
Wire up suspend/resume handles for I2C controller to match its power
state with SGUnit.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-5-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10 10:19:41 -04:00
Heikki Krogerus
f0e53aadd7 drm/xe: Support for I2C attached MCUs
Adding adaption/glue layer where the I2C host adapter
(Synopsys DesignWare I2C adapter) and the I2C clients (the
microcontroller units) are enumerated.

The microcontroller units (MCU) that are attached to the GPU
depend on the OEM. The initially supported MCU will be the
Add-In Management Controller (AMC).

Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo fixed the co-developed tags and SPDX format in the .c file]
2025-07-10 10:19:41 -04:00
Heikki Krogerus
f6a8e9f3de i2c: designware: Add quirk for Intel Xe
The regmap is coming from the parent also in case of Xe
GPUs. Reusing the Wangxun quirk for that.

Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250701122252.2590230-3-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo fixed the co-developed tags while merging]
2025-07-10 10:19:24 -04:00
Heikki Krogerus
22290cc904 i2c: designware: Use polling by default when there is no irq resource
The irq resource itself can be used as a generic way to
determine when polling is needed.

This not only removes the need for special additional device
properties that would soon be needed when the platform may
or may not have the irq, but it also removes the need to
check the platform in the first place in order to determine
is polling needed or not.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250701122252.2590230-2-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10 10:18:47 -04:00
Michal Wajdeczko
621a422079 drm/xe/guc: Don't allocate temporary policies object
Since we are already using reusable buffer objects from the GuC
buffer cache, we can directly write into their CPU pointers and
spare unnecessary temporary allocation.

While around, also make sure to clear obtained buffer, to avoid
sending some stale data.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com
2025-07-10 16:00:13 +02:00
Michal Wajdeczko
1fbe023d30 drm/xe/pf: Print configuration KLVs using debug printer
While we print VF's configuration KLVs only under DEBUG_SRIOV
config, we should be doing it at debug level, not info level.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://lore.kernel.org/r/20250703145709.1832-1-michal.wajdeczko@intel.com
2025-07-10 14:26:13 +02:00
Michal Wajdeczko
89cd027c94 drm/xe/pf: Print runtime registers using debug printer
While we already print VF's runtime registers only under DEBUG_SRIOV
config, we should be still doing it at debug level, not info.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://lore.kernel.org/r/20250604190021.725-2-michal.wajdeczko@intel.com
2025-07-10 14:26:11 +02:00
Raag Jadav
cdc36b66cd drm/xe: Expose fan control and voltage regulator version
Add sysfs attributes for late binding features which expose bound version
to the user.

v2: Rework attribute and macro naming (Badal)
v3: Drop fancy formatting (Rodrigo)
v4: Form version string using local variables (Rodrigo)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250709164224.2676086-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-09 18:25:22 -04:00
Shuicheng Lin
017ef1228d drm/xe: Release runtime pm for error path of xe_devcoredump_read()
xe_pm_runtime_put() is missed to be called for the error path in
xe_devcoredump_read().
Add function description comments for xe_devcoredump_read() to help
understand it.

v2: more detail function comments and refine goto logic (Matt)

Fixes: c4a2e5f865 ("drm/xe: Add devcoredump chunking")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com
2025-07-08 15:11:37 -07:00
Shuicheng Lin
8ce560d8e1 drm/xe: Remove unused code in devcoredump_snapshot()
The deleted code is no longer needed because patch "drm/xe/guc: Plumb
GuC-capture into dev coredump" has removed the related usage code.
Remove the code to tidy up the function.

v2: s/bacause/because

Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707004911.3502904-5-shuicheng.lin@intel.com
2025-07-08 15:11:36 -07:00
Zhanjun Dong
b2c4ac219f drm/xe/uc: Disable GuC communication on hardware initialization error
Disable GuC communication on Xe micro controller hardware initialization
error.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707231108.3217573-1-zhanjun.dong@intel.com
2025-07-08 14:56:49 -07:00
Shuicheng Lin
83dcee1785 drm/xe/pm: Restore display pm if there is error after display suspend
xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there
is error with xe_bo_evict_all(), display pm should be restored.

Fixes: 51462211f4 ("drm/xe/pxp: add PXP PM support")
Fixes: cb8f81c175 ("drm/xe/display: Make display suspend/resume work on discrete")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-08 16:05:07 -04:00
Matt Atwood
94de1dfd47 drm/xe/ptl: Drop force_probe requirement
Panther Lake has proven to be stable through testing and use.

Remove the force_probe requirement and enable the platform by default.

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-08 09:55:35 -04:00
Daniele Ceraolo Spurio
4c93e2c341 drm/xe/ptl: Add HuC FW definition for PTL
Add the unversioned define for the PTL HuC FW.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250626182805.1701096-15-daniele.ceraolospurio@intel.com
2025-07-07 10:37:12 -07:00
Daniele Ceraolo Spurio
5cdb71d3b0 drm/xe/ptl: Add GuC FW definition for PTL
The first official GuC relase for PTL is 70.47.0, which maps to
API version 1.22.4.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250626182805.1701096-14-daniele.ceraolospurio@intel.com
2025-07-07 10:37:12 -07:00
Julia Filipchuk
0b64addcae drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2
UAPI compatibility version 1.22.2

Resolves various bugs. Recommend newer version.

Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250626182805.1701096-13-daniele.ceraolospurio@intel.com
2025-07-07 10:37:01 -07:00
Vodapalli, Ravi Kumar
ccfb15b815 drm/xe/bmg: Add one additional PCI ID
One additional PCI ID is added in Bspec for BMG, Add it so that
driver recognizes this device with this new ID.

Bspec: 68090
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Vodapalli, Ravi Kumar <ravi.kumar.vodapalli@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250704103527.100178-1-ravi.kumar.vodapalli@intel.com
2025-07-04 14:55:51 +01:00
Matthew Auld
f7a2fd776e drm/xe/bmg: fix compressed VRAM handling
There looks to be an issue in our compression handling when the BO pages
are very fragmented, where we choose to skip the identity map and
instead fall back to emitting the PTEs by hand when migrating memory,
such that we can hopefully do more work per blit operation. However in
such a case we need to ensure the src PTEs are correctly tagged with a
compression enabled PAT index on dgpu xe2+, otherwise the copy will
simply treat the src memory as uncompressed, leading to corruption if
the memory was compressed by the user.

To fix this pass along use_comp_pat into emit_pte() on the src side, to
indicate that compression should be considered.

v2 (Jonathan): tweak the commit message

Fixes: 523f191cc0 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com
2025-07-04 12:31:06 +01:00
Matthew Brost
03d85ab36b Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"
This reverts commit fe0154cf82.

Seeing some unexplained random failures during LRC context switches with
indirect ring state enabled. The failures were always there, but the
repro rate increased with the addition of WA BB as a separate BO.
Commit 3a1edef8f4 ("drm/xe: Make WA BB part of LRC BO") helped to
reduce the issues in the context switches, but didn't eliminate them
completely.

Indirect ring state is not required for any current features, so disable
for now until failures can be root caused.

Cc: stable@vger.kernel.org
Fixes: fe0154cf82 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-03 12:23:45 -07:00
Tomasz Lis
7eba6a80fe drm/xe/vf: Make multi-GT migration less error prone
There is a remote chance that after migration,  some GTs will not
send the MIGRATED interrupt, or due to current VF KMD state the
interrupt will not lead to marking the GT for recovery.

Requiring IRQs from all GTs before starting migration introduces
the possibility that the process will get stalled due to one GuC.

One could argue it is also waste of time to wait for all IRQs,
but we should get them all IRQs as soon as VGPU starts, so that's
not really an impactful argument.

Still, not waiting for all GTs makes it easier to handle situations:
* where one GuC IRQ is missing
* where state before probe is unclean - getting MIGRATED IRQ as soon
  as interrupts are enabled
* where multiple migrations happen close to each other

To help with these cases, this patch alters the post-migration
recovery so that recovery task is started as soon as one GuC IRQ
is handled, and other GTs are included in recovery later as the
subsequent IRQs are serviced.

The post-migration recovery can now be called for any selection of
GTs, and it will perform recovery on all GTs for which IRQs have
arrived, even multiple times if necessary.

v2: Typos and style fixes
v3: Transferring gt_flags by value rather than reference to last
 function where it is used

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250630152155.195648-1-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-07-03 21:05:21 +02:00
Matthew Brost
491b978312 drm/xe: Allocate PF queue size on pow2 boundary
CIRC_SPACE does not work unless the size argument is a power of 2,
allocate PF queue size on power of 2 boundary.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Fixes: 29582e0ea7 ("drm/xe: Add page queue multiplier")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com
2025-07-03 10:19:35 -07:00
Michal Wajdeczko
3fae6918a3 drm/xe/pf: Clear all LMTT pages on alloc
Our LMEM buffer objects are not cleared by default on alloc
and during VF provisioning we only setup LMTT PTEs for the
actually provisioned LMEM range. But beyond that valid range
we might leave some stale data that could either point to some
other VFs allocations or even to the PF pages.

Explicitly clear all new LMTT page to avoid the risk that a
malicious VF would try to exploit that gap.

While around add asserts to catch any undesired PTE overwrites
and low-level debug traces to track LMTT PT life-cycle.

Fixes: b1d2040582 ("drm/xe/pf: Introduce Local Memory Translation Table")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com
2025-07-03 16:38:45 +02:00
Riana Tauro
b9329f5167 drm/xe/xe_pmu: Validate gt in event supported
Validate gt instead of checking gt_id is lesser
than max gts per tile

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250630093741.2435281-1-riana.tauro@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-02 16:09:11 -07:00
Matt Roper
d4eb4a0102 drm/xe/xe_query: Use separate iterator while filling GT list
The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs
being iterated over, and may skip over values if a GT is not present on
the device.  Use a separate iterator for GT list array assignments to
ensure that the array will be filled properly on future platforms where
index in the GT query list may not match the uapi ID.

v2:
 - Include the missing increment of the iterator.  (Jonathan)

Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-02 16:08:54 -07:00
Matt Roper
457123d5a0 drm/xe: Don't compare GT ID to GT count when determining valid GTs
On current platforms with multiple GTs, all of the GT IDs are
consecutive; as a result we know that the GT IDs range from 0 to
gt_count-1 and can determine if a GT ID is valid by comparing against
the count.  The consecutive nature of GT IDs may not hold true on future
platforms if/when we have platforms that are both multi-tile and have
multiple GTs within each tile.  Once such platforms exist, it's quite
possible that we could wind up with something like a GT list composed of
IDs 0, 2, and 3 with no GT 1 (which would be a 2-tile platform with
media only on the second tile).

To future-proof the code we should stop comparing against the GT count
to determine whether a GT ID is valid or not.  Instead we should do an
actual lookup of the ID to determine whether the GT exists.  This also
means that our GT loop macro should not end at the GT count, but should
rather examine the entire space up to (# of tiles) * (max GT per tile)
to ensure it doesn't stop prematurely.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-15-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-02 16:08:54 -07:00