Commit Graph

1382327 Commits

Author SHA1 Message Date
Lucas De Marchi
c9dfd66cb9 drm/xe/lrc: Allow INDIRECT_CTX for more engine classes
Currently it's only allowed for render and compute. Going forward we
want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX
flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor
for its availability.

While at it, add the missing const to rcs_funcs array. Since
CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and
gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there
is no change in behavior, it's only preparation for future use case.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
39ac06f700 drm/xe/configfs: Add post context restore bb
Allow the user to specify commands to execute during a context restore.
Currently it's possible to parse 2 types of actions:

	- cmd: the instructions are added as is to the bb
	- reg: just use the address and value, without worrying about
	  encoding the right LRI instruction. This is possibly the most
	  useful use case, so added a dedicated action for that.

This also prepares for future BBs: mid context restore and rc6 context
restore that can re-use the same parsing functions.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-4-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
6c6988c5e0 drm/xe/lrc: Allow to add user commands on context switch
During validation it's useful to allows additional commands to be
executed on context switch. Fetch the commands from configfs (to be
added) and add them to the WA BB.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-3-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
e2a9854d80 drm/xe/configfs: Allow to select by class only
For a future configfs attribute, it's desirable to select by engine mask
only as the instance doesn't make sense.

Rename the function lookup_engine_mask() to lookup_engine_info() and
make it return the entry. This allows parse_engine() to still return an
item if the caller wants to allow parsing a class-only string like
"rcs", "bcs", "ccs", etc.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-2-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
7166cc3a6a drm/xe/configfs: Extract function to parse engine
Move the part that copies the engine to a local buffer so it can be
shared in future for other configfs attributes parsing an engine.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-1-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:38 -07:00
Badal Nilawar
efa29317a5 drm/xe/xe_late_bind_fw: Extract and print version info
Extract and print version info of the late binding binary.

v2: Some refinements (Daniele)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-10-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
67de7982d5 drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding
Introduce a debug filesystem node to disable late binding fw reload
during the system or runtime resume. This is intended for situations
where the late binding fw needs to be loaded from user mode,
perticularly for validation purpose.
Note that xe kmd doesn't participate in late binding flow from user
space. Binary loaded from the userspace will be lost upon entering to
D3 cold hence user space app need to handle this situation.

v2:
  - s/(uval == 1) ? true : false/!!uval/ (Daniele)
v3:
  - Refine the commit message (Daniele)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-9-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
02f52f6d92 drm/xe/xe_late_bind_fw: Reload late binding fw during system resume
Reload late binding fw during resume from system suspend

v2:
  - Unconditionally reload late binding fw (Rodrigo)
  - Flush worker during system suspend

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-8-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
69ac1bb8fc drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume
Reload late binding fw during runtime resume.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-7-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
691a54ad94 drm/xe/xe_late_bind_fw: Load late binding firmware
Load late binding firmware

v2:
 - s/EAGAIN/EBUSY/
 - Flush worker in suspend and driver unload (Daniele)
v3:
 - Use retry interval of 6s, in steps of 200ms, to allow
   other OS components release MEI CL handle (Sasha)
v4:
 - return -ENODEV if component not added (Daniele)
 - parse and print status returned by csc
v5:
 - Use payload to check firmware valid (Daniele)
 - Obtain the RPM reference before scheduling the worker to
   ensure the device remains awake until the worker completes
   firmware loading (Rodrigo)
v6:
 - In case of error donot re-attempt fw download (Daniele)
v7 (Rodrigo):
 - Rename of mei structs and callback.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-6-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
45832bf9c1 drm/xe/xe_late_bind_fw: Initialize late binding firmware
Search for late binding firmware binaries and populate the meta data of
firmware structures.

v2 (Daniele):
 - drm_err if firmware size is more than max pay load size
 - s/request_firmware/firmware_request_nowarn/ as firmware will
   not be available for all possible cards
v3 (Daniele):
 - init firmware from within xe_late_bind_init, propagate error
 - switch late_bind_fw to array to handle multiple firmware types
v4 (Daniele):
 - Alloc payload dynamically, fix nits
v6 (Daniele)
 - %s/MAX_PAYLOAD_SIZE/XE_LB_MAX_PAYLOAD_SIZE/

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-5-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
918bd789d6 drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw
Introduce xe_late_bind_fw to enable firmware loading for the devices,
such as the fan controller, during the driver probe. Typically,
firmware for such devices are part of IFWI flash image but can be
replaced at probe after OEM tuning.
This patch binds mei late binding component to enable firmware loading.

v2:
 - Add devm_add_action_or_reset to remove the component (Daniele)
 - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
v3:
 - Fail driver probe if late bind initialization fails,
   add has_late_bind flag (Daniele)
v4:
 - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/
v6:
 - rebased
v7:
 - rebased
 - In xe_late_bind_init, use drm_err when returning an error to
   stop the probe (Lucas)
 - Use imperative mode in commit message (Lucas)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:00 -07:00
Alexander Usyskin
741eeabb7c mei: late_bind: add late binding component driver
Introduce a new MEI client driver to support Late Binding firmware
upload/update for Intel discrete graphics platforms.

Late Binding is a runtime firmware upload/update mechanism that allows
payloads, such as fan control and voltage regulator, to be securely
delivered and applied without requiring SPI flash updates or
system reboots. This driver enables the Xe graphics driver and other
user-space tools to push such firmware blobs to the authentication
firmware via the MEI interface.

The driver handles authentication, versioning, and communication
with the authentication firmware, which in turn coordinates with
the PUnit/PCODE to apply the payload.

This is a foundational component for enabling dynamic, secure,
and re-entrant configuration updates on platforms like Battlemage.

Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250905154953.3974335-3-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:00 -07:00
Alexander Usyskin
8d5b7009aa mei: bus: add mei_cldev_mtu interface
Add a new helper function that allows MEI client drivers
to query the maximum transmission unit (MTU) for a connected
MEI client.

This is useful for clients that need to transmit large payloads,
such as firmware blobs, allowing them to determine the maximum
message size that can be safely sent before starting transmission and
size of the buffer to allocate when receiving data.

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250905154953.3974335-2-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:31:50 -07:00
Thomas Hellström
187e16f69d drm/xe: Work around clang multiple goto-label error
When using drm_exec_retry_on_contention(), clang may consider
all labels for which we take addresses in a function as
potential retry goto targets, although strictly only one
is possible. It will then in some situations generate false
positive errors.

In this case, the compiler, for some architectures, consider the

might_lock(&m->job_mutex);

as a potential goto target from drm_exec_retry_on_contention(),
and errors.

Work around that by moving the xe_validate / drm_exec
transaction to a separate function.

v2:
- New commit message based on analysis of Nathan Chancellor

Fixes: 59eabff2a3 ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction")
Cc: Matthew Brost <matthew.brost@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509101853.nDmyxTEM-lkp@intel.com/
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Link: https://lore.kernel.org/r/20250911080324.180307-1-thomas.hellstrom@linux.intel.com
2025-09-18 08:27:00 +02:00
Michal Wajdeczko
fb3c27a69c drm/xe/sysfs: Simplify sysfs registration
Instead of manually maintaining each sysfs file define and use
attribute groups and register them using device managed function.
Then use is_visible() to filter-out unsupported attributes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250916170029.3313-3-michal.wajdeczko@intel.com
2025-09-17 21:59:31 +02:00
Michal Wajdeczko
a2d6223d22 drm/xe/vf: Don't expose sysfs attributes not applicable for VFs
VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE
(already guarded by the info.skip_pcode flag) so we shouldn't
expose attributes that require any of them to avoid errors like:

 [] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \
                     inaccessible register 0x138340+0x0
 [] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe]
 [] Call Trace:
 []  xe_mmio_read32+0x110/0x280 [xe]
 []  auto_link_downgrade_capable_show+0x2e/0x70 [xe]
 []  dev_attr_show+0x1a/0x70
 []  sysfs_kf_seq_show+0xaa/0x120
 []  kernfs_seq_show+0x41/0x60

Fixes: 0e414bf7ad ("drm/xe: Expose PCIe link downgrade attributes")
Fixes: cdc36b66cd ("drm/xe: Expose fan control and voltage regulator version")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com
2025-09-17 21:59:29 +02:00
Shuicheng Lin
33fe111a35 drm/xe/madvise: Fix ioctl argument check
It is "preferred_mem_loc" instead of "atomic" for the ATTR_PREFERRED_LOC
path.

Also include 2 minor changes with no functional impact.
1. Remove the redundant "attr.atomic_access" assignment.
2. Replace down_read_interruptible() with
   xe_svm_notifier_lock_interruptible() to pair with
   xe_svm_notifier_unlock().

Fixes: ada7486c56 ("drm/xe: Implement madvise ioctl for xe")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250911173139.1405878-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-09-17 10:09:34 -07:00
Shuicheng Lin
5959c4da17 drm/xe: Misc refine for svm
These changes should have no functional impact.
1. Correct typo of "operation"in macro range_debug().
2. Combine 2 spin_lock() call in xe_svm_garbage_collector() into 1.
3. Drop redundant preferred_region_is_vram check in
   xe_svm_range_needs_migrate_to_vram().
4. Combine the devmem_possible check in xe_svm_handle_pagefault().
   need_vram includes the IS_DGFX() check, so there is no change for
   .devmem_only.

v2: revert !ctx.devmem_only change (Matt)
v3: rebase code and refine commit message.
v4: rebase code and refine commit message.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250911031405.1371812-2-shuicheng.lin@intel.com
2025-09-17 09:50:44 -07:00
Michal Wajdeczko
5bb5258e35 drm/xe/tests: Add pre-GMDID IP descriptors to param generators
Recently introduced kunit parameter generators were based on
the existing arrays which have only GDMID-based IPs and didn't
take into account IP definitions from pre-GMDID era.

Add test only arrays with pre-GMDID IPs (as those will not change)
and extend param generators to start iterating over them.

 [ ] =================== xe_pci (2 subtests) ====================
 [ ] ==================== check_graphics_ip  ====================
 [ ] [PASSED] 12.00 Xe_LP
 [ ] [PASSED] 12.10 Xe_LP+
 [ ] [PASSED] 12.55 Xe_HPG
 [ ] [PASSED] 12.60 Xe_HPC
 [ ] [PASSED] 12.70 Xe_LPG
 [ ] [PASSED] 12.71 Xe_LPG
 [ ] [PASSED] 12.74 Xe_LPG+
 [ ] [PASSED] 20.01 Xe2_HPG
 [ ] [PASSED] 20.02 Xe2_HPG
 [ ] [PASSED] 20.04 Xe2_LPG
 [ ] [PASSED] 30.00 Xe3_LPG
 [ ] [PASSED] 30.01 Xe3_LPG
 [ ] [PASSED] 30.03 Xe3_LPG
 [ ] ================ [PASSED] check_graphics_ip ================
 [ ] ===================== check_media_ip  ======================
 [ ] [PASSED] 12.00 Xe_M
 [ ] [PASSED] 12.55 Xe_HPM
 [ ] [PASSED] 13.00 Xe_LPM+
 [ ] [PASSED] 13.01 Xe2_HPM
 [ ] [PASSED] 20.00 Xe2_LPM
 [ ] [PASSED] 30.00 Xe3_LPM
 [ ] [PASSED] 30.02 Xe3_LPM
 [ ] ================= [PASSED] check_media_ip ==================
 [ ] ===================== [PASSED] xe_pci ======================

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250916171645.3335-1-michal.wajdeczko@intel.com
2025-09-17 15:29:13 +02:00
Daniele Ceraolo Spurio
aaae483657 drm/xe: Allow error injection for xe_pxp_exec_queue_add
This will allow us to simulate this function returning an error like
we do for other functions called in the exec_queue_create path.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250909221240.3711023-4-daniele.ceraolospurio@intel.com
2025-09-16 15:54:31 -07:00
Daniele Ceraolo Spurio
626667321d drm/xe: Fix error handling if PXP fails to start
Since the PXP start comes after __xe_exec_queue_init() has completed,
we need to cleanup what was done in that function in case of a PXP
start error.
__xe_exec_queue_init calls the submission backend init() function,
so we need to introduce an opposite for that. Unfortunately, while
we already have a fini() function pointer, it performs other
operations in addition to cleaning up what was done by the init().
Therefore, for clarity, the existing fini() has been renamed to
destroy(), while a new fini() has been added to only clean up what was
done by the init(), with the latter being called by the former (via
xe_exec_queue_fini).

Fixes: 72d479601d ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
2025-09-16 15:54:28 -07:00
Yang Li
9e6eb49ec1 drm/xe: Remove duplicate header files
Fix some duplicate includes in xe:
./drivers/gpu/drm/xe/xe_tlb_inval.c: xe_tlb_inval.h is included more than once.
./drivers/gpu/drm/xe/xe_pt.c: xe_tlb_inval_job.h is included more than once.

While at it, also sort the include lines alphabetically.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24705
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24706
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
[Reword commit message]
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250916021039.1632766-1-yang.lee@linux.alibaba.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-16 12:47:40 -07:00
John Harrison
3b09b11805 drm/xe/guc: Return an error code if the GuC load fails
Due to multiple explosion issues in the early days of the Xe driver,
the GuC load was hacked to never return a failure. That prevented
kernel panics and such initially, but now all it achieves is creating
more confusing errors when the driver tries to submit commands to a
GuC it already knows is not there. So fix that up.

As a stop-gap and to help with debug of load failures due to invalid
GuC init params, a wedge call had been added to the inner GuC load
function. The reason being that it leaves the GuC log accessible via
debugfs. However, for an end user, simply aborting the module load is
much cleaner than wedging and trying to continue. The wedge blocks
user submissions but it seems that various bits of the driver itself
still try to submit to a dead GuC and lots of subsequent errors occur.
And with regards to developers debugging why their particular code
change is being rejected by the GuC, it is trivial to either add the
wedge back in and hack the return code to zero again or to just do a
GuC log dump to dmesg.

v2: Add support for error injection testing and drop the now redundant
wedge call.

CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250909224132.536320-1-John.C.Harrison@Intel.com
2025-09-16 12:11:08 -07:00
Zongyao Bai
1a869168d9 drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init
On partial failure, some sysfs files created before the failure might
not be removed. Add common cleanup step to remove them all immediately,
as is should be harmless to attempt to remove non-existing files.

Fixes: 0e414bf7ad ("drm/xe: Expose PCIe link downgrade attributes")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Zongyao Bai <zongyao.bai@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250915214716.1327379-2-zongyao.bai@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-16 07:59:36 -07:00
John Harrison
456b32c9c1 drm/xe/guc: Add test for G2G communications
Add a test for sending messages from every GuC to every other GuC to
test G2G communications.

Note that, being a debug only feature, the test interface only exists
in pre-production builds of the GuC firmware.

v2: Fix 'default' case to actually use the driver's registration code
as well as allocation. Add comments explaining the different test
types. Fix (C) date and an assert. Review feedback from Daniele.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250910210237.603576-5-John.C.Harrison@Intel.com
2025-09-15 09:53:26 -07:00
John Harrison
537773db91 drm/xe: Allow freeing of a managed bo
If a bo is created via xe_managed_bo_create_pin_map() then it cannot be
freed by the driver using xe_bo_unpin_map_no_vm(), or indeed any other
existing function. The DRM layer will still have a pointer stashed
away for later freeing, causing a invalid memory access on driver
unload. So add a helper for releasing the DRM action as well.

v2: Drop 'xe' parameter (review feedbak from Michal W)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250910210237.603576-4-John.C.Harrison@Intel.com
2025-09-15 09:53:25 -07:00
John Harrison
acf01c79f0 drm/xe/guc: Add firmware build type to available info
Some test features are not available in production builds of the GuC
firmware. So add the build type field to the available information
that tests can inspect to decide if they should skip or run.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250910210237.603576-3-John.C.Harrison@Intel.com
2025-09-15 09:53:23 -07:00
John Harrison
7d0ca56e91 drm/xe/guc: Update CSS header structures
Rework the CSS header structure according to recent updates to the GuC
API spec. Also include more field definitions.

v2: Also pass the new GuC specific structure to a GuC specific
function instead of the higher level, generic structure (review
feedback from Daniele).
Also correct naming of CSS_TIME_* fields.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250910210237.603576-2-John.C.Harrison@Intel.com
2025-09-15 09:53:21 -07:00
Fushuai Wang
84afb84bcc drm/xe: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...))
Use ERR_CAST inline function instead of ERR_PTR(PTR_ERR(...)).

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250914101630.17719-1-wangfushuai@baidu.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-15 08:32:51 -07:00
Lucas De Marchi
19baa830fb drm/xe: Use ARRAY_SIZE in guc_waklv_init()
Prefer using ARRAY_SIZE where needed and just passing 1 instead of
calculating the size of one element.

Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508130158.eogeBZQT-lkp@intel.com/
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250912-guc-ads-array-size-v1-1-a6555392a1f8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-15 07:50:53 -07:00
Dan Carpenter
75cc23ffe5 drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()
The xe_preempt_fence_create() function returns error pointers.  It
never returns NULL.  Update the error checking to match.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-15 08:33:19 -04:00
Nitin Gote
d4c3ed963e drm/xe: defer free of NVM auxiliary container to device release callback
Do not kfree the intel_dg_nvm_dev in xe_nvm_fini() right after
auxiliary_device_delete/uninit. The auxiliary_device embeds the
device/kobject (and its name); freeing it too early can race
with asynchronous device_del/udev processing and cause a use-after-free.

Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Fixes: c28bfb107d ("drm/xe/nvm: add on-die non-volatile memory device")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250911052823.226696-1-nitin.r.gote@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12 12:02:23 -07:00
Lucas De Marchi
2ec2945625 drm/xe/configfs: Fix documentation warning
Fix this warning while building the documentation:

	Documentation/gpu/xe/xe_configfs:9: drivers/gpu/drm/xe/xe_configfs.c:138:
	WARNING: Definition list ends without a blank line; unexpected unindent.

That also makes it better formatted in the output.

While at it, also fix the underline length in "Overview".

Fixes: e2b33fce5e ("drm/xe/configfs: Improve documentation steps")
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-2-c8f7e48f7eae@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12 09:36:43 -07:00
Lucas De Marchi
c34f9868df drm/xe: Update workaround documentation
Bring it up to reality, better documenting the existing batch buffers,
OOB rules and fixing some typos.

Bspec: 60122
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-1-c8f7e48f7eae@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12 09:36:43 -07:00
Mallesh Koujalagi
4e1d3b5e64 drm/xe/hwmon: Remove type casting
Refactor: eliminate type casts by using proper u32
declarations.

v2:
- Address review comments. (Karthik)

v3:
- Use the proper u32 type and drop cast. (Lucas De Marchi)
- Modify variable when actually using u64 value.
- Change r value to reg_value with u32 type.

v4:
- Remove newline between trailer and Signed-off-by. (Lucas De Marchi)
- Change reg_val to val for more user-friendly logging.
- Use mul_u32_u32 function since both values are u32.

v5:
- mul_u32_u32 function with shift. (Lucas De Marchi)

Fixes: 7596d839f6 ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12 07:36:50 -07:00
Colin Ian King
9e0b0fd531 drm/xe/guc: Fix spelling mistake "sheduling" -> "scheduling"
There is a spelling mistake in a xe_gt_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250912074330.1275279-1-colin.i.king@gmail.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12 07:20:32 -07:00
Harish Chegondi
2a810401aa drm/xe/xe3: Extend Wa_18041344222 to graphics IP versions 30.00 and 30.01
Apply WA 18041344222 to Xe3 LPG graphics IP versions 30.00 and 30.01 too.

Bspec: 56024
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/7368f8059013424ac94f4a01c23f9c98a37b06dc.1757552915.git.harish.chegondi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12 07:20:32 -07:00
Rodrigo Vivi
fed1a9d60f drm/xe: Fix circular locking dependency
Fix this:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.17.0-rc4-lgci-xe-xe-pw-153723v2+ #1 Tainted: G S   U
 ------------------------------------------------------
 xe_pm/11324 is trying to acquire lock:
 ffff8881085f22a0 (&pc->freq_lock){+.+.}-{3:3}, at:
 		  xe_guc_pc_start+0x39f/0xf70 [xe]

but task is already holding lock:

 ffffffffa1020420 (xe_rpm_nod3cold_map){+.+.}-{0:0}, at:
 		  xe_rpm_lockmap_acquire+0x1a/0x70 [xe]

which lock already depends on the new lock.

Possible unsafe locking scenario:
      CPU0                    CPU1
      ----                    ----
 lock(xe_rpm_nod3cold_map);
                              lock(&pc->freq_lock);
                              lock(xe_rpm_nod3cold_map);
 lock(&pc->freq_lock);

Reported-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6122
Fixes: 60d2b78991 ("drm/xe/guc: Add SLPC power profile interface")
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250911212024.966757-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-12 09:38:08 -04:00
Michal Wajdeczko
01ecf00463 drm/xe: Use tile-oriented messages in GGTT code
Use recently added macros to print tile-oriented messages.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250909165941.31730-6-michal.wajdeczko@intel.com
2025-09-12 12:23:59 +02:00
Michal Wajdeczko
48a8659cd5 drm/xe: Add dedicated printk macros for tile and device
We already have dedicated helper macros for printing GT-oriented
messages but we don't have any to print messages that are tile
oriented and we wrongly try to use plain drm or GT-oriented ones.

Add tile-oriented printk messages and to provide similar coverage
as we have with xe_assert() macros. Also add set of simple macros
for the top level xe_device, which we could easily tweak to include
extra device specific info if needed.

Typical output of our printk macros will look like:

 [drm] this is xe_WARN()
 [drm] *ERROR* this is xe_err()
 [drm] *ERROR* this is xe_err_printer()
 [drm] this is xe_info()
 [drm] this is xe_info_printer()
 [drm:printk_demo.cold] this is xe_dbg()
 [drm:printk_demo.cold] this is xe_dbg_printer()

 [drm] Tile0: this is xe_tile_WARN()
 [drm] *ERROR* Tile0: this is xe_tile_err()
 [drm] *ERROR* Tile0: this is xe_tile_err_printer()
 [drm] Tile0: this is xe_tile_info()
 [drm] Tile0: this is xe_tile_info_printer()
 [drm:printk_demo.cold] Tile0: this is xe_tile_dbg()
 [drm:printk_demo.cold] Tile0: this is xe_tile_dbg_printer()

 [drm] Tile0: GT0: this is xe_gt_WARN()
 [drm] *ERROR* Tile0: GT0: this is xe_gt_err()
 [drm] *ERROR* Tile0: GT0: this is xe_gt_err_printer()
 [drm] Tile0: GT0: this is xe_gt_info()
 [drm] Tile0: GT0: this is xe_gt_info_printer()
 [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg()
 [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg_printer()

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250909165941.31730-5-michal.wajdeczko@intel.com
2025-09-12 12:23:57 +02:00
Michal Wajdeczko
efd54b0cff drm/xe: Prepare format for GT-oriented messages in one place
To avoid code duplication (and thus potential mistakes) and to
allow easier changes (if needed) of the prefix format of the
GT-oriented messages, prepare that prefix in dedicated macro.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250909165941.31730-4-michal.wajdeczko@intel.com
2025-09-12 12:23:56 +02:00
Michal Wajdeczko
a2dc39fb1c drm/xe: Drop "gt_" prefix from xe_gt_WARN() macros
Those WARN messages will already include GT-specific "GT%u:" prefix
so there is no point to include additional "gt_" prefix.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250909165941.31730-3-michal.wajdeczko@intel.com
2025-09-12 12:23:55 +02:00
Michal Wajdeczko
edffa93a93 drm/xe: Keep xe_gt_err() macro definitions together
There is no need to keep them separated. No functional changes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250909165941.31730-2-michal.wajdeczko@intel.com
2025-09-12 12:23:53 +02:00
Daniele Ceraolo Spurio
8843444843 drm/xe/guc: Set RCS/CCS yield policy
All recent platforms (including all the ones officially supported by the
Xe driver) do not allow concurrent execution of RCS and CCS workloads
from different address spaces, with the HW blocking the context switch
when it detects such a scenario.
The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a
context it knows will not be able to execute. This, however, causes a new
problem: if RCS and CCS queues have pending workloads from different
address spaces, the GuC needs to choose from which of the 2 queues to
pick the next workload to execute. By default, the GuC prioritizes RCS
submissions over CCS ones, which can lead to CCS workloads being
significantly (or completely) starved of execution time.
The driver can tune this by setting a dedicated scheduling policy KLV;
this KLV allows the driver to specify a quantum (in ms) and a ratio
(percentage value between 0 and 100), and the GuC will prioritize the CCS
for that percentage of each quantum.
Given that we want to guarantee enough RCS throughput to avoid missing
frames, we set the yield policy to 20% of each 80ms interval.

v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable
in gt_sanitize

Fixes: d9a1ae0d17 ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
2025-09-11 09:45:35 -07:00
Michal Wajdeczko
95c1cfa306 drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation
This effectively reverts commit 4c3fe5eae4 ("drm/xe/pf: Limit
fair VF LMEM provisioning") since we don't need it any more after
non-contig VRAM allocations were fixed. This allows larger LMEM
auto-provisioning for VFs, so instead:

 [ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M)
 [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM
or
 [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M)
 [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM

we may get:

 [ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M)
 [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM
and
 [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M)
 [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM

Fixes: 1e32ffbc9d ("drm/xe/sriov: support non-contig VRAM provisioning")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com
2025-09-11 18:20:57 +02:00
Varun Gupta
010629e00d drm/xe: Fix driver reference in FLR comment
Rectify the reference of i915 to Xe in a comment.

v2: Cosmetic changes. (Karthik)
v3: Rephrased the commit message. (Karthik)

Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Reviewed-by: Karthik Poosa <karthik.poosa@intel.com>
Link: https://lore.kernel.org/r/20250911111712.811524-1-varun.gupta@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-11 08:00:33 -07:00
Vinay Belgaumkar
60d2b78991 drm/xe/guc: Add SLPC power profile interface
GuC has an interface to set a power profile for the SLPC algorithm.
Base mode is default and ensures a balanced performance, power_saving
mode has conservative up/down thresholds and is suitable for use with
apps that typically need to be power efficient. This will result in
lower GT frequencies, thus consuming lower power.

Selected power profile will be displayed in this format:

$ cat power_profile

  [base]    power_saving

$ echo power_saving > power_profile
$ cat power_profile

  base    [power_saving]

v2: Address review comments (Rodrigo)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-11 08:45:05 -04:00
Thomas Hellström
692a480243 drm/xe: Fix uninitialized return values
clang warned about two uninitialized variables used as return
values in the exhaustive eviction series.

Fix those.

Fixes: 1f1541720f ("drm/xe: Rework instances of variants of xe_bo_create_locked()")
Fixes: 7bcb6e38c1 ("drm/xe/display: Convert __xe_pin_fb_vma()")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250910151128.49693-1-thomas.hellstrom@linux.intel.com
2025-09-11 08:45:12 +02:00
Shuicheng Lin
b98775bca9 drm/xe/tile: Release kobject for the failure path
Call kobject_put() for the failure path to release the kobject

v2: remove extra newline. (Matt)

Fixes: e3d0839aa5 ("drm/xe/tile: Abort driver load for sysfs creation failure")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-10 21:00:10 -07:00