Now that we save the job's head during submission, it's no longer
necessary to adjust the LRC ring head during resubmission. Instead, a
software-based adjustment of the tail will overwrite the old jobs in
place. For some odd reason, adjusting the LRC ring head didn't work on
parallel queues, which was causing issues in our CI.
v5:
- Add comment in guc_exec_queue_start explaning why the function works
(Auld)
v7:
- Only adjust first state on first unsignaled job (Auld)
v8:
- Break unsignaled job handling to separate patch (Auld)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-7-matthew.brost@intel.com
In all cases where the first pending job helper is called, we only want
to retrieve the first unsignaled pending job, as this helper is used
exclusively in recovery flows. It is possible for signaled jobs to
remain in the pending list as the scheduler is stopped, so those should
be skipped.
Also, add kernel documentation to clarify this behavior.
v8:
- Split out into own patch (Auld)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-6-matthew.brost@intel.com
VF migration requires jobs to remain pending so they can be replayed
after the VF comes back. Previously, LR job fences were intentionally
signaled immediately after submission to avoid the risk of exporting
them, as these fences do not naturally signal in a timely manner and
could break dma-fence contracts. A side effect of this approach was that
LR jobs were never added to the DRM scheduler’s pending list, preventing
them from being tracked for later resubmission.
We now avoid signaling LR job fences and ensure they are never exported;
Xe already guards against exporting these internal fences. With that
guarantee in place, we can safely track LR jobs in the scheduler’s
pending list so they are eligible for resubmission during VF
post-migration recovery (and similar recovery paths).
An added benefit is that LR queues now gain the DRM scheduler’s built-in
flow control over ring usage rather than rejecting new jobs in the exec
IOCTL if the ring is full.
v2:
- Ensure DRM scheduler TDR doesn't run for LR jobs
- Stack variable for killed_or_banned_or_wedged
v4:
- Clarify commit message (Tomasz)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-5-matthew.brost@intel.com
Add explicit tracking in the GuC submission state to record the source
of a pending enable (TDR vs. queue resume path vs. submission).
Disambiguating the origin lets the GuC submission state machine apply
the correct recovery/replay behavior.
This helps VF restore: when the device comes back, the state machine knows
whether the pending enable stems from timeout recovery, from a queue resume
sequence, or submission and can gate sequencing and fixups accordingly.
v4:
- Clarify commit message (Tomasz)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-4-matthew.brost@intel.com
We already have device and GT level SR-IOV specific macros, but
unlike native case, we don't have yet tile-based ones.
Add macros to match native use case and also update GT-based
macros to rely on those new tile-based SR-IOV macros. This will
slightly rearrange the output of the GT logs and instead:
[...] Tile0: GT0: PF: pushed VF1 config with 2 KLVs...
we might see:
[...] PF: Tile0: GT0: pushed VF1 config with 2 KLVs...
but that's even better.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251005133641.2651-3-michal.wajdeczko@intel.com
While the late PF per-GT initialization is done quite late in the
single GT initialization flow, in case of multi-GT platforms, it
may still be done before other GT early initialization. That leads
to some issues during unwind, when there are cross-GT dependencies,
like resource cleanup that is shared by both GTs, but the other GT
may already be sanitized or disabled.
The following errors could be observed when trying to unload the PF
driver with some LMEM/VRAM already provisioned for few VFs:
[ ] xe 0000:03:00.0: DEVRES REL ffff88814708f240 fini_config (16 bytes)
[ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=1 pte=0x0
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=53a470000
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=53a4b0000
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV)
[ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=2 pte=0x0
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=539b70000
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=539bf0000
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV)
Move all PF per-GT late initialization to the already defined late
SR-IOV initialization function to allow proper order of the cleanup
actions.
While around, format all PF function stubs as one-liners, like many
other stubs are defined in the Xe driver.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251004162008.1782-1-michal.wajdeczko@intel.com
Both vm->xef and XE_LRC_CREATE_USER_CTX indicate in xe_lrc_init that
the context originates from userspace. However, XE_LRC_CREATE_USER_CTX
has a broader scope as it may be set even when no vm->xef is present.
The XE_BO_FLAG_PINNED_LATE_RESTORE flag can be extended to both cases,
so there is no point in handling the two cases separately.
Let's combine vm->xef and XE_LRC_CREATE_USER_CTX checks to detect
userspace context.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-6-piotr.piorkowski@intel.com
When using a separate VRAM region for kernel allocations,
some kernel structures, such as context userspace data,
should not reside in the VRAM region dedicated to the kernel.
The VRAM kernel region is intended only for allocations necessary
for driver operation. Allocations created via ioctl are long-lived
and not easily evictable. If this region runs out of space,
there may not be a fallback, which could cause failures.
To prevent this, add a new BO flag that explicitly forces the BO to be
allocated in the general-purpose VRAM region accessible to userspace,
avoiding the kernel-only VRAM region.
v2:
- update commit message (Matthew)
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-3-piotr.piorkowski@intel.com
So far, kernel and userspace allocations have shared the same VRAM region.
However, in some scenarios, it may be necessary to reserve a separate
VRAM area exclusively for kernel allocations.
Let's add preliminary support for such a configuration.
v2:
- replaced for_each_bo_flag_vram with the improved
for_each_set_bo_vram_flag helper (Matthew)
- moved the VRAM flag iteration macro definition into xe_bo.c (Matthew)
- drop unused bo_flgas from bo_vram_flags_to_vram_placement (Matthew)
- use hweight32 helper in __xe_bo_fixed_placement for readability
(Matthew)
v3: remove unnecessary VRAM fixup id
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-2-piotr.piorkowski@intel.com
We already have control functions that we use to control the VF
state on the per-GT basis, but that is low level detail from the
user point of view, who rather expects VF-level functions.
For now add simple functions that just iterate over all GTs and
call per-GT control function. We will soon allow to use some of
them from the user facing interfaces like debugfs.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-2-michal.wajdeczko@intel.com
Initialize the uval variable to 0 in xe_late_bind_fw_num_fans() to fix
a potential use of uninitialized variable warning and ensure predictable
behavior.
The variable is passed by reference to xe_pcode_read() which should
populate it on success, but initializing it to 0 provides a safe
default value and follows kernel coding best practices.
v2:
- uval = 0 which serves as both a safe default and the fallback
value when the pcode read operation fails.
v3:
- Handle MMIO failure (Rodrigo)
- The function should probably return the error and make the uval as
pointer-argument, like the pcode_read.
- Change the caller of this function to propagate the error
upwards if mmio failed.
Fixes: 45832bf9c1 ("drm/xe/xe_late_bind_fw: Initialize late binding firmware")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Link: https://lore.kernel.org/r/20251002005648.3185636-1-mallesh.koujalagi@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When userptr is used on SVM-enabled VMs, a non-NULL
hmm_range::dev_private_owner value might mean that
hmm_range_fault() attempts to return device private pages.
Either that will fail, or the userptr code will not know
how to handle those.
Use NULL for hmm_range::dev_private_owner to migrate
such pages to system. In order to do that, move the
struct drm_gpusvm::device_private_page_owner field to
struct drm_gpusvm_ctx::device_private_page_owner so that
it doesn't remain immutable over the drm_gpusvm lifetime.
v2:
- Don't conditionally compile xe_svm_devm_owner().
- Kerneldoc xe_svm_devm_owner().
Fixes: 9e97874148 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250930122752.96034-1-thomas.hellstrom@linux.intel.com
Due to initial design of the Xe debugfs, the GGTT and LMEM files
were defined on the primary GT, instead of being per-tile.
While PF provisioning code is now still maintaining GGTT and LMEM
also on the per primary-GT level, this will be refactored soon,
but we can fix debugfs layout now, as part of the new SR-IOV tree.
For backward compatibility we will provide some symlinks that can
be removed once our tools will be fully converted.
As we are making all those changes in the user facing interface,
take this as apportunity to also start replacing the "LMEM" term,
used by the SR-IOV code, with the "VRAM" term, used by Xe driver.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-7-michal.wajdeczko@intel.com