The function `guc_ct_send_locked()` is really quite simple, but still
looks complex due to exposed internals. It is sending a message,
and in case of lack of space, waiting for a proper moment to send
a retry.
Clear separation of send function and wait function will help with
readability. This is a cosmetic change only, no functional difference
is expected.
This patch introduces `guc_ct_send_wait_for_retry()`, and uses it to
greatly simplify `guc_ct_send_locked()`.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251009170844.178199-1-tomasz.lis@intel.com
Using 2M pages in xe_migrate_vram has two benefits: we issue fewer
instructions per 2M copy (1 vs. 512), and the cache hit rate should be
higher. This results in increased copy engine bandwidth, as shown by
benchmark IGTs.
Enable 2M pages by reserving PDEs in the migrate VM and using 2M pages
in xe_migrate_vram if the DMA address order matches 2M.
v2:
- Reuse build_pt_update_batch_sram (Stuart)
- Fix build_pt_update_batch_sram for PAGE_SIZE > 4K
v3:
- More fixes for PAGE_SIZE > 4K, align chunk, decrement chunk as needed
- Use stack incr var in xe_migrate_vram_use_pde (Stuart)
v4:
- Split PAGE_SIZE > 4K fix out in different patch (Stuart)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20251013034555.4121168-3-matthew.brost@intel.com
Correct several spelling and grammar issues in xe_gt struct
documentation to improve readability:
- Fix "to not" -> "do not".
- Fix "mmigrations" -> "migrations".
- Fix "Multiple queues exists" -> "Multiple queues exist".
- Fix "are be processed" -> "to be processed".
- Fix "have being processed" -> "have been processed".
These changes are purely cosmetic and do not affect functionality.
v2: drop kernel-doc formatting change. (Jani)
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251006151317.2553182-2-shuicheng.lin@intel.com
In normal operation, a registered exec queue is disabled and
deregistered through the GuC, and freed only after the GuC confirms
completion. However, if the driver is forced to unbind while the exec
queue is still running, the user may call exec_destroy() after the GuC
has already been stopped and CT communication disabled.
In this case, the driver cannot receive a response from the GuC,
preventing proper cleanup of exec queue resources. Fix this by directly
releasing the resources when GuC is not running.
Here is the failure dmesg log:
"
[ 468.089581] ---[ end trace 0000000000000000 ]---
[ 468.089608] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
[ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535
[ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1
[ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1)
[ 468.092716] ------------[ cut here ]------------
[ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe]
"
v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled().
As CT may go down and come back during VF migration.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com
Pull more drm fixes from Dave Airlie:
"Just the follow up fixes for rc1 from the next branch, amdgpu and xe
mostly with a single v3d fix in there.
amdgpu:
- DC DCE6 fixes
- GPU reset fixes
- Secure diplay messaging cleanup
- MES fix
- GPUVM locking fixes
- PMFW messaging cleanup
- PCI US/DS switch handling fix
- VCN queue reset fix
- DC FPU handling fix
- DCN 3.5 fix
- DC mirroring fix
amdkfd:
- Fix kfd process ref leak
- mmap write lock handling fix
- Fix comments in IOCTL
xe:
- Fix build with clang 16
- Fix handling of invalid configfs syntax usage and spell out the
expected syntax in the documentation
- Do not try late bind firmware when running as VF since it shouldn't
handle firmware loading
- Fix idle assertion for local BOs
- Fix uninitialized variable for late binding
- Do not require perfmon_capable to expose free memory at page
granularity. Handle it like other drm drivers do
- Fix lock handling on suspend error path
- Fix I2C controller resume after S3
v3d:
- fix fence locking"
* tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
drm/amd/display: Incorrect Mirror Cositing
drm/amd/display: Enable Dynamic DTBCLK Switch
drm/amdgpu: Report individual reset error
drm/amdgpu: partially revert "revert to old status lock handling v3"
drm/amd/display: Fix unsafe uses of kernel mode FPU
drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression
drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine
drm/amdgpu: Check swus/ds for switch state save
drm/amdkfd: Fix two comments in kfd_ioctl.h
drm/amd/pm: Avoid interface mismatch messaging
drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init
drm/amd/amdgpu: Fix the mes version that support inv_tlbs
drm/amd: Check whether secure display TA loaded successfully
drm/amdkfd: Fix mmap write lock not release
drm/amdkfd: Fix kfd process ref leaking when userptr unmapping
drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O.
drm/amd/display: Disable scaling on DCE6 for now
drm/amd/display: Properly disable scaling on DCE6
drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6
drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs
...
Pull drm fixes from Dave Airlie:
"Some fixes leftover from our fixes branch, just nouveau and vmwgfx:
nouveau:
- Return errno code from TTM move helper
vmwgfx:
- Fix null-ptr access in cursor code
- Fix UAF in validation
- Use correct iterator in validation"
* tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel:
drm/nouveau: fix bad ret code in nouveau_bo_move_prep
drm/vmwgfx: Fix copy-paste typo in validation
drm/vmwgfx: Fix Use-after-free in validation
drm/vmwgfx: Fix a null-ptr access in the cursor snooper
Moving to VRAM will fail if mixed mappings are present or if the page is
already located in VRAM. Atomic faults that require a move to VRAM
currently retry without attempting to evict mixed mappings or locate
existing VRAM mappings.
This patch fixes the issue by attempting to evict mixed mappings or find
existing VRAM pages when a move to VRAM fails during atomic fault
handling.
Fixes: a9ac0fa455 ("drm/xe: Strict migration policy for atomic SVM faults")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com
There may be cases in which the BAR0 also needs to move to accommodate
the bigger BAR2. However if it's not released, the BAR2 resize fails.
During the vram probe it can't be released as it's already in use by
xe_mmio for early register access.
Add a new function in xe_vram and let xe_pci call it directly before
even early device probe. This allows the BAR2 to resize in cases BAR0
also needs to move, assuming there aren't other reasons to hold that
move:
[] xe 0000:03:00.0: vgaarb: deactivate vga console
[] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB
[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing
[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned
[] pcieport 0000:00:01.0: PCI bridge to [bus 01-04]
[] pcieport 0000:00:01.0: bridge window [mem 0x83000000-0x840fffff]
[] pcieport 0000:00:01.0: bridge window [mem 0x4000000000-0x44007fffff 64bit pref]
[] pcieport 0000:01:00.0: PCI bridge to [bus 02-04]
[] pcieport 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]
[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
[] pcieport 0000:02:01.0: PCI bridge to [bus 03]
[] pcieport 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff]
[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
[] xe 0000:03:00.0: [drm] BAR2 resized to 16384M
[] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ...
For BMG there are additional fix needed in the PCI side, but this
helps getting it to a working resize.
All the rebar logic is more pci-specific than xe-specific and can be
done very early in the probe sequence. In future it would be good to
move it out of xe_vram.c, but this refactor is left for later.
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org # 6.12+
Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
It is possible that the media GT's VF post-migration recovery work item
gets scheduled before the primary GT's work item. Since the media GT
depends on the primary GT's work item to complete CCS restore, if the
media GT's work item is scheduled first, detect this condition and
re-queue the media GT's work item for a later time.
v5:
- Adjust debug message (Tomasz)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-33-matthew.brost@intel.com
VF CCS restore is a primary GT operation on which the media GT depends.
Therefore, it doesn't make much sense to run these operations in
parallel. To address this, point the media GT's ordered work queue to
the primary GT's ordered work queue on platforms that require (PTL VFs)
CCS restore as part of VF post-migration recovery.
v7:
- Remove bool from xe_gt_alloc (Lucas)
v9:
- Fix typo (Lucas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-32-matthew.brost@intel.com
A race condition exists where a paused VF's H2G request can be processed
and subsequently rejected. This rejection results in a FAST_REQ failure
being delivered to the KMD, which then terminates the CT via a dead
worker and triggers a GT reset—an undesirable outcome.
This workaround mitigates the issue by checking if a VF post-migration
recovery is in progress and aborting these adverse actions accordingly.
The GuC firmware will address this bug in an upcoming release. Once that
version is available and VF migration depends on it, this workaround can
be safely removed.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-30-matthew.brost@intel.com
A queue must be in the submission backend's tracking state before the
LRC is created to avoid a race condition where the LRC's GGTT addresses
are not properly fixed up during VF post-migration recovery.
Move the queue initialization—which adds the queue to the submission
backend's tracking state—before LRC creation.
Also wait on pending GGTT fixups before allocating LRCs to avoid racing
with fixups.
v2:
- Wait on VF GGTT fixes before creating LRC (testing)
v5:
- Adjust comment in code (Tomasz)
- Reduce race window
v7:
- Only wakeup waiters in recovery path (CI)
- Wakeup waiters on abort
- Use GT warn on (Michal)
- Fix kernel doc for LRC ring size function (Tomasz)
v8:
- Guard against migration not supported or no memirq (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-28-matthew.brost@intel.com
Fixup GuC submission pause / unpause functions to properly replay any
possible state lost during VF post migration recovery.
v3:
- Add helpers for revert / replay (Tomasz)
- Add comment around WQ NOPs (Tomasz)
v7:
- Only fixup / replay parallel queues once (Testing)
- Skip unpause step on queues created after resfix done (Testing)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-27-matthew.brost@intel.com
Before RESFIX_DONE, all CTs stuck in the H2G queue need to be squashed,
as they may contain actions which contain invalid GGTT references or are
unnecessary after HW change.
Starting the CTs clears all H2Gs in the queue. Any lost H2Gs are
resubmitted by the GuC submission state machine.
v3:
- Don't mess with head / tail values (Michal)
v4:
- Don't mess with broke (Michal)
- Add CTB_H2G_BUFFER_OFFSET (Michal)
v5:
- Adjust commit message (Tomasz)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-25-matthew.brost@intel.com
TLB invalidations requests can be lost during VF post-migration
recovery. Since the VF has migrated, these invalidations are no longer
needed.
Reset the TLB invalidation frontend, which will signal all pending
fences.
v3:
- Move TLB invalidation reset after pausing submission (Tomasz)
- Adjust commit message (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-22-matthew.brost@intel.com
Flushing CTs (i.e., progressing all pending G2H messages) gives VF
post-migration recovery an accurate view of which H2G messages the GuC
has processed, enabling the GuC submission state machine to correctly
rebuild all state.
Also, stop all CT traffic, as the CT is not live during VF
post-migration recovery.
v3:
- xe_guc_ct_flush_and_stop rename (Michal)
- Drop extra GuC CT WQ wake up (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-21-matthew.brost@intel.com
The only case where the GuC submission backend cannot reason 100%
correctly is when a GuC context is registered during VF post-migration
recovery. In this scenario, it's possible that the GuC context register
H2G is processed, but the immediately following schedule-enable H2G gets
lost. The schedule-enable G2H "done" response is how the GuC state machine
determines whether context registration has completed.
A double register is harmless when using `GUC_HXG_TYPE_EVENT`, as GuC
simply drops the duplicate H2G. To keep things simple, use
`GUC_HXG_TYPE_EVENT` for all context registrations on VFs.
v5:
- Check for xe_sriov_vf_migration_supported (Tomasz)
v7:
- Add comment about subsequent protocol failures (Tomasz)
- Modify commit message (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-20-matthew.brost@intel.com
Blocking in work queues on a hardware action that may never occur —
especially when it depends on a software fixup also scheduled on the
a work queue — is a recipe for deadlock. This situation arises with
the preempt rebind worker and VF post-migration recovery. To prevent
potential deadlocks, avoid indefinite blocking in the preempt rebind
worker for VFs that support migration.
v4:
- Use dma_fence_wait_timeout (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-19-matthew.brost@intel.com
If VF post-migration recovery is in progress, the recovery flow will
rebuild all GuC submission state. In this case, exit all waiters to
ensure that submission queue scheduling can also be paused. Avoid taking
any adverse actions after aborting the wait.
As part of waking up the GuC backend, suspend_wait can now return
-EAGAIN indicating the waiter should be retried. If the caller is
running on work item, that work item need to be requeued to avoid a
deadlock for the work item blocking the VF migration recovery work item.
v3:
- Don't block in preempt fence work queue as this can interfere with VF
post-migration work queue scheduling leading to deadlock (Testing)
- Use xe_gt_recovery_inprogress (Michal)
v5:
- Use static function for vf_recovery (Michal)
- Add helper to wake CT waiters (Michal)
- Move some code to following patch (Michal)
- Adjust commit message to explain suspend_wait returning -EAGAIN (Michal)
- Add kernel doc to suspend_wait around returning -EAGAIN
v7:
- Add comment on why a shared wait queue is need on VFs (Michal)
- Guard again suspend_wait signaling early on resfix donw (Tomasz)
v8:
- Fix kernel doc (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-18-matthew.brost@intel.com
With well-behaved software, a GT reset should never occur, nor should it
happen during VF post-migration recovery. If it does, trigger a warning
but suppress the GT reset, as VF post-migration recovery is expected to
bring the VF back to a working state.
v3:
- Better commit message (Tomasz)
v5:
- Use xe_gt_WARN_ON (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-17-matthew.brost@intel.com
As multi-GT VF post-migration recovery can run in parallel on different
workqueues, but both GTs point to the same GGTT, only one GT needs to
shift the GGTT. However, both GTs need to know when this step has
completed. To coordinate this, perform the GGTT shift under the GGTT
lock. With shift being done under the lock, storing the shift value
becomes unnecessary.
In addition to above, move the GGTT VF config from the GT to the tile.
v3:
- Update commmit message (Tomasz)
v4:
- Move GGTT values to tile state (Michal)
- Use GGTT lock (Michal)
v5:
- Only take GGTT lock during recovery (CI)
- Drop goto in vf_get_submission_cfg (Michal)
- Add kernel doc around recovery in xe_gt_sriov_vf_query_config (Michal)
v7:
- Drop recovery variable (Michal)
- Use _locked naming (Michal)
- Use guard (Michal)
v9:
- Break LMEM changes into different patch (Michal)
- Fix layering (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-15-matthew.brost@intel.com
VF recovery is a per-GT operation, so it makes sense to isolate it to a
per-GT queue. Scheduling this operation on the same worker as the GT
reset and TDR not only aligns with this design but also helps avoid race
conditions, as those operations can also modify the queue state.
v2:
- Fix lockdep splat (Adam)
- Use xe_sriov_vf_migration_supported helper
v3:
- Drop xe_gt_sriov_ prefix for private functions (Michal)
- Drop message in xe_gt_sriov_vf_migration_init_early (Michal)
- Logic rework in vf_post_migration_notify_resfix_done (Michal)
- Rework init sequence layering (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-10-matthew.brost@intel.com
Add xe_gt_recovery_pending helper.
This helper serves as the singular point to determine whether a GT
recovery is currently in progress. Expected callers include the GuC CT
layer and the GuC submission layer. Atomically visable as soon as vCPU
are unhalted until VF recovery completes.
v3:
- Add GT layer xe_gt_recovery_inprogress (Michal)
- Don't blow up in memirq not enabled (CI)
- Add __memirq_received with clear argument (Michal)
- xe_memirq_sw_int_0_irq_pending rename (Michal)
- Use offset in xe_memirq_sw_int_0_irq_pending (Michal)
v4:
- Refactor xe_gt_recovery_inprogress logic around memirq (Michal)
v5:
- s/inprogress/pending (Michal)
v7:
- Fix typos, adjust comment (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-9-matthew.brost@intel.com
The LRC W/A currently checks for LRC being iomem in some places, while
in others it checks if the scratch buffer is non-NULL. This
inconsistency causes issues with the VF post-migration recovery code,
which blindly passes in a scratch buffer.
This patch standardizes the check by consistently verifying whether the
LRC is iomem to determine if the scratch buffer should be used.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-8-matthew.brost@intel.com