linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-03 13:53:57 -04:00

Author	SHA1	Message	Date
Matthew Brost	fe3a615dad	drm/xe/vf: Kickstart after resfix in VF post migration recovery GuC needs to be live for the GuC submission state machine to resubmit anything lost during VF post-migration recovery. Therefore, move the kickstart step after `resfix` to ensure proper resubmission. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-23-matthew.brost@intel.com	2025-10-09 03:22:46 -07:00
Matthew Brost	24687730cd	drm/xe/vf: Reset TLB invalidations during VF post migration recovery TLB invalidations requests can be lost during VF post-migration recovery. Since the VF has migrated, these invalidations are no longer needed. Reset the TLB invalidation frontend, which will signal all pending fences. v3: - Move TLB invalidation reset after pausing submission (Tomasz) - Adjust commit message (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-22-matthew.brost@intel.com	2025-10-09 03:22:44 -07:00
Matthew Brost	3061e8e0dd	drm/xe/vf: Flush and stop CTs in VF post migration recovery Flushing CTs (i.e., progressing all pending G2H messages) gives VF post-migration recovery an accurate view of which H2G messages the GuC has processed, enabling the GuC submission state machine to correctly rebuild all state. Also, stop all CT traffic, as the CT is not live during VF post-migration recovery. v3: - xe_guc_ct_flush_and_stop rename (Michal) - Drop extra GuC CT WQ wake up (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-21-matthew.brost@intel.com	2025-10-09 03:22:43 -07:00
Matthew Brost	1f135a1ee9	drm/xe/vf: Use GUC_HXG_TYPE_EVENT for GuC context register The only case where the GuC submission backend cannot reason 100% correctly is when a GuC context is registered during VF post-migration recovery. In this scenario, it's possible that the GuC context register H2G is processed, but the immediately following schedule-enable H2G gets lost. The schedule-enable G2H "done" response is how the GuC state machine determines whether context registration has completed. A double register is harmless when using `GUC_HXG_TYPE_EVENT`, as GuC simply drops the duplicate H2G. To keep things simple, use `GUC_HXG_TYPE_EVENT` for all context registrations on VFs. v5: - Check for xe_sriov_vf_migration_supported (Tomasz) v7: - Add comment about subsequent protocol failures (Tomasz) - Modify commit message (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-20-matthew.brost@intel.com	2025-10-09 03:22:42 -07:00
Matthew Brost	1faeeea056	drm/xe/vf: Avoid indefinite blocking in preempt rebind worker for VFs supporting migration Blocking in work queues on a hardware action that may never occur — especially when it depends on a software fixup also scheduled on the a work queue — is a recipe for deadlock. This situation arises with the preempt rebind worker and VF post-migration recovery. To prevent potential deadlocks, avoid indefinite blocking in the preempt rebind worker for VFs that support migration. v4: - Use dma_fence_wait_timeout (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-19-matthew.brost@intel.com	2025-10-09 03:22:41 -07:00
Matthew Brost	a4dae94aad	drm/xe/vf: Wakeup in GuC backend on VF post migration recovery If VF post-migration recovery is in progress, the recovery flow will rebuild all GuC submission state. In this case, exit all waiters to ensure that submission queue scheduling can also be paused. Avoid taking any adverse actions after aborting the wait. As part of waking up the GuC backend, suspend_wait can now return -EAGAIN indicating the waiter should be retried. If the caller is running on work item, that work item need to be requeued to avoid a deadlock for the work item blocking the VF migration recovery work item. v3: - Don't block in preempt fence work queue as this can interfere with VF post-migration work queue scheduling leading to deadlock (Testing) - Use xe_gt_recovery_inprogress (Michal) v5: - Use static function for vf_recovery (Michal) - Add helper to wake CT waiters (Michal) - Move some code to following patch (Michal) - Adjust commit message to explain suspend_wait returning -EAGAIN (Michal) - Add kernel doc to suspend_wait around returning -EAGAIN v7: - Add comment on why a shared wait queue is need on VFs (Michal) - Guard again suspend_wait signaling early on resfix donw (Tomasz) v8: - Fix kernel doc (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-18-matthew.brost@intel.com	2025-10-09 03:22:39 -07:00
Matthew Brost	f1029b9dde	drm/xe/vf: Don't allow GT reset to be queued during VF post migration recovery With well-behaved software, a GT reset should never occur, nor should it happen during VF post-migration recovery. If it does, trigger a warning but suppress the GT reset, as VF post-migration recovery is expected to bring the VF back to a working state. v3: - Better commit message (Tomasz) v5: - Use xe_gt_WARN_ON (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-17-matthew.brost@intel.com	2025-10-09 03:22:37 -07:00
Matthew Brost	b47c0c07c3	drm/xe/vf: Teardown VF post migration worker on driver unload Be cautious and ensure the VF post-migration worker is not running during driver unload. v3: - More teardown later in driver init, use devm (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-16-matthew.brost@intel.com	2025-10-09 03:22:36 -07:00
Matthew Brost	7dd11d8804	drm/xe/vf: Close multi-GT GGTT shift race As multi-GT VF post-migration recovery can run in parallel on different workqueues, but both GTs point to the same GGTT, only one GT needs to shift the GGTT. However, both GTs need to know when this step has completed. To coordinate this, perform the GGTT shift under the GGTT lock. With shift being done under the lock, storing the shift value becomes unnecessary. In addition to above, move the GGTT VF config from the GT to the tile. v3: - Update commmit message (Tomasz) v4: - Move GGTT values to tile state (Michal) - Use GGTT lock (Michal) v5: - Only take GGTT lock during recovery (CI) - Drop goto in vf_get_submission_cfg (Michal) - Add kernel doc around recovery in xe_gt_sriov_vf_query_config (Michal) v7: - Drop recovery variable (Michal) - Use _locked naming (Michal) - Use guard (Michal) v9: - Break LMEM changes into different patch (Michal) - Fix layering (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-15-matthew.brost@intel.com	2025-10-09 03:22:34 -07:00
Matthew Brost	c6d00c60c4	drm/xe/vf: Move LMEM config to tile layer The LMEM VF provision is tile-layer-specific information. Move the LMEM configuration to the tile layer accordingly. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-14-matthew.brost@intel.com	2025-10-09 03:22:31 -07:00
Matthew Brost	cc9b24c6bb	drm/xe: Move GGTT lock init to alloc The GGTT lock is needed very early during GT initialization for a VF; move the GGTT lock initialization to the allocation phase. v8: - Rework function structure (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-13-matthew.brost@intel.com	2025-10-09 03:22:30 -07:00
Matthew Brost	98e78e0c8b	drm/xe/vf: Remove memory allocations from VF post migration recovery VF post migration recovery is the path of dma-fence signaling / reclaim, avoid memory allocations in this path. v3: - s/lrc_wa_bb/scratch (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-12-matthew.brost@intel.com	2025-10-09 03:22:29 -07:00
Matthew Brost	489d890a39	drm/xe/vf: Abort H2G sends during VF post-migration recovery While VF post-migration recovery is in progress, abort H2G sends with -ECANCEL. These messages are treated as lost, and TLB invalidation errors are suppressed. During this phase, the H2G channel is down, and VF recovery requires the CT lock to proceed. v3: - Use xe_gt_recovery_inprogress (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-11-matthew.brost@intel.com	2025-10-09 03:22:28 -07:00
Matthew Brost	e1587f1660	drm/xe/vf: Make VF recovery run on per-GT worker VF recovery is a per-GT operation, so it makes sense to isolate it to a per-GT queue. Scheduling this operation on the same worker as the GT reset and TDR not only aligns with this design but also helps avoid race conditions, as those operations can also modify the queue state. v2: - Fix lockdep splat (Adam) - Use xe_sriov_vf_migration_supported helper v3: - Drop xe_gt_sriov_ prefix for private functions (Michal) - Drop message in xe_gt_sriov_vf_migration_init_early (Michal) - Logic rework in vf_post_migration_notify_resfix_done (Michal) - Rework init sequence layering (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-10-matthew.brost@intel.com	2025-10-09 03:22:25 -07:00
Matthew Brost	e1d2e2d878	drm/xe/vf: Add xe_gt_recovery_pending helper Add xe_gt_recovery_pending helper. This helper serves as the singular point to determine whether a GT recovery is currently in progress. Expected callers include the GuC CT layer and the GuC submission layer. Atomically visable as soon as vCPU are unhalted until VF recovery completes. v3: - Add GT layer xe_gt_recovery_inprogress (Michal) - Don't blow up in memirq not enabled (CI) - Add __memirq_received with clear argument (Michal) - xe_memirq_sw_int_0_irq_pending rename (Michal) - Use offset in xe_memirq_sw_int_0_irq_pending (Michal) v4: - Refactor xe_gt_recovery_inprogress logic around memirq (Michal) v5: - s/inprogress/pending (Michal) v7: - Fix typos, adjust comment (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-9-matthew.brost@intel.com	2025-10-09 03:22:23 -07:00
Matthew Brost	0ca229da92	drm/xe: Make LRC W/A scratch buffer usage consistent The LRC W/A currently checks for LRC being iomem in some places, while in others it checks if the scratch buffer is non-NULL. This inconsistency causes issues with the VF post-migration recovery code, which blindly passes in a scratch buffer. This patch standardizes the check by consistently verifying whether the LRC is iomem to determine if the scratch buffer should be used. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-8-matthew.brost@intel.com	2025-10-09 03:22:22 -07:00
Matthew Brost	807c42dd80	drm/xe: Don't change LRC ring head on job resubmission Now that we save the job's head during submission, it's no longer necessary to adjust the LRC ring head during resubmission. Instead, a software-based adjustment of the tail will overwrite the old jobs in place. For some odd reason, adjusting the LRC ring head didn't work on parallel queues, which was causing issues in our CI. v5: - Add comment in guc_exec_queue_start explaning why the function works (Auld) v7: - Only adjust first state on first unsignaled job (Auld) v8: - Break unsignaled job handling to separate patch (Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-7-matthew.brost@intel.com	2025-10-09 03:22:21 -07:00
Matthew Brost	b00d1e3fc8	drm/xe: Return first unsignaled job first pending job helper In all cases where the first pending job helper is called, we only want to retrieve the first unsignaled pending job, as this helper is used exclusively in recovery flows. It is possible for signaled jobs to remain in the pending list as the scheduler is stopped, so those should be skipped. Also, add kernel documentation to clarify this behavior. v8: - Split out into own patch (Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-6-matthew.brost@intel.com	2025-10-09 03:22:20 -07:00
Matthew Brost	f6375fb3aa	drm/xe: Track LR jobs in DRM scheduler pending list VF migration requires jobs to remain pending so they can be replayed after the VF comes back. Previously, LR job fences were intentionally signaled immediately after submission to avoid the risk of exporting them, as these fences do not naturally signal in a timely manner and could break dma-fence contracts. A side effect of this approach was that LR jobs were never added to the DRM scheduler’s pending list, preventing them from being tracked for later resubmission. We now avoid signaling LR job fences and ensure they are never exported; Xe already guards against exporting these internal fences. With that guarantee in place, we can safely track LR jobs in the scheduler’s pending list so they are eligible for resubmission during VF post-migration recovery (and similar recovery paths). An added benefit is that LR queues now gain the DRM scheduler’s built-in flow control over ring usage rather than rejecting new jobs in the exec IOCTL if the ring is full. v2: - Ensure DRM scheduler TDR doesn't run for LR jobs - Stack variable for killed_or_banned_or_wedged v4: - Clarify commit message (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-5-matthew.brost@intel.com	2025-10-09 03:22:19 -07:00
Matthew Brost	7e1fe102c8	drm/xe/guc: Track pending-enable source in submission state Add explicit tracking in the GuC submission state to record the source of a pending enable (TDR vs. queue resume path vs. submission). Disambiguating the origin lets the GuC submission state machine apply the correct recovery/replay behavior. This helps VF restore: when the device comes back, the state machine knows whether the pending enable stems from timeout recovery, from a queue resume sequence, or submission and can gate sequencing and fixups accordingly. v4: - Clarify commit message (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-4-matthew.brost@intel.com	2025-10-09 03:22:18 -07:00
Matthew Brost	26cd498e00	drm/xe: Save off position in ring in which a job was programmed VF post-migration recovery needs to modify the ring with updated GGTT addresses for pending jobs. Save off position in ring in which a job was programmed to facilitate. v4: - s/VF resume/VF post-migration recovery (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-3-matthew.brost@intel.com	2025-10-09 03:22:16 -07:00
Matthew Brost	b0607599b7	drm/xe: Add NULL checks to scratch LRC allocation kmalloc can fail, the returned value must have a NULL check. This should be immediately after kmalloc for clarity. v5: - Assert state->buffer in setup_bo if buffer is iomem (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-2-matthew.brost@intel.com	2025-10-09 03:22:15 -07:00
Tejas Upadhyay	15b3036045	drm/xe: Move declarations under conditional branch The xe_device_shutdown() function was needing a few declarations that were only required under a specific condition. This change moves those declarations to be within that conditional branch to avoid unnecessary declarations. Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20251007100208.1407021-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-10-08 15:07:41 +05:30
Michal Wajdeczko	c09a9933af	drm/xe/pf: Add max_vfs configfs attribute to control PF mode In addition to existing max_vfs modparam, add max_vfs configfs attribute to allow PF configuration on the per-device level. Default config value is still based on the modparam value. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251002232648.203370-1-michal.wajdeczko@intel.com	2025-10-07 23:03:44 +02:00
Michal Wajdeczko	4592e7abd2	drm/xe/pf: Improve reading VF config blob from debugfs Due to the use of the file operation flows, we might encode the VF config blob multiple times. Avoid that by capturing it once during the open() operation instead of the read() operation. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251004162036.1800-1-michal.wajdeczko@intel.com	2025-10-07 13:00:29 +02:00
Michal Wajdeczko	0faa22e706	drm/xe/guc: Ratelimit diagnostic messages from the relay There might be some malicious VFs that by sending an invalid VF2PF relay messages will flood PF's dmesg with our diagnostics messages. Rate limit all relay messages, unless running in DEBUG_SRIOV mode. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251005173946.2784-1-michal.wajdeczko@intel.com	2025-10-06 19:44:43 +02:00
Michal Wajdeczko	430d328877	drm/xe: Update MEMIRQ to use tile-based printk macros We already have tile-based printk macros, there is no need to manually prepare MEMIRQ specific messages to include tile id. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251005133641.2651-5-michal.wajdeczko@intel.com	2025-10-06 19:39:26 +02:00
Michal Wajdeczko	cd11babcd0	drm/xe/pf: Update LMTT to use tile-based messages Since now we have tile-based SR-IOV printk macros, there is no need to manually prepare the LMTT specific warning message (that is now upgraded to proper error level message) nor to use generic debug message without tile/LMTT identification. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251005133641.2651-4-michal.wajdeczko@intel.com	2025-10-06 19:39:25 +02:00
Michal Wajdeczko	c66e4b6cae	drm/xe: Add tile-based SRIOV printk macros We already have device and GT level SR-IOV specific macros, but unlike native case, we don't have yet tile-based ones. Add macros to match native use case and also update GT-based macros to rely on those new tile-based SR-IOV macros. This will slightly rearrange the output of the GT logs and instead: [...] Tile0: GT0: PF: pushed VF1 config with 2 KLVs... we might see: [...] PF: Tile0: GT0: pushed VF1 config with 2 KLVs... but that's even better. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251005133641.2651-3-michal.wajdeczko@intel.com	2025-10-06 19:39:23 +02:00
Michal Wajdeczko	c95f180207	drm/xe: Update SRIOV printk macros Recently we introduced xe-based printk macros, use them instead of plain drm-based ones. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251005133641.2651-2-michal.wajdeczko@intel.com	2025-10-06 19:39:22 +02:00
Michal Wajdeczko	9a54b5127f	drm/xe/pf: Make the late-initialization really late While the late PF per-GT initialization is done quite late in the single GT initialization flow, in case of multi-GT platforms, it may still be done before other GT early initialization. That leads to some issues during unwind, when there are cross-GT dependencies, like resource cleanup that is shared by both GTs, but the other GT may already be sanitized or disabled. The following errors could be observed when trying to unload the PF driver with some LMEM/VRAM already provisioned for few VFs: [ ] xe 0000:03:00.0: DEVRES REL ffff88814708f240 fini_config (16 bytes) [ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=1 pte=0x0 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=53a470000 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=53a4b0000 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19 [ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV) [ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=2 pte=0x0 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=539b70000 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=539bf0000 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19 [ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV) Move all PF per-GT late initialization to the already defined late SR-IOV initialization function to allow proper order of the cleanup actions. While around, format all PF function stubs as one-liners, like many other stubs are defined in the Xe driver. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251004162008.1782-1-michal.wajdeczko@intel.com	2025-10-06 19:30:17 +02:00
Michal Wajdeczko	71f1939e0d	drm/xe/xe_late_bind_fw: Fix and simplify parsing user input Code was wrongly passing sizeof(uval) as the number base to use, and unlike other debugfs entries that represent bool data, it wasn't using the dedicated function to parse user input as bool. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20251002192736.203186-1-michal.wajdeczko@intel.com	2025-10-06 19:24:15 +02:00
Michal Wajdeczko	869580c415	drm/xe: Don't force DRM_XE_DEBUG_MEMIRQ for SR-IOV debug For pure SR-IOV debugging there is no need to select already separated config for the debugging of the memory based interrupts, as the latter is also very noisy on its own. Change config order and use a weak reverse dependency instead. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20251002171308.203127-1-michal.wajdeczko@intel.com	2025-10-06 19:11:30 +02:00
Shuicheng Lin	a908de69ce	drm/xe: Fix copyright and function naming in xe_ttm_vram_mgr - Correct copyright year from "2002" to "2022". - Rename ttm_vram_mgr_fini() to xe_ttm_vram_mgr_fini() to avoid confusion with generic TTM helpers. No functional changes intended. Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://lore.kernel.org/r/20251004000425.2489291-2-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-10-06 12:46:28 -04:00
Piotr Piórkowski	8462d16d1b	drm/xe: Combine userspace context check Both vm->xef and XE_LRC_CREATE_USER_CTX indicate in xe_lrc_init that the context originates from userspace. However, XE_LRC_CREATE_USER_CTX has a broader scope as it may be set even when no vm->xef is present. The XE_BO_FLAG_PINNED_LATE_RESTORE flag can be extended to both cases, so there is no point in handling the two cases separately. Let's combine vm->xef and XE_LRC_CREATE_USER_CTX checks to detect userspace context. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251003162619.1984236-6-piotr.piorkowski@intel.com	2025-10-06 08:33:52 +02:00
Piotr Piórkowski	b48140f446	drm/xe/pf: Force use user VRAM for LMEM provisioning The LMEM assigned to VFs should be allocated from the general-purpose VRAM pool, not from the kernel-reserved region. Let's force the use of general-purpose VRAM for BOs intended for VFs. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251003162619.1984236-5-piotr.piorkowski@intel.com	2025-10-06 08:33:51 +02:00
Piotr Piórkowski	3f6cd669d5	drm/xe: Force user context allocations in user VRAM In general, kernel structures should be allocated in the kernel-dedicated VRAM region. However, userspace context data - while used by the kernel - does not need to reside there. Let's force the allocation of such data in the general-purpose VRAM region accessible to userspace. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251003162619.1984236-4-piotr.piorkowski@intel.com	2025-10-06 08:33:49 +02:00
Piotr Piórkowski	9d290ab0b5	drm/xe: Introduce new BO flag XE_BO_FLAG_FORCE_USER_VRAM When using a separate VRAM region for kernel allocations, some kernel structures, such as context userspace data, should not reside in the VRAM region dedicated to the kernel. The VRAM kernel region is intended only for allocations necessary for driver operation. Allocations created via ioctl are long-lived and not easily evictable. If this region runs out of space, there may not be a fallback, which could cause failures. To prevent this, add a new BO flag that explicitly forces the BO to be allocated in the general-purpose VRAM region accessible to userspace, avoiding the kernel-only VRAM region. v2: - update commit message (Matthew) Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251003162619.1984236-3-piotr.piorkowski@intel.com	2025-10-06 08:33:48 +02:00
Piotr Piórkowski	db7dde9904	drm/xe: Add initial support for separate kernel VRAM region on the tile So far, kernel and userspace allocations have shared the same VRAM region. However, in some scenarios, it may be necessary to reserve a separate VRAM area exclusively for kernel allocations. Let's add preliminary support for such a configuration. v2: - replaced for_each_bo_flag_vram with the improved for_each_set_bo_vram_flag helper (Matthew) - moved the VRAM flag iteration macro definition into xe_bo.c (Matthew) - drop unused bo_flgas from bo_vram_flags_to_vram_placement (Matthew) - use hweight32 helper in __xe_bo_fixed_placement for readability (Matthew) v3: remove unnecessary VRAM fixup id Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251003162619.1984236-2-piotr.piorkowski@intel.com	2025-10-06 08:33:46 +02:00
Matthew Brost	bdc2fb17ae	Revert "drm/xe/vf: Fixup CTB send buffer messages after migration" This reverts commit `cef88d1265`. Due to change in the VF migration recovery design this code is not needed any more. v3: - Add commit message (Michal / Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251002233824.203417-4-michal.wajdeczko@intel.com	2025-10-03 20:36:26 -07:00
Matthew Brost	6c640592e8	Revert "drm/xe/vf: Post migration, repopulate ring area for pending request" This reverts commit `a0dda25d24`. Due to change in the VF migration recovery design this code is not needed any more. v3: - Add commit message (Michal / Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251002233824.203417-3-michal.wajdeczko@intel.com	2025-10-03 20:36:24 -07:00
Matthew Brost	08c98f3f2b	Revert "drm/xe/vf: Rebase exec queue parallel commands during migration recovery" This reverts commit `ba180a3621`. Due to change in the VF migration recovery design this code is not needed any more. v3: - Add commit message (Michal / Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251002233824.203417-2-michal.wajdeczko@intel.com	2025-10-03 20:36:23 -07:00
Michal Wajdeczko	2a8fcf7cc9	drm/xe/pf: Synchronize VF FLR between all GTs The PF part of the VF FLR processing shall be done after all GuCs confirm that they finished their part VF FLR processing, otherwise PF may start clearing VF's GGTT that other GuC may still accessing. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250930233525.201263-7-michal.wajdeczko@intel.com	2025-10-02 23:58:35 +02:00
Michal Wajdeczko	03dc00c782	drm/xe/pf: Split VF FLR processing function On multi-GT platforms (like PTL) we may want to run VF FLR on each GuC (render and media) in parallel. Split our FLR function to allow to wait for GT VF FLR completion separately. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250930233525.201263-6-michal.wajdeczko@intel.com	2025-10-02 23:58:33 +02:00
Michal Wajdeczko	1f018c8496	drm/xe/pf: Unify VF state tracking log By using single function that dumps VF state transition, final logs are easier to analyze as there is always the same call site in every debug message. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250930233525.201263-5-michal.wajdeczko@intel.com	2025-10-02 23:58:32 +02:00
Michal Wajdeczko	5b7451fdd7	drm/xe/pf: Expose VF control operations over debugfs To allow the user to control the activity of individual VFs, expose basic VF control operations (pause, resume, stop, reset) over the debugfs as write-only files: /sys/kernel/debug/dri/BDF/sriov/ ├── vf1 │ ├── pause │ ├── reset │ ├── resume │ ├── stop │ : ├── vf2 : : Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250930233525.201263-4-michal.wajdeczko@intel.com	2025-10-02 23:58:31 +02:00
Michal Wajdeczko	ac43294e8e	drm/xe/pf: Log only top level VF state changes The user likely only care about top level VF state changes, any VF state logs on the per-GT basis can be demoted to the debug level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250930233525.201263-3-michal.wajdeczko@intel.com	2025-10-02 23:58:30 +02:00
Michal Wajdeczko	c97cdf7686	drm/xe/pf: Add top level functions to control VFs We already have control functions that we use to control the VF state on the per-GT basis, but that is low level detail from the user point of view, who rather expects VF-level functions. For now add simple functions that just iterate over all GTs and call per-GT control function. We will soon allow to use some of them from the user facing interfaces like debugfs. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250930233525.201263-2-michal.wajdeczko@intel.com	2025-10-02 23:58:28 +02:00
Michal Wajdeczko	846a81abbe	drm/xe: Detect GT workqueue allocation failure The allocation of the per-GT workqueue may fail and we shouldn't ignore that. While around use drm managed allocation function to drop our custom fini action. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251001144051.202040-1-michal.wajdeczko@intel.com	2025-10-02 18:48:10 +02:00
Niranjana Vishwanathapura	b56bc81078	drm/xe/doc: Add documentation for Execution Queues Add documentation for Xe Execution Queues and add xe_exec_queue.rst file. v2: Add info about how Execution queue interfaces with other components in the driver (Matt Brost) Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251002044319.450181-2-niranjana.vishwanathapura@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 08:43:07 -07:00

1 2 3 4 5 ...

118332 Commits