Introduce xe_vm_find_cpu_addr_mirror_vma_range(), which computes an
extended range around a given range by including adjacent VMAs that are
CPU-address-mirrored and have default memory attributes. This helper is
useful for determining mergeable range without performing the actual merge.
v2
- Add assert
- Move unmap check to this patch
v3
- Decrease offset to check by SZ_4K to avoid wrong vma return in fast
lookup path
v4
- *start should be >= SZ_4K (Matt)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
The LRC software ring tail is reset to the first unsignaled pending
job's head.
Fix the re-emission logic to begin submitting from the first unsignaled
job detected, rather than scanning all pending jobs, which can cause
imbalance.
v2:
- Include missing local changes
v3:
- s/skip_replay/restore_replay (Tomasz)
Fixes: c25c1010df ("drm/xe/vf: Replay GuC submission state on pause / unpause")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patch.msgid.link/20251121152750.240557-1-matthew.brost@intel.com
Add support for triggering and handling MERT TLB invalidation. After
LMTT updates, the MERT TLB invalidation is initiated to ensure memory
translations remain coherent.
Completion of the invalidation is signaled via MERT interrupt (bit 13 in
the GFX master interrupt register). Detect and handle this interrupt to
properly synchronize the invalidation flow.
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-4-lukasz.laguna@intel.com
xe_guc_ct_init_noalloc() allocates the CT workqueue and other helpers
before it tries to initialize ct->lock. If drmm_mutex_init() fails
we currently bail out without releasing those resources because the
guc_ct_fini() hasn’t been registered yet.
Since destroy_workqueue() in guc_ct_fini() may flush the workqueue, which
in turn can take the ct lock, the initialization sequence is restructured
to first initialize the ct->lock, then set up all CT state, and finally
register guc_ct_fini().
v2: guc_ct_fini() does take ct lock. (Matt)
v3: move primelockdep() together with drmm_mutex_init(). (Lucas)
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251110184522.1581001-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Adjust the signature of force_wake_get_any_engine() such that it returns
a 'struct xe_force_wake_ref' rather than a boolean success/failure.
Failure cases are now recognized by inspecting the hardware engine
returned by reference; a NULL hwe indicates that no engine's forcewake
could be obtained.
These changes will make it cleaner and easier to incorporate scope-based
cleanup in force_wake_get_any_engine()'s caller in a future patch.
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-43-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Add a scope-based helpers for runtime PM that may be used to simplify
cleanup logic and potentially avoid goto-based cleanup.
For example, using
guard(xe_pm_runtime)(xe);
will get runtime PM and cause a corresponding put to occur automatically
when the current scope is exited. 'xe_pm_runtime_noresume' can be used
as a guard replacement for the corresponding 'noresume' variant.
There's also an xe_pm_runtime_ioctl conditional guard that can be used
as a replacement for xe_runtime_ioctl():
ACQUIRE(xe_pm_runtime_ioctl, pm)(xe);
if ((ret = ACQUIRE_ERR(xe_pm_runtime_ioctl, &pm)) < 0)
/* failed */
In a few rare cases (such as gt_reset_worker()) we need to ensure that
runtime PM is dropped when the function is exited by any means
(including error paths), but the function does not need to acquire
runtime PM because that has already been done earlier by a different
function. For these special cases, an 'xe_pm_runtime_release_only'
guard can be used to handle the release without doing an acquisition.
These guards will be used in future patches to eliminate some of our
goto-based cleanup.
v2:
- Specify success condition for xe_pm runtime_ioctl as _RET >= 0 so
that positive values will be properly identified as success and
trigger destructor cleanup properly.
v3:
- Add comments to the kerneldoc for the existing 'get' functions
indicating that scope-based handling should be preferred where
possible. (Gustavo)
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-31-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Since forcewake uses a reference counting get/put model, there are many
places where we need to be careful to drop the forcewake reference when
bailing out of a function early on an error path. Add scope-based
cleanup options that can be used in place of explicit get/put to help
prevent mistakes in this area.
Examples:
CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT);
Obtain forcewake on the XE_FW_GT domain and hold it until the
end of the current block. The wakeref will be dropped
automatically when the current scope is exited by any means
(return, break, reaching the end of the block, etc.).
xe_with_force_wake(fw_ref, gt_to_fw(ss->gt), XE_FORCEWAKE_ALL) {
...
}
Hold all forcewake domains for the following block. As with the
CLASS usage, forcewake will be dropped automatically when the
block is exited by any means.
Use of these cleanup helpers should allow us to remove some ugly
goto-based error handling and help avoid mistakes in functions with lots
of early error exits.
An 'xe_force_wake_release_only' class is also added for cases where a
forcewake reference is passed in from another function and the current
function is responsible for releasing it in every flow and error path.
v2:
- Create a separate constructor that just wraps xe_force_wake_get for
use in the class. This eliminates the need to update the signature
of xe_force_wake_get(). (Michal)
v3:
- Wrap xe_with_force_wake's 'done' marker in __UNIQUE_ID. (Gustavo)
- Add a note to xe_force_wake_get()'s kerneldoc explaining that
scope-based cleanup is preferred when possible. (Gustavo)
- Add an xe_force_wake_release_only class. (Gustavo)
v4:
- Add NULL check on fw in release_only variant. (Gustavo)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-30-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
ops_execute() calculates the size of a fence array based on
XE_MAX_GT_PER_TILE, while the code that actually fills in the fence
array uses a for_each_tlb_inval() iterator. This works out okay today
since both approaches come up with the same number of invalidation
fences (2: primary GT invalidation + media GT invalidation), but could
be problematic in the future if there isn't a 1:1 relationship between
TLBs needing invalidation and potential GTs on the tile.
Adjust the allocation code to use the same for_each_tlb_inval()
counting logic as the code that fills the array to future-proof the
code.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251118202604.3715782-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>