We already have device and GT level SR-IOV specific macros, but
unlike native case, we don't have yet tile-based ones.
Add macros to match native use case and also update GT-based
macros to rely on those new tile-based SR-IOV macros. This will
slightly rearrange the output of the GT logs and instead:
[...] Tile0: GT0: PF: pushed VF1 config with 2 KLVs...
we might see:
[...] PF: Tile0: GT0: pushed VF1 config with 2 KLVs...
but that's even better.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251005133641.2651-3-michal.wajdeczko@intel.com
While the late PF per-GT initialization is done quite late in the
single GT initialization flow, in case of multi-GT platforms, it
may still be done before other GT early initialization. That leads
to some issues during unwind, when there are cross-GT dependencies,
like resource cleanup that is shared by both GTs, but the other GT
may already be sanitized or disabled.
The following errors could be observed when trying to unload the PF
driver with some LMEM/VRAM already provisioned for few VFs:
[ ] xe 0000:03:00.0: DEVRES REL ffff88814708f240 fini_config (16 bytes)
[ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=1 pte=0x0
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=53a470000
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=53a4b0000
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV)
[ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=2 pte=0x0
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=539b70000
[ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=539bf0000
[ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
[ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV)
Move all PF per-GT late initialization to the already defined late
SR-IOV initialization function to allow proper order of the cleanup
actions.
While around, format all PF function stubs as one-liners, like many
other stubs are defined in the Xe driver.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251004162008.1782-1-michal.wajdeczko@intel.com
Both vm->xef and XE_LRC_CREATE_USER_CTX indicate in xe_lrc_init that
the context originates from userspace. However, XE_LRC_CREATE_USER_CTX
has a broader scope as it may be set even when no vm->xef is present.
The XE_BO_FLAG_PINNED_LATE_RESTORE flag can be extended to both cases,
so there is no point in handling the two cases separately.
Let's combine vm->xef and XE_LRC_CREATE_USER_CTX checks to detect
userspace context.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-6-piotr.piorkowski@intel.com
When using a separate VRAM region for kernel allocations,
some kernel structures, such as context userspace data,
should not reside in the VRAM region dedicated to the kernel.
The VRAM kernel region is intended only for allocations necessary
for driver operation. Allocations created via ioctl are long-lived
and not easily evictable. If this region runs out of space,
there may not be a fallback, which could cause failures.
To prevent this, add a new BO flag that explicitly forces the BO to be
allocated in the general-purpose VRAM region accessible to userspace,
avoiding the kernel-only VRAM region.
v2:
- update commit message (Matthew)
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-3-piotr.piorkowski@intel.com
So far, kernel and userspace allocations have shared the same VRAM region.
However, in some scenarios, it may be necessary to reserve a separate
VRAM area exclusively for kernel allocations.
Let's add preliminary support for such a configuration.
v2:
- replaced for_each_bo_flag_vram with the improved
for_each_set_bo_vram_flag helper (Matthew)
- moved the VRAM flag iteration macro definition into xe_bo.c (Matthew)
- drop unused bo_flgas from bo_vram_flags_to_vram_placement (Matthew)
- use hweight32 helper in __xe_bo_fixed_placement for readability
(Matthew)
v3: remove unnecessary VRAM fixup id
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-2-piotr.piorkowski@intel.com
We already have control functions that we use to control the VF
state on the per-GT basis, but that is low level detail from the
user point of view, who rather expects VF-level functions.
For now add simple functions that just iterate over all GTs and
call per-GT control function. We will soon allow to use some of
them from the user facing interfaces like debugfs.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-2-michal.wajdeczko@intel.com
Initialize the uval variable to 0 in xe_late_bind_fw_num_fans() to fix
a potential use of uninitialized variable warning and ensure predictable
behavior.
The variable is passed by reference to xe_pcode_read() which should
populate it on success, but initializing it to 0 provides a safe
default value and follows kernel coding best practices.
v2:
- uval = 0 which serves as both a safe default and the fallback
value when the pcode read operation fails.
v3:
- Handle MMIO failure (Rodrigo)
- The function should probably return the error and make the uval as
pointer-argument, like the pcode_read.
- Change the caller of this function to propagate the error
upwards if mmio failed.
Fixes: 45832bf9c1 ("drm/xe/xe_late_bind_fw: Initialize late binding firmware")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Link: https://lore.kernel.org/r/20251002005648.3185636-1-mallesh.koujalagi@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When userptr is used on SVM-enabled VMs, a non-NULL
hmm_range::dev_private_owner value might mean that
hmm_range_fault() attempts to return device private pages.
Either that will fail, or the userptr code will not know
how to handle those.
Use NULL for hmm_range::dev_private_owner to migrate
such pages to system. In order to do that, move the
struct drm_gpusvm::device_private_page_owner field to
struct drm_gpusvm_ctx::device_private_page_owner so that
it doesn't remain immutable over the drm_gpusvm lifetime.
v2:
- Don't conditionally compile xe_svm_devm_owner().
- Kerneldoc xe_svm_devm_owner().
Fixes: 9e97874148 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250930122752.96034-1-thomas.hellstrom@linux.intel.com
Due to initial design of the Xe debugfs, the GGTT and LMEM files
were defined on the primary GT, instead of being per-tile.
While PF provisioning code is now still maintaining GGTT and LMEM
also on the per primary-GT level, this will be refactored soon,
but we can fix debugfs layout now, as part of the new SR-IOV tree.
For backward compatibility we will provide some symlinks that can
be removed once our tools will be fully converted.
As we are making all those changes in the user facing interface,
take this as apportunity to also start replacing the "LMEM" term,
used by the SR-IOV code, with the "VRAM" term, used by Xe driver.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-7-michal.wajdeczko@intel.com
Currently we expose debugfs files related to SR-IOV functions
together with other native files, but that approach will not
scale well as we plan to add more attributes and also expose
some of them on the per-tile basis.
Start building separate tree for SR-IOV specific debugfs files
where we can replicate similar files per every SR-IOV function:
/sys/kernel/debug/dri/BDF/
├── sriov
│ ├── pf
│ │ ├── tile0
│ │ │ ├── gt0
│ │ │ ├── gt1
│ │ │ :
│ │ ├── tile1
│ │ :
│ ├── vf1
│ │ ├── tile0
│ │ │ ├── gt0
│ │ │ ├── gt1
│ │ │ :
│ │ :
│ ├── vf2
│ ├── ...
We will populate this new tree in upcoming patches.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-3-michal.wajdeczko@intel.com
In xe_hw_engine_group_get_mode(), a write lock is acquired before
calling switch_mode(), which in turn invokes
xe_hw_engine_group_suspend_faulting_lr_jobs().
On failure inside xe_hw_engine_group_suspend_faulting_lr_jobs(),
the write lock is released there, and then again in
xe_hw_engine_group_get_mode(), leading to a double release.
Fix this by keeping both acquire and release operation in
xe_hw_engine_group_get_mode().
Fixes: 770bd1d341 ("drm/xe/hw_engine_group: Ensure safe transition between execution modes")
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250925023145.1203004-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Currently there are 2 wait loops for loading GuC: one in
xe_mmio_wait32_not() and one guc_wait_ucode(). Now that there's a
generic poll_timeout_us(), refactor the code to use that to be more
readable.
Main change in behavior is that there's no exponential wait anymore:
that is now replaced by a 10msec retry.
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-5-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Convert wait_for_pc_state() and wait_for_act_freq_limit() to
poll_timeout_us(). This brings 2 changes in behavior: Drop the
exponential wait and fix a potential much longer sleep.
usleep_range() will wait anywhere between `wait` and `wait << 1`, so
it's not correct to assume `slept += wait`. This code is not really
accurate. Pairing this with the exponential wait increase, it could be
waiting much longer than intended.
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-2-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>