RPe is runtime-determined by PCODE and caching it caused stale values,
leading to incorrect GuC SLPC parameter settings.
Drop the cached rpe_freq field and query fresh values from hardware
on each use to ensure GuC SLPC parameters reflect current RPe.
v2: Remove cached RPe frequency field (Rodrigo)
v3: Remove extra variable (Vinay)
Modify function name (Vinay)
v4: Maintain a separate function for PVC (Rodrigo)
v5: Avoid RPn update while fetching RPe frequency (Rodrigo)
v6: Split platform-specific RPe comments (Vinay)
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5166
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patch.msgid.link/20251112185153.3593145-5-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Some driver components, like eudebug or ccs-mode, can't be used
when VFs are enabled. Add functions to allow those components
to block the PF from enabling VFs for the requested duration.
Introduce trivial counter to allow lockdown or exclusive access
that can be used in the scenarios where we can't follow the strict
owner semantics as required by the rw_semaphore implementation.
Before enabling VFs, the PF will try to arm the "vfs_enabling"
guard for the exclusive access. This will fail if there are
some lockdown requests already initiated by the other components.
For testing purposes, add debugfs file which will call these new
functions from the file's open/close hooks.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Christoph Manszewski <christoph.manszewski@intel.com>
Reviewed-by: Christoph Manszewski <christoph.manszewski@intel.com>
Link: https://patch.msgid.link/20251109162451.4779-1-michal.wajdeczko@intel.com
The sparse array used for error decoding from is unnecessarily big. It
should be better handled by a switch statement that will also allow us
to more easily improve this code.
Add a CASE_ERR() macro to keep the table compact and use it instead of
the 256-entries array, which saves some space:
$ bloat-o-meter xe_pcode.o.old xe_pcode.o
add/remove: 0/1 grow/shrink: 2/0 up/down: 190/-4096 (-3906)
Function old new delta
__pcode_mailbox_rw 363 465 +102
__pcode_mailbox_rw.cold 58 146 +88
err_decode 4096 - -4096
Total: Before=7890, After=3984, chg -49.51%
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20251110-pcode-errmap-v2-1-cb18c8f54238@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The TILE_ADDR_RANGE register is not available on all platforms going
forward as it was deprecated and is being replaced by equivalent
registers within SoC MMIO space. While that doesn't happen, the
SG_TILE_ADDR_RANGE (base 0x1083a0) is still valid for all platforms
supported by xe. Use that instead.
BSpec: 59353, 54991
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251107-tile-addr-v1-1-a3014aadc2e7@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Instead of trying very hard to find the largest fair number of GuC
doorbell IDs that could be allocated for VFs on the current GT, pick
some smaller rounded down to power-of-two value that is more likely
to be provisioned in the same manner by the other PF instance:
num VFs | num doorbells
--------+--------------
63..32 | 4
31..16 | 8
15..8 | 16
7..4 | 32
3..2 | 64
1 | 128 (regular PF)
1 | 240 (admin only PF)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251105183253.863-3-michal.wajdeczko@intel.com
Instead of trying very hard to find the largest fair number of GuC
context IDs that could be allocated for VFs on the current GT, pick
some smaller rounded down to power-of-two value that is more likely
to be provisioned in the same manner by the other PF instance:
num VFs | num contexts
--------+-------------
63..32 | 1024
31..16 | 2048
15..8 | 4096
7..4 | 8192
3..2 | 16384
1 | 32768 (regular PF)
1 | 64512 (admin only PF)
Add also helper function to determine if the PF is admin-only,
and for now use .probe_display flag for that.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251105183253.863-2-michal.wajdeczko@intel.com
It's currently not possible to safely monitor if there's throttling
happening and what are the reasons. The approach of reading the status
and then reading the reasons is not reliable as by the time sysadmin
reads the reason, the throttling could not be happening anymore.
Previous tentative to fix that[1] was breaking the ABI and potentially
sysadmin's scripts. This takes a different approach of adding and
documenting the additional attribute. It's still valuable, though
redundant, to provide the simpler 0/1 interface.
In order to avoid userspace knowledge on the bitmask meaning and to be
able to maintain the kernel side in sync with possible changes in
future, just walk the attribute group and check what are the masks that
match the value read.
[1] https://lore.kernel.org/intel-xe/20241025092238.167042-1-raag.jadav@intel.com/
Cc: Raag Jadav <raag.jadav@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20251104-gt-throttle-cri-v5-1-4948b060bbfd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Clang is not happy with set but unused variable (this is visible
with `make LLVM=1` build:
drivers/gpu/drm/xe/xe_vm.c:1462:11: error: variable 'number_tiles' set
but not used [-Werror,-Wunused-but-set-variable]
The use of this variable was removed in the commit mentioned below as
"Fixes:" but only its declaration and update remain.
It seems like the variable is not used along with the assignment that
does not have side effects as far as I can see.
Remove those altogether.
Fixes: cb99e12ba8 ("drm/xe: Decouple bind queue last fence from TLB invalidations")
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patch.msgid.link/20251105011311.3177875-1-gwan-gyeong.mun@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Add xe_guc_pagefault layer (producer) which parses G2H fault messages
messages into struct xe_pagefault, forwards them to the page fault layer
(consumer) for servicing, and provides a vfunc to acknowledge faults to
the GuC upon completion. Replace the old (and incorrect) GT page fault
layer with this new layer throughout the driver.
As part of this change, the ACC handling code has been removed, as it is
dead code that is currently unused.
v2:
- Include engine instance (Stuart)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-7-matthew.brost@intel.com
Stub out the new page fault layer and add kernel documentation. This is
intended as a replacement for the GT page fault layer, enabling multiple
producers to hook into a shared page fault consumer interface.
v2:
- Fix kernel doc typo (checkpatch)
- Remove comment around GT (Stuart)
- Add explaination around reclaim (Francois)
- Add comment around u8 vs enum (Francois)
- Include engine instance (Stuart)
v3:
- Fix XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION kernel doc (Stuart)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-2-matthew.brost@intel.com
Prevent input fences from being installed on zero batch execs or zero
binds, which were originally added to support queue idling in Mesa via
output fences. Although input fence support was introduced for interface
consistency, it leads to incorrect behavior due to chained composite
fences, which are disallowed.
Avoid the complexity of fixing this by removing support, as input fences
for these cases are not used in practice.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-6-matthew.brost@intel.com
Avoid waiting on unrelated TLB invalidations when servicing page fault
binds. Since the migrate queue is shared across processes, TLB
invalidations triggered by other processes may occur concurrently but
are not relevant to the current bind. Teach the bind pipeline to skip
waits on such invalidations to prevent unnecessary serialization.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-5-matthew.brost@intel.com
Separate the bind queue’s last fence to apply exclusively to the bind
job, avoiding unnecessary serialization on prior TLB invalidations.
Preserve correct user fence signaling by merging bind and TLB
invalidation fences later in the pipeline.
v3:
- Fix lockdep assert for migrate queues (CI)
- Use individual dma fence contexts for array out fences (Testing)
- Don't set last fence with arrays (Testing)
- Move TLB invalid last fence under migrate lock (Testing)
- Don't set queue last for migrate queues (Testing)
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6047
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-4-matthew.brost@intel.com
Add support for attaching the last fence to TLB invalidation job queues
to address serialization issues during bursts of unbind jobs. Ensure
that user fence signaling for a bind job reflects both the bind job
itself and the last fences of all related TLB invalidations. Maintain
submission order based solely on the state of the bind and TLB
invalidation queues.
Introduce support functions for last fence attachment to TLB
invalidation queues.
v3:
- Fix assert in xe_exec_queue_tlb_inval_last_fence_set (CI)
- Ensure migrate lock held for migrate queues (Testing)
v5:
- Style nits (Thomas)
- Rewrite commit message (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-3-matthew.brost@intel.com
Prevent application hangs caused by out-of-order fence signaling when
user fences are attached. Use drm_syncobj (via dma-fence-chain) to
guarantee that each user fence signals in order, regardless of the
signaling order of the attached fences. Ensure user fence writebacks to
user space occur in the correct sequence.
v7:
- Skip drm_syncbj create of error (CI)
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com
gt_reset() doesn't make sense by itself: it can only be called as part
of the worker. Inline it there to avoid it being called from elsewhere
and clarify the gt_reset() vs do_gt_reset() paths. Note that the error
return from gt_reset() was just being ignored.
Also add a comment to the xe_pm_runtime_put() to make sure the
get()/put() pair is clear.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251031222244.37735-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
It is expected that VFs activity will be monitored and in some
cases admin might want to silence specific VF without killing
the VM where it was attached.
Add write-only attribute to stop GuC scheduling at VFs level.
/sys/bus/pci/drivers/xe/BDF/
├── sriov_admin/
├── vf1/
│ └── stop [WO] bool
├── vf2/
│ └── stop [WO] bool
Writing "1" or "y" (or whatever is recognized by the strtobool()
function) to this file will trigger the change of the VF state
to STOP (GuC will stop servicing the VF). To go back to a READY
state (to allow GuC to service this VF again) the VF FLR must be
triggered (which can be done by writing 1 to device/reset file).
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251030222348.186658-17-michal.wajdeczko@intel.com
We have just added bulk change of the scheduling priority for all
VFs and PF, but that only allow to select LOW and NORMAL priority.
Add read-write attribute under PF to allow changing its priority
without impacting other VFs priority settings.
For completeness also add read-only attributes under VFs, to show
currently selected priority levels used by the VFs.
/sys/bus/pci/drivers/xe/BDF/
├── sriov_admin/
├── pf/
│ └── profile
│ └── sched_priority [RW] low, normal, high
├── vf1/
│ └── profile
│ └── sched_priority [RO] low, normal
Writing "high" to the PF read-write attribute will change PF
priority on all tiles/GTs to HIGH (schedule function in the next
time-slice after current one completes and it has work). Writing
"low" or "normal" to change priority to LOW/NORMAL is supported.
When read, those files will display the current and available
scheduling priorities. The currently active priority level will
be enclosed in square brackets, default output will be like:
$ grep . -h sriov_admin/{pf,vf1,vf2}/profile/sched_priority
[low] normal high
[low] normal
[low] normal
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251030222348.186658-14-michal.wajdeczko@intel.com
It is expected to be a common practice to configure the same level
of scheduling priority across all VFs and PF (at least as starting
point). Due to current GuC FW limitations it is also the only way
to change VFs priority.
Add write-only sysfs attribute that will apply required priority
level to all VFs and PF at once.
/sys/bus/pci/drivers/xe/BDF/
├── sriov_admin/
├── .bulk_profile
│ └── sched_priority [WO] low, normal
Writing "low" to this write-only attribute will change PF and
VFs scheduling priority on all tiles/GTs to LOW (function will
be scheduled only if it has work submitted). Similarly, writing
"normal" will change functions priority to NORMAL (functions will
be scheduled irrespective of whether there is a work or not).
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251030222348.186658-13-michal.wajdeczko@intel.com
We already have function to configure PF (or VF) scheduling priority
on a single GT, but we also need function that will cover all tiles
and GTs.
However, due to the current GuC FW limitation, we can't always rely
on per-GT function as it actually only works for the PF case. The
only way to change VFs scheduling priority is to use 'sched_if_idle'
policy KLV that will change priorities for all VFs (and the PF).
We will use these new functions in the upcoming patches.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251030222348.186658-12-michal.wajdeczko@intel.com