xe_gsc_proxy_remove undoes what is done in both xe_gsc_proxy_init and
xe_gsc_proxy_start; however, if we fail between those 2 calls, it is
possible that the HW forcewake access hasn't been initialized yet and so
we hit errors when the cleanup code tries to write GSC register. To
avoid that, split the cleanup in 2 functions so that the HW cleanup is
only called if the HW setup was completed successfully.
Since the HW cleanup (interrupt disabling) is now removed from
xe_gsc_proxy_remove, the cleanup on error paths in xe_gsc_proxy_start
must be updated to disable interrupts before returning.
Fixes: ff6cd29b69 ("drm/xe: Cleanup unwind of gt initialization")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260220225308.101469-1-zhanjun.dong@intel.com
If the xe module within a VM was creating a new LRC during save/
restore, this LRC will be invalid. The fixups procedure may not
be able to reach it, as there will be a race to add the new LRC
reference to an exec queue.
Even if the new LRC which was being created during VM migration is
added to EQ in time for fixups, said LRC may still remain damaged.
In a small percentage of specially crafted test cases, the resulting
LRC was still damaged and caused GPU hang.
Any LRC which could be created in such a situation, have to be
re-created.
Due to VM having arbitrarily set amount of CPU cores, it is possible
to limit the amount to 1. In such case, there is a possibility that
kernel will switch CPU contexts in a way which allows to miss
VF migration recovery running in parallel (by simply not switching
to the LRC creation thread during recovery). Therefore checking
if the migration is in progress just after LRC creation, is not
enough to ensure detection.
Free the incorrectly created LRC, and trigger a re-run of the
creation, but only after waiting for default LRC to get fixups.
Use additional atomic value increased after fixups, to ensure any VF
migration that avoided detection by just checking for recovery in
progress, will be caught.
v2: Merge marker and wait for default LRC, reducing amount of calls
within xe_init_eq(). Alter the LRC creation loop to remove a race
with post-migration fixups worker.
v3: Kerneldoc fixes. Rename fixups_complete_count.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-5-tomasz.lis@intel.com
When a context is being created during save/restore, the LRC creation
needs to wait for GGTT address space to be shifted. But it also needs
to have fixed default LRCs. This is mandatory to avoid the situation
where LRC will be created based on data from before the fixups, but
reference within exec queue will be set too late for fixups.
This fixes an issue where contexts created during save/restore have
a large chance of having one unfixed LRC, due to the xe_lrc_create()
being synced for equal start to race with default LRC fixups.
v2: Move the fixups confirmation further, behind all fixups.
Revert some renames.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-4-tomasz.lis@intel.com
There is a small but non-zero chance that VF post migration fixups
are running on an exec queue during teardown. The chances are
decreased by starting the teardown by releasing guc_id, but remain
non-zero. On the other hand the sync between fixups and EQ creation
(wait_valid_ggtt) drastically increases the chance for such parallel
teardown if queue creation error path is entered (err_lrc label).
The exec queue itself is not going to cause an issue, but LRCs have
a small chance of getting freed during the fixups.
Creating a setter and a getter makes it easier to protect the fixup
operations with a lock. For other driver activities, the original
access method (without any protection) can still be used.
v2: Separate lock, only for LRCs. Kerneldoc fixes. Subject tag fix.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-3-tomasz.lis@intel.com
Every call to queue init should have a corresponding fini call.
Skipping this would mean skipping removal of the queue from GuC list
(which is part of guc_id allocation). A damaged queue stored in
exec_queue_lookup list would lead to invalid memory reference,
sooner or later.
Call fini to free guc_id. This must be done before any internal
LRCs are freed.
Since the finalization with this extra call became very similar to
__xe_exec_queue_fini(), reuse that. To make this reuse possible,
alter xe_lrc_put() so it can survive NULL parameters, like other
similar functions.
v2: Reuse _xe_exec_queue_fini(). Make xe_lrc_put() aware of NULLs.
Fixes: 3c1fa4aa60 ("drm/xe: Move queue init before LRC creation")
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> (v1)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-2-tomasz.lis@intel.com
Tighten uapi validation to restrict multi-lrc support to VIDEO_DECODE and
VIDEO_ENHANCE engines only. This check should have been in place from the
start, as the driver typically avoids allowing uapi cases that we have
no userspace consumer for.
Additionally, the GuC firmware on ModSched platforms no longer supports
multi-lrc on non-media engines.
V4:
- use a unified mask for all platforms since engine instance count
is an independent runtime check (Matt Roper, Matthew Brost)
V3:
- store a multi-lrc enable class mask in xe->info and populate from
xe_device_desc in xe_pci.c (Matthew Brost)
V2:
- correct the typo (Shuicheng)
- move the check earlier to avoid VM lookup (Shuicheng, Matt Roper)
- remove the graphics version check (Matt Roper)
- input more details in the commit info (Matt Roper)
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260225022014.45394-1-x.wang@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
The LRC seqno is read by the CPU in the fence signaling path. On dGPU
that read can turn into a PCIe transaction when the seqno lives in the
main LRC BO, making the hot-path poll/peek much more expensive.
Allocate a small dedicated seqno BO in system memory and map the seqno
and start_seqno fields from there instead. The GPU still updates the
values, but CPU reads stay in cached system memory and avoid PCIe read
latency.
Update the LRC map/address helpers to accept a BO expression and use the
new lrc->seqno_bo for seqno mappings. Unpin/unmap seqno_bo during
teardown.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260218043319.809548-4-matthew.brost@intel.com
desc_read() issues an VRAM read which serializes the CPU and drains
posted writes on dGPU platforms. The H2G tracepoint evaluated its
arguments unconditionally, so even with tracing disabled the submission
path paid the full VRAM readf latency. Guard the tracepoint with
trace_xe_guc_ctb_h2g_enabled().
Adso move the descriptor status verification under CONFIG_DRM_XE_DEBUG.
This removes another unnecessary VRAM read in non-debug builfds.
This results in ~10× faster H2G submission and significantly reduces
lock contention across the driver.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260218043319.809548-3-matthew.brost@intel.com
H2G and G2H buffers have different access patterns (H2G is CPU-write,
GuC-read, while G2H is GPU-write, CPU-read). On dGPU, these patterns
benefit from different memory placements: H2G in VRAM and G2H in system
memory. Split the CT buffer into two separate buffers—one for H2G and
one for G2H—and select the optimal placement for each.
This provides a significant performance improvement on the G2H read
path, reducing a single read from ~20 µs to under 1 µs on BMG.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260218043319.809548-2-matthew.brost@intel.com
If a batch buffer is complete, it makes little sense to preempt the
fence signaling instructions in the ring, as the largest portion of the
work (the batch buffer) is already done and fence signaling consists of
only a few instructions. If these instructions are preempted, the GuC
would need to perform a context switch just to signal the fence, which
is costly and delays fence signaling. Avoid this scenario by disabling
preemption immediately after the BB start instruction and re-enabling it
after executing the fence signaling instructions.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Carlos Santa <carlos.santa@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260115004546.58060-1-matthew.brost@intel.com
Fix three code-level cleanups in xe_guc_ct.c:
- Use SZ_4K for the queue size alignment assertion in
xe_guc_ct_queue_proc_time_jiffies().
- Drop an unused local variable in guc_ct_send_wait_for_retry().
- Add missing trailing newlines in CT error/warn log messages.
These changes keep behavior unchanged while improving correctness checks
and log formatting.
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260223162350.3205364-6-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Wa_15010599737 was a workaround originally proposed (and ultimately
rejected) for DG2-G10. There's no record of it ever being relevant or
even considered for any other platforms.
The specific bit this workaround was setting is documented as "This bit
should be set to 1 for the DX9 API and 0 for all other APIs" which means
that it should almost always be left at the default value of 0 on Linux.
The register itself is directly accessible from userspace, so in the
special cases where it might be relevant (e.g., Wine/Proton running
Windows DX9 apps), the userspace drivers already have the ability to
change the setting without involvement of the kernel.
Fixes: 7f3ee7d880 ("drm/xe/xe2hpg: Add initial GT workarounds")
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260223-forupstream-wa_cleanup-v3-2-7f201eb2f172@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Some compute applications may try to allocate device memory to probe
how much device memory is actually available, assuming that the
application will be the only one running on the particular GPU.
That strategy fails in fault mode since it allows VM overcommit.
While this could be resolved in user-space it's further complicated
by cgroups potentially restricting the amount of memory available
to the application.
Introduce a vm create flag, DRM_XE_VM_CREATE_NO_VM_OVERCOMMIT, that
allows fault mode to mimic the behaviour of !fault mode WRT this. It
blocks evicting same vm bos during VM_BIND processing. However,
it does *not* block evicting same-vm bos during pagefault
processing, preferring eviction rather than VM banning in
OOM situations.
Cc: John Falkowski <john.falkowski@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260204153320.17989-1-thomas.hellstrom@linux.intel.com
Given the new policy of allowing graphics/media IP ranges to extend over
unused IP versions, we can consolidate some of the OOB workaround rules
and simplify the table. If new IP variants eventually show up that use
these unused versions (e.g., media version 30.01, graphics versions
20.03 / 30.02, etc.), and if an existing workaround does not extend to
that new intermediate version, the ranges will be split back apart as
part of the enablement work for that new IP version.
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260220-forupstream-wa_cleanup-v2-22-b12005a05af6@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
During early Xe driver development, our policy for applying workarounds
to ranges of IP versions was to only use GRAPHICS_VERSION_RANGE and
MEDIA_VERSION_RANGE rules when all of the affected IP versions had
consecutive version numbers; otherwise separate RTP entries should be
used. For example, a workaround that applies to all Xe2-based platforms
would be implemented in the driver with two RTP entries: one using
GRAPHICS_VERSION_RANGE(2001, 2002) and the other using
GRAPHICS_VERSION(2004). This ensured that if a new IP variant showed up
in the future with currently unused version 20.03, an old workaround
entry wouldn't automatically apply to it by accident (and we could
always consolidate those two distinct entries in the future if the
workaround database did explicitly indicate that 20.03 also needed the
workaround).
Now that we're a couple years down the road with this driver, the number
of IP versions supported is much larger (several Xe2 20.xx versions,
several Xe3 30.xx versions, and a couple Xe3p 35.xx versions). When new
workarounds are discovered that need to apply to a wide range of IPs,
it's becoming more of a pain to create independent entries for each
non-contiguous range of versions, and the general consensus is that we
should revisit our previous policy and start allowing use of
VERSION_RANGE constructs for non-contiguous version ranges.
Note that allowing ranges that cover currently unused versions will
require additional care if/when some of those intermediate version
numbers start being used in the future. We'll need to re-check every
workaround that has a range including the new IP version and check the
hardware database to see whether the workaround also applies to the new
version (no code change required) or whether we need to split the
existing range into two separate ranges that don't cover the new
version. The platform enabling engineers are willing to take on this
extra review burden at the time we first enable a new IP in the driver
(see lore link below for one recent discussion).
Update the kerneldoc for the workaround file to make the new policy
official.
Link: https://lore.kernel.org/all/20260203233600.GT458797@mdroper-desk1.amr.corp.intel.com/
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260220-forupstream-wa_cleanup-v2-3-b12005a05af6@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Production PVC hardware had a graphics stepping of C0. Xe1 platforms
already aren't officially supported by the Xe driver, but pre-production
steppings are especially out of scope (and 'has_pre_prod_wa' is not set
in the device descriptor). Drop the workarounds that aren't relevant to
production hardware.
v2:
- Drop the stream->override_gucrc which is no longer set anywhere after
the removal of Wa_1509372804. (Bala)
- Drop xe_guc_rc_set_mode / xe_guc_rc_unset_mode which are no longer
used after the removal of Wa_1509372804.
Bspec: 44484
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260220-forupstream-wa_cleanup-v2-2-b12005a05af6@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Wa_14015795083 and Wa_14014475959 only apply to early steppings of
Xe_LPG that appeared only in pre-production hardware (in fact
Wa_14014475959 wasn't supposed to apply to _any_ steppings of version
12.71). Xe1 platforms already aren't officially supported by the Xe
driver, but pre-production steppings are especially out of scope (and
'has_pre_prod_wa' is not set in the device descriptor). Drop both
workarounds.
Bspec: 55420
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260220-forupstream-wa_cleanup-v2-1-b12005a05af6@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>