The GuC compatibility version that we read from the CSS header in
native/PF and the GuC VF version that we get when a VF handshakes with
the GuC are the same version number, so we should store it into the same
structure. This makes all the checks based on the compatibility version
automatically work for VFs without having to copy the value over.
For completion, also copy the wanted version and set the path to a known
string to indicate that the FW is PF-loaded. This way all the info will
be coherent when dumped from debugfs.
v2: several code cleanups and style changes (Michal), rebase on
bootstrap changes.
v3: s/min/wanted/, clarify that handshake must happen before we can get
the VF versions (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-10-daniele.ceraolospurio@intel.com
Currently we perform the bootstrap for the primary GT early on during
device init, while the media GT bootstrap happens when we try and fetch
the hwconfig table. For consistency, move the bootstrap of the media GT
happen at the same time as the primary GT, so that all the subsequent
code can rely on both GTs being in the same state.
v2: Also drop config query from min_guc_load since we now do it
early (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-8-daniele.ceraolospurio@intel.com
Document VMA tile_invalidated access rules, use READ_ONCE / WRITE_ONCE
for opportunistic checks of tile_present and tile_invalidated, move
tile_invalidated state change from page fault handler to PT code under
the correct locks, and add lockdep asserts to TLB invalidation paths.
v2:
- Assert VM dma-resv lock rather than BO in zap PTEs
v3:
- Back to BO's dma-resv lock, adjust documentation
v4:
- Add WRITE_ONCE in xe_vm_invalidate_vma (Thomas)
- Change lockdep assert for userptr in xe_vm_invalidate_vma (CI)
- Take userptr notifier lock in read mode in xe_vm_userptr_pin before
calling xe_vm_invalidate_vma (CI)
v5:
- Fix typos (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250602164412.1912293-1-matthew.brost@intel.com
Add the userspace interface to load the driver with fewer engines.
The syntax is to just echo the engine names to a file in configfs, like
below:
echo 'rcs0,bcs0' > /sys/kernel/config/xe/<bdf>/engine_allowed
With that engines other than rcs0 and bcs0 will not be enabled. To
enable all instances from a class, a '*' can be used.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250528-engine-mask-v4-4-f4636d2a890a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Sometimes it's useful to load the driver with a smaller set of engines
to allow more targeted debugging, particularly on early enabling.
Besides checking what is fused off in hardware, add similar logic to
disable engines in software. This will use configfs to allow users
to set what engine to disable, so already add prepare for that. The
exact configfs interface will be added later.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250528-engine-mask-v4-3-f4636d2a890a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There could be a scenario where the VF driver is resuming faster
than the driver PF is able to complete the VF FLR sequence which
includes reset of the VF scratch registers. This may result in
deletion of the ongoing HXG message (it could be either a host
request or a GuC response).
When we detect that HXG message was likey lost (scratch register
with HXG header was zeroed) try to send this request once more
before giving up.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250528090021.329-1-michal.wajdeczko@intel.com
Starting PXP and adding a queue to the PXP queue list are separate
actions. Given that a queue can only be added to the list if PXP is
active, the 2 actions were bundled together to avoid having to
re-lock and re-check the status to perform the queue addition after
having done so during the PXP start. However, we don't save a lot of
complexity by doing so and we lose in clarity of code, so overall it's
cleaner to just keep the 2 actions separate.
v2: remove leftover rpm_get (John), fix rpm_put in error case
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-8-daniele.ceraolospurio@intel.com
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.
Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).
v2: Minor rework of the doc (Rodrigo)
Fixes: 72d479601d ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
Customer is reporting a really subtle issue where we get random DMAR
faults, hangs and other nasties for kernel migration jobs when stressing
stuff like s2idle/s3/s4. The explosions seems to happen somewhere
after resuming the system with splats looking something like:
PM: suspend exit
rfkill: input handler disabled
xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0
xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1]
xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out
The likely cause appears to be a race between suspend cancelling the
worker that processes the free_job()'s, such that we still have pending
jobs to be freed after the cancel. Following from this, on resume the
pending_list will now contain at least one already complete job, but it
looks like we call drm_sched_resubmit_jobs(), which will then call
run_job() on everything still on the pending_list. But if the job was
already complete, then all the resources tied to the job, like the bb
itself, any memory that is being accessed, the iommu mappings etc. might
be long gone since those are usually tied to the fence signalling.
This scenario can be seen in ftrace when running a slightly modified
xe_pm IGT (kernel was only modified to inject artificial latency into
free_job to make the race easier to hit):
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3
xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
.....
xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13
So the job_run() is clearly triggered twice for the same job, even
though the first must have already signalled to completion during
suspend. We can also see a CAT error after the re-submit.
To prevent this only resubmit jobs on the pending_list that have not yet
signalled.
v2:
- Make sure to re-arm the fence callbacks with sched_start().
v3 (Matt B):
- Stop using drm_sched_resubmit_jobs(), which appears to be deprecated
and just open-code a simple loop such that we skip calling run_job()
on anything already signalled.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: William Tseng <william.tseng@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
For preempt_fence mode VM's we're rejecting eviction of
shared bos during VM_BIND. However, since we do this in the
move() callback, we're getting an eviction failure warning from
TTM. The TTM callback intended for these things is
eviction_valuable().
However, the latter doesn't pass in the struct ttm_operation_ctx
needed to determine whether the caller needs this.
Instead, attach the needed information to the vm under the
vm->resv, until we've been able to update TTM to provide the
needed information. And add sufficient lockdep checks to prevent
misuse and races.
v2:
- Fix a copy-paste error in xe_vm_clear_validating()
v3:
- Fix kerneldoc errors.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 0af944f0e3 ("drm/xe: Reject BO eviction if BO is bound to current VM")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250528164105.234718-1-thomas.hellstrom@linux.intel.com
The XE driver can be built with or without VSEC support, but fails to link as
built-in if vsec is in a loadable module:
x86_64-linux-ld: vmlinux.o: in function `xe_vsec_init':
(.text+0x1e83e16): undefined reference to `intel_vsec_register'
The normal fix for this is to add a 'depends on INTEL_VSEC || !INTEL_VSEC',
forcing XE to be a loadable module as well, but that causes a circular
dependency:
symbol DRM_XE depends on INTEL_VSEC
symbol INTEL_VSEC depends on X86_PLATFORM_DEVICES
symbol X86_PLATFORM_DEVICES is selected by DRM_XE
The problem here is selecting a symbol from another subsystem, so change
that as well and rephrase the 'select' into the corresponding dependency.
Since X86_PLATFORM_DEVICES is 'default y', there is no change to
defconfig builds here.
Fixes: 0c45e76fcc ("drm/xe/vsec: Support BMG devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250529172355.2395634-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Read card and package energy status using pmt apis instead
of xe_mmio for supported platforms.
Enable Battlemage to read energy from PMT.
v2:
- Remove unused has_pmt_energy field. (Badal)
- Use GENMASK to extract energy data. (Badal)
v3:
- Move PMT energy register offset and GENMASK to xe_pmt.h
- Address review comments. (Jani)
v4:
- Remove unnecessary debug print. (Badal)
v5:
- Resolve an unused variable warning.
- Add a return value check.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-6-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add support to manage power limit PL2 (burst limit) through
pcode mailbox commands.
v2:
- Update power1_cap definition in hwmon documentation. (Badal)
- Clamp PL2 power limit to GPU firmware default value.
v3:
- Activate the power label when either the PL1 or PL2 power
limit is enabled.
v4:
- Update description of pl2_on_boot variable to fix kernel-doc
error.
v5:
- Remove unnecessary drm_warn.
- Rectify powerX_label permission to read-only on platforms
without mailbox power limits support.
- Expose powerX_cap entries only on platforms with mailbox
support.
v6:
- Improve commit message, refer to BIOS as GPU firmware.
- Refer to card firmware as GPU firmware in code.
- Remove unnecessary drm_dbg.
- Print supported and unsupported power limits. (Rodrigo)
- Enable powerN_cap/max_xxx entries only when power limits
supported in GPU firmware.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-4-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add support to manage power limits using pcode mailbox commands
for supported platforms.
v2:
- Address review comments. (Badal)
- Use mailbox commands instead of registers to manage power limits
for BMG.
- Clamp the maximum power limit to GPU firmware default value.
v3:
- Clamp power limit in write also for platforms with mailbox support.
v4:
- Remove unnecessary debug prints. (Badal)
v5:
- Update description of variable pl1_on_boot to fix kernel-doc error.
v6:
- Improve commit message, refer to BIOS as GPU firmware.
- Change macro READ_PL_FROM_BIOS to READ_PL_FROM_FW.
- Rectify drm_warn to drm_info.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: e90f7a58e6 ("drm/xe/hwmon: Add HWMON support for BMG")
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-2-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Dealing with CCS state is significant on LNL+, where we end up clearing
the compression state on every page alloc using the blitter for user
buffers, including also saving and restoring it when moving between
domains, plus we need to alloc extra pages to hold the raw CCS state for
the save step.
However all compression PAT modes, on platforms like LNL, also require
coh_none, meaning that only WC memory can use compression in the first
place. With this we can be sneaky and completely ignore CCS for WB
buffers, which is likely the common case anyway. This would then skip
all blitter moves/clears between sys <-> tt and then also means we can
drop the extra CCS pages.
This should be safe since there is no way to interact with the
compression state (potentially uncleared) without using a PAT enabled
index (which is rejected at bind), including if trying to be malicious
and copy the raw CCS state from userpace, which should give back all
zeroes if the src surface (indirect) is lacking compressed PAT index.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250516153810.223530-2-matthew.auld@intel.com
Today we allow to trigger GT resest by reading dedicated debugfs
files "force_reset" and "force_reset_sync" that we are exposing
using drm_info_list[] and drm_debugfs_create_files().
To avoid triggering potentially disruptive actions during otherwise
"safe" read operations, expose those two attributes using debugfs
function where we can specify file permissions and provide custom
"write" handler to trigger the GT resets also from there.
This step would allow us to drop triggering GT resets during read
operations, which we leave just to give users more time to switch.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250519200914.216-1-michal.wajdeczko@intel.com