While we expect config directory names to match PCI device name,
currently we are only scanning provided names for domain, bus,
device and function numbers, without checking their format.
This would pass slightly broken entries like:
/sys/kernel/config/xe/
├── 0000:00:02.0000000000000
│ └── ...
├── 0000:00:02.0x
│ └── ...
├── 0: 0: 2. 0
│ └── ...
└── 0:0:2.0
└── ...
To avoid such mistakes, check if the name provided exactly matches
the canonical PCI device address format, which we recreated from
the parsed BDF data. Also simplify scanf format as it can't really
catch all formatting errors.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250722141059.30707-3-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Rather than open-coding GT TLB invalidations in the PT layer, use GT TLB
invalidation jobs. The real benefit is that GT TLB invalidation jobs use
a single dma-fence context, allowing the generated fences to be squashed
in dma-resv/DRM scheduler.
v2:
- s/;;/; (checkpatch)
- Move ijob/mjob job push after range fence install
v3:
- Remove extra newline (Stuart)
- Set ijob/mjob near creation (Stuart)
- Add comment back in (Stuart)
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250724191216.4076566-7-matthew.brost@intel.com
Add generic dependecy jobs / scheduler which serves as wrapper for DRM
scheduler. Useful when we want delay a generic operation until a
dma-fence signals.
Existing use cases could be destroying of resources based fences /
dma-resv, the preempt rebind worker, and pipelined GT TLB invalidations.
Written in such a way it could be moved to DRM subsystem if needed.
v3:
- Remove unnecessary cast (Staurt)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250724191216.4076566-3-matthew.brost@intel.com
The register at offset 0xfd0 was incorrectly named MCFG_MCR_SELECTOR,
likely copied from i915. According to the hardware specification (Bspec),
this register is actually called STEER_SEMAPHORE.
Rename the register definition and update its usage in xe_gt_mcr.c to
match the official hardware documentation.
No functional changes.
v2: Add Bspec reference (Tejas)
Bspec: 67113
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250723141039.3848390-1-nitin.r.gote@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
We were already testing those two platforms for a while on our CI,
but enabling flag (has_sriov) was only available on the topic branch
and only for builds with CONFIG_DRM_XE_DEBUG config.
Since those two platforms are guarded by the another enabling flag
(require_force_probe) and we believe our SR-IOV support for them is
at sufficient level to start enjoying the feature, turn on the
SR-IOV enabling flag unconditionally.
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250722182618.30811-4-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o
is not built and build fails on some setups with:
ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset':
drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure'
ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure'
collect2: error: ld returned 1 exit status
Do not use the gt_reset_failure attribute if debugfs is not enabled.
Fixes: 8f3013e0b2 ("drm/xe: Introduce fault injection for gt reset")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Add debug nodes, "dgfx_pkg_residencies" for G-states (G2, G6, G8, G10,
ModS) and "dgfx_pcie_link_residencies" for PCIe link states(L0, L1, L1.2)
residency counters.
v1:
- Expose all G-State residency counter values under
dgfx_pkg_residencies. (Anshuman)
- Include runtime_get/put. (Riana)
v2:
- Move offset macros to drm/xe/regs/xe_pmt. (Riana)
v3:
- Include debugfs node "dgfx_pcie_link_residencies" for pcie link
residency counter values. (Anshuman)
v4:
- Include check for BMG and add helper function for repetitive
code. (Riana)
- Add for loop and local struct to avoid repetition. (Riana)
- Use "drm_debugfs_create_files" to create debugfs. (Karthik)
v5:
- Reorder commits to reflect the correct dependency hierarchy. (Jonathan)
- Simplification of commit message and rectified register offset.(Karthik)
- Error handling and return before printing. (Riana)
v6:
- Remove check for DGFX as BMG is discrete. (Karthik)
- Rearrange residency offsets in ascending order. (Riana)
v7:
- Squash the macros into the patch they are used in. (Lucas)
Signed-off-by: Soham Purkait <soham.purkait@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250716101412.3062780-2-soham.purkait@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Currently in the drivers we have defined VRAM regions per device and per
tile. Initialization of these regions is done in two completely different
ways. To simplify the logic of the code and make it easier to add new
regions in the future, let's unify the way we initialize VRAM regions.
v2:
- fix doc comments in struct xe_vram_region
- remove unnecessary includes (Jani)
v3:
- move code from xe_vram_init_regions_managers to xe_tile_init_noalloc
(Matthew)
- replace ioremap_wc to devm_ioremap_wc for mapping VRAM BAR
(Matthew)
- Replace the tile id parameter with vram region in the xe_pf_begin
function.
v4:
- remove tile back pointer from struct xe_vram_region
- add new back pointers: xe and migarte to xe_vram_region
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> # rev3
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-6-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Currently, xe_migrate_init handled both allocation and initialization,
Lets moves allocation to xe_tile_alloc via a new xe_migrate_alloc
function, and keep initialization in xe_migrate_init.
This will allow the migration pointers to be passed to other structures
before full initialization.
Also replaces devm_kzalloc with drmm_kzalloc for better
DRM-managed memory.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-5-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
While in sysfs we are correctly trying to reflect the hardware
architecture and we expose GT attributes in per tile hierarchy,
in debugfs we expose GT attributes at flat level, without tiles.
Create debugfs directories to represent each tile and move GT
attributes under matching parent tile directory. To not break
existing debugfs tools, create symlink under old location:
/sys/kernel/debug/dri/0000:00:02.0/
├── ...
├── gt0 -> tile0/gt0
├── gt1 -> tile0/gt1
├── tile0
│ ├── gt0
│ │ ├── ...
│ ├── gt1
│ │ ├── ...
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250714193645.763-1-michal.wajdeczko@intel.com
If the device is running older pcode firmware, it is possible that newer
mailbox commands are not supported by it. The sysfs attributes aren't
useful in that case, but we shouldn't fail driver probe because of it.
As of now, it is unknown if we can distinguish unsupported commands before
attempting them. But until we figure out a way to do that, fix the
regressions.
v2: Add debug message (Lucas)
Fixes: cdc36b66cd ("drm/xe: Expose fan control and voltage regulator version")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250714215503.2897748-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
By default the GuC starts in the 'native' mode and enables the VGT
mode (aka 'virtualization' mode) only after it receives at least one
set of VF configuration data. While this happens naturally while PF
begins VFs provisioning, we might need this sooner as some actions,
like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in
the VGT mode.
And this becomes a real problem if we would want to use above action
to invalidate the LMTT early during VFs auto-provisioning, before VFs
are enabled, as such H2G would be rejected:
[ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: FAST_REQ H2G fence 0x804e failed! e=0x30, h=0
[ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: Fence 0x804e was used by action 0x7002 sent at:
h2g_write+0x33e/0x870 [xe]
__guc_ct_send_locked+0x1e1/0x1110 [xe]
guc_ct_send_locked+0x9f/0x740 [xe]
xe_guc_ct_send_locked+0x19/0x60 [xe]
send_tlb_invalidation+0xc2/0x470 [xe]
xe_gt_tlb_invalidation_all_async+0x45/0xa0 [xe]
xe_gt_tlb_invalidation_all+0x4b/0xa0 [xe]
lmtt_invalidate_hw+0x64/0x1a0 [xe]
xe_lmtt_invalidate_hw+0x5c/0x340 [xe]
pf_update_vf_lmtt+0x398/0xae0 [xe]
pf_provision_vf_lmem+0x350/0xa60 [xe]
xe_gt_sriov_pf_config_bulk_set_lmem+0xe2/0x410 [xe]
xe_gt_sriov_pf_config_set_fair_lmem+0x1c6/0x620 [xe]
xe_gt_sriov_pf_config_set_fair+0xd5/0x3f0 [xe]
xe_pci_sriov_configure+0x360/0x1200 [xe]
sriov_numvfs_store+0xbc/0x1d0
dev_attr_store+0x17/0x40
sysfs_kf_write+0x4a/0x80
kernfs_fop_write_iter+0x166/0x220
vfs_write+0x2ba/0x580
ksys_write+0x77/0x100
__x64_sys_write+0x19/0x30
x64_sys_call+0x2bf/0x2660
do_syscall_64+0x93/0x7a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: CT dequeue failed: -71
[ ] xe 0000:4d:00.0: [drm] GT0: trying reset from receive_g2h [xe]
This could be mitigated by pushing earlier a PF self-configuration
with some hard-coded values that cover unlimited access to the GGTT,
use of all GuC contexts and doorbells. This step is sufficient for
the GuC to switch into the VGT mode.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-5-michal.wajdeczko@intel.com
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.
However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:
$ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset
this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:
[ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
[ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
[ ] xe 0000:00:02.0: [drm] GT0: reset queued
[ ] xe 0000:00:02.0: [drm] GT0: reset started
[ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
[ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
[ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
[ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed
While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.
Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
There's no need to submit the nop job again on the first queue. Any
state needed is already saved when the first LRC is switched out. The
comment is a little misleading regarding indirect W/A: first of all
there's still no indirect W/A enabled and secondly, even after they are,
there's no need to submit this job again for having their state
propagated: the indirect W/A will actually run on every LRC switch.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-6-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There isn't anything to set for CTX_TIMESTAMP handling in the empty
LRC: that is set on every LRC init since it should always start from 0
rather than the value saved in the image after first submission.
The FIXME about perma-pinning also doesn't make much sense as we will
always going to pin the lrc and the GGTT mapping has nothing to do with
VM bind.
Nuke these leftover comments.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-5-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Both the nop and wa jobs are going through the same boiler plate calls
to emit the job with a timeout and handling error for both bb and job.
Extract emit_job_sync() so those functions create the bb, handling
possible errors and delegate the part about really emitting the job
and waiting for its completion.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-3-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
allocating enough space for the 3DSTATE or hardcoding 4k depending on
the engine. In the first case it doesn't account for the WAs and in the
former it may not be sufficient. Secondly it's using the size instead of
number of dwords, causing the buffer to be 4x bigger than needed:
xe_bb_new() receives number of dwords as parameter and its declaration
was also not following its implementation.
Lastly, reword the debug message since it's not only about the LRC WAs
anymore as it also include the 3DSTATE for render.
While it's unlikely this is causing any real issue, let's calculate the
needed space and allocate just enough.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-2-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>