Commit Graph

1368288 Commits

Author SHA1 Message Date
Michal Wajdeczko
2e76103998 drm/xe: Enable SR-IOV for ADL/ATSM
We were already testing those two platforms for a while on our CI,
but enabling flag (has_sriov) was only available on the topic branch
and only for builds with CONFIG_DRM_XE_DEBUG config.

Since those two platforms are guarded by the another enabling flag
(require_force_probe) and we believe our SR-IOV support for them is
at sufficient level to start enjoying the feature, turn on the
SR-IOV enabling flag unconditionally.

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250722182618.30811-4-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-23 11:49:48 -07:00
Michal Wajdeczko
a2b461bd6f drm/xe/pf: Enable SR-IOV PF mode by default
We already claim official support for SR-IOV PF/VF modes on PTL
and BMG platforms, but by default we start the Xe driver on those
platforms in non-virtualized mode (native) since we still have
max_vfs modparam set to disable creation of the VFs.

It's time to let the Xe driver support SR-IOV PF mode by default.
We were already testing this on our CI, which was relying on the
patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-23 11:49:48 -07:00
Lucas De Marchi
4d3bbe9dd2 drm/xe: Fix build without debugfs
When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o
is not built and build fails on some setups with:

	ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset':
	drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure'
	ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure'
	collect2: error: ld returned 1 exit status

Do not use the gt_reset_failure attribute if debugfs is not enabled.

Fixes: 8f3013e0b2 ("drm/xe: Introduce fault injection for gt reset")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-23 08:45:40 -07:00
Satyanarayana K V P
916ee4704a drm/xe/vf: Register CCS read/write contexts with Guc
Register read write contexts with newly added flags with GUC and
enable the context immediately after registration.
Re-register the context with Guc when resuming from runtime suspend as
soft reset is applied to Guc during xe_pm_runtime_resume().
Make Ring head=tail while unbinding device to avoid issues with VF pause
after device is unbinded.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250722120506.6483-4-satyanarayana.k.v.p@intel.com
2025-07-23 07:22:34 -07:00
Satyanarayana K V P
864690cf4d drm/xe/vf: Attach and detach CCS copy commands with BO
Attach CCS read/write copy commands to BO for old and new mem types as
NULL -> tt or system -> tt.
Detach the CCS read/write copy commands from BO while deleting ttm bo
from xe_ttm_bo_delete_mem_notify().

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250722120506.6483-3-satyanarayana.k.v.p@intel.com
2025-07-23 07:22:31 -07:00
Satyanarayana K V P
f3009272ff drm/xe/vf: Create contexts for CCS read write
Create two LRCs to handle CCS meta data read / write from CCS pool in the
VM. Read context is used to hold GPU instructions to be executed at save
time and write context is used to hold GPU instructions to be executed at
the restore time.

Allocate batch buffer pool using suballocator for both read and write
contexts.

Migration framework is reused to create LRCAs for read and write.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250722120506.6483-2-satyanarayana.k.v.p@intel.com
2025-07-23 07:22:28 -07:00
Lukasz Laguna
9a220e0659 drm/xe/vf: Don't register I2C devices if VF
VF drivers can't access I2C devices, so skip their registration when
running as VF.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Fixes: f0e53aadd7 ("drm/xe: Support for I2C attached MCUs")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250717155420.25298-1-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-23 09:10:53 -04:00
Zhanjun Dong
176f44a5ec drm/xe/uc: Fix missing unwind goto
Fix missing unwind goto on error handling.

Fixes: b2c4ac219f ("drm/xe/uc: Disable GuC communication on hardware initialization error")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250721214520.954014-1-zhanjun.dong@intel.com
2025-07-22 10:46:52 -07:00
Dan Carpenter
6c9e64e83b drm/xe: Fix an IS_ERR() vs NULL bug in xe_tile_alloc_vram()
The xe_vram_region_alloc() function returns NULL on error.  It never
returns error pointers.  Update the error checking to match.

Fixes: 4b0a5f5ce7 ("drm/xe: Unify the initialization of VRAM regions")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/5449065e-9758-4711-b706-78771c0753c4@sabinyo.mountain
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-21 10:38:05 -04:00
Harish Chegondi
4b5514f786 drm/xe: Remove unnecessary EU stall debug message
The EU stall debug message may cause CI to complain on
unsupported platforms. Remove it.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/dfb6a080b3442d481c567489aabe47e72f3e784c.1752870172.git.harish.chegondi@intel.com
2025-07-18 15:05:32 -07:00
Soham Purkait
487579fd85 drm/xe/xe_debugfs: Exposure of G-State and pcie link state residency counters through debugfs
Add debug nodes, "dgfx_pkg_residencies" for G-states (G2, G6, G8, G10,
ModS) and "dgfx_pcie_link_residencies" for PCIe link states(L0, L1, L1.2)
residency counters.

v1:
 - Expose all G-State residency counter values under
   dgfx_pkg_residencies. (Anshuman)
 - Include runtime_get/put. (Riana)
v2:
 - Move offset macros to drm/xe/regs/xe_pmt. (Riana)
v3:
 - Include debugfs node "dgfx_pcie_link_residencies" for pcie link
   residency counter values.  (Anshuman)
v4:
 - Include check for BMG and add helper function for repetitive
   code. (Riana)
 - Add for loop and local struct to avoid repetition. (Riana)
 - Use "drm_debugfs_create_files" to create debugfs. (Karthik)
v5:
 - Reorder commits to reflect the correct dependency hierarchy.  (Jonathan)
 - Simplification of commit message and rectified register offset.(Karthik)
 - Error handling and return before printing.                       (Riana)
v6:
 - Remove check for DGFX as BMG is discrete. (Karthik)
 - Rearrange residency offsets in ascending order. (Riana)
v7:
 - Squash the macros into the patch they are used in. (Lucas)

Signed-off-by: Soham Purkait <soham.purkait@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250716101412.3062780-2-soham.purkait@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-17 12:33:00 -04:00
Piotr Piórkowski
4b0a5f5ce7 drm/xe: Unify the initialization of VRAM regions
Currently in the drivers we have defined VRAM regions per device and per
tile. Initialization of these regions is done in two completely different
ways. To simplify the logic of the code and make it easier to add new
regions in the future, let's unify the way we initialize VRAM regions.

v2:
- fix doc comments in struct xe_vram_region
- remove unnecessary includes (Jani)
v3:
- move code from xe_vram_init_regions_managers to xe_tile_init_noalloc
  (Matthew)
- replace ioremap_wc to devm_ioremap_wc for mapping VRAM BAR
  (Matthew)
- Replace the tile id parameter with vram region in the xe_pf_begin
  function.
v4:
- remove tile back pointer from struct xe_vram_region
- add new back pointers: xe and migarte to xe_vram_region

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> # rev3
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-6-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:15:00 -07:00
Piotr Piórkowski
d65ff1ec85 drm/xe: Split xe_migrate allocation from initialization
Currently, xe_migrate_init handled both allocation and initialization,
Lets moves allocation to xe_tile_alloc via a new xe_migrate_alloc
function, and keep initialization in xe_migrate_init.
This will allow the migration pointers to be passed to other structures
before full initialization.
Also replaces devm_kzalloc with drmm_kzalloc for better
DRM-managed memory.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-5-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:12:49 -07:00
Piotr Piórkowski
7a20b4f558 drm/xe: Move struct xe_vram_region to a dedicated header
Let's move the xe_vram_region structure to a new header dedicated to VRAM
to improve modularity and avoid unnecessary dependencies when only
VRAM-related structures are needed.

v2: Fix build if CONFIG_DRM_XE_DEVMEM_MIRROR is enabled
v3: Fix build if CONFIG_DRM_XE_DISPLAY is enabled
v4: Move helper to get tile dpagemap to xe_svm.c

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Suggested-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> # rev3
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-4-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:12:36 -07:00
Piotr Piórkowski
f92cfd72d9 drm/xe: Use dynamic allocation for tile and device VRAM region structures
In future platforms, we will need to represent the device and tile
VRAM regions in a more dynamic way, so let's abandon the static
allocation of these structures and start use a dynamic allocation.

v2:
 - Add a helpers for accessing fields of the xe_vram_region structure
v3:
- Add missing EXPORT_SYMBOL_IF_KUNIT for
  xe_vram_region_actual_physical_size

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-3-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:08:31 -07:00
Piotr Piórkowski
922ae87523 drm/xe: Use devm_ioremap_wc for VRAM mapping and drop manual unmap
Let's replace the manual call to ioremap_wc function with devm_ioremap_wc
function, ensuring that VRAM mappings are automatically released when
the driver is detached.
Since devm_ioremap_wc registers the mapping with the device's managed
resources, the explicit iounmap call in vram_fini is no longer needed,
so let's remove it.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-2-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:06:25 -07:00
Michal Wajdeczko
bf81505f7d drm/xe: Move debugfs GT attributes under tile directory
While in sysfs we are correctly trying to reflect the hardware
architecture and we expose GT attributes in per tile hierarchy,
in debugfs we expose GT attributes at flat level, without tiles.

Create debugfs directories to represent each tile and move GT
attributes under matching parent tile directory. To not break
existing debugfs tools, create symlink under old location:

 /sys/kernel/debug/dri/0000:00:02.0/
 ├── ...
 ├── gt0 -> tile0/gt0
 ├── gt1 -> tile0/gt1
 ├── tile0
 │   ├── gt0
 │   │   ├── ...
 │   ├── gt1
 │   │   ├── ...

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250714193645.763-1-michal.wajdeczko@intel.com
2025-07-16 17:24:00 +02:00
Dan Carpenter
2f264d58cc drm/xe: Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter()
The fwnode_create_software_node() function returns error pointers.  It
never returns NULL.  Update the checks to match.

Fixes: f0e53aadd7 ("drm/xe: Support for I2C attached MCUs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/65825d00-81ab-4665-af51-4fff6786a250@sabinyo.mountain
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-16 10:51:36 -04:00
Ashutosh Dixit
308dc9b278 drm/xe/oa: Fix static checker warning about null gt
There is a static checker warning that gt returned by xe_device_get_gt can
be NULL and that is being dereferenced. Use xe_root_mmio_gt instead, which
is equivalent and cannot return a NULL gt 0.

Fixes: 10d42ef34b ("drm/xe/oa: Assign hwe for OAM_SAG")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250715181422.2807624-1-ashutosh.dixit@intel.com
2025-07-15 22:36:16 -07:00
Raag Jadav
ed5461daa1 drm/xe: Don't fail probe on unsupported mailbox command
If the device is running older pcode firmware, it is possible that newer
mailbox commands are not supported by it. The sysfs attributes aren't
useful in that case, but we shouldn't fail driver probe because of it.
As of now, it is unknown if we can distinguish unsupported commands before
attempting them. But until we figure out a way to do that, fix the
regressions.

v2: Add debug message (Lucas)

Fixes: cdc36b66cd ("drm/xe: Expose fan control and voltage regulator version")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250714215503.2897748-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-15 15:26:01 -04:00
Michal Wajdeczko
a816487681 drm/xe/pf: Invalidate LMTT after completing changes
Once we finish populating all leaf pages in the VF's LMTT we should
make sure that hardware will not access any stale data. Explicitly
force LMTT invalidation (as it was already planned in the past).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-7-michal.wajdeczko@intel.com
2025-07-15 13:05:22 +02:00
Michal Wajdeczko
e497957fee drm/xe/pf: Invalidate LMTT during LMEM unprovisioning
Invalidate LMTT immediately after removing VF's LMTT page tables
and clearing root PTE in the LMTT PD to avoid any invalid access
by the hardware (and VF) due to stale data.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-6-michal.wajdeczko@intel.com
2025-07-15 13:05:20 +02:00
Michal Wajdeczko
68ae022278 drm/xe/pf: Force GuC virtualization mode
By default the GuC starts in the 'native' mode and enables the VGT
mode (aka 'virtualization' mode) only after it receives at least one
set of VF configuration data. While this happens naturally while PF
begins VFs provisioning, we might need this sooner as some actions,
like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in
the VGT mode.

And this becomes a real problem if we would want to use above action
to invalidate the LMTT early during VFs auto-provisioning, before VFs
are enabled, as such H2G would be rejected:

 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: FAST_REQ H2G fence 0x804e failed! e=0x30, h=0
 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: Fence 0x804e was used by action 0x7002 sent at:
      h2g_write+0x33e/0x870 [xe]
      __guc_ct_send_locked+0x1e1/0x1110 [xe]
      guc_ct_send_locked+0x9f/0x740 [xe]
      xe_guc_ct_send_locked+0x19/0x60 [xe]
      send_tlb_invalidation+0xc2/0x470 [xe]
      xe_gt_tlb_invalidation_all_async+0x45/0xa0 [xe]
      xe_gt_tlb_invalidation_all+0x4b/0xa0 [xe]
      lmtt_invalidate_hw+0x64/0x1a0 [xe]
      xe_lmtt_invalidate_hw+0x5c/0x340 [xe]
      pf_update_vf_lmtt+0x398/0xae0 [xe]
      pf_provision_vf_lmem+0x350/0xa60 [xe]
      xe_gt_sriov_pf_config_bulk_set_lmem+0xe2/0x410 [xe]
      xe_gt_sriov_pf_config_set_fair_lmem+0x1c6/0x620 [xe]
      xe_gt_sriov_pf_config_set_fair+0xd5/0x3f0 [xe]
      xe_pci_sriov_configure+0x360/0x1200 [xe]
      sriov_numvfs_store+0xbc/0x1d0
      dev_attr_store+0x17/0x40
      sysfs_kf_write+0x4a/0x80
      kernfs_fop_write_iter+0x166/0x220
      vfs_write+0x2ba/0x580
      ksys_write+0x77/0x100
      __x64_sys_write+0x19/0x30
      x64_sys_call+0x2bf/0x2660
      do_syscall_64+0x93/0x7a0
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: CT dequeue failed: -71
 [ ] xe 0000:4d:00.0: [drm] GT0: trying reset from receive_g2h [xe]

This could be mitigated by pushing earlier a PF self-configuration
with some hard-coded values that cover unlimited access to the GGTT,
use of all GuC contexts and doorbells.  This step is sufficient for
the GuC to switch into the VGT mode.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-5-michal.wajdeczko@intel.com
2025-07-15 13:05:19 +02:00
Michal Wajdeczko
92ba2032a1 drm/xe/pf: Move GGTT config KLVs encoding to helper
In upcoming patch we will want to encode GGTT config KLVs based
on raw numbers, without relying on the allocated GGTT node.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-4-michal.wajdeczko@intel.com
2025-07-15 13:05:18 +02:00
Michal Wajdeczko
1c38dd6afa drm/xe/pf: Resend PF provisioning after GT reset
If we reload the GuC due to suspend/resume or GT reset then we
have to resend not only any VFs provisioning data, but also PF
configuration, like scheduling parameters (EQ, PT), as otherwise
GuC will continue to use default values.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com
2025-07-15 13:05:17 +02:00
Michal Wajdeczko
9f50b729dd drm/xe/pf: Prepare to stop SR-IOV support prior GT reset
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.

However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:

 $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset

this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:

 [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
 [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
 [ ] xe 0000:00:02.0: [drm] GT0: reset queued
 [ ] xe 0000:00:02.0: [drm] GT0: reset started
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
 [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
 [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed

While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
2025-07-15 13:05:15 +02:00
Lucas De Marchi
f4d51b6ce5 drm/xe/lrc: Add table with LRC layout
Add a table to document the LRC's BO layout to make it easier to
visualize how each region stacks on top of each other.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-4-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Tvrtko Ursulin
aded26ccaa drm/xe: Waste fewer instructions in emit_wa_job()
I was debugging some unrelated issue and noticed the current code was
very verbose. We can improve it easily by using the more common batch
buffer building pattern.

Before:
                bb->cs[bb->len++] = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO;
     c4d:       41 8b 56 10             mov    0x10(%r14),%edx
     c51:       49 8b 4e 08             mov    0x8(%r14),%rcx
     c55:       8d 72 01                lea    0x1(%rdx),%esi
     c58:       41 89 76 10             mov    %esi,0x10(%r14)
     c5c:       c7 04 91 01 00 08 15    movl   $0x15080001,(%rcx,%rdx,4)
                        bb->cs[bb->len++] = entry->reg.addr;
     c63:       8b 08                   mov    (%rax),%ecx
     c65:       41 8b 56 10             mov    0x10(%r14),%edx
     c69:       49 8b 76 08             mov    0x8(%r14),%rsi
     c6d:       81 e1 ff ff 3f 00       and    $0x3fffff,%ecx
     c73:       8d 7a 01                lea    0x1(%rdx),%edi
     c76:       41 89 7e 10             mov    %edi,0x10(%r14)
     c7a:       89 0c 96                mov    %ecx,(%rsi,%rdx,4)
 ..etc..

After:
                *cs++ = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO;
     c52:       41 c7 04 24 01 00 08    movl   $0x15080001,(%r12)
     c59:       15
                        *cs++ = entry->reg.addr;
     c5a:       8b 10                   mov    (%rax),%edx
 ..etc..

Resulting in the following binary change:

	add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-348 (-348)
	Function                                     old     new   delta
	xe_gt_record_default_lrcs.cold               304     296      -8
	xe_gt_record_default_lrcs                   2200    1860    -340
	Total: Before=13554, After=13206, chg -2.57%

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-7-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Lucas De Marchi
f4b538245f drm/xe/gt: Drop third submission for default context
There's no need to submit the nop job again on the first queue. Any
state needed is already saved when the first LRC is switched out. The
comment is a little misleading regarding indirect W/A: first of all
there's still no indirect W/A enabled and secondly, even after they are,
there's no need to submit this job again for having their state
propagated: the indirect W/A will actually run on every LRC switch.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-6-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Lucas De Marchi
6d891d22c6 drm/xe/lrc: Remove leftover TODO/FIXME
There isn't anything to set for CTX_TIMESTAMP handling in the empty
LRC: that is set on every LRC init since it should always start from 0
rather than the value saved in the image after first submission.

The FIXME about perma-pinning also doesn't make much sense as we will
always going to pin the lrc and the GGTT mapping has nothing to do with
VM bind.

Nuke these leftover comments.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-5-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Lucas De Marchi
fab2cc0c09 drm/xe/gt: Extract emit_job_sync()
Both the nop and wa jobs are going through the same boiler plate calls
to emit the job with a timeout and handling error for both bb and job.
Extract emit_job_sync() so those functions create the bb, handling
possible errors and delegate the part about really emitting the job
and waiting for its completion.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-3-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:20:03 -07:00
Lucas De Marchi
e4cb5823ba drm/xe: Count dwords before allocating
The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
allocating enough space for the 3DSTATE or hardcoding 4k depending on
the engine. In the first case it doesn't account for the WAs and in the
former it may not be sufficient. Secondly it's using the size instead of
number of dwords, causing the buffer to be 4x bigger than needed:
xe_bb_new() receives number of dwords as parameter and its declaration
was also not following its implementation.

Lastly, reword the debug message since it's not only about the LRC WAs
anymore as it also include the 3DSTATE for render.

While it's unlikely this is causing any real issue, let's calculate the
needed space and allocate just enough.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-2-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:20:02 -07:00
Lucas De Marchi
76650bcf2a drm/xe/lrc: Reduce scope of empty lrc data
The only case in which new lrc data is created from scratch is when it's
called prior to recording the default lrc. There's no need to check for
NULL init_data since in that case the function already failed: just move
the allocation where it's needed.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-1-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:20:02 -07:00
Michal Wajdeczko
b533b8e5a1 drm/xe/vf: Store negotiated VF/PF ABI version at device level
There is no need to maintain PF ABI version on per-GT level.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-8-michal.wajdeczko@intel.com
2025-07-14 18:19:39 +02:00
Michal Wajdeczko
a6c384b24f drm/xe/pf: Stop requiring VF/PF version negotiation on every GT
While some VF/PF relay actions must be handled on the GT level,
like query for runtime registers, it was clarified by the arch
team that initial version negotiation can be done by the VF just
once, by using any available GuC/GT.

Move handling of the VF/PF ABI version negotiation on the PF side
from the GT level functions to the device level functions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-7-michal.wajdeczko@intel.com
2025-07-14 18:19:31 +02:00
Michal Wajdeczko
d962178a88 drm/xe/pf: Expose basic info about VFs in debugfs
We already have function to print summary about VFs, but we missed
to add debugfs attribute to make it visible. Do it now.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-6-michal.wajdeczko@intel.com
2025-07-14 18:19:29 +02:00
Michal Wajdeczko
ffab82b062 drm/xe: Introduce xe_gt_is_main_type helper
Instead of checking for not being a media type GT provide a small
helper to explicitly express our intentions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-5-michal.wajdeczko@intel.com
2025-07-14 18:19:29 +02:00
Michal Wajdeczko
76293a83a9 drm/xe: Introduce xe_tile_is_root helper
Instead of looking at the tile->id member provide a small helper
to explicitly express our intentions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-4-michal.wajdeczko@intel.com
2025-07-14 18:19:28 +02:00
Michal Wajdeczko
73c0e8054f drm/xe: Move PF and VF device types to separate headers
We plan to add more PF and VF types and mixing them in a single
file is not desired.  Move them out to new dedicated files.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-3-michal.wajdeczko@intel.com
2025-07-14 18:18:49 +02:00
Michal Wajdeczko
7dcae5288a drm/xe: Combine PF and VF device data into union
There is no need to keep PF and VF data fields fully separate
since we can be only in one mode at the time. Move them into
a anonymous union to save few bytes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-2-michal.wajdeczko@intel.com
2025-07-14 17:53:39 +02:00
Xin Wang
8d4aec43f6 drm/xe: Update register definitions in LRC layout header
Update the register definitions in xe_lrc_layout.h to align with the
official hardware specification (Bspec) terminology. Specifically:

- rename PVC_CTX_ACC_CTR_THOLD to CTX_ACC_CTR_THOLD
- rename PVC_CTX_ASID to CTX_ASID

Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711060924.7373-1-x.wang@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:34:44 -07:00
Tvrtko Ursulin
fba1230763 drm/xe: Add plumbing for indirect context workarounds
Some upcoming workarounds need to be emitted from the indirect workaround
context so lets add some plumbing where they will be able to easily slot
in.

No functional changes for now since everything is still deactivated.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Bspec: 45954
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-7-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:22:10 -07:00
Tvrtko Ursulin
a3397b24ae drm/xe: Allow specifying number of extra dwords at the end of wa bb emission
Indirect context setup will need more than one.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-6-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:55 -07:00
Tvrtko Ursulin
5ce511ad2b drm/xe: Track number of written dwords from workaround batch buffer emission
Indirect context setup will need to get to the number of written dwords.
Lets add it as an output parameter so it can be accessed from the finish
helper regardless of whether code is writing directly or via an shadow
buffer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-5-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:14 -07:00
Tvrtko Ursulin
1ec31d355c drm/xe: Rename utilization workaround emission function
Lucas suggested to consolidate to a slightly different naming scheme which
will align with the upcoming additions better.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-4-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:14 -07:00
Tvrtko Ursulin
81b79670a3 drm/xe: Pass wa bb setup arguments in a struct
Group the function arguments in a struct for more readable code and easier
extending.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-3-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:09 -07:00
Tvrtko Ursulin
fa7c2a2460 drm/xe: Generalize wa bb emission code
Generalize the wa bb emission by splitting it into three phases - setup,
emit and finish, and extract setup and finish steps into helpers.

This will enable using the same infrastructure for emitting the indirect
context workarounds.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-2-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:04:35 -07:00
Lucas De Marchi
e08c0fa02e drm/xe: Fix missing kernel-doc
Fix warning:

	Warning: drivers/gpu/drm/xe/xe_device_types.h:658 struct member 'wa_active' not described in 'xe_device'

Fixes: 661a6950e0 ("drm/xe: Add infrastructure for Device OOB workarounds")
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Jonathan Cavitt <joanthan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250711214911.2009714-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:01:19 -07:00
Dr. David Alan Gilbert
8f3d1c9fb0 drm/xe: Remove unused functions
xe_bo_create_from_data() last use was removed in 2023 by
commit 0e1a47fcab ("drm/xe: Add a helper for DRM device-lifetime BO
create")

xe_rtp_match_first_gslice_fused_off() last use was removed in 2023 by
commit 4e124151fc ("drm/xe/dg2: Drop pre-production workarounds")

Remove them, and xe_dss_mask_empty whose last use was by
xe_rtp_match_first_gslice_fused_off().

(Xe has a bunch ofother symbols that have been added but not used,
given how new it is, I've left those, as opposed to these that
had the code that used them removed).

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20250713152531.219326-1-linux@treblig.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 07:55:18 -07:00
Lucas De Marchi
7b6db1731a drm/xe: Normalize default param values
Document xe module params with the default values following a similar
strategy for all of them:

	1) Define a DEFAULT_* macro with the default value. When the
	   value can't be directly stringified, also define a *_STR
	   variant
	2) Use __stringify() or the _STR variant to make sure the
	   default value shows up in the param description

This allows us to show the correct default according to the
configuration. max_vfs for example was wrongly documented for
CONFIG_DRM_XE_DEBUG and svm_notifier_size didn't have its default
documented.

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250626-guc-log-level-v3-1-c3ed8b452e91@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11 12:56:38 -07:00