linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 10:20:17 -04:00

Author	SHA1	Message	Date
Michal Wajdeczko	88df7939d7	drm/xe/configfs: Rename struct xe_config_device Rename it to struct xe_config_group_device to better match its purpose. It will also help us to reintroduce in the upcoming patch the same struct name but this time to hold only configuration data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-6-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-05 11:54:07 -07:00
Michal Wajdeczko	c4b1dde063	drm/xe/configfs: Drop redundant init() error message There is no need to print separate error message since we will also print one in xe_init(). Also drop temporary variable, which was likely just taken from the example code. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-5-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-05 11:53:51 -07:00
Michal Wajdeczko	b90613fb02	drm/xe/configfs: Destroy xe_configfs.su_mutex on exit/error While mutex_destroy() is NOP when CONFIG_DEBUG_MUTEXES is not enabled, we should still call it. While around, drop a trailing line. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-4-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-05 11:53:29 -07:00
Michal Wajdeczko	823301c847	drm/xe: Print module init abort code We should provide a hint to the user why the module refused to load. This will also allow us to drop individual error messages from init steps. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-05 11:53:17 -07:00
Michal Wajdeczko	90759cddac	drm/xe: Simplify module initialization code There is no need to have extra checks and WARN() in the helpers as instead of an index of the entry with function pointers, we can pass pointer to the entry which we prepare directly in the main loop, that is guaranteed to be valid. add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-180 (-180) Function old new delta xe_exit 109 79 -30 cleanup_module 109 79 -30 xe_init 248 188 -60 init_module 248 188 -60 Total: Before=2774145, After=2773965, chg -0.01% Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-05 11:52:53 -07:00
Jonathan Cavitt	7c9de25efa	drm/xe/xe_guc_ads: Consolidate guc_waklv_enable functions Presently, multiple versions of the guc_waklv_enable_.* function exist, all with different numbers of dwords added to the klv_entry array. This is not extensible, and more duplicates of the function will need to be created if it ever becomes necessary to support 3 or more dwords per wa in the future. Consolidate the disparate guc_waklv_enable functions into a single guc_waklv_enable function that can take an arbitrary number of dword values. v2: - Update length value properly (Shuicheng) v3: (Harrison) - Use data as a term instead of dwords or arr - Reformat warning message to use hex values - Eliminate need for kzalloc and klv_entry array - Reorder function parameters to fix line wrapping v4: - Miscellaneous formatting fixes (Cavitt) v5: (Harrison) - s/data_range/data_len_dw - Use data_len_dw to calculate size for xe_map_memcpy_to Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Lucas De Marchi <lucas.demarch@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250728194806.68176-2-jonathan.cavitt@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-05 07:59:38 -07:00
Tangudu Tilak Tirumalesh	bcddb12c02	drm/xe: Extend wa_13012615864 to additional Xe2 and Xe3 platforms Extend WA 13012615864 to Graphics Versions 20.01,20.02,20.04 and 30.03. Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20250731220143.72942-2-jonathan.cavitt@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:52:20 -04:00
Tomasz Lis	ba180a3621	drm/xe/vf: Rebase exec queue parallel commands during migration recovery Parallel exec queues have an additional command streamer buffer which holds a GGTT reference to data within context status. The GGTT references have to be fixed after VF migration. v2: Properly handle nop entry, verify if parsing goes ok v3: Improve error/warn logging, add propagation of errors, give names to magic offsets Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-9-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:47:12 +02:00
Tomasz Lis	168b586731	drm/xe/vf: Refresh utilization buffer during migration recovery The WA buffer we use to capture context utilization contains GGTT references. This means its instructions have to be either fixed or re-emitted during VF post-migration recovery. This patch adds re-emitting content of the utilization WA BB during the recovery. The way we write to vram requires scratch buffer to be used before the whole block is memcopied. We are re-using a scratch buffer introduced in earlier part of the recovery. This is not a performance optimization, but a necessity to avoid creating dependencies between locks. v2: Notable rebase after "Prepare WA BB setup for more users" patch v3: Added error propagation Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-8-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:47:07 +02:00
Tomasz Lis	a0dda25d24	drm/xe/vf: Post migration, repopulate ring area for pending request The commands within ring area allocated for a request may contain references to GGTT. These references require update after VF migration, in order to continue any preempted LRCs, or jobs which were emitted to the ring but not sent to GuC yet. This change calls the emit function again for all such jobs, as part of post-migration recovery. v2: Moved few functions to better files v3: Take job_list_lock v4: Rephrased comments Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-7-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:46:56 +02:00
Tomasz Lis	30d137ddce	drm/xe/vf: Rebase MEMIRQ structures for all contexts after migration All contexts require an update of state data, as the data includes GGTT references to memirq-related buffers. Default contexts need these references updated as well, because they are not refreshed when a new context is created from them. The way we write to vram requires scratch buffer to be used before the whole block is memcopied. Since using kalloc() within specific recovery functions would lead to unintended relations between locks, we are allocating the buffer earlier, before any locks are taken. The same buffer will be used for other steps of the recovery. v2: Update addresses by xe_lrc_write_ctx_reg() rather than set_memory_based_intr() v3: Renamed parameter, reordered parameters in some functs v4: Check if have MEMIRQ, move `xe_gt*` funct to proper file v5: Revert back to requiring scratch buffer, but allocate it earlier this time Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-6-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:46:52 +02:00
Tomasz Lis	b46ef76673	drm/xe/vf: Rebase HWSP of all contexts after migration All contexts require an update due to GGTT range shift, as that affects their HWSP. The HW status page of a context contains GGTT references, which need to be shifted to a new range (or re-computed using the previously updated vma nodes). The references include ring start address and indirect state address. v2: move some functions to better matched files v3: Add missing kerneldocs v4: Style fix Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-5-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:46:47 +02:00
Tomasz Lis	a0840b1ce9	drm/xe: Block reset while recovering from VF migration Resetting GuC during recovery could interfere with the recovery process. Such reset might be also triggered without justification, due to migration taking time, rather than due to the workload not progressing. Doing GuC reset during the recovery would cause exit of RESFIX state, and therefore continuation of GuC work while fixups are still being applied. To avoid that, reset needs to be blocked during the recovery. This patch blocks the reset during recovery. Reset request in that time range will be stalled, and unblocked only after GuC goes out of RESFIX state. In case a reset procedure already started while the recovery is triggered, there isn't much we can do - we cannot wait for it to finish as it involves waiting for hardware, and we can't be sure at which exact point of the reset procedure the GPU got switched. Therefore, the rare cases where migration happens while reset is in progress, are still dangerous. Resets are not a part of the standard flow, and cause unfinished workloads - that will happen during the reset interrupted by migration as well, so it doesn't diverge that much from what normally happens during such resets. v2: Introduce a new atomic for reset blocking, as we cannot reuse `stopped` atomic (that could lead to losing a workload). v3: Switched atomic functs to ones which include proper barriers Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-4-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:46:34 +02:00
Tomasz Lis	f1193b864c	drm/xe/vf: Pause submissions during RESFIX fixups While applying post-migration fixups to VF, GuC will not respond to any commands. This means submissions have no way of finishing. To avoid acquiring additional resources and then stalling on hardware access, pause the submission work. This will decrease the chance of depleting resources, and speed up the recovery. v2: Commented xe_irq_resume() call v3: Typo fix Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-3-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:46:25 +02:00
Tomasz Lis	d47cc89d81	drm/xe/sa: Avoid caching GGTT address within the manager Non-virtualized resources require fixups after SRIOV VF migration. Caching GGTT references rather than re-computing them from the underlying Buffer Object is something we want to avoid, as such code would require additional fixup step and additional locking around all the places where the address is accessed. This change removes the cached GPU address from the Sub-Allocation Manager, and introduces a function which recomputes and returns the address instead. v2: renamed xe_sa_manager_gpu_addr(), added kerneldoc Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-2-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:46:17 +02:00
Simon Richter	0521a86822	Mark xe driver as BROKEN if kernel page size is not 4kB This driver, for the time being, assumes that the kernel page size is 4kB, so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB pages. Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250802024152.3021-1-Simon.Richter@hogyros.de	2025-08-04 11:24:03 +02:00
Michal Wajdeczko	9fd9f22144	drm/xe/pf: Don't resume device from restart worker The PF's restart worker shouldn't attempt to resume the device on its own, since its goal is to finish PF and VFs reprovisioning on the recently reset GuC. Take extra RPM reference while scheduling a work and release it from the worker or when we cancel a work. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-4-michal.wajdeczko@intel.com	2025-08-01 21:29:55 +02:00
Michal Wajdeczko	c6c86441c4	drm/xe/pf: Make sure PF is ready to configure VFs The PF driver might be resumed just to configure VFs, but since it is doing some asynchronous GuC reconfigurations after fresh reset, we should wait until all pending works are completed. This is especially important in case of LMEM provisioning, since we also need to update the LMTT and send invalidation requests to all GuCs, which are expected to be already in the VGT mode. Fixes: `68ae022278` ("drm/xe/pf: Force GuC virtualization mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com	2025-08-01 21:29:53 +02:00
Michal Wajdeczko	a424353937	drm/xe/pf: Disable PF restart worker on device removal We can't let restart worker run once device is removed, since other data that it might want to access could be already released. Explicitly disable worker as part of device cleanup action. Fixes: `a4d1c5d0b9` ("drm/xe/pf: Move VFs reprovisioning to worker") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com	2025-08-01 21:29:51 +02:00
Michal Wajdeczko	b52f8d7a8f	drm/xe/pf: Skip LMTT update if no LMEM was provisioned During VF unprovisioning, if VF was not provisioned with LMEM, there is no need to trigger LMTT update, as VF LMTT was never set. This will spare us sending full TLB invalidation requests. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801144418.180584-1-michal.wajdeczko@intel.com	2025-08-01 21:14:45 +02:00
Balasubramani Vivekanandan	1fdc4c381f	drm/xe/devcoredump: Defer devcoredump initialization during probe Doing devcoredump initializing before GT though look harmless, it leads to problem during driver unbind. Because of this order, GT/Engine release functions will be called before xe devcoredump release function (xe_driver_devcoredump_fini) leading to the following kernel crash[1] because the devcoredump functions might still use GT/Engine datastructures after those are freed. The following crash is observed while running the IGT xe_wedged@wedged-at-any-timeout. The test forces a wedged state by submitting a workload which hangs. Then does a unbind/rebind of the driver to recover from the wedged state. The hanged workload leads to a devcoredump. The following crash is noticed when the devcoredump capture races with the driver unbind. During driver unbind, the release function hw_engine_fini() will be called which assigns NULL to hwe->gt. But the same data structure is accessed during the coredump capture in the function xe_engine_snapshot_print by reading snapshot->hwe->gt. With this patch, we make sure the devcoredump is stopped before deinitializing the core driver functions. [1]: BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe] RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe] Call Trace: <TASK> ? drm_printf+0x64/0x90 __xe_devcoredump_read+0x23f/0x2d0 [xe] ? __pfx___drm_printfn_coredump+0x10/0x10 ? __pfx___drm_puts_coredump+0x10/0x10 xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe] process_one_work+0x22e/0x6f0 worker_thread+0x1e8/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x11f/0x250 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x47/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 v2: Detailed commit description (Rodrigo) v3: FIXME added (Rodrigo, Stuart) Fixes: `4209d635a8` ("drm/xe: Remove devcoredump during driver release") Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-01 11:28:35 -04:00
Dan Carpenter	7d3a5962d7	drm/xe/vf: Fix IS_ERR() vs NULL check in xe_sriov_vf_ccs_init() The xe_migrate_alloc() function returns NULL on error. It doesn't return error pointers. Update the checking to match. Fixes: `a843b98947` ("drm/xe/vf: Fix VM crash during VF driver release") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/aIzB8-Y6wtZvfNQT@stanley.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-01 11:26:22 -04:00
Maarten Lankhorst	1cda3c755b	drm/xe: Fix oops in xe_gem_fault when running core_hotunplug test. I saw an oops in xe_gem_fault when running the xe-fast-feedback testlist against the realtime kernel without debug options enabled. The panic happens after core_hotunplug unbind-rebind finishes. Presumably what happens is that a process mmaps, unlocks because of the FAULT_FLAG_RETRY_NOWAIT logic, has no process memory left, causing ttm_bo_vm_dummy_page() to return VM_FAULT_NOPAGE, since there was nothing left to populate, and then oopses in "mem_type_is_vram(tbo->resource->mem_type)" because tbo->resource is NULL. It's convoluted, but fits the data and explains the oops after the test exits. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250715152057.23254-2-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-07-31 13:45:03 +02:00
Lukasz Laguna	552dbba1ca	drm/xe/vf: Disable CSC support on VF CSC is not accessible by VF drivers, so disable its support flag on VF to prevent further initialization attempts. Fixes: `e02cea83d3` ("drm/xe/gsc: add Battlemage support") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Cc: Alexander Usyskin <alexander.usyskin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com	2025-07-30 13:09:46 +02:00
Satyanarayana K V P	a843b98947	drm/xe/vf: Fix VM crash during VF driver release The VF CCS save/restore series (patchwork #149108) has a dependency on the migration framework. A recent migration update in commit `d65ff1ec85` ("drm/xe: Split xe_migrate allocation from initialization") caused a VM crash during XE driver release for iGPU devices. Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b83: 0000 [#1] SMP NOPTI RIP: 0010:xe_lrc_ring_head+0x12/0xb0 [xe] Call Trace: xe_sriov_vf_ccs_fini+0x1e/0x40 [xe] devm_action_release+0x12/0x30 release_nodes+0x3a/0x120 devres_release_all+0x96/0xd0 device_unbind_cleanup+0x12/0x80 device_release_driver_internal+0x23a/0x280 device_release_driver+0x12/0x20 pci_stop_bus_device+0x69/0x90 pci_stop_and_remove_bus_device+0x12/0x30 pci_iov_remove_virtfn+0xbd/0x130 sriov_disable+0x42/0x100 pci_disable_sriov+0x34/0x50 xe_pci_sriov_configure+0xf71/0x1020 [xe] Update the VF CCS migration initialization sequence to align with the new migration framework changes, resolving the release-time crash. Fixes: `f3009272ff` ("drm/xe/vf: Create contexts for CCS read write") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250729120720.13990-1-satyanarayana.k.v.p@intel.com	2025-07-29 22:05:14 -07:00
Michal Wajdeczko	d6a0311c37	drm/xe/hw_engine_group: Don't use drm_warn to catch missed case Since hwe->class is an enumeration we can rely on the compiler to catch any unhandled engine class case at compile time thanks to [-Werror=switch]. Any unexpected use of a special CLASS_MAX enum case can be guarded by our xe_gt_assert() instead, which will be compiled-out on the production builds. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250725090508.571-1-michal.wajdeczko@intel.com	2025-07-29 17:25:06 +02:00
Priyanka Dandamudi	4df0bd5eb4	drm/xe/uapi: Add documentation for DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING Add documentation for drm_xe_gem_create structure flag DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING. v2: Modified to be in a more generalised way. Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250728043336.3319521-1-priyanka.dandamudi@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-07-29 11:21:00 +05:30
John Harrison	45fbb51050	drm/xe/guc: Add more GuC load error status codes The GuC load process will abort if certain status codes (which are indicative of a fatal error) are reported. Otherwise, it keeps waiting until the 'success' code is returned. New error codes have been added in recent GuC releases, so add support for aborting on those as well. v2: Shuffle HWCONFIG_START to the front of the switch to keep the ordering as per the enum define for clarity (review feedback by Jonathan). Also add a description for the basic 'invalid init data' code which was missing. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250726024337.4056272-1-John.C.Harrison@Intel.com	2025-07-28 14:38:44 -07:00
Ilia Levi	1ffcf8b8ae	drm/xe: Support for mmap-ing mmio regions Allow the driver to expose hardware register spaces to userspace through GEM objects with fake mmap offsets. This can be useful for userspace-firmware communication, debugging, etc. v2: Minor doc fix (CI) v3: Enforce MAP_SHARED (Tejas) Add fault handler with dummy page (Tejas, Matt Auld) Store physical address instead of xe_mmio in the GEM object (MattB) v4: Separate xe_mmio_gem from xe_mmio and make it private (MattB) Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250714122658.1803-1-ilia.levi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 14:44:54 -07:00
Tvrtko Ursulin	ca33cd271e	drm/xe/xelp: Add Wa_18022495364 Add Wa_18022495364 as a context workaround batch buffer workaround. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-9-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 08:43:03 -07:00
Tvrtko Ursulin	e8372edec9	drm/xe/xelp: Implement Wa_16010904313 Add XeLP workaround 16010904313. The description calls for it to be emitted as the indirect context buffer workaround for render and compute, and from the workaround batch buffer for the other engines. Therefore we plug into the previously added respective top level emission functions. The actual command streamer programming sequence differs from what is described in the PRM, in that it assumes the listed LRCA offset was supposed to actually refer to the location of the CTX_TIMESTAMP register instead of LRCA + 0x180c (which is in GPR space). Latter appears to make more sense under the assumption that multiple writes are helping with restoring the CTX_TIMESTAMP register content from the saved context state. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-8-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 08:42:49 -07:00
Michal Wajdeczko	9b807f0bb0	drm/xe/configfs: Use pci_name() for lookup There is no need to manually build PCI device name from BDF data, since it was already prepared and assigned and can be accessed by calling pci_name() function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-4-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 05:59:07 -07:00
Michal Wajdeczko	400a6da1e9	drm/xe/configfs: Enforce canonical device names While we expect config directory names to match PCI device name, currently we are only scanning provided names for domain, bus, device and function numbers, without checking their format. This would pass slightly broken entries like: /sys/kernel/config/xe/ ├── 0000:00:02.0000000000000 │ └── ... ├── 0000:00:02.0x │ └── ... ├── 0: 0: 2. 0 │ └── ... └── 0:0:2.0 └── ... To avoid such mistakes, check if the name provided exactly matches the canonical PCI device address format, which we recreated from the parsed BDF data. Also simplify scanf format as it can't really catch all formatting errors. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 05:59:07 -07:00
Michal Wajdeczko	0bdd05c2a8	drm/xe/configfs: Fix pci_dev reference leak We are using pci_get_domain_bus_and_slot() function to verify if the given config directory name matches any existing PCI device, but we missed to call matching pci_dev_put() to release reference. While around, also change error code in case of no device match, to make it more specific than generic formatting error. Fixes: `16280ded45` ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-25 05:59:07 -07:00
Shuicheng Lin	f98de826b4	drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() Memory allocated with drmm_kzalloc() should not be freed using kfree(), as it is managed by the DRM subsystem. The memory will be automatically freed when the associated drm_device is released. These 3 group pointers are allocated using drmm_kzalloc() in hw_engine_group_alloc(), so they don't require manual deallocation. Fixes: `6797906074` ("drm/xe/hw_engine_group: Fix potential leak") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250724193854.1124510-2-shuicheng.lin@intel.com	2025-07-25 10:01:02 +02:00
Matthew Brost	51330ba66c	drm/xe: Remove unused GT TLB invalidation trace points Remove unused GT TLB invalidation trace points after converting to used GT TLB invalidation jobs. The trace points removed were used during early bring up of unstable driver, with a stable driver no need to replace with new tracepoints. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-8-matthew.brost@intel.com	2025-07-24 18:27:47 -07:00
Matthew Brost	b8d5779eee	drm/xe: Use GT TLB invalidation jobs in PT layer Rather than open-coding GT TLB invalidations in the PT layer, use GT TLB invalidation jobs. The real benefit is that GT TLB invalidation jobs use a single dma-fence context, allowing the generated fences to be squashed in dma-resv/DRM scheduler. v2: - s/;;/; (checkpatch) - Move ijob/mjob job push after range fence install v3: - Remove extra newline (Stuart) - Set ijob/mjob near creation (Stuart) - Add comment back in (Stuart) Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-7-matthew.brost@intel.com	2025-07-24 18:27:47 -07:00
Matthew Brost	dba89840a9	drm/xe: Add GT TLB invalidation jobs Add GT TLB invalidation jobs which issue GT TLB invalidations. Built on top of Xe generic dependency scheduler. v2: - Fix checkpatch v3: - Fix kernel doc in xe_gt_tlb_inval_job_alloc_dep, xe_gt_tlb_inval_job_push - Use IS_ERR_OR_NULL in xe_gt_tlb_inval_job_put - Squash migrate lock / unlock helpers into this patch (Stuart) Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-6-matthew.brost@intel.com	2025-07-24 18:27:22 -07:00
Matthew Brost	535c445eb9	drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues Add a generic dependency scheduler for GT TLB invalidations, used to schedule jobs that issue GT TLB invalidations to bind queues. v2: - Use shared GT TLB invalidation queue for dep scheduler - Break allocation of dep scheduler into its own function - Add define for max number tlb invalidations - Skip media if not present Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-5-matthew.brost@intel.com	2025-07-24 18:25:59 -07:00
Matthew Brost	ada5121948	drm/xe: Create ordered workqueue for GT TLB invalidation jobs No sense to schedule GT TLB invalidation jobs in parallel which target the same GT given these all contend on the same lock, create ordered workqueue for GT TLB invalidation jobs. v3: - Fix type in commmit message (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-4-matthew.brost@intel.com	2025-07-24 18:25:58 -07:00
Matthew Brost	69f187d446	drm/xe: Add generic dependecy jobs / scheduler Add generic dependecy jobs / scheduler which serves as wrapper for DRM scheduler. Useful when we want delay a generic operation until a dma-fence signals. Existing use cases could be destroying of resources based fences / dma-resv, the preempt rebind worker, and pipelined GT TLB invalidations. Written in such a way it could be moved to DRM subsystem if needed. v3: - Remove unnecessary cast (Staurt) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-3-matthew.brost@intel.com	2025-07-24 18:25:56 -07:00
Matthew Brost	c3ead4ecfc	drm/xe: Explicitly mark migration queues with flag Rather than inferring if an exec queue is a migration queue for a flag, explicitly mark migration queues with a flag. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-2-matthew.brost@intel.com	2025-07-24 18:25:55 -07:00
Sk Anirban	d72779c29d	drm/xe/ptl: Apply Wa_16026007364 As part of this WA GuC will save and restore value of two XE3_Media control registers that were not included in the HW power context. Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250716101622.3421480-2-sk.anirban@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 14:01:27 -07:00
Tvrtko Ursulin	a98cdd979c	drm/xe: Use emit_flush_imm_ggtt helper instead of open coding Helper is already there so lets just use it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250724131711.74291-2-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 07:45:40 -07:00
Nitin Gote	a313d9059f	drm/xe: Rename MCFG_MCR_SELECTOR to STEER_SEMAPHORE The register at offset 0xfd0 was incorrectly named MCFG_MCR_SELECTOR, likely copied from i915. According to the hardware specification (Bspec), this register is actually called STEER_SEMAPHORE. Rename the register definition and update its usage in xe_gt_mcr.c to match the official hardware documentation. No functional changes. v2: Add Bspec reference (Tejas) Bspec: 67113 Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250723141039.3848390-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 07:37:45 -07:00
Michal Wajdeczko	159afd92ba	drm/xe/guc: Clear whole g2h_fence during initialization The struct g2h_fence must be explicitly initializated using the g2h_fence_init() function to avoid trash values in its members, but we missed to update this helper function with the new member. To fix that and avoid any future mistakes, memset the whole struct first, then update remaining non-zero members. Fixes: `94de94d24e` ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723175639.206875-1-michal.wajdeczko@intel.com	2025-07-24 14:48:42 +02:00
Michal Wajdeczko	538b27a09a	drm/xe: Make GGTT TLB invalidation failure message GT oriented GGTT TLB invalidation is performed on the specific GT, thus any failure message shall be also GT specific. And to help investigate any unexpected failures, promote message from warn level to WARN to get full call stack of this unlikely case. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723133015.206601-1-michal.wajdeczko@intel.com	2025-07-24 14:44:31 +02:00
Michal Wajdeczko	6983ea9cd7	drm/xe: Enable SR-IOV for TGL While we don't have official CI SR-IOV coverage for the Tigerlake platforms, we were using this platform for the feature enabling and Xe driver already has all required changes to support it. Since TGL platforms are guarded by the xe.require_force_probe flag enable SR-IOV feature on them, like we recently did for ADL/ATSM. Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-5-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-23 11:49:49 -07:00
Michal Wajdeczko	2e76103998	drm/xe: Enable SR-IOV for ADL/ATSM We were already testing those two platforms for a while on our CI, but enabling flag (has_sriov) was only available on the topic branch and only for builds with CONFIG_DRM_XE_DEBUG config. Since those two platforms are guarded by the another enabling flag (require_force_probe) and we believe our SR-IOV support for them is at sufficient level to start enjoying the feature, turn on the SR-IOV enabling flag unconditionally. Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-4-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-23 11:49:48 -07:00
Michal Wajdeczko	a2b461bd6f	drm/xe/pf: Enable SR-IOV PF mode by default We already claim official support for SR-IOV PF/VF modes on PTL and BMG platforms, but by default we start the Xe driver on those platforms in non-virtualized mode (native) since we still have max_vfs modparam set to disable creation of the VFs. It's time to let the Xe driver support SR-IOV PF mode by default. We were already testing this on our CI, which was relying on the patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-23 11:49:48 -07:00

1 2 3 4 5 ...

1368336 Commits