linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-07 23:20:32 -04:00

Author	SHA1	Message	Date
Michal Wajdeczko	f261f5ddde	drm/xe/debugfs: Don't expose dgfx residencies attributes on VF In addition of checking if we are running on the BATTLEMAGE, we should also check for not being a VF driver, as VFs can't access necessary registers, and doing so leads to: \| .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0 \| RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe] \| Call Trace: \| xe_mmio_read32+0x110/0x280 [xe] \| read_residency_counter+0x42/0xd0 [xe] \| dgfx_pkg_residencies_show+0x115/0x190 [xe] \| .. [drm] Package G2 counter failed to read, ret -19 or \| .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0 \| RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe] \| Call Trace: \| xe_mmio_read32+0x110/0x280 [xe] \| read_residency_counter+0x42/0xd0 [xe] \| dgfx_pcie_link_residencies_show+0xe7/0x160 [xe] \| .. [drm] PCIE LINK L0 RESIDENCY counter failed to read, ret -19 Similarly, there is no point to expose inject_csc_hw_error on VFs, as HW errors support is already disabled for VFs. Bspec: 53221 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Soham Purkait <soham.purkait@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905173625.8398-1-michal.wajdeczko@intel.com	2025-09-09 17:29:10 +02:00
Christophe JAILLET	7b77941724	drm/xe/hwmon: Use devm_mutex_init() Use devm_mutex_init() instead of hand-writing it. This saves some LoC, improves readability and saves some space in the generated .o file. Before: ====== text data bss dec hex filename 36884 10296 64 47244 b88c drivers/gpu/drm/xe/xe_hwmon.o After: ===== text data bss dec hex filename 36651 10224 64 46939 b75b drivers/gpu/drm/xe/xe_hwmon.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/989e96369e9e1f8a44b816962917ec76877c912d.1757252520.git.christophe.jaillet@wanadoo.fr Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-09 06:31:13 -07:00
Michal Wajdeczko	e57ae80fe0	drm/xe/debugfs: Make residencies definitions const No need to keep them non-const. Also fix declaration of .name member, as it points to the const string. This translates to: add/remove: 1/0 grow/shrink: 0/2 up/down: 80/-248 (-168) Function old new delta residencies - 80 +80 dgfx_pcie_link_residencies_show 365 263 -102 dgfx_pkg_residencies_show 454 308 -146 Total: Before=2821548, After=2821380, chg -0.01% Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Soham Purkait <soham.purkait@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Karthik Poosa <karthik.poosa@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250905180225.8434-1-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-09 06:05:45 -07:00
Raag Jadav	fce99326c9	drm/xe/i2c: Enable bus mastering Enable bus mastering for I2C controller to support device initiated in-band transactions. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20250908055320.2549722-1-raag.jadav@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-09 06:02:35 -07:00
Michal Wajdeczko	fd548b77d5	drm/xe/vf: Move VF CCS debugfs attribute The VF CCS handling is per-device so its debugfs file should not be exposed on per-GT basis. Move it up to the device level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-8-michal.wajdeczko@intel.com	2025-09-09 11:27:52 +02:00
Michal Wajdeczko	55ddca2a3c	drm/xe/vf: Move VF CCS data to xe_device We only need single set of VF CCS contexts, they are not per-tile as initial implementation might suggest. Move all VF CCS data from xe_tile.sriov.vf to xe_device.sriov.vf. Also rename some structs to align with the usage and fix their kernel-doc. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-7-michal.wajdeczko@intel.com	2025-09-09 11:27:50 +02:00
Michal Wajdeczko	e699700834	drm/xe/bo: Add xe_bo_has_valid_ccs_bb helper This will allow as to drop ugly IS_VF_CCS_BB_VALID macro. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-6-michal.wajdeczko@intel.com	2025-09-09 11:27:49 +02:00
Michal Wajdeczko	b179dfd0db	drm/xe/vf: Use single check when calling VF CCS functions All xe_sriov_vf_ccs() functions but init() expect to be called when initialization was successful and CCS handling is ready. Update IS_VF_CCS_READY macro and use it as single entry guard. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-5-michal.wajdeczko@intel.com	2025-09-09 11:27:46 +02:00
Michal Wajdeczko	aa8d9d75ea	drm/xe/vf: Drop IS_VF_CCS_INIT_NEEDED macro We only use this macro once and we can open-code it to explicitly show relevant conditions and avoid duplications. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-4-michal.wajdeczko@intel.com	2025-09-09 11:27:45 +02:00
Michal Wajdeczko	4e5bc50ad2	drm/xe/guc: Use proper flag definitions when registering context In H2G action context type is specified in flags dword in bits 2:1. Use generic FIELD_PREP macro instead of misleading BIT logic. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-3-michal.wajdeczko@intel.com	2025-09-09 11:27:44 +02:00
Michal Wajdeczko	dd432009f1	drm/xe/guc: Rename xe_guc_register_exec_queue This function is dedicated for use by the VFs, we shouldn't use name that might suggests it's general purpose. While there, update asserts to better reflect intended usage. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-2-michal.wajdeczko@intel.com	2025-09-09 11:27:42 +02:00
Lucas De Marchi	956f5e5bc8	drm/xe/configfs: Use config_group_put() configfs has a config_group_put() helper that was adopted by commit `88df7939d7` ("drm/xe/configfs: Rename struct xe_config_device"). Another pending work to add psmi later landed in commit `afe902848b` ("drm/xe/configfs: Allow to enable PSMI") and didn't use the helper. Use config_group_put() consistently to hide the inner workings of configfs. No change in behavior since it does exactly the same thing as currently being done. Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20250905162236.578117-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-08 07:39:47 -07:00
John Harrison	cf423b928f	drm/xe/guc: Fix badly worded error message If a GuC id lookup failed, the error message was 'Not engine present', which is bad in multiple ways - incorrect English and 'engines' are now called 'exec queues' in this context. So fix it. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250904195752.3846138-3-John.C.Harrison@Intel.com	2025-09-05 13:31:34 -07:00
John Harrison	0b05857dc1	drm/xe/guc: Clean up of GuC 'CTL' defines All the field generation for the CTL defines (used for GuC init data) were hand-rolled rather than using FIELD_PREP/REG_GENMASK/BIT macros. Also, there were a bunch of macros defined for verbosity settings that were never used. So fix that all up. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250904195752.3846138-2-John.C.Harrison@Intel.com	2025-09-05 13:31:32 -07:00
Julia Filipchuk	6fc957185e	drm/xe: Extend Wa_13011645652 to PTL-H, WCL Expand workaround to additional graphics architectures. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.17+ Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250903190122.1028373-2-julia.filipchuk@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-05 11:54:30 -07:00
Thomas Hellström	599334572a	drm/xe: Block exec and rebind worker while evicting for suspend / hibernate When the xe pm_notifier evicts for suspend / hibernate, there might be racing tasks trying to re-validate again. This can lead to suspend taking excessive time or get stuck in a live-lock. This behaviour becomes much worse with the fix that actually makes re-validation bring back bos to VRAM rather than letting them remain in TT. Prevent that by having exec and the rebind worker waiting for a completion that is set to block by the pm_notifier before suspend and is signaled by the pm_notifier after resume / wakeup. It's probably still possible to craft malicious applications that block suspending. More work is pending to fix that. v3: - Avoid wait_for_completion() in the kernel worker since it could potentially cause work item flushes from freezable processes to wait forever. Instead terminate the rebind workers if needed and re-launch at resume. (Matt Auld) v4: - Fix some bad naming and leftover debug printouts. - Fix kerneldoc. - Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld). - Rework the interface of xe_vm_rebind_resume_worker (Matt Auld). Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288 Fixes: `c6a4d46ec1` ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com	2025-09-05 17:03:35 +02:00
Thomas Hellström	ebd546fdff	drm/xe: Allow the pm notifier to continue on failure Its actions are opportunistic anyway and will be completed on device suspend. Marking as a fix to simplify backporting of the fix that follows in the series. v2: - Keep the runtime pm reference over suspend / hibernate and document why. (Matt Auld, Rodrigo Vivi): Fixes: `c6a4d46ec1` ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-3-thomas.hellstrom@linux.intel.com	2025-09-05 17:03:35 +02:00
Thomas Hellström	cb3d7b3b46	drm/xe: Attempt to bring bos back to VRAM after eviction VRAM+TT bos that are evicted from VRAM to TT may remain in TT also after a revalidation following eviction or suspend. This manifests itself as applications becoming sluggish after buffer objects get evicted or after a resume from suspend or hibernation. If the bo supports placement in both VRAM and TT, and we are on DGFX, mark the TT placement as fallback. This means that it is tried only after VRAM + eviction. This flaw has probably been present since the xe module was upstreamed but use a Fixes: commit below where backporting is likely to be simple. For earlier versions we need to open- code the fallback algorithm in the driver. v2: - Remove check for dgfx. (Matthew Auld) - Update the xe_dma_buf kunit test for the new strategy (CI) - Allow dma-buf to pin in current placement (CI) - Make xe_bo_validate() for pinned bos a NOP. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995 Fixes: `a78a8da51b` ("drm/ttm: replace busy placement with flags v6") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.9+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com	2025-09-05 17:03:35 +02:00
Sanjay Yadav	c4dfa0bea2	drm/xe/migrate: Remove unneeded emit_pte() when copying CCS only In xe_migrate_copy(), when copy_only_ccs is true, we only need two emit_pte() calls one for the BO and one for the raw CCS storage. However, the current implementation issues three emit_pte() calls, resulting in an unnecessary PTE programming job. This fix removes the redundant emit_pte() call to avoid programming the same PTEs twice and reducing overhead during CCS-only migration. v2: Preserve correct behavior on DG2, which requires both CCS and page copies. Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904161423.2448727-1-sanjay.kumar.yadav@intel.com	2025-09-05 13:29:20 +01:00
Michal Wajdeczko	2d1e962098	drm/xe: Fix broken kernel-doc for the struct xe_bo Use correct multi-line kernel-doc style if required. Some members were described only in the commit message. Some other members were described using wrong names. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250904144026.7222-1-michal.wajdeczko@intel.com	2025-09-05 13:32:13 +02:00
Michal Wajdeczko	dcc38bc5e1	drm/xe/kunit: Drop xe_wa_test_exit Remove xe_wa_test_exit() as it could crach the KUnit kernel in case of hitting some asserts in xe_wa_test_init() as test->priv could not be pointing to expected data. \| # xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:34 \| Expected ret == 0, but \| ret == -19 (0xffffffffffffffed) \|Bus error - the host /dev/shm or /tmp mount likely just ran out of space \|Kernel panic - not syncing: Kernel mode signal 7 Note that there is no need to call drm_kunit_helper_free_device() since our fake device allocated by drm_kunit_helper_alloc_device() will be cleaned up automatically. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250829171922.572-7-michal.wajdeczko@intel.com	2025-09-05 12:57:29 +02:00
Michal Wajdeczko	a9c8517058	drm/xe/kunit: Promote fake platform parameter list The list of all known representative platforms defined in xe_wa could be used in more places by other test suites. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250829171922.572-6-michal.wajdeczko@intel.com	2025-09-05 12:57:27 +02:00
Michal Wajdeczko	ddbe5aecea	drm/xe/kunit: Drop custom struct platform_test_case Custom struct platform_test_case definition in xe_wa is now almost identical to generic struct xe_pci_fake_data defintiion except the .name member, which could be generated by xe_pci_fake_data_desc(). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250829171922.572-5-michal.wajdeczko@intel.com	2025-09-05 12:57:26 +02:00
Michal Wajdeczko	b1ee655843	drm/xe/kunit: Introduce xe_pci_fake_data_desc() We already use struct xe_pci_fake_data to provide custom config of the fake PCI device and soon we will be using this struct also as direct parameter for the parameterized Xe KUnit tests. Add function to generate description based on that config data. For platform or subplatform name lookup pciidlist which already have definitions of all supported platforms. Examples: TIGERLAKE TIGERLAKE A0 TIGERLAKE SR-IOV PF ... PANTHERLAKE 30.00(Xe3_LPG) 30.00(Xe3_LPM) PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0 PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0 SR-IOV VF Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250829171922.572-4-michal.wajdeczko@intel.com	2025-09-05 12:57:25 +02:00
Michal Wajdeczko	42367babd8	drm/xe/kunit: Update struct xe_pci_fake_data step declarations The struct xe_pci_fake_data has fields that specify graphics and media stepping of the fake PCI device used during KUnit testing. Change definitions of those separate step fields and use existing struct xe_step_info definition that already have required fields. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250829171922.572-3-michal.wajdeczko@intel.com	2025-09-05 12:57:23 +02:00
Michal Wajdeczko	981daf1046	drm/xe: Allow to stub lookup for graphics and media IP In upcoming patch we will want to replace lookup code during the test to relax the strict match that we use in production. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250829171922.572-2-michal.wajdeczko@intel.com	2025-09-05 12:57:22 +02:00
Matthew Auld	edb1745fc6	drm/xe: improve dma-resv handling for backup object Since the dma-resv is shared we don't need to reserve and add a fence slot fence twice, plus no need to loop through the dependencies. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250829164715.720735-2-matthew.auld@intel.com	2025-09-05 11:53:00 +01:00
Matthew Auld	7477c4bd20	drm/xe/pt: unify xe_pt_svm_pre_commit with userptr We now use the same notifier lock for SVM and userptr, with that we can combine xe_pt_userptr_pre_commit and xe_pt_svm_pre_commit. v2: (Matt B) - Re-use xe_svm_notifier_lock/unlock for userptr. - Combine svm/userptr handling further down into op_check_svm_userptr. v3: - Only hide the ops if we lack DRM_GPUSVM, since we also need them for userptr. Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-18-matthew.auld@intel.com	2025-09-05 11:45:47 +01:00
Matthew Auld	9e97874148	drm/xe/userptr: replace xe_hmm with gpusvm Goal here is cut over to gpusvm and remove xe_hmm, relying instead on common code. The core facilities we need are get_pages(), unmap_pages() and free_pages() for a given useptr range, plus a vm level notifier lock, which is now provided by gpusvm. v2: - Reuse the same SVM vm struct we use for full SVM, that way we can use the same lock (Matt B & Himal) v3: - Re-use svm_init/fini for userptr. v4: - Allow building xe without userptr if we are missing DRM_GPUSVM config. (Matt B) - Always make .read_only match xe_vma_read_only() for the ctx. (Dafna) v5: - Fix missing conversion with CONFIG_DRM_XE_USERPTR_INVAL_INJECT v6: - Convert the new user in xe_vm_madise. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Dafna Hirschfeld <dafna.hirschfeld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-17-matthew.auld@intel.com	2025-09-05 11:45:47 +01:00
Matthew Auld	dd25b995a2	drm/xe/vm: split userptr bits into separate file This will simplify compiling out the bits that depend on DRM_GPUSVM in a later patch. Without this we end up littering the code with ifdef checks, plus it becomes hard to be sure that something won't blow at runtime due to something not being initialised, even though it passed the build. Should be no functional change here. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-16-matthew.auld@intel.com	2025-09-05 11:45:47 +01:00
Matthew Auld	83f706ecbd	drm/gpusvm: export drm_gpusvm_pages API Export get/unmap/free pages API. We also need to tweak the SVM init to allow skipping much of the unneeded parts. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-15-matthew.auld@intel.com	2025-09-05 11:45:47 +01:00
Matthew Auld	6364afd532	drm/gpusvm: refactor core API to use pages struct Refactor the core API of get/unmap/free pages to all operate on drm_gpusvm_pages. In the next patch we want to export a simplified core API without needing fully blown svm range etc. Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-14-matthew.auld@intel.com	2025-09-05 11:45:46 +01:00
Matthew Auld	f70da6f99d	drm/gpusvm: pull out drm_gpusvm_pages substructure Pull the pages stuff from the svm range into its own substructure, with the idea of having the main pages related routines, like get_pages(), unmap_pages() and free_pages() all operating on some lower level structures, which can then be re-used for stuff like userptr. v2: - Move seq into pages struct (Matt B) v3: - Small kernel-doc fixes Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-13-matthew.auld@intel.com	2025-09-05 11:45:46 +01:00
Matthew Auld	ad70e289ed	drm/gpusvm: use more selective dma dir in get_pages() If we are only reading the memory then from the device pov the direction can be DMA_TO_DEVICE. This aligns with the xe-userptr code. Using the most restrictive data direction to represent the access is normally a good idea. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-12-matthew.auld@intel.com	2025-09-05 11:45:46 +01:00
Matthew Auld	c50729c68a	drm/gpusvm: fix hmm_pfn_to_map_order() usage Handle the case where the hmm range partially covers a huge page (like 2M), otherwise we can potentially end up doing something nasty like mapping memory which is outside the range, and maybe not even mapped by the mm. Fix is based on the xe userptr code, which in a future patch will directly use gpusvm, so needs alignment here. v2: - Add kernel-doc (Matt B) - s/fls/ilog2/ (Thomas) Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250828142430.615826-11-matthew.auld@intel.com	2025-09-05 11:45:46 +01:00
Harish Chegondi	5952d80514	drm/xe/xe2hpg: Add Wa_18041344222 for Xe2_HPG Add Wa_18041344222 for Xe2_HPG that requires disabling the perf mode for subslice count for eustall sampling when the enabled slices are discontiguous. Bspec: 79483, 56024 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/b6a631a13a9fb7360e89d679e0797fae42d5a09e.1756855529.git.harish.chegondi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-05 03:02:34 -07:00
Harish Chegondi	6ee8adf124	drm/xe/mcr: Make xe_gt_mcr_get_dss_steering() input gt a const Make gt, input parameter to xe_gt_mcr_get_dss_steering(), a constant. This would allow xe_gt_mcr_get_dss_steering() to be called from functions that have gt as const to struct xe_gt. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://lore.kernel.org/r/9dc621a90880f62ac8e2951afea7952277f7eb0e.1756855529.git.harish.chegondi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-05 03:02:34 -07:00
Michal Wajdeczko	3088f485de	drm/xe/configfs: Don't expose survivability_mode if not applicable The survivability_mode attribute is applicable only for DGFX and platforms newer than BATTLEMAGE. Use .is_visible() hook to hide this attribute when above conditions are not met. Remove code that was trying to fix such configuration during the runtime. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250902131744.5076-4-michal.wajdeczko@intel.com	2025-09-04 22:33:51 +02:00
Michal Wajdeczko	b076d32177	drm/xe/configfs: Prepare to filter-out configfs attributes Implement empty ops.is_visible hook to allow filtering-out any not supported attributes, as not all of them are applicable on all xe platforms. Since during creation of each new configfs directory we are looking for xe device descriptor to validate that xe driver supports given PCI device, store reference to that descriptor to allow later use while doing attribute filtering. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250902131744.5076-3-michal.wajdeczko@intel.com	2025-09-04 22:32:46 +02:00
Michal Wajdeczko	079a5c83db	drm/xe/configfs: Don't touch survivability_mode on fini This is a user controlled configfs attribute, we should not modify that outside the configfs attr.store() implementation. Fixes: `bc417e54e2` ("drm/xe: Enable configfs support for survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com	2025-09-04 22:31:26 +02:00
Michal Wajdeczko	2506af5f81	drm/xe/guc: Set upper limit of H2G retries over CTB The GuC communication protocol allows GuC to send NO_RESPONSE_RETRY reply message to indicate that due to some interim condition it can not handle incoming H2G request and the host shall resend it. But in some cases, due to errors, this unsatisfied condition might be final and this could lead to endless retries as it was recently seen on the CI: [drm] GT0: PF: VF1 FLR didn't finish in 5000 ms (-ETIMEDOUT) [drm] GT0: PF: VF1 resource sanitizing failed (-ETIMEDOUT) [drm] GT0: PF: VF1 FLR failed! [drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0 [drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0 [drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0 [drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0 To avoid such dangerous loops allow only limited number of retries (for now 50) and add some delays (n * 5ms) to slow down the rate of resending this repeated request. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://lore.kernel.org/r/20250903223330.6408-1-michal.wajdeczko@intel.com	2025-09-04 22:24:51 +02:00
Michal Wajdeczko	a85ead6d7f	drm/xe/debugfs: Move sa_info from gt to tile directory Our drm-based suballocator is implemented per-tile so it is better to show its debug information also per-tile debugfs directory, not under per-gt directory as it is done today. To allow adding more per-tile attributes, prepare necessary helper functions, like we already did for per-gt or per-uc attributes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250829201106.1263-1-michal.wajdeczko@intel.com	2025-09-04 12:45:09 +02:00
Himal Prasad Ghimiray	fece859855	drm/xe/vm: Fix error handling in xe_vm_query_vmas_attrs_ioctl() copy_to_user() returns the number of bytes not copied on failure, not a negative error code. Update the logic to return -EFAULT instead of the number of bytes to correctly signal the error. Fixes: `418807860e` ("drm/xe/uapi: Add UAPI for querying VMA count and memory attributes") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250828104933.3839825-3-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-09-04 11:58:21 +05:30
Himal Prasad Ghimiray	294912f93d	drm/xe: Fix indentation in xe_zap_ptes_in_madvise_range Fix misleading indentation around WRITE_ONCE in pte zap loop. No functional change intended. Fixes: `ada7486c56` ("drm/xe: Implement madvise ioctl for xe") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250828104933.3839825-2-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-09-04 11:58:21 +05:30
Matthew Brost	4208fac3dc	drm/xe: Add more SVM GT stats Add more SVM GT stats which give visibility to where time is spent in the SVM page fault handler. Stats include number of faults at a given size, total SVM page fault time, migration time in us, copy time in us, copy kb, get pages time in us, and bind time in us. Will help in tuning SVM for performance. v2: - Include local changes v3: - Add tlb invalidation + valid page fault + per size copy size stats v4: - Ensure gt not NULL when incrementing SVM copy stats - Normalize stats names - Use magic macros to generate increment functions for ranges v7: - Use DEF_STAT_STR (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250829172232.1308004-3-matthew.brost@intel.com	2025-09-02 22:23:08 -07:00
Matthew Brost	56e6d56885	drm/xe: Add clearing stats to GT debugfs It helpful to clear GT stats, run a test cases which is being profiled, and look at the results of the stats from the individual test case. Make stats entry writable and upon write clear the stats. v5: - Drop clear_stats debugfs entry (Lucas) v6: - Use xe_gt_stats_clear rather than helper (Michal) - Rework loop in xe_gt_stats_clear (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250829172232.1308004-2-matthew.brost@intel.com	2025-09-02 22:23:07 -07:00
Tangudu Tilak Tirumalesh	8d6f16f1f0	drm/xe: Extend Wa_22021007897 to Xe3 platforms WA 22021007897 should also be applied to Graphics Versions 30.00, 30.01 and 30.03. To make it simple, simply use the range [3000, 3003] that should be ok as there isn't a 3002 and if it's added, the WA list would need to be revisited anyway. Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250827-wa-22021007897-v1-1-96922eb52af4@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-02 20:10:44 -07:00
Zhanjun Dong	ad83b1da5b	drm/xe/guc: Increase GuC crash dump buffer size There are platforms already have a maximum dump size of 12KB, to avoid data truncating, increase GuC crash dump buffer size to 16KB. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250829160427.1245732-1-zhanjun.dong@intel.com	2025-09-02 13:57:17 -07:00
Satyanarayana K V P	be5590c384	drm/xe/vf: Enable CCS save/restore only on supported GUC versions CCS save/restore is supported starting with GuC 70.48.0 (compatibility version 1.23.0). Gate the feature on the GuC firmware version and keep it disabled on older or unsupported versions. Fixes: `f3009272ff` ("drm/xe/vf: Create contexts for CCS read write") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Andi Shyti <andi.shyti@kernel.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250902103256.21658-2-satyanarayana.k.v.p@intel.com	2025-09-02 18:59:17 +02:00
Satyanarayana K V P	ee4b32220a	drm/xe/guc: Add devm release action to safely tear down CT When a buffer object (BO) is allocated with the XE_BO_FLAG_GGTT_INVALIDATE flag, the driver initiates TLB invalidation requests via the CTB mechanism while releasing the BO. However a premature release of the CTB BO can lead to system crashes, as observed in: Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:h2g_write+0x2f3/0x7c0 [xe] Call Trace: guc_ct_send_locked+0x8b/0x670 [xe] xe_guc_ct_send_locked+0x19/0x60 [xe] send_tlb_invalidation+0xb4/0x460 [xe] xe_gt_tlb_invalidation_ggtt+0x15e/0x2e0 [xe] ggtt_invalidate_gt_tlb.part.0+0x16/0x90 [xe] ggtt_node_remove+0x110/0x140 [xe] xe_ggtt_node_remove+0x40/0xa0 [xe] xe_ggtt_remove_bo+0x87/0x250 [xe] Introduce a devm-managed release action during xe_guc_ct_init() and xe_guc_ct_init_post_hwconfig() to ensure proper CTB disablement before resource deallocation, preventing the use-after-free scenario. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Summers Stuart <stuart.summers@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250901072541.31461-1-satyanarayana.k.v.p@intel.com	2025-09-02 08:21:58 +02:00

1 2 3 4 5 ...

1382260 Commits