linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-17 09:00:22 -05:00

Author	SHA1	Message	Date
Badal Nilawar	69ac1bb8fc	drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume Reload late binding fw during runtime resume. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-7-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	691a54ad94	drm/xe/xe_late_bind_fw: Load late binding firmware Load late binding firmware v2: - s/EAGAIN/EBUSY/ - Flush worker in suspend and driver unload (Daniele) v3: - Use retry interval of 6s, in steps of 200ms, to allow other OS components release MEI CL handle (Sasha) v4: - return -ENODEV if component not added (Daniele) - parse and print status returned by csc v5: - Use payload to check firmware valid (Daniele) - Obtain the RPM reference before scheduling the worker to ensure the device remains awake until the worker completes firmware loading (Rodrigo) v6: - In case of error donot re-attempt fw download (Daniele) v7 (Rodrigo): - Rename of mei structs and callback. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-6-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	45832bf9c1	drm/xe/xe_late_bind_fw: Initialize late binding firmware Search for late binding firmware binaries and populate the meta data of firmware structures. v2 (Daniele): - drm_err if firmware size is more than max pay load size - s/request_firmware/firmware_request_nowarn/ as firmware will not be available for all possible cards v3 (Daniele): - init firmware from within xe_late_bind_init, propagate error - switch late_bind_fw to array to handle multiple firmware types v4 (Daniele): - Alloc payload dynamically, fix nits v6 (Daniele) - %s/MAX_PAYLOAD_SIZE/XE_LB_MAX_PAYLOAD_SIZE/ Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-5-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	918bd789d6	drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw Introduce xe_late_bind_fw to enable firmware loading for the devices, such as the fan controller, during the driver probe. Typically, firmware for such devices are part of IFWI flash image but can be replaced at probe after OEM tuning. This patch binds mei late binding component to enable firmware loading. v2: - Add devm_add_action_or_reset to remove the component (Daniele) - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele) v3: - Fail driver probe if late bind initialization fails, add has_late_bind flag (Daniele) v4: - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/ v6: - rebased v7: - rebased - In xe_late_bind_init, use drm_err when returning an error to stop the probe (Lucas) - Use imperative mode in commit message (Lucas) Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:00 -07:00
Thomas Hellström	187e16f69d	drm/xe: Work around clang multiple goto-label error When using drm_exec_retry_on_contention(), clang may consider all labels for which we take addresses in a function as potential retry goto targets, although strictly only one is possible. It will then in some situations generate false positive errors. In this case, the compiler, for some architectures, consider the might_lock(&m->job_mutex); as a potential goto target from drm_exec_retry_on_contention(), and errors. Work around that by moving the xe_validate / drm_exec transaction to a separate function. v2: - New commit message based on analysis of Nathan Chancellor Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509101853.nDmyxTEM-lkp@intel.com/ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Link: https://lore.kernel.org/r/20250911080324.180307-1-thomas.hellstrom@linux.intel.com	2025-09-18 08:27:00 +02:00
Michal Wajdeczko	fb3c27a69c	drm/xe/sysfs: Simplify sysfs registration Instead of manually maintaining each sysfs file define and use attribute groups and register them using device managed function. Then use is_visible() to filter-out unsupported attributes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-3-michal.wajdeczko@intel.com	2025-09-17 21:59:31 +02:00
Michal Wajdeczko	a2d6223d22	drm/xe/vf: Don't expose sysfs attributes not applicable for VFs VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE (already guarded by the info.skip_pcode flag) so we shouldn't expose attributes that require any of them to avoid errors like: [] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \ inaccessible register 0x138340+0x0 [] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe] [] Call Trace: [] xe_mmio_read32+0x110/0x280 [xe] [] auto_link_downgrade_capable_show+0x2e/0x70 [xe] [] dev_attr_show+0x1a/0x70 [] sysfs_kf_seq_show+0xaa/0x120 [] kernfs_seq_show+0x41/0x60 Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Fixes: `cdc36b66cd` ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com	2025-09-17 21:59:29 +02:00
Shuicheng Lin	33fe111a35	drm/xe/madvise: Fix ioctl argument check It is "preferred_mem_loc" instead of "atomic" for the ATTR_PREFERRED_LOC path. Also include 2 minor changes with no functional impact. 1. Remove the redundant "attr.atomic_access" assignment. 2. Replace down_read_interruptible() with xe_svm_notifier_lock_interruptible() to pair with xe_svm_notifier_unlock(). Fixes: `ada7486c56` ("drm/xe: Implement madvise ioctl for xe") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250911173139.1405878-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-09-17 10:09:34 -07:00
Shuicheng Lin	5959c4da17	drm/xe: Misc refine for svm These changes should have no functional impact. 1. Correct typo of "operation"in macro range_debug(). 2. Combine 2 spin_lock() call in xe_svm_garbage_collector() into 1. 3. Drop redundant preferred_region_is_vram check in xe_svm_range_needs_migrate_to_vram(). 4. Combine the devmem_possible check in xe_svm_handle_pagefault(). need_vram includes the IS_DGFX() check, so there is no change for .devmem_only. v2: revert !ctx.devmem_only change (Matt) v3: rebase code and refine commit message. v4: rebase code and refine commit message. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250911031405.1371812-2-shuicheng.lin@intel.com	2025-09-17 09:50:44 -07:00
Michal Wajdeczko	5bb5258e35	drm/xe/tests: Add pre-GMDID IP descriptors to param generators Recently introduced kunit parameter generators were based on the existing arrays which have only GDMID-based IPs and didn't take into account IP definitions from pre-GMDID era. Add test only arrays with pre-GMDID IPs (as those will not change) and extend param generators to start iterating over them. [ ] =================== xe_pci (2 subtests) ==================== [ ] ==================== check_graphics_ip ==================== [ ] [PASSED] 12.00 Xe_LP [ ] [PASSED] 12.10 Xe_LP+ [ ] [PASSED] 12.55 Xe_HPG [ ] [PASSED] 12.60 Xe_HPC [ ] [PASSED] 12.70 Xe_LPG [ ] [PASSED] 12.71 Xe_LPG [ ] [PASSED] 12.74 Xe_LPG+ [ ] [PASSED] 20.01 Xe2_HPG [ ] [PASSED] 20.02 Xe2_HPG [ ] [PASSED] 20.04 Xe2_LPG [ ] [PASSED] 30.00 Xe3_LPG [ ] [PASSED] 30.01 Xe3_LPG [ ] [PASSED] 30.03 Xe3_LPG [ ] ================ [PASSED] check_graphics_ip ================ [ ] ===================== check_media_ip ====================== [ ] [PASSED] 12.00 Xe_M [ ] [PASSED] 12.55 Xe_HPM [ ] [PASSED] 13.00 Xe_LPM+ [ ] [PASSED] 13.01 Xe2_HPM [ ] [PASSED] 20.00 Xe2_LPM [ ] [PASSED] 30.00 Xe3_LPM [ ] [PASSED] 30.02 Xe3_LPM [ ] ================= [PASSED] check_media_ip ================== [ ] ===================== [PASSED] xe_pci ====================== Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916171645.3335-1-michal.wajdeczko@intel.com	2025-09-17 15:29:13 +02:00
Daniele Ceraolo Spurio	aaae483657	drm/xe: Allow error injection for xe_pxp_exec_queue_add This will allow us to simulate this function returning an error like we do for other functions called in the exec_queue_create path. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-4-daniele.ceraolospurio@intel.com	2025-09-16 15:54:31 -07:00
Daniele Ceraolo Spurio	626667321d	drm/xe: Fix error handling if PXP fails to start Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini). Fixes: `72d479601d` ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com	2025-09-16 15:54:28 -07:00
Yang Li	9e6eb49ec1	drm/xe: Remove duplicate header files Fix some duplicate includes in xe: ./drivers/gpu/drm/xe/xe_tlb_inval.c: xe_tlb_inval.h is included more than once. ./drivers/gpu/drm/xe/xe_pt.c: xe_tlb_inval_job.h is included more than once. While at it, also sort the include lines alphabetically. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24705 Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24706 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [Reword commit message] Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916021039.1632766-1-yang.lee@linux.alibaba.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-16 12:47:40 -07:00
John Harrison	3b09b11805	drm/xe/guc: Return an error code if the GuC load fails Due to multiple explosion issues in the early days of the Xe driver, the GuC load was hacked to never return a failure. That prevented kernel panics and such initially, but now all it achieves is creating more confusing errors when the driver tries to submit commands to a GuC it already knows is not there. So fix that up. As a stop-gap and to help with debug of load failures due to invalid GuC init params, a wedge call had been added to the inner GuC load function. The reason being that it leaves the GuC log accessible via debugfs. However, for an end user, simply aborting the module load is much cleaner than wedging and trying to continue. The wedge blocks user submissions but it seems that various bits of the driver itself still try to submit to a dead GuC and lots of subsequent errors occur. And with regards to developers debugging why their particular code change is being rejected by the GuC, it is trivial to either add the wedge back in and hack the return code to zero again or to just do a GuC log dump to dmesg. v2: Add support for error injection testing and drop the now redundant wedge call. CC: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250909224132.536320-1-John.C.Harrison@Intel.com	2025-09-16 12:11:08 -07:00
Zongyao Bai	1a869168d9	drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init On partial failure, some sysfs files created before the failure might not be removed. Add common cleanup step to remove them all immediately, as is should be harmless to attempt to remove non-existing files. Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Zongyao Bai <zongyao.bai@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250915214716.1327379-2-zongyao.bai@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-16 07:59:36 -07:00
John Harrison	456b32c9c1	drm/xe/guc: Add test for G2G communications Add a test for sending messages from every GuC to every other GuC to test G2G communications. Note that, being a debug only feature, the test interface only exists in pre-production builds of the GuC firmware. v2: Fix 'default' case to actually use the driver's registration code as well as allocation. Add comments explaining the different test types. Fix (C) date and an assert. Review feedback from Daniele. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-5-John.C.Harrison@Intel.com	2025-09-15 09:53:26 -07:00
John Harrison	537773db91	drm/xe: Allow freeing of a managed bo If a bo is created via xe_managed_bo_create_pin_map() then it cannot be freed by the driver using xe_bo_unpin_map_no_vm(), or indeed any other existing function. The DRM layer will still have a pointer stashed away for later freeing, causing a invalid memory access on driver unload. So add a helper for releasing the DRM action as well. v2: Drop 'xe' parameter (review feedbak from Michal W) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-4-John.C.Harrison@Intel.com	2025-09-15 09:53:25 -07:00
John Harrison	acf01c79f0	drm/xe/guc: Add firmware build type to available info Some test features are not available in production builds of the GuC firmware. So add the build type field to the available information that tests can inspect to decide if they should skip or run. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-3-John.C.Harrison@Intel.com	2025-09-15 09:53:23 -07:00
John Harrison	7d0ca56e91	drm/xe/guc: Update CSS header structures Rework the CSS header structure according to recent updates to the GuC API spec. Also include more field definitions. v2: Also pass the new GuC specific structure to a GuC specific function instead of the higher level, generic structure (review feedback from Daniele). Also correct naming of CSS_TIME_* fields. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-2-John.C.Harrison@Intel.com	2025-09-15 09:53:21 -07:00
Fushuai Wang	84afb84bcc	drm/xe: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...)) Use ERR_CAST inline function instead of ERR_PTR(PTR_ERR(...)). Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250914101630.17719-1-wangfushuai@baidu.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-15 08:32:51 -07:00
Lucas De Marchi	19baa830fb	drm/xe: Use ARRAY_SIZE in guc_waklv_init() Prefer using ARRAY_SIZE where needed and just passing 1 instead of calculating the size of one element. Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508130158.eogeBZQT-lkp@intel.com/ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250912-guc-ads-array-size-v1-1-a6555392a1f8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-15 07:50:53 -07:00
Dan Carpenter	75cc23ffe5	drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue() The xe_preempt_fence_create() function returns error pointers. It never returns NULL. Update the error checking to match. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-15 08:33:19 -04:00
Nitin Gote	d4c3ed963e	drm/xe: defer free of NVM auxiliary container to device release callback Do not kfree the intel_dg_nvm_dev in xe_nvm_fini() right after auxiliary_device_delete/uninit. The auxiliary_device embeds the device/kobject (and its name); freeing it too early can race with asynchronous device_del/udev processing and cause a use-after-free. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Fixes: `c28bfb107d` ("drm/xe/nvm: add on-die non-volatile memory device") Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250911052823.226696-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 12:02:23 -07:00
Lucas De Marchi	2ec2945625	drm/xe/configfs: Fix documentation warning Fix this warning while building the documentation: Documentation/gpu/xe/xe_configfs:9: drivers/gpu/drm/xe/xe_configfs.c:138: WARNING: Definition list ends without a blank line; unexpected unindent. That also makes it better formatted in the output. While at it, also fix the underline length in "Overview". Fixes: `e2b33fce5e` ("drm/xe/configfs: Improve documentation steps") Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-2-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 09:36:43 -07:00
Lucas De Marchi	c34f9868df	drm/xe: Update workaround documentation Bring it up to reality, better documenting the existing batch buffers, OOB rules and fixing some typos. Bspec: 60122 Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-1-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 09:36:43 -07:00
Mallesh Koujalagi	4e1d3b5e64	drm/xe/hwmon: Remove type casting Refactor: eliminate type casts by using proper u32 declarations. v2: - Address review comments. (Karthik) v3: - Use the proper u32 type and drop cast. (Lucas De Marchi) - Modify variable when actually using u64 value. - Change r value to reg_value with u32 type. v4: - Remove newline between trailer and Signed-off-by. (Lucas De Marchi) - Change reg_val to val for more user-friendly logging. - Use mul_u32_u32 function since both values are u32. v5: - mul_u32_u32 function with shift. (Lucas De Marchi) Fixes: `7596d839f6` ("drm/xe/hwmon: Add support to manage power limits though mailbox") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:36:50 -07:00
Colin Ian King	9e0b0fd531	drm/xe/guc: Fix spelling mistake "sheduling" -> "scheduling" There is a spelling mistake in a xe_gt_err error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250912074330.1275279-1-colin.i.king@gmail.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:20:32 -07:00
Harish Chegondi	2a810401aa	drm/xe/xe3: Extend Wa_18041344222 to graphics IP versions 30.00 and 30.01 Apply WA 18041344222 to Xe3 LPG graphics IP versions 30.00 and 30.01 too. Bspec: 56024 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/7368f8059013424ac94f4a01c23f9c98a37b06dc.1757552915.git.harish.chegondi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:20:32 -07:00
Rodrigo Vivi	fed1a9d60f	drm/xe: Fix circular locking dependency Fix this: ====================================================== WARNING: possible circular locking dependency detected 6.17.0-rc4-lgci-xe-xe-pw-153723v2+ #1 Tainted: G S U ------------------------------------------------------ xe_pm/11324 is trying to acquire lock: ffff8881085f22a0 (&pc->freq_lock){+.+.}-{3:3}, at: xe_guc_pc_start+0x39f/0xf70 [xe] but task is already holding lock: ffffffffa1020420 (xe_rpm_nod3cold_map){+.+.}-{0:0}, at: xe_rpm_lockmap_acquire+0x1a/0x70 [xe] which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); Reported-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6122 Fixes: `60d2b78991` ("drm/xe/guc: Add SLPC power profile interface") Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250911212024.966757-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-12 09:38:08 -04:00
Michal Wajdeczko	01ecf00463	drm/xe: Use tile-oriented messages in GGTT code Use recently added macros to print tile-oriented messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-6-michal.wajdeczko@intel.com	2025-09-12 12:23:59 +02:00
Michal Wajdeczko	48a8659cd5	drm/xe: Add dedicated printk macros for tile and device We already have dedicated helper macros for printing GT-oriented messages but we don't have any to print messages that are tile oriented and we wrongly try to use plain drm or GT-oriented ones. Add tile-oriented printk messages and to provide similar coverage as we have with xe_assert() macros. Also add set of simple macros for the top level xe_device, which we could easily tweak to include extra device specific info if needed. Typical output of our printk macros will look like: [drm] this is xe_WARN() [drm] ERROR this is xe_err() [drm] ERROR this is xe_err_printer() [drm] this is xe_info() [drm] this is xe_info_printer() [drm:printk_demo.cold] this is xe_dbg() [drm:printk_demo.cold] this is xe_dbg_printer() [drm] Tile0: this is xe_tile_WARN() [drm] ERROR Tile0: this is xe_tile_err() [drm] ERROR Tile0: this is xe_tile_err_printer() [drm] Tile0: this is xe_tile_info() [drm] Tile0: this is xe_tile_info_printer() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg_printer() [drm] Tile0: GT0: this is xe_gt_WARN() [drm] ERROR Tile0: GT0: this is xe_gt_err() [drm] ERROR Tile0: GT0: this is xe_gt_err_printer() [drm] Tile0: GT0: this is xe_gt_info() [drm] Tile0: GT0: this is xe_gt_info_printer() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg_printer() Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-5-michal.wajdeczko@intel.com	2025-09-12 12:23:57 +02:00
Michal Wajdeczko	efd54b0cff	drm/xe: Prepare format for GT-oriented messages in one place To avoid code duplication (and thus potential mistakes) and to allow easier changes (if needed) of the prefix format of the GT-oriented messages, prepare that prefix in dedicated macro. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-4-michal.wajdeczko@intel.com	2025-09-12 12:23:56 +02:00
Michal Wajdeczko	a2dc39fb1c	drm/xe: Drop "gt_" prefix from xe_gt_WARN() macros Those WARN messages will already include GT-specific "GT%u:" prefix so there is no point to include additional "gt_" prefix. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-3-michal.wajdeczko@intel.com	2025-09-12 12:23:55 +02:00
Michal Wajdeczko	edffa93a93	drm/xe: Keep xe_gt_err() macro definitions together There is no need to keep them separated. No functional changes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-2-michal.wajdeczko@intel.com	2025-09-12 12:23:53 +02:00
Daniele Ceraolo Spurio	8843444843	drm/xe/guc: Set RCS/CCS yield policy All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: `d9a1ae0d17` ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com	2025-09-11 09:45:35 -07:00
Michal Wajdeczko	95c1cfa306	drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation This effectively reverts commit `4c3fe5eae4` ("drm/xe/pf: Limit fair VF LMEM provisioning") since we don't need it any more after non-contig VRAM allocations were fixed. This allows larger LMEM auto-provisioning for VFs, so instead: [ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M) [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM or [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M) [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM we may get: [ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M) [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM and [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M) [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM Fixes: `1e32ffbc9d` ("drm/xe/sriov: support non-contig VRAM provisioning") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com	2025-09-11 18:20:57 +02:00
Varun Gupta	010629e00d	drm/xe: Fix driver reference in FLR comment Rectify the reference of i915 to Xe in a comment. v2: Cosmetic changes. (Karthik) v3: Rephrased the commit message. (Karthik) Signed-off-by: Varun Gupta <varun.gupta@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Link: https://lore.kernel.org/r/20250911111712.811524-1-varun.gupta@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-11 08:00:33 -07:00
Vinay Belgaumkar	60d2b78991	drm/xe/guc: Add SLPC power profile interface GuC has an interface to set a power profile for the SLPC algorithm. Base mode is default and ensures a balanced performance, power_saving mode has conservative up/down thresholds and is suitable for use with apps that typically need to be power efficient. This will result in lower GT frequencies, thus consuming lower power. Selected power profile will be displayed in this format: $ cat power_profile [base] power_saving $ echo power_saving > power_profile $ cat power_profile base [power_saving] v2: Address review comments (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-11 08:45:05 -04:00
Thomas Hellström	692a480243	drm/xe: Fix uninitialized return values clang warned about two uninitialized variables used as return values in the exhaustive eviction series. Fix those. Fixes: `1f1541720f` ("drm/xe: Rework instances of variants of xe_bo_create_locked()") Fixes: `7bcb6e38c1` ("drm/xe/display: Convert __xe_pin_fb_vma()") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250910151128.49693-1-thomas.hellstrom@linux.intel.com	2025-09-11 08:45:12 +02:00
Shuicheng Lin	b98775bca9	drm/xe/tile: Release kobject for the failure path Call kobject_put() for the failure path to release the kobject v2: remove extra newline. (Matt) Fixes: `e3d0839aa5` ("drm/xe/tile: Abort driver load for sysfs creation failure") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-10 21:00:10 -07:00
Thomas Hellström	844150c255	drm/xe: Convert pinned suspend eviction for exhaustive eviction Pinned suspend eviction and preparation for eviction validates system memory for eviction buffers. Do that under a validation exclusive lock to avoid interfering with other processes validating system graphics memory. v2: - Avoid gotos from within xe_validation_guard(). - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-14-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:10 +02:00
Thomas Hellström	1f1541720f	drm/xe: Rework instances of variants of xe_bo_create_locked() A common pattern is to create a locked bo, pin it without mapping and then unlock it. Add a function to do that, which internally uses xe_validation_guard(). With that we can remove xe_bo_create_locked_range() and add exhaustive eviction to stolen, pf_provision_vf_lmem and psmi_alloc_object. v4: - New patch after reorganization. v5: - Replace DRM_XE_GEM_CPU_CACHING_WB with 0. (CI) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-13-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:09 +02:00
Thomas Hellström	59eabff2a3	drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction Introduce an xe_bo_create_pin_map_novm() function that does not take the drm_exec paramenter to simplify the conversion of many callsites. For the rest, ensure that the same drm_exec context that was used for locking the vm is passed down to validation. Use xe_validation_guard() where appropriate. v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Break out the change to pf_provision_vf_lmem8 to a separate patch. - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-12-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:06 +02:00
Thomas Hellström	e6108eade1	drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Most users of xe_bo_create_pin_map_at() and xe_bo_create_pin_map_at_aligned() are not using the vm parameter, and that simplifies conversion. Introduce an xe_bo_create_pin_map_at_novm() function and make the _aligned() version static. Use xe_validation_guard() for conversion. v2: - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Fix up documentation. v4: - Postpone the change to i915_gem_stolen_insert_node_in_range() to a later patch. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-11-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:05 +02:00
Thomas Hellström	550a42a8da	drm/xe: Rename ___xe_bo_create_locked() Don't start external function names with underscores. Rename to xe_bo_init_locked(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-10-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:04 +02:00
Thomas Hellström	eb289a5f6c	drm/xe: Convert xe_dma_buf.c for exhaustive eviction Convert dma-buf migration to XE_PL_TT and dma-buf import to support exhaustive eviction, using xe_validation_guard(). It seems unlikely that the import would result in an -ENOMEM, but convert import anyway for completeness. The dma-buf map_attachment() functionality unfortunately doesn't support passing a drm_exec, which means that foreign devices validating a dma-buf that we exported will not, unless they are xeKMD devices, participate in the exhaustive eviction scheme. v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Remove an unneeded (void)ret. (Matt Brost) - Fix up an error path. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-9-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:03 +02:00
Thomas Hellström	7bcb6e38c1	drm/xe/display: Convert __xe_pin_fb_vma() Convert __xe_pin_fb_vma() for exhaustive eviction using xe_validation_guard(). v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Use interruptible waiting, since xe_bo_migrate() already does that. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-8-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:02 +02:00
Thomas Hellström	c2ae94cf8c	drm/xe: Convert the CPU fault handler for exhaustive eviction The CPU fault handler may populate bos and migrate, and in doing so might interfere with other tasks validating. Rework the CPU fault handler completely into a fastpath and a slowpath. The fastpath trylocks only the validation lock in read-mode. If that fails, there's a fallback to the slowpath, where we do a full validation transaction. This mandates open-coding of bo locking, bo idling and bo populating, but we still call into TTM for fault finalizing. v2: - Rework the CPU fault handler to actually take part in the exhaustive eviction scheme (Matthew Brost). v3: - Don't return anything but VM_FAULT_RETRY if we've dropped the mmap_lock. Not even if a signal is pending. - Rebase on gpu_madvise() and split out fault migration. - Wait for idle after migration. - Check whether the resource manager uses tts to determine whether to map the tt or iomem. - Add a number of asserts. - Allow passing a ttm_operation_ctx to xe_bo_migrate() so that it's possible to try non-blocking migration. - Don't fall through to TTM on migration / population error Instead remove the gfp_retry_mayfail in mode 2 where we must succeed. (Matthew Brost) v5: - Don't allow faulting in the imported bo case (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthews Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-7-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:01 +02:00
Thomas Hellström	8f25e5abcb	drm/xe: Convert existing drm_exec transactions for exhaustive eviction Convert existing drm_exec transactions, like GT pagefault validation, non-LR exec() IOCTL and the rebind worker to support exhaustive eviction using the xe_validation_guard(). v2: - Adapt to signature change in xe_validation_guard() (Matt Brost) - Avoid gotos from within xe_validation_guard() (Matt Brost) - Check error return from xe_validation_guard() v3: - Rebase on gpu_madvise() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Link: https://lore.kernel.org/r/20250908101246.65025-6-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:00 +02:00
Thomas Hellström	1710cd5c8c	drm/xe: Convert SVM validation for exhaustive eviction Convert SVM validation to support exhaustive eviction, using xe_validation_guard(). v2: - Wrap also xe_vm_range_rebind (Matt Brost) - Adapt to argument changes of xe_validation_guard(). v5: - Rebase on SVM stats. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-5-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:59 +02:00

1 2 3 4 5 ...

117192 Commits