linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-07 17:27:11 -04:00

Author	SHA1	Message	Date
John Harrison	acf01c79f0	drm/xe/guc: Add firmware build type to available info Some test features are not available in production builds of the GuC firmware. So add the build type field to the available information that tests can inspect to decide if they should skip or run. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-3-John.C.Harrison@Intel.com	2025-09-15 09:53:23 -07:00
John Harrison	7d0ca56e91	drm/xe/guc: Update CSS header structures Rework the CSS header structure according to recent updates to the GuC API spec. Also include more field definitions. v2: Also pass the new GuC specific structure to a GuC specific function instead of the higher level, generic structure (review feedback from Daniele). Also correct naming of CSS_TIME_* fields. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-2-John.C.Harrison@Intel.com	2025-09-15 09:53:21 -07:00
Fushuai Wang	84afb84bcc	drm/xe: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...)) Use ERR_CAST inline function instead of ERR_PTR(PTR_ERR(...)). Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250914101630.17719-1-wangfushuai@baidu.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-15 08:32:51 -07:00
Lucas De Marchi	19baa830fb	drm/xe: Use ARRAY_SIZE in guc_waklv_init() Prefer using ARRAY_SIZE where needed and just passing 1 instead of calculating the size of one element. Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508130158.eogeBZQT-lkp@intel.com/ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250912-guc-ads-array-size-v1-1-a6555392a1f8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-15 07:50:53 -07:00
Dan Carpenter	75cc23ffe5	drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue() The xe_preempt_fence_create() function returns error pointers. It never returns NULL. Update the error checking to match. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-15 08:33:19 -04:00
Nitin Gote	d4c3ed963e	drm/xe: defer free of NVM auxiliary container to device release callback Do not kfree the intel_dg_nvm_dev in xe_nvm_fini() right after auxiliary_device_delete/uninit. The auxiliary_device embeds the device/kobject (and its name); freeing it too early can race with asynchronous device_del/udev processing and cause a use-after-free. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Fixes: `c28bfb107d` ("drm/xe/nvm: add on-die non-volatile memory device") Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250911052823.226696-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 12:02:23 -07:00
Lucas De Marchi	2ec2945625	drm/xe/configfs: Fix documentation warning Fix this warning while building the documentation: Documentation/gpu/xe/xe_configfs:9: drivers/gpu/drm/xe/xe_configfs.c:138: WARNING: Definition list ends without a blank line; unexpected unindent. That also makes it better formatted in the output. While at it, also fix the underline length in "Overview". Fixes: `e2b33fce5e` ("drm/xe/configfs: Improve documentation steps") Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-2-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 09:36:43 -07:00
Lucas De Marchi	c34f9868df	drm/xe: Update workaround documentation Bring it up to reality, better documenting the existing batch buffers, OOB rules and fixing some typos. Bspec: 60122 Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-1-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 09:36:43 -07:00
Mallesh Koujalagi	4e1d3b5e64	drm/xe/hwmon: Remove type casting Refactor: eliminate type casts by using proper u32 declarations. v2: - Address review comments. (Karthik) v3: - Use the proper u32 type and drop cast. (Lucas De Marchi) - Modify variable when actually using u64 value. - Change r value to reg_value with u32 type. v4: - Remove newline between trailer and Signed-off-by. (Lucas De Marchi) - Change reg_val to val for more user-friendly logging. - Use mul_u32_u32 function since both values are u32. v5: - mul_u32_u32 function with shift. (Lucas De Marchi) Fixes: `7596d839f6` ("drm/xe/hwmon: Add support to manage power limits though mailbox") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:36:50 -07:00
Colin Ian King	9e0b0fd531	drm/xe/guc: Fix spelling mistake "sheduling" -> "scheduling" There is a spelling mistake in a xe_gt_err error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250912074330.1275279-1-colin.i.king@gmail.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:20:32 -07:00
Harish Chegondi	2a810401aa	drm/xe/xe3: Extend Wa_18041344222 to graphics IP versions 30.00 and 30.01 Apply WA 18041344222 to Xe3 LPG graphics IP versions 30.00 and 30.01 too. Bspec: 56024 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/7368f8059013424ac94f4a01c23f9c98a37b06dc.1757552915.git.harish.chegondi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:20:32 -07:00
Rodrigo Vivi	fed1a9d60f	drm/xe: Fix circular locking dependency Fix this: ====================================================== WARNING: possible circular locking dependency detected 6.17.0-rc4-lgci-xe-xe-pw-153723v2+ #1 Tainted: G S U ------------------------------------------------------ xe_pm/11324 is trying to acquire lock: ffff8881085f22a0 (&pc->freq_lock){+.+.}-{3:3}, at: xe_guc_pc_start+0x39f/0xf70 [xe] but task is already holding lock: ffffffffa1020420 (xe_rpm_nod3cold_map){+.+.}-{0:0}, at: xe_rpm_lockmap_acquire+0x1a/0x70 [xe] which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); Reported-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6122 Fixes: `60d2b78991` ("drm/xe/guc: Add SLPC power profile interface") Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250911212024.966757-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-12 09:38:08 -04:00
Michal Wajdeczko	01ecf00463	drm/xe: Use tile-oriented messages in GGTT code Use recently added macros to print tile-oriented messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-6-michal.wajdeczko@intel.com	2025-09-12 12:23:59 +02:00
Michal Wajdeczko	48a8659cd5	drm/xe: Add dedicated printk macros for tile and device We already have dedicated helper macros for printing GT-oriented messages but we don't have any to print messages that are tile oriented and we wrongly try to use plain drm or GT-oriented ones. Add tile-oriented printk messages and to provide similar coverage as we have with xe_assert() macros. Also add set of simple macros for the top level xe_device, which we could easily tweak to include extra device specific info if needed. Typical output of our printk macros will look like: [drm] this is xe_WARN() [drm] ERROR this is xe_err() [drm] ERROR this is xe_err_printer() [drm] this is xe_info() [drm] this is xe_info_printer() [drm:printk_demo.cold] this is xe_dbg() [drm:printk_demo.cold] this is xe_dbg_printer() [drm] Tile0: this is xe_tile_WARN() [drm] ERROR Tile0: this is xe_tile_err() [drm] ERROR Tile0: this is xe_tile_err_printer() [drm] Tile0: this is xe_tile_info() [drm] Tile0: this is xe_tile_info_printer() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg_printer() [drm] Tile0: GT0: this is xe_gt_WARN() [drm] ERROR Tile0: GT0: this is xe_gt_err() [drm] ERROR Tile0: GT0: this is xe_gt_err_printer() [drm] Tile0: GT0: this is xe_gt_info() [drm] Tile0: GT0: this is xe_gt_info_printer() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg_printer() Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-5-michal.wajdeczko@intel.com	2025-09-12 12:23:57 +02:00
Michal Wajdeczko	efd54b0cff	drm/xe: Prepare format for GT-oriented messages in one place To avoid code duplication (and thus potential mistakes) and to allow easier changes (if needed) of the prefix format of the GT-oriented messages, prepare that prefix in dedicated macro. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-4-michal.wajdeczko@intel.com	2025-09-12 12:23:56 +02:00
Michal Wajdeczko	a2dc39fb1c	drm/xe: Drop "gt_" prefix from xe_gt_WARN() macros Those WARN messages will already include GT-specific "GT%u:" prefix so there is no point to include additional "gt_" prefix. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-3-michal.wajdeczko@intel.com	2025-09-12 12:23:55 +02:00
Michal Wajdeczko	edffa93a93	drm/xe: Keep xe_gt_err() macro definitions together There is no need to keep them separated. No functional changes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-2-michal.wajdeczko@intel.com	2025-09-12 12:23:53 +02:00
Daniele Ceraolo Spurio	8843444843	drm/xe/guc: Set RCS/CCS yield policy All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: `d9a1ae0d17` ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com	2025-09-11 09:45:35 -07:00
Michal Wajdeczko	95c1cfa306	drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation This effectively reverts commit `4c3fe5eae4` ("drm/xe/pf: Limit fair VF LMEM provisioning") since we don't need it any more after non-contig VRAM allocations were fixed. This allows larger LMEM auto-provisioning for VFs, so instead: [ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M) [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM or [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M) [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM we may get: [ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M) [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM and [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M) [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM Fixes: `1e32ffbc9d` ("drm/xe/sriov: support non-contig VRAM provisioning") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com	2025-09-11 18:20:57 +02:00
Varun Gupta	010629e00d	drm/xe: Fix driver reference in FLR comment Rectify the reference of i915 to Xe in a comment. v2: Cosmetic changes. (Karthik) v3: Rephrased the commit message. (Karthik) Signed-off-by: Varun Gupta <varun.gupta@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Link: https://lore.kernel.org/r/20250911111712.811524-1-varun.gupta@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-11 08:00:33 -07:00
Vinay Belgaumkar	60d2b78991	drm/xe/guc: Add SLPC power profile interface GuC has an interface to set a power profile for the SLPC algorithm. Base mode is default and ensures a balanced performance, power_saving mode has conservative up/down thresholds and is suitable for use with apps that typically need to be power efficient. This will result in lower GT frequencies, thus consuming lower power. Selected power profile will be displayed in this format: $ cat power_profile [base] power_saving $ echo power_saving > power_profile $ cat power_profile base [power_saving] v2: Address review comments (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-11 08:45:05 -04:00
Thomas Hellström	692a480243	drm/xe: Fix uninitialized return values clang warned about two uninitialized variables used as return values in the exhaustive eviction series. Fix those. Fixes: `1f1541720f` ("drm/xe: Rework instances of variants of xe_bo_create_locked()") Fixes: `7bcb6e38c1` ("drm/xe/display: Convert __xe_pin_fb_vma()") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250910151128.49693-1-thomas.hellstrom@linux.intel.com	2025-09-11 08:45:12 +02:00
Shuicheng Lin	b98775bca9	drm/xe/tile: Release kobject for the failure path Call kobject_put() for the failure path to release the kobject v2: remove extra newline. (Matt) Fixes: `e3d0839aa5` ("drm/xe/tile: Abort driver load for sysfs creation failure") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-10 21:00:10 -07:00
Thomas Hellström	844150c255	drm/xe: Convert pinned suspend eviction for exhaustive eviction Pinned suspend eviction and preparation for eviction validates system memory for eviction buffers. Do that under a validation exclusive lock to avoid interfering with other processes validating system graphics memory. v2: - Avoid gotos from within xe_validation_guard(). - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-14-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:10 +02:00
Thomas Hellström	1f1541720f	drm/xe: Rework instances of variants of xe_bo_create_locked() A common pattern is to create a locked bo, pin it without mapping and then unlock it. Add a function to do that, which internally uses xe_validation_guard(). With that we can remove xe_bo_create_locked_range() and add exhaustive eviction to stolen, pf_provision_vf_lmem and psmi_alloc_object. v4: - New patch after reorganization. v5: - Replace DRM_XE_GEM_CPU_CACHING_WB with 0. (CI) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-13-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:09 +02:00
Thomas Hellström	59eabff2a3	drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction Introduce an xe_bo_create_pin_map_novm() function that does not take the drm_exec paramenter to simplify the conversion of many callsites. For the rest, ensure that the same drm_exec context that was used for locking the vm is passed down to validation. Use xe_validation_guard() where appropriate. v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Break out the change to pf_provision_vf_lmem8 to a separate patch. - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-12-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:06 +02:00
Thomas Hellström	e6108eade1	drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Most users of xe_bo_create_pin_map_at() and xe_bo_create_pin_map_at_aligned() are not using the vm parameter, and that simplifies conversion. Introduce an xe_bo_create_pin_map_at_novm() function and make the _aligned() version static. Use xe_validation_guard() for conversion. v2: - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Fix up documentation. v4: - Postpone the change to i915_gem_stolen_insert_node_in_range() to a later patch. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-11-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:05 +02:00
Thomas Hellström	550a42a8da	drm/xe: Rename ___xe_bo_create_locked() Don't start external function names with underscores. Rename to xe_bo_init_locked(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-10-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:04 +02:00
Thomas Hellström	eb289a5f6c	drm/xe: Convert xe_dma_buf.c for exhaustive eviction Convert dma-buf migration to XE_PL_TT and dma-buf import to support exhaustive eviction, using xe_validation_guard(). It seems unlikely that the import would result in an -ENOMEM, but convert import anyway for completeness. The dma-buf map_attachment() functionality unfortunately doesn't support passing a drm_exec, which means that foreign devices validating a dma-buf that we exported will not, unless they are xeKMD devices, participate in the exhaustive eviction scheme. v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Remove an unneeded (void)ret. (Matt Brost) - Fix up an error path. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-9-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:03 +02:00
Thomas Hellström	7bcb6e38c1	drm/xe/display: Convert __xe_pin_fb_vma() Convert __xe_pin_fb_vma() for exhaustive eviction using xe_validation_guard(). v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Use interruptible waiting, since xe_bo_migrate() already does that. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-8-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:02 +02:00
Thomas Hellström	c2ae94cf8c	drm/xe: Convert the CPU fault handler for exhaustive eviction The CPU fault handler may populate bos and migrate, and in doing so might interfere with other tasks validating. Rework the CPU fault handler completely into a fastpath and a slowpath. The fastpath trylocks only the validation lock in read-mode. If that fails, there's a fallback to the slowpath, where we do a full validation transaction. This mandates open-coding of bo locking, bo idling and bo populating, but we still call into TTM for fault finalizing. v2: - Rework the CPU fault handler to actually take part in the exhaustive eviction scheme (Matthew Brost). v3: - Don't return anything but VM_FAULT_RETRY if we've dropped the mmap_lock. Not even if a signal is pending. - Rebase on gpu_madvise() and split out fault migration. - Wait for idle after migration. - Check whether the resource manager uses tts to determine whether to map the tt or iomem. - Add a number of asserts. - Allow passing a ttm_operation_ctx to xe_bo_migrate() so that it's possible to try non-blocking migration. - Don't fall through to TTM on migration / population error Instead remove the gfp_retry_mayfail in mode 2 where we must succeed. (Matthew Brost) v5: - Don't allow faulting in the imported bo case (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthews Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-7-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:01 +02:00
Thomas Hellström	8f25e5abcb	drm/xe: Convert existing drm_exec transactions for exhaustive eviction Convert existing drm_exec transactions, like GT pagefault validation, non-LR exec() IOCTL and the rebind worker to support exhaustive eviction using the xe_validation_guard(). v2: - Adapt to signature change in xe_validation_guard() (Matt Brost) - Avoid gotos from within xe_validation_guard() (Matt Brost) - Check error return from xe_validation_guard() v3: - Rebase on gpu_madvise() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Link: https://lore.kernel.org/r/20250908101246.65025-6-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:00 +02:00
Thomas Hellström	1710cd5c8c	drm/xe: Convert SVM validation for exhaustive eviction Convert SVM validation to support exhaustive eviction, using xe_validation_guard(). v2: - Wrap also xe_vm_range_rebind (Matt Brost) - Adapt to argument changes of xe_validation_guard(). v5: - Rebase on SVM stats. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-5-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:59 +02:00
Thomas Hellström	a2f2453c2c	drm/xe: Convert xe_bo_create_user() for exhaustive eviction Use the xe_validation_guard() to convert xe_bo_create_user() for exhaustive eviction. v2: - Adapt to argument changes of xe_validation_guard() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Link: https://lore.kernel.org/r/20250908101246.65025-4-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:57 +02:00
Thomas Hellström	c460bc2311	drm/xe: Introduce an xe_validation wrapper around drm_exec Introduce a validation wrapper xe_validation_guard() as a helper intended to be used around drm_exec transactions what perform validations. Once TTM can handle exhaustive eviction we could remove this wrapper or make it mostly a NO-OP unless other functionality is added to it. Currently the wrapper takes a read lock upon entry and if the transaction hits an OOM, all locks are released and the transaction is retried with a write-lock. If all other validations participate in this scheme, the transaction with the write lock will be the only transaction validating and should have access to all available non-pinned memory. There is currently a problem in that TTM converts -EDEADLOCKS to -ENOMEM, and with ww_mutex slowpath error injections, we can hit -ENOMEMs without having actually ran out of memory. We abuse ww_mutex internals to detect such situations until TTM is fixes to not convert the error code. In the meantime, injecting ww_mutex slowpath -EDEADLOCKs is a good way to test the implementation in the absence of real OOMs. Just introduce the wrapper in this commit. It will be hooked up to the driver in following commits. v2: - Mark class_xe_validation conditional so that the loop is skipped on initialization error. - Argument sanitation (Matt Brost) - Fix conditional execution of xe_validation_ctx_fini() (Matt Brost) - Add a no_block mode for upcoming use in the CPU fault handler. v4: - Update kerneldoc. (Xe CI). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-3-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:56 +02:00
Thomas Hellström	0131514f97	drm/xe: Pass down drm_exec context to validation We want all validation (potential backing store allocation) to be part of a drm_exec transaction. Therefore add a drm_exec pointer argument to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches will deal with making all (or nearly all) calls to these functions part of a drm_exec transaction. In the meantime, define special values of the drm_exec pointer: XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction has not been done yet. XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow the drm_exec context to be passed down to map_attachment where validation takes place. XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive eviction isn't crucial and the ROI of converting those is very small. For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also a lockdep check that a drm_exec transaction can indeed start at the location where the macro is expanded. This is to encourage developers to take this into consideration early in the code development process. v2: - Fix xe_vm_set_validation_exec() imbalance. Add an assert that hopefully catches future instances of this (Matt Brost) v3: - Extend to psmi_alloc_object Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v3 Link: https://lore.kernel.org/r/20250908101246.65025-2-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:52 +02:00
Julia Filipchuk	0d40ea7843	drm/xe/guc: Recommend GUC v70.49.4 for PTL, BMG UAPI compatibility version 1.24.4 Resolves various BMG, SRIOV issues and adds new PTL features. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250829235305.672287-7-julia.filipchuk@intel.com	2025-09-09 13:00:58 -07:00
Michal Wajdeczko	43fac1b2f0	drm/xe/guc: Don't invoke disable_ct action during replacement During second CT initialization step, known as post_hwconfig, we want to replace previously registered CT disable devm action to make sure it will be invoked prior to releasing underlying BO. But to replace this action we don't need to execute it right away since we know that CT was disabled prior to that late init step and we already assert that. Use devm_remove_action() instead to avoid extra message about 'disabling CT' that could be seen now: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled ... DEVRES REM ff11000149320940 guc_action_disable_ct (16 bytes) [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled DEVRES ADD ff110001664fc040 guc_action_disable_ct (16 bytes) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908102053.539-3-michal.wajdeczko@intel.com	2025-09-09 20:38:26 +02:00
Michal Wajdeczko	955f3bc4af	drm/xe/guc: Always add CT disable action during second init step On DGFX, during init_post_hwconfig() step, we are reinitializing CTB BO in VRAM and we have to replace cleanup action to disable CT communication prior to release of underlying BO. But that introduces some discrepancy between DGFX and iGFX, as for iGFX we keep previously added disable CT action that would be called during unwind much later. To keep the same flow on both types of platforms, always replace old cleanup action and register new one. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908102053.539-2-michal.wajdeczko@intel.com	2025-09-09 20:38:25 +02:00
Matt Roper	30071d58df	drm/xe: Never report L3 bank mask for media GT going forward We currently report an L3 bank mask as part of the GT topology on both GTs (primary and media) because a copy of the L3 bank fuse register exists on both GTs (e.g., $gsi_offset + 0x9130 on Xe3). After recent discussions it's come to light that the only known userspace software that uses this part of the uapi (the compute UMD and Mesa) only uses the value reported for the primary GT; the value reported for the media GT is ignored by both projects, and the media UMDs don't have any use for L3 information today. Since we always strive to have our uapi match the specific needs of userspace and not include additional unused baggage, let's officially drop L3 bank reporting on the media GT going forward and only keep it around for the primary GT where it actually gets used. This change will only apply to future platforms (Xe3 and later); even though it would probably be safe to remove it from Xe1/Xe2 as well, we don't want to take any chances with changing existing ABI. Note that we'd already disabled reading/reporting of the L3 bank for the media GT on PTL in commit `9ab440a9d0` ("drm/xe/ptl: L3bank mask is not available on the media GT") because it was discovered that the copy of the fuse registers on the media GT were just reporting a bogus ~0 value rather than an accurate mask. So this is just extending that PTL behavior forward to WCL and other future platforms. Note that we're also free to reinstate this part of the uapi in the future if/when some new userspace consumer emerges that _does_ have a use for media-specific L3 bank masks. Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20250905215614.796247-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-09-09 10:04:29 -07:00
Michal Wajdeczko	f261f5ddde	drm/xe/debugfs: Don't expose dgfx residencies attributes on VF In addition of checking if we are running on the BATTLEMAGE, we should also check for not being a VF driver, as VFs can't access necessary registers, and doing so leads to: \| .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0 \| RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe] \| Call Trace: \| xe_mmio_read32+0x110/0x280 [xe] \| read_residency_counter+0x42/0xd0 [xe] \| dgfx_pkg_residencies_show+0x115/0x190 [xe] \| .. [drm] Package G2 counter failed to read, ret -19 or \| .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0 \| RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe] \| Call Trace: \| xe_mmio_read32+0x110/0x280 [xe] \| read_residency_counter+0x42/0xd0 [xe] \| dgfx_pcie_link_residencies_show+0xe7/0x160 [xe] \| .. [drm] PCIE LINK L0 RESIDENCY counter failed to read, ret -19 Similarly, there is no point to expose inject_csc_hw_error on VFs, as HW errors support is already disabled for VFs. Bspec: 53221 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Soham Purkait <soham.purkait@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905173625.8398-1-michal.wajdeczko@intel.com	2025-09-09 17:29:10 +02:00
Christophe JAILLET	7b77941724	drm/xe/hwmon: Use devm_mutex_init() Use devm_mutex_init() instead of hand-writing it. This saves some LoC, improves readability and saves some space in the generated .o file. Before: ====== text data bss dec hex filename 36884 10296 64 47244 b88c drivers/gpu/drm/xe/xe_hwmon.o After: ===== text data bss dec hex filename 36651 10224 64 46939 b75b drivers/gpu/drm/xe/xe_hwmon.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/989e96369e9e1f8a44b816962917ec76877c912d.1757252520.git.christophe.jaillet@wanadoo.fr Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-09 06:31:13 -07:00
Michal Wajdeczko	e57ae80fe0	drm/xe/debugfs: Make residencies definitions const No need to keep them non-const. Also fix declaration of .name member, as it points to the const string. This translates to: add/remove: 1/0 grow/shrink: 0/2 up/down: 80/-248 (-168) Function old new delta residencies - 80 +80 dgfx_pcie_link_residencies_show 365 263 -102 dgfx_pkg_residencies_show 454 308 -146 Total: Before=2821548, After=2821380, chg -0.01% Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Soham Purkait <soham.purkait@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Karthik Poosa <karthik.poosa@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250905180225.8434-1-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-09 06:05:45 -07:00
Raag Jadav	fce99326c9	drm/xe/i2c: Enable bus mastering Enable bus mastering for I2C controller to support device initiated in-band transactions. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20250908055320.2549722-1-raag.jadav@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-09 06:02:35 -07:00
Michal Wajdeczko	fd548b77d5	drm/xe/vf: Move VF CCS debugfs attribute The VF CCS handling is per-device so its debugfs file should not be exposed on per-GT basis. Move it up to the device level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-8-michal.wajdeczko@intel.com	2025-09-09 11:27:52 +02:00
Michal Wajdeczko	55ddca2a3c	drm/xe/vf: Move VF CCS data to xe_device We only need single set of VF CCS contexts, they are not per-tile as initial implementation might suggest. Move all VF CCS data from xe_tile.sriov.vf to xe_device.sriov.vf. Also rename some structs to align with the usage and fix their kernel-doc. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-7-michal.wajdeczko@intel.com	2025-09-09 11:27:50 +02:00
Michal Wajdeczko	e699700834	drm/xe/bo: Add xe_bo_has_valid_ccs_bb helper This will allow as to drop ugly IS_VF_CCS_BB_VALID macro. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-6-michal.wajdeczko@intel.com	2025-09-09 11:27:49 +02:00
Michal Wajdeczko	b179dfd0db	drm/xe/vf: Use single check when calling VF CCS functions All xe_sriov_vf_ccs() functions but init() expect to be called when initialization was successful and CCS handling is ready. Update IS_VF_CCS_READY macro and use it as single entry guard. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-5-michal.wajdeczko@intel.com	2025-09-09 11:27:46 +02:00
Michal Wajdeczko	aa8d9d75ea	drm/xe/vf: Drop IS_VF_CCS_INIT_NEEDED macro We only use this macro once and we can open-code it to explicitly show relevant conditions and avoid duplications. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-4-michal.wajdeczko@intel.com	2025-09-09 11:27:45 +02:00
Michal Wajdeczko	4e5bc50ad2	drm/xe/guc: Use proper flag definitions when registering context In H2G action context type is specified in flags dword in bits 2:1. Use generic FIELD_PREP macro instead of misleading BIT logic. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908123025.747-3-michal.wajdeczko@intel.com	2025-09-09 11:27:44 +02:00

1 2 3 4 5 ...

1382300 Commits