linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-16 14:15:02 -05:00

Author	SHA1	Message	Date
Lucas De Marchi	c9dfd66cb9	drm/xe/lrc: Allow INDIRECT_CTX for more engine classes Currently it's only allowed for render and compute. Going forward we want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor for its availability. While at it, add the missing const to rcs_funcs array. Since CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there is no change in behavior, it's only preparation for future use case. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	39ac06f700	drm/xe/configfs: Add post context restore bb Allow the user to specify commands to execute during a context restore. Currently it's possible to parse 2 types of actions: - cmd: the instructions are added as is to the bb - reg: just use the address and value, without worrying about encoding the right LRI instruction. This is possibly the most useful use case, so added a dedicated action for that. This also prepares for future BBs: mid context restore and rc6 context restore that can re-use the same parsing functions. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-4-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	6c6988c5e0	drm/xe/lrc: Allow to add user commands on context switch During validation it's useful to allows additional commands to be executed on context switch. Fetch the commands from configfs (to be added) and add them to the WA BB. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-3-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	e2a9854d80	drm/xe/configfs: Allow to select by class only For a future configfs attribute, it's desirable to select by engine mask only as the instance doesn't make sense. Rename the function lookup_engine_mask() to lookup_engine_info() and make it return the entry. This allows parse_engine() to still return an item if the caller wants to allow parsing a class-only string like "rcs", "bcs", "ccs", etc. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-2-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	7166cc3a6a	drm/xe/configfs: Extract function to parse engine Move the part that copies the engine to a local buffer so it can be shared in future for other configfs attributes parsing an engine. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-1-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:38 -07:00
Badal Nilawar	efa29317a5	drm/xe/xe_late_bind_fw: Extract and print version info Extract and print version info of the late binding binary. v2: Some refinements (Daniele) Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-10-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	67de7982d5	drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding Introduce a debug filesystem node to disable late binding fw reload during the system or runtime resume. This is intended for situations where the late binding fw needs to be loaded from user mode, perticularly for validation purpose. Note that xe kmd doesn't participate in late binding flow from user space. Binary loaded from the userspace will be lost upon entering to D3 cold hence user space app need to handle this situation. v2: - s/(uval == 1) ? true : false/!!uval/ (Daniele) v3: - Refine the commit message (Daniele) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-9-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	02f52f6d92	drm/xe/xe_late_bind_fw: Reload late binding fw during system resume Reload late binding fw during resume from system suspend v2: - Unconditionally reload late binding fw (Rodrigo) - Flush worker during system suspend Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-8-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	69ac1bb8fc	drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume Reload late binding fw during runtime resume. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-7-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	691a54ad94	drm/xe/xe_late_bind_fw: Load late binding firmware Load late binding firmware v2: - s/EAGAIN/EBUSY/ - Flush worker in suspend and driver unload (Daniele) v3: - Use retry interval of 6s, in steps of 200ms, to allow other OS components release MEI CL handle (Sasha) v4: - return -ENODEV if component not added (Daniele) - parse and print status returned by csc v5: - Use payload to check firmware valid (Daniele) - Obtain the RPM reference before scheduling the worker to ensure the device remains awake until the worker completes firmware loading (Rodrigo) v6: - In case of error donot re-attempt fw download (Daniele) v7 (Rodrigo): - Rename of mei structs and callback. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-6-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	45832bf9c1	drm/xe/xe_late_bind_fw: Initialize late binding firmware Search for late binding firmware binaries and populate the meta data of firmware structures. v2 (Daniele): - drm_err if firmware size is more than max pay load size - s/request_firmware/firmware_request_nowarn/ as firmware will not be available for all possible cards v3 (Daniele): - init firmware from within xe_late_bind_init, propagate error - switch late_bind_fw to array to handle multiple firmware types v4 (Daniele): - Alloc payload dynamically, fix nits v6 (Daniele) - %s/MAX_PAYLOAD_SIZE/XE_LB_MAX_PAYLOAD_SIZE/ Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-5-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	918bd789d6	drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw Introduce xe_late_bind_fw to enable firmware loading for the devices, such as the fan controller, during the driver probe. Typically, firmware for such devices are part of IFWI flash image but can be replaced at probe after OEM tuning. This patch binds mei late binding component to enable firmware loading. v2: - Add devm_add_action_or_reset to remove the component (Daniele) - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele) v3: - Fail driver probe if late bind initialization fails, add has_late_bind flag (Daniele) v4: - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/ v6: - rebased v7: - rebased - In xe_late_bind_init, use drm_err when returning an error to stop the probe (Lucas) - Use imperative mode in commit message (Lucas) Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:00 -07:00
Alexander Usyskin	741eeabb7c	mei: late_bind: add late binding component driver Introduce a new MEI client driver to support Late Binding firmware upload/update for Intel discrete graphics platforms. Late Binding is a runtime firmware upload/update mechanism that allows payloads, such as fan control and voltage regulator, to be securely delivered and applied without requiring SPI flash updates or system reboots. This driver enables the Xe graphics driver and other user-space tools to push such firmware blobs to the authentication firmware via the MEI interface. The driver handles authentication, versioning, and communication with the authentication firmware, which in turn coordinates with the PUnit/PCODE to apply the payload. This is a foundational component for enabling dynamic, secure, and re-entrant configuration updates on platforms like Battlemage. Cc: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20250905154953.3974335-3-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:00 -07:00
Alexander Usyskin	8d5b7009aa	mei: bus: add mei_cldev_mtu interface Add a new helper function that allows MEI client drivers to query the maximum transmission unit (MTU) for a connected MEI client. This is useful for clients that need to transmit large payloads, such as firmware blobs, allowing them to determine the maximum message size that can be safely sent before starting transmission and size of the buffer to allocate when receiving data. Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20250905154953.3974335-2-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:31:50 -07:00
Thomas Hellström	187e16f69d	drm/xe: Work around clang multiple goto-label error When using drm_exec_retry_on_contention(), clang may consider all labels for which we take addresses in a function as potential retry goto targets, although strictly only one is possible. It will then in some situations generate false positive errors. In this case, the compiler, for some architectures, consider the might_lock(&m->job_mutex); as a potential goto target from drm_exec_retry_on_contention(), and errors. Work around that by moving the xe_validate / drm_exec transaction to a separate function. v2: - New commit message based on analysis of Nathan Chancellor Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509101853.nDmyxTEM-lkp@intel.com/ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Link: https://lore.kernel.org/r/20250911080324.180307-1-thomas.hellstrom@linux.intel.com	2025-09-18 08:27:00 +02:00
Michal Wajdeczko	fb3c27a69c	drm/xe/sysfs: Simplify sysfs registration Instead of manually maintaining each sysfs file define and use attribute groups and register them using device managed function. Then use is_visible() to filter-out unsupported attributes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-3-michal.wajdeczko@intel.com	2025-09-17 21:59:31 +02:00
Michal Wajdeczko	a2d6223d22	drm/xe/vf: Don't expose sysfs attributes not applicable for VFs VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE (already guarded by the info.skip_pcode flag) so we shouldn't expose attributes that require any of them to avoid errors like: [] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \ inaccessible register 0x138340+0x0 [] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe] [] Call Trace: [] xe_mmio_read32+0x110/0x280 [xe] [] auto_link_downgrade_capable_show+0x2e/0x70 [xe] [] dev_attr_show+0x1a/0x70 [] sysfs_kf_seq_show+0xaa/0x120 [] kernfs_seq_show+0x41/0x60 Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Fixes: `cdc36b66cd` ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com	2025-09-17 21:59:29 +02:00
Shuicheng Lin	33fe111a35	drm/xe/madvise: Fix ioctl argument check It is "preferred_mem_loc" instead of "atomic" for the ATTR_PREFERRED_LOC path. Also include 2 minor changes with no functional impact. 1. Remove the redundant "attr.atomic_access" assignment. 2. Replace down_read_interruptible() with xe_svm_notifier_lock_interruptible() to pair with xe_svm_notifier_unlock(). Fixes: `ada7486c56` ("drm/xe: Implement madvise ioctl for xe") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250911173139.1405878-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-09-17 10:09:34 -07:00
Shuicheng Lin	5959c4da17	drm/xe: Misc refine for svm These changes should have no functional impact. 1. Correct typo of "operation"in macro range_debug(). 2. Combine 2 spin_lock() call in xe_svm_garbage_collector() into 1. 3. Drop redundant preferred_region_is_vram check in xe_svm_range_needs_migrate_to_vram(). 4. Combine the devmem_possible check in xe_svm_handle_pagefault(). need_vram includes the IS_DGFX() check, so there is no change for .devmem_only. v2: revert !ctx.devmem_only change (Matt) v3: rebase code and refine commit message. v4: rebase code and refine commit message. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250911031405.1371812-2-shuicheng.lin@intel.com	2025-09-17 09:50:44 -07:00
Michal Wajdeczko	5bb5258e35	drm/xe/tests: Add pre-GMDID IP descriptors to param generators Recently introduced kunit parameter generators were based on the existing arrays which have only GDMID-based IPs and didn't take into account IP definitions from pre-GMDID era. Add test only arrays with pre-GMDID IPs (as those will not change) and extend param generators to start iterating over them. [ ] =================== xe_pci (2 subtests) ==================== [ ] ==================== check_graphics_ip ==================== [ ] [PASSED] 12.00 Xe_LP [ ] [PASSED] 12.10 Xe_LP+ [ ] [PASSED] 12.55 Xe_HPG [ ] [PASSED] 12.60 Xe_HPC [ ] [PASSED] 12.70 Xe_LPG [ ] [PASSED] 12.71 Xe_LPG [ ] [PASSED] 12.74 Xe_LPG+ [ ] [PASSED] 20.01 Xe2_HPG [ ] [PASSED] 20.02 Xe2_HPG [ ] [PASSED] 20.04 Xe2_LPG [ ] [PASSED] 30.00 Xe3_LPG [ ] [PASSED] 30.01 Xe3_LPG [ ] [PASSED] 30.03 Xe3_LPG [ ] ================ [PASSED] check_graphics_ip ================ [ ] ===================== check_media_ip ====================== [ ] [PASSED] 12.00 Xe_M [ ] [PASSED] 12.55 Xe_HPM [ ] [PASSED] 13.00 Xe_LPM+ [ ] [PASSED] 13.01 Xe2_HPM [ ] [PASSED] 20.00 Xe2_LPM [ ] [PASSED] 30.00 Xe3_LPM [ ] [PASSED] 30.02 Xe3_LPM [ ] ================= [PASSED] check_media_ip ================== [ ] ===================== [PASSED] xe_pci ====================== Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916171645.3335-1-michal.wajdeczko@intel.com	2025-09-17 15:29:13 +02:00
Daniele Ceraolo Spurio	aaae483657	drm/xe: Allow error injection for xe_pxp_exec_queue_add This will allow us to simulate this function returning an error like we do for other functions called in the exec_queue_create path. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-4-daniele.ceraolospurio@intel.com	2025-09-16 15:54:31 -07:00
Daniele Ceraolo Spurio	626667321d	drm/xe: Fix error handling if PXP fails to start Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini). Fixes: `72d479601d` ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com	2025-09-16 15:54:28 -07:00
Yang Li	9e6eb49ec1	drm/xe: Remove duplicate header files Fix some duplicate includes in xe: ./drivers/gpu/drm/xe/xe_tlb_inval.c: xe_tlb_inval.h is included more than once. ./drivers/gpu/drm/xe/xe_pt.c: xe_tlb_inval_job.h is included more than once. While at it, also sort the include lines alphabetically. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24705 Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24706 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [Reword commit message] Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916021039.1632766-1-yang.lee@linux.alibaba.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-16 12:47:40 -07:00
John Harrison	3b09b11805	drm/xe/guc: Return an error code if the GuC load fails Due to multiple explosion issues in the early days of the Xe driver, the GuC load was hacked to never return a failure. That prevented kernel panics and such initially, but now all it achieves is creating more confusing errors when the driver tries to submit commands to a GuC it already knows is not there. So fix that up. As a stop-gap and to help with debug of load failures due to invalid GuC init params, a wedge call had been added to the inner GuC load function. The reason being that it leaves the GuC log accessible via debugfs. However, for an end user, simply aborting the module load is much cleaner than wedging and trying to continue. The wedge blocks user submissions but it seems that various bits of the driver itself still try to submit to a dead GuC and lots of subsequent errors occur. And with regards to developers debugging why their particular code change is being rejected by the GuC, it is trivial to either add the wedge back in and hack the return code to zero again or to just do a GuC log dump to dmesg. v2: Add support for error injection testing and drop the now redundant wedge call. CC: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250909224132.536320-1-John.C.Harrison@Intel.com	2025-09-16 12:11:08 -07:00
Zongyao Bai	1a869168d9	drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init On partial failure, some sysfs files created before the failure might not be removed. Add common cleanup step to remove them all immediately, as is should be harmless to attempt to remove non-existing files. Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Zongyao Bai <zongyao.bai@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250915214716.1327379-2-zongyao.bai@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-16 07:59:36 -07:00
John Harrison	456b32c9c1	drm/xe/guc: Add test for G2G communications Add a test for sending messages from every GuC to every other GuC to test G2G communications. Note that, being a debug only feature, the test interface only exists in pre-production builds of the GuC firmware. v2: Fix 'default' case to actually use the driver's registration code as well as allocation. Add comments explaining the different test types. Fix (C) date and an assert. Review feedback from Daniele. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-5-John.C.Harrison@Intel.com	2025-09-15 09:53:26 -07:00
John Harrison	537773db91	drm/xe: Allow freeing of a managed bo If a bo is created via xe_managed_bo_create_pin_map() then it cannot be freed by the driver using xe_bo_unpin_map_no_vm(), or indeed any other existing function. The DRM layer will still have a pointer stashed away for later freeing, causing a invalid memory access on driver unload. So add a helper for releasing the DRM action as well. v2: Drop 'xe' parameter (review feedbak from Michal W) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-4-John.C.Harrison@Intel.com	2025-09-15 09:53:25 -07:00
John Harrison	acf01c79f0	drm/xe/guc: Add firmware build type to available info Some test features are not available in production builds of the GuC firmware. So add the build type field to the available information that tests can inspect to decide if they should skip or run. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-3-John.C.Harrison@Intel.com	2025-09-15 09:53:23 -07:00
John Harrison	7d0ca56e91	drm/xe/guc: Update CSS header structures Rework the CSS header structure according to recent updates to the GuC API spec. Also include more field definitions. v2: Also pass the new GuC specific structure to a GuC specific function instead of the higher level, generic structure (review feedback from Daniele). Also correct naming of CSS_TIME_* fields. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-2-John.C.Harrison@Intel.com	2025-09-15 09:53:21 -07:00
Fushuai Wang	84afb84bcc	drm/xe: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...)) Use ERR_CAST inline function instead of ERR_PTR(PTR_ERR(...)). Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250914101630.17719-1-wangfushuai@baidu.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-15 08:32:51 -07:00
Lucas De Marchi	19baa830fb	drm/xe: Use ARRAY_SIZE in guc_waklv_init() Prefer using ARRAY_SIZE where needed and just passing 1 instead of calculating the size of one element. Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508130158.eogeBZQT-lkp@intel.com/ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250912-guc-ads-array-size-v1-1-a6555392a1f8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-15 07:50:53 -07:00
Dan Carpenter	75cc23ffe5	drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue() The xe_preempt_fence_create() function returns error pointers. It never returns NULL. Update the error checking to match. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-15 08:33:19 -04:00
Nitin Gote	d4c3ed963e	drm/xe: defer free of NVM auxiliary container to device release callback Do not kfree the intel_dg_nvm_dev in xe_nvm_fini() right after auxiliary_device_delete/uninit. The auxiliary_device embeds the device/kobject (and its name); freeing it too early can race with asynchronous device_del/udev processing and cause a use-after-free. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Fixes: `c28bfb107d` ("drm/xe/nvm: add on-die non-volatile memory device") Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250911052823.226696-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 12:02:23 -07:00
Lucas De Marchi	2ec2945625	drm/xe/configfs: Fix documentation warning Fix this warning while building the documentation: Documentation/gpu/xe/xe_configfs:9: drivers/gpu/drm/xe/xe_configfs.c:138: WARNING: Definition list ends without a blank line; unexpected unindent. That also makes it better formatted in the output. While at it, also fix the underline length in "Overview". Fixes: `e2b33fce5e` ("drm/xe/configfs: Improve documentation steps") Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-2-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 09:36:43 -07:00
Lucas De Marchi	c34f9868df	drm/xe: Update workaround documentation Bring it up to reality, better documenting the existing batch buffers, OOB rules and fixing some typos. Bspec: 60122 Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-1-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 09:36:43 -07:00
Mallesh Koujalagi	4e1d3b5e64	drm/xe/hwmon: Remove type casting Refactor: eliminate type casts by using proper u32 declarations. v2: - Address review comments. (Karthik) v3: - Use the proper u32 type and drop cast. (Lucas De Marchi) - Modify variable when actually using u64 value. - Change r value to reg_value with u32 type. v4: - Remove newline between trailer and Signed-off-by. (Lucas De Marchi) - Change reg_val to val for more user-friendly logging. - Use mul_u32_u32 function since both values are u32. v5: - mul_u32_u32 function with shift. (Lucas De Marchi) Fixes: `7596d839f6` ("drm/xe/hwmon: Add support to manage power limits though mailbox") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:36:50 -07:00
Colin Ian King	9e0b0fd531	drm/xe/guc: Fix spelling mistake "sheduling" -> "scheduling" There is a spelling mistake in a xe_gt_err error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250912074330.1275279-1-colin.i.king@gmail.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:20:32 -07:00
Harish Chegondi	2a810401aa	drm/xe/xe3: Extend Wa_18041344222 to graphics IP versions 30.00 and 30.01 Apply WA 18041344222 to Xe3 LPG graphics IP versions 30.00 and 30.01 too. Bspec: 56024 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/7368f8059013424ac94f4a01c23f9c98a37b06dc.1757552915.git.harish.chegondi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-12 07:20:32 -07:00
Rodrigo Vivi	fed1a9d60f	drm/xe: Fix circular locking dependency Fix this: ====================================================== WARNING: possible circular locking dependency detected 6.17.0-rc4-lgci-xe-xe-pw-153723v2+ #1 Tainted: G S U ------------------------------------------------------ xe_pm/11324 is trying to acquire lock: ffff8881085f22a0 (&pc->freq_lock){+.+.}-{3:3}, at: xe_guc_pc_start+0x39f/0xf70 [xe] but task is already holding lock: ffffffffa1020420 (xe_rpm_nod3cold_map){+.+.}-{0:0}, at: xe_rpm_lockmap_acquire+0x1a/0x70 [xe] which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); Reported-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6122 Fixes: `60d2b78991` ("drm/xe/guc: Add SLPC power profile interface") Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250911212024.966757-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-12 09:38:08 -04:00
Michal Wajdeczko	01ecf00463	drm/xe: Use tile-oriented messages in GGTT code Use recently added macros to print tile-oriented messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-6-michal.wajdeczko@intel.com	2025-09-12 12:23:59 +02:00
Michal Wajdeczko	48a8659cd5	drm/xe: Add dedicated printk macros for tile and device We already have dedicated helper macros for printing GT-oriented messages but we don't have any to print messages that are tile oriented and we wrongly try to use plain drm or GT-oriented ones. Add tile-oriented printk messages and to provide similar coverage as we have with xe_assert() macros. Also add set of simple macros for the top level xe_device, which we could easily tweak to include extra device specific info if needed. Typical output of our printk macros will look like: [drm] this is xe_WARN() [drm] ERROR this is xe_err() [drm] ERROR this is xe_err_printer() [drm] this is xe_info() [drm] this is xe_info_printer() [drm:printk_demo.cold] this is xe_dbg() [drm:printk_demo.cold] this is xe_dbg_printer() [drm] Tile0: this is xe_tile_WARN() [drm] ERROR Tile0: this is xe_tile_err() [drm] ERROR Tile0: this is xe_tile_err_printer() [drm] Tile0: this is xe_tile_info() [drm] Tile0: this is xe_tile_info_printer() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg_printer() [drm] Tile0: GT0: this is xe_gt_WARN() [drm] ERROR Tile0: GT0: this is xe_gt_err() [drm] ERROR Tile0: GT0: this is xe_gt_err_printer() [drm] Tile0: GT0: this is xe_gt_info() [drm] Tile0: GT0: this is xe_gt_info_printer() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg_printer() Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-5-michal.wajdeczko@intel.com	2025-09-12 12:23:57 +02:00
Michal Wajdeczko	efd54b0cff	drm/xe: Prepare format for GT-oriented messages in one place To avoid code duplication (and thus potential mistakes) and to allow easier changes (if needed) of the prefix format of the GT-oriented messages, prepare that prefix in dedicated macro. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-4-michal.wajdeczko@intel.com	2025-09-12 12:23:56 +02:00
Michal Wajdeczko	a2dc39fb1c	drm/xe: Drop "gt_" prefix from xe_gt_WARN() macros Those WARN messages will already include GT-specific "GT%u:" prefix so there is no point to include additional "gt_" prefix. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-3-michal.wajdeczko@intel.com	2025-09-12 12:23:55 +02:00
Michal Wajdeczko	edffa93a93	drm/xe: Keep xe_gt_err() macro definitions together There is no need to keep them separated. No functional changes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-2-michal.wajdeczko@intel.com	2025-09-12 12:23:53 +02:00
Daniele Ceraolo Spurio	8843444843	drm/xe/guc: Set RCS/CCS yield policy All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: `d9a1ae0d17` ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com	2025-09-11 09:45:35 -07:00
Michal Wajdeczko	95c1cfa306	drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation This effectively reverts commit `4c3fe5eae4` ("drm/xe/pf: Limit fair VF LMEM provisioning") since we don't need it any more after non-contig VRAM allocations were fixed. This allows larger LMEM auto-provisioning for VFs, so instead: [ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M) [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM or [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M) [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM we may get: [ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M) [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM and [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M) [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM Fixes: `1e32ffbc9d` ("drm/xe/sriov: support non-contig VRAM provisioning") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com	2025-09-11 18:20:57 +02:00
Varun Gupta	010629e00d	drm/xe: Fix driver reference in FLR comment Rectify the reference of i915 to Xe in a comment. v2: Cosmetic changes. (Karthik) v3: Rephrased the commit message. (Karthik) Signed-off-by: Varun Gupta <varun.gupta@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Link: https://lore.kernel.org/r/20250911111712.811524-1-varun.gupta@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-11 08:00:33 -07:00
Vinay Belgaumkar	60d2b78991	drm/xe/guc: Add SLPC power profile interface GuC has an interface to set a power profile for the SLPC algorithm. Base mode is default and ensures a balanced performance, power_saving mode has conservative up/down thresholds and is suitable for use with apps that typically need to be power efficient. This will result in lower GT frequencies, thus consuming lower power. Selected power profile will be displayed in this format: $ cat power_profile [base] power_saving $ echo power_saving > power_profile $ cat power_profile base [power_saving] v2: Address review comments (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-11 08:45:05 -04:00
Thomas Hellström	692a480243	drm/xe: Fix uninitialized return values clang warned about two uninitialized variables used as return values in the exhaustive eviction series. Fix those. Fixes: `1f1541720f` ("drm/xe: Rework instances of variants of xe_bo_create_locked()") Fixes: `7bcb6e38c1` ("drm/xe/display: Convert __xe_pin_fb_vma()") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250910151128.49693-1-thomas.hellstrom@linux.intel.com	2025-09-11 08:45:12 +02:00
Shuicheng Lin	b98775bca9	drm/xe/tile: Release kobject for the failure path Call kobject_put() for the failure path to release the kobject v2: remove extra newline. (Matt) Fixes: `e3d0839aa5` ("drm/xe/tile: Abort driver load for sysfs creation failure") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-10 21:00:10 -07:00

1 2 3 4 5 ...

1382327 Commits