linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 11:21:26 -04:00

Author	SHA1	Message	Date
Matthew Brost	9ff885ef8b	drm/xe: Convert GT stats to per-cpu counters Current GT statistics use atomic64_t counters. Atomic operations incur a global coherency penalty. Transition to dynamic per-cpu counters using alloc_percpu(). This allows stats to be incremented via this_cpu_add(), which compiles to a single non-locking instruction. This approach keeps the hot-path updates local to the CPU, avoiding expensive cross-core cache invalidation traffic. Use for_each_possible_cpu() during aggregation and clear operations to ensure data consistency across CPU hotplug events. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260217200552.596718-1-matthew.brost@intel.com	2026-02-17 18:12:10 -08:00
Karthik Poosa	48eb073c7d	drm/xe/hwmon: Prevent unintended VRAM channel creation Remove the unnecessary VRAM channel entry introduced in xe_hwmon_channel. Without this, adding any new hwmon channel causes extra VRAM channel to appear. This remained unnoticed earlier because VRAM was the final xe hwmon channel. v2: Use MAX_VRAM_CHANNELS with in_range() instead of CHANNEL_VRAM_N_MAX. (Raag) Fixes: `49a4983384` ("drm/xe/hwmon: Expose individual VRAM channel temperature") Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20260206081655.2115439-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:21:30 -05:00
Arnd Bergmann	95162db020	drm/pagemap: pass pagemap_addr by reference Passing a structure by value into a function is sometimes problematic, for a number of reasons. Of of these is a warning from the 32-bit arm compiler: drivers/gpu/drm/drm_gpusvm.c: In function '__drm_gpusvm_unmap_pages': drivers/gpu/drm/drm_gpusvm.c:1152:33: note: parameter passing for argument of type 'struct drm_pagemap_addr' changed in GCC 9.1 1152 \| dpagemap->ops->device_unmap(dpagemap, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1153 \| dev, *addr); \| ~~~~~~~~~~~ This particular problem is harmless since we are not mixing compiler versions inside of the compiler. However, passing this by reference avoids the warning along with providing slightly better calling conventions as it avoids an extra copy on the stack. Fixes: `75af93b3f5` ("drm/pagemap, drm/xe: Support destination migration over interconnect") Fixes: `2df55d9e66` ("drm/xe: Support pcie p2p dma as a fast interconnect") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260216134644.1025365-1-arnd@kernel.org Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2026-02-17 13:10:52 +01:00
Maarten Lankhorst	08d05c7366	drm/xe: Remove xe_ggtt_node_allocated With the intermediate state gone, no longer useful. Just check against NULL where needed. After looking carefully, the check for allocated in xe_fb_pin.c is unneeded. vma->node is never NULL. The check is specifically only to check if vma->node == the bo's root tile ggtt_obj. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260206112108.1453809-12-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2026-02-16 14:20:24 +01:00
Maarten Lankhorst	a4eac88e31	drm/xe: Make xe_ggtt_node_insert return a node This extra step is easier to handle inside xe_ggtt.c and makes xe_ggtt_node_allocated a simple null check instead, as the intermediate state 'allocated but not inserted' is no longer used. Privatize xe_ggtt_node_fini() and init() as they're no longer used outside of xe_ggtt.c Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260206112108.1453809-11-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2026-02-16 14:20:24 +01:00
Maarten Lankhorst	95f5f9a96d	drm/xe: Move struct xe_ggtt to xe_ggtt.c No users left outside of xe_ggtt.c, so we can make the struct private. This prevents us from accidentally touching it before init. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260206112108.1453809-10-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2026-02-16 14:20:24 +01:00
Maarten Lankhorst	e904c56ba6	drm/xe: Rewrite GGTT VF initialization The previous code was using a complicated system with 2 balloons to set GGTT size and adjust GGTT offset. While it works, it's overly complicated. A better approach is to set the offset and size when initializing GGTT, this removes the need for adding balloons. The resize function only needs readjust ggtt->start to have GGTT at the new offset. This removes the need to manipulate the internals of xe_ggtt outside of xe_ggtt, and cleans up a lot of now unneeded code. Co-developed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260206112108.1453809-9-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2026-02-16 14:20:24 +01:00
Maarten Lankhorst	7feebdb041	drm/xe: Make xe_ggtt_node offset relative to starting offset Fix all functions that use node->start to use xe_ggtt_node_addr, and add ggtt->start to node->start. This will make node shifting for SR-IOV VF a one-liner, instead of manually changing each GGTT node's base address. Also convert some uses of mutex_lock/unlock to mutex guards. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260206112108.1453809-8-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2026-02-16 14:20:09 +01:00
Jani Nikula	4a175759e3	drm/xe: remove unnecessary struct dram_info forward declaration There's no longer any need for the struct dram_info forward declaration. Remove it. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260212131206.1804113-1-jani.nikula@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-13 15:01:43 -08:00
Matthew Brost	2405ba53ff	drm/xe: Avoid touching consumer fields in GuC pagefault ack The GuC pagefault acknowledgment code is designed to extract the fields needed for the acknowledgment from the producer-stored message so that the consumer fields can be overloaded to return additional information. The ASID is stored in the producer message; extract it from there to future‑proof this logic. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20260212204227.2764054-3-matthew.brost@intel.com	2026-02-13 12:03:47 -08:00
Matthew Brost	68be2bfe4b	drm/xe: Pack fault type and level into a u8 Pack the fault type and level fields into a single u8 to save space in struct xe_pagefault. This also makes future extensions easier. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20260212204227.2764054-2-matthew.brost@intel.com	2026-02-13 12:03:39 -08:00
Arvind Yadav	2882094e0d	drm/xe/xe2: Apply Wa_14024997852 Applied Wa_14024997852 to Graphics version 20.01 to 20.04 Whitelist registers needed for userspace to control autostrip on xe2. v2: - set Bit 31 of FF_MODE, for TE autostrip disable (Nitin) v3: - Need to whitelist these for Xe2 IPs (MATT R) v4: - Combine these into a single range for simplicity:(2001, 3005) (MATT R) Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260212065920.1815979-1-arvind.yadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-13 09:26:14 -08:00
Raag Jadav	c020fff70d	drm/xe/bo: Redirect faults to dummy page for wedged device As per uapi documentation[1], the prerequisite for wedged device is to redirected page faults to a dummy page. Follow it. [1] Documentation/gpu/drm-uapi.rst v2: Add uapi reference and fixes tag (Matthew Brost) Fixes: `7bc00751f8` ("drm/xe: Use device wedged event") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260212055622.2054991-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-13 09:21:22 -08:00
Piotr Piórkowski	aafbb42be5	drm/xe: Force EXEC_QUEUE_FLAG_KERNEL for kernel internal VMs VMs created without an associated xe_file originate from kernel contexts and should use kernel exec queues. Ensure such VMs create bind exec queues with EXEC_QUEUE_FLAG_KERNEL set. Let's ensure bind exec queues created for kernel VMs are always marked with EXEC_QUEUE_FLAG_KERNEL. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260211171441.3246686-1-piotr.piorkowski@intel.com	2026-02-13 13:16:36 +01:00
Matt Roper	1ff4b1730c	drm/xe: Stop applying Wa_16018737384 from Xe3 onward Wa_16018737384 is one of the rare cases where the hardware teams mark a workaround as "driver change required" rather than "permanent/temporary workaround" in the internal workaround database, signifying that the implementation details of the workaround should just be considered standard programming instructions on all platforms going forward. Cases like this are the only time that using XE_RTP_END_VERSION_UNDEFINED as an upper bound for a workaround's IP range is warranted and correct. However in this specific case, the register bit in question (0xE4F0[1]) simply no longer exists in hardware from Xe3 onward. Trying to write to that bit on Xe3 or Xe3p platforms is harmless and just doesn't have any effect, but it's possible that the register bit could get repurposed to control something else down the road on future platforms. To avoid any surprises in the future we should replace the unbounded upper bound in our RTP table with a value that accurately reflects that Wa_16018737384 can only apply to Xe2 platforms. Bspec: 56849 Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://patch.msgid.link/20260211234735.620087-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-12 16:19:18 -08:00
Matt Roper	b5b55d0932	drm/xe/xe3p_xpc: Add new XeCore fuse registers to VF runtime regs SRIOV VFs do not automatically have access to the XeCore fuse registers. Add the two new registers that show up on Xe3p_XPC to the runtime register list to grant VFs access. Since there's a single runtime register list for all Xe3p, this will technically also grant access on Xe3p_LPG platforms where the registers don't exist, but that should be harmless since even if a VF tries to read a non-existent register on those platforms it will just get back a sensible value of 0x0. Fixes: `e8100643ff` ("drm/xe/xe3p_xpc: XeCore mask spans four registers") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Ngai-Mint Kwan <ngai-mint.kwan@linux.intel.com> Link: https://patch.msgid.link/20260210182519.206952-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-12 11:01:41 -08:00
Raag Jadav	6d83ef1ada	drm/xe: Update xe_device_declare_wedged() error log Since the introduction of DRM wedged event, there are now a few different procedures to recover the device depending on selected recovery method. Update the error log to reflect this and point the user to correct documentation for it. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260205113424.1629204-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-12 08:55:57 -05:00
Thomas Hellström	1a3c0049b3	Revert "drm/pagemap: Disable device-to-device migration" With commit `a69d1ab971` ("mm: Fix a hmm_range_fault() livelock / starvation problem") device-to-device migration is not functional again and the disabling can be reverted. Add the above commit as a Fixes: tag in order for the revert to not take place unless that commit is present. This reverts commit `10dd1eaa80`. Cc: Matthew Brost <matthew.brost@intel.com> Fixes: `a69d1ab971` ("mm: Fix a hmm_range_fault() livelock / starvation problem") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260211104159.114947-1-thomas.hellstrom@linux.intel.com	2026-02-12 11:12:40 +01:00
Shuicheng Lin	25c9aa4dcb	drm/xe: Make xe_modparam.force_vram_bar_size signed vram_bar_size is registered as an int module parameter and is documented to accept negative values to disable BAR resizing. Store it as an int in xe_modparam as well, so negative values work as intended and the module_param type matches. Fixes: `80742a1aa2` ("drm/xe: Allow to drop vram resizing") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202181853.1095736-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-11 11:00:09 -08:00
Piotr Piórkowski	0bcacf56dc	drm/xe/vf: Avoid reading media version when media GT is disabled When the media GT is not allowed, a VF must not attempt to read the media version from the GuC. The GuC may not be loaded, and any attempt to communicate with it would result in a timeout and a VF probe failure: (...) [ 1912.406046] xe 0000:01:00.1: [drm] ERROR Tile0: GT1: GuC mmio request 0x5507: no reply 0x5507 [ 1912.407277] xe 0000:01:00.1: [drm] ERROR Tile0: GT1: [GUC COMMUNICATION] MMIO send failed (-ETIMEDOUT) [ 1912.408689] xe 0000:01:00.1: [drm] ERROR VF: Tile0: GT1: Failed to reset GuC state (-ETIMEDOUT) [ 1912.413986] xe 0000:01:00.1: probe with driver xe failed with error -110 Let's skip reading the media version for VFs when the media GT is not allowed. v2: move the condition directly to the VF path Fixes: `7abd69278b` ("drm/xe/configfs: Add attribute to disable GT types") Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202115041.2863357-1-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2026-02-11 11:26:59 +01:00
Thomas Hellström	a69d1ab971	mm: Fix a hmm_range_fault() livelock / starvation problem If hmm_range_fault() fails a folio_trylock() in do_swap_page, trying to acquire the lock of a device-private folio for migration, to ram, the function will spin until it succeeds grabbing the lock. However, if the process holding the lock is depending on a work item to be completed, which is scheduled on the same CPU as the spinning hmm_range_fault(), that work item might be starved and we end up in a livelock / starvation situation which is never resolved. This can happen, for example if the process holding the device-private folio lock is stuck in migrate_device_unmap()->lru_add_drain_all() sinc lru_add_drain_all() requires a short work-item to be run on all online cpus to complete. A prerequisite for this to happen is: a) Both zone device and system memory folios are considered in migrate_device_unmap(), so that there is a reason to call lru_add_drain_all() for a system memory folio while a folio lock is held on a zone device folio. b) The zone device folio has an initial mapcount > 1 which causes at least one migration PTE entry insertion to be deferred to try_to_migrate(), which can happen after the call to lru_add_drain_all(). c) No or voluntary only preemption. This all seems pretty unlikely to happen, but indeed is hit by the "xe_exec_system_allocator" igt test. Resolve this by waiting for the folio to be unlocked if the folio_trylock() fails in do_swap_page(). Rename migration_entry_wait_on_locked() to softleaf_entry_wait_unlock() and update its documentation to indicate the new use-case. Future code improvements might consider moving the lru_add_drain_all() call in migrate_device_unmap() to be called after all pages have migration entries inserted. That would eliminate also b) above. v2: - Instead of a cond_resched() in hmm_range_fault(), eliminate the problem by waiting for the folio to be unlocked in do_swap_page() (Alistair Popple, Andrew Morton) v3: - Add a stub migration_entry_wait_on_locked() for the !CONFIG_MIGRATION case. (Kernel Test Robot) v4: - Rename migrate_entry_wait_on_locked() to softleaf_entry_wait_on_locked() and update docs (Alistair Popple) v5: - Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION version of softleaf_entry_wait_on_locked(). - Modify wording around function names in the commit message (Andrew Morton) Suggested-by: Alistair Popple <apopple@nvidia.com> Fixes: `1afaeb8293` ("mm/migrate: Trylock device page in do_swap_page") Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: linux-mm@kvack.org Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3 Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://patch.msgid.link/20260210115653.92413-1-thomas.hellstrom@linux.intel.com	2026-02-11 11:03:01 +01:00
Maciej Patelczyk	d287dee565	drm/gpusvm: Fix unbalanced unlock in drm_gpusvm_scan_mm() There is a unbalanced lock/unlock to gpusvm notifier lock: [ 931.045868] ===================================== [ 931.046509] WARNING: bad unlock balance detected! [ 931.047149] 6.19.0-rc6+xe-**************** #9 Tainted: G U [ 931.048150] ------------------------------------- [ 931.048790] kworker/u5:0/51 is trying to release lock (&gpusvm->notifier_lock) at: [ 931.049801] [<ffffffffa090c0d8>] drm_gpusvm_scan_mm+0x188/0x460 [drm_gpusvm_helper] [ 931.050802] but there are no more locks to release! [ 931.051463] The drm_gpusvm_notifier_unlock() sits under err_free label and the first jump to err_free is just before calling the drm_gpusvm_notifier_lock() causing unbalanced unlock. Fixes: `f1d08a5864` ("drm/gpusvm: Introduce a function to scan the current migration state") Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260209123433.1271053-1-maciej.patelczyk@intel.com	2026-02-10 10:52:49 -08:00
Matt Roper	e04c609eed	drm/xe/xe2_hpg: Fix handling of Wa_14019988906 & Wa_14019877138 The PSS_CHICKEN register has been part of the RCS engine's LRC since it was first introduced in Xe_LP. That means that any workarounds that adjust its value (such as Wa_14019988906 and Wa_14019877138) need to be implemented in the lrc_was[] table so that they become part of the default LRC from which all subsequent LRCs are copied. Although these workarounds were implemented correctly on most platforms, they were incorrectly placed on the engine_was[] table for Xe2_HPG. Move the workarounds to the proper lrc_was[] table and switch the 'xe_rtp_match_first_render_or_compute' rule to specifically match the RCS since that's the engine whose LRC manages the register. Bspec: 65182 Fixes: `7f3ee7d880` ("drm/xe/xe2hpg: Add initial GT workarounds") Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Link: https://patch.msgid.link/20260205220508.51905-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-10 07:41:48 -08:00
Gustavo Sousa	d2e0540a62	drm/xe/nvlp: Bump maximum WOPCM size On NVL-P, the primary GT's WOPCM gained an extra 8MiB for the Memory URB. As such, we need to bump the maximum size in the driver so that the driver is able to load without erroring out thinking that the WOPCM is too small. FIXME: The wopcm code in xe driver is a bit confusing. For the case where the offsets for GUC WOPCM are already locked, it appears we are using the maximum overall WOPCM size instead of the sizes relative to each type of GT. The function __check_layout() should be checking against the latter. Bspec: 67090 Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-15-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:25 -03:00
Matt Roper	d59d94f91f	drm/i915/nvlp: Hook up display support Although NVL-S and NVL-P are quite different on the GT side, they use identical Xe3p_LPD display IP and should take all the same codepaths. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-14-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:23 -03:00
Dnyaneshwar Bhadane	b9006dacb8	drm/xe/nvlp: Attach MOCS table for nvlp The MOCS table for NVL-P is same as for Xe2/Xe3 platforms. Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-13-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:21 -03:00
Shekhar Chauhan	be07d8f707	drm/xe/nvlp: Add NVL-P platform definition Add platform definition along with device IDs for NVL-P. Here is the list of device descriptor fields and associated Bspec references: .dma_mask_size (Bspec 74198) .has_cached_pt (Bspec 71582) .has_display (Bspec 74196) .has_flat_ccs (Bspec 74110) .has_page_reclaim_hw_assist (Bspec 73451) .max_gt_per_tile (Bspec 74196) .va_bits (Bspec 74198) .vm_max_level (Bspec 59507) v2: - Add list of descriptor fields and Bspec references. (Matt) Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-12-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:19 -03:00
Aradhya Bhatia	377c89bfaa	drm/xe/xe3p_lpg: Set STLB bank hash mode to 4KB Since the dominant size of the pages referred in an i-gpu, such as Xe3p_LPG, will be 4KB, the HW default of mix of 64K and 2M for STLB bank hash mode does not make sense. Allow the SW to change it to 4KB Mode, for Xe3p_LPG. v2: - Add Bspec reference. (Matt) Bspec: 78248 Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-11-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:17 -03:00
Gustavo Sousa	1888b3397e	drm/xe/xe3p_lpg: Update LRC sizes Like with previous generations, the engine context images for of both RCS and CCS in Xe3p_LPG contain a common layout at the end for the context related to the "Compute Pipeline". The size of the memory area written to such section varies; it depends on the type of preemption has taken place during the execution and type of command streamer instruction that was used on the pipeline. For Xe3p_LPG, the maximum possible size, including NOOPs for cache line alignment, is 4368 dwords, which would be the case of a mid-thread preemption during the execution of a COMPUTE_WALKER_2 instruction. The maximum size has increased in such a way that we need to update xe_gt_lrc_size() to match the new sizing requirement. When we add that to the engine-specific parts, we have: - RCS context image: 6672 dwords = 26688 bytes -> 7 pages - CCS context image: 5024 dwords = 20096 bytes -> 5 pages Bspec: 65182, 55793, 73590 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-10-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:14 -03:00
Matt Roper	60fcdf645c	drm/xe/xe3p_lpg: Extend 'group ID' mask size Xe3p_LPG extends the 'group ID' register mask by one bit. Since the new upper bit (12) was unused on previous platforms, we can safely extend the existing mask size without worrying about adding conditional version checks to the register programming. Bspec: 67175 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-9-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:12 -03:00
Matt Roper	ce0e1a6384	drm/xe/xe3p_lpg: Drop unnecessary tuning settings From Xe3p onward, the desired settings are now the hardware's default values and the driver does not need to program them explicitly. Since 35.xx seems to be the starting point for "Xe3p" version numbers; we'll adjust the bounds of the old programming to stop at 34.99. Even though there's no platform with version 35.00 at the moment, this is simplest in case one does show up in the future. Bspec: 72161, 59928, 59930 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-8-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:11 -03:00
Matt Roper	e5db97a305	drm/xe/xe3p_lpg: Disable reporting of context switch status to GHWSP By default the hardware reports context switch status into the global hardware status page. The Xe driver doesn't use this information for anything, and as of Xe3p, leaving this setting enabled will prevent other hardware optimizations from being enabled. Disable this reporting as suggested by the tuning guide. Bspec: 72161 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-7-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:08 -03:00
Matt Roper	4a0836a260	drm/xe/xe3p_lpg: Add LRC parsing for additional RCS engine state Xe3p_LPG adds some additional state instructions to the RCS engine's LRC. Add support for these to the debugfs LRC parser. Note that the bspec's LRC description page seems to have a few mistakes in the name/spelling of these new instructions (e.g., "3DSTATE_TASK_DATA_EXT" instead of "3DSTATE_TASK_SHADER_DATA_EXT" or "3DSTATE_VIEWPORT_STATE_POINTERS_CL_SF_2" instead of "3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP_2"). Bspec: 65182 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-6-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:09:05 -03:00
Matt Roper	641a2208c0	drm/xe/xe3p_lpg: Add MCR steering Xe3p_LPG has nearly identical steering to Xe2 and Xe3. The only DSS/XeCore change from those IPs is an additional range from 0xDE00-0xDE7F that was previously reserved, so we can simply grow one of the existing ranges in the Xe2 table to include it. Similarly, the "instance0" table is also almost identical, but gains one additional PSMI range and requires a separate table. v2: - Drop reserved range from MEMPIPE range. (Dnyaneshwar) Bspec: 75242 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-5-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:08:59 -03:00
Matt Roper	f3e5f71fd6	drm/xe/xe3p_lpg: Add new PAT table PAT programming for Xe3p_LPG is more similar to Xe2 and Xe3 than it is to Xe3p_XPC. Compared to Xe2/Xe3 we have: * There's a slight update to the PAT table, where two new indices (18 and 19) are added to expose a new "WB - Transient App" L3 caching mode. * The PTA_MODE entry must be programmed differently according to the media type, and both differ from Xe2. There are no changes to the underlying registers, so the Xe2 ops can be re-used for Xe3p. Bspec: 71582, 74160 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-4-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:08:57 -03:00
Gustavo Sousa	a08104551d	drm/xe/pat: Differentiate between primary and media for PTA Differently from currently supported platforms, in upcoming changes we will need to have different PAT entries for PTA based on the GT type. As such, let's prepare the code to support that by having two separate PTA-specific members in the pat struct, one for each type of GT. While at it, also fix the kerneldoc for pat_ats. Co-developed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-3-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:08:54 -03:00
Shekhar Chauhan	835cd6cbb0	drm/xe/xe3p_lpg: Add initial workarounds for graphics version 35.10 Add the initial set of workarounds for Xe3p_LPG graphics version 35.10. v2: - Fix spacing style for field LOCALITYDIS. (Matt) - Drop unnecessary Wa_14025780377. (Matt) Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Co-developed-by: Nitin Gote <nitin.r.gote@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Co-developed-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Co-developed-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-2-636e1ad32688@intel.com Co-developed-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:08:46 -03:00
Shekhar Chauhan	8fcb7dfb8b	drm/xe/xe3p_lpg: Add support for graphics IP 35.10 Add Xe3p_LPG graphics IP version 35.10. Xe3p_LPG supports all features described by XE2_GFX_FEATURES and also multi-queue feature on BCS and CCS engines. As such, create a new struct xe_graphics_desc named graphics_xe3p_lpg that inherits from XE2_GFX_FEATURES and also includes the necessary .multi_queue_engine_class_mask. Here is a list of fields and associated Bspec references for the members of the IP descriptor: .hw_engine_mask (Bspec 60149) .multi_queue_engine_class_mask (Bspec 74110) .has_asid (Bspec 71132) .has_atomic_enable_pte_bit (Bspec 59510, 74675) .has_indirect_ring_state (Bspec 67296) .has_range_tlb_inval (Bspec 71126) .has_usm (Bspec 59651) .has_64bit_timestamp (Bspec 60318) .num_geometry_xecore_fuse_regs (Bspec 62566, 67401, 67536) .num_compute_xecore_fuse_regs (Bspec 62565, 62561, 67537) v2: - Drop non-existing fields from the list in the commit message. (Matt) - Squash patch adding .multi_queue_engine_class_mask here. (Matt) - Rename graphics_xe3p to graphics_xe3p_lpg. (Matt) - Add fields .num_geometry_xecore_fuse_regs and .num_compute_xecore_fuse_regs after rebasing and inheriting commit `6acf3d3ed6` ("drm/xe: Move number of XeCore fuse registers to graphics descriptor"). (Gustavo) Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-1-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2026-02-10 10:05:12 -03:00
Shuicheng Lin	a30f999681	drm/xe/mmio: Avoid double-adjust in 64-bit reads xe_mmio_read64_2x32() was adjusting register addresses and then calling xe_mmio_read32(), which applies the adjustment again. This may shift accesses twice if adj_offset < adj_limit. There is no issue currently, as for media gt, adj_offset > adj_limit, so the 2nd adjust will be a no-op. But it may not work in future. To fix it, replace the adjusted-address comparison with a direct sanity check that ensures the MMIO address adjustment cutoff never falls within the 8-byte range of a 64-bit register. And let xe_mmio_read32() handle address translation. v2: rewrite the sanity check in a more natural way. (Matt) v3: Add Fixes tag. (Jani) Fixes: `07431945d8` ("drm/xe: Avoid 64-bit register reads") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260130165621.471408-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-09 15:10:46 -08:00
Michal Wajdeczko	4e2796c828	drm/xe/vf: Allow VF to initialize MCR tables While VFs can't access MCR registers, it's still safe to initialize our per-platform MCR tables, as we might need them later in the LRC programming, as engines itself may access MCR steer registers and thanks to all our past fixes to the VF probe initialization order, VFs are able to use values of the fuse registers needed here. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260207214428.5205-1-michal.wajdeczko@intel.com Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-09 12:30:34 -08:00
Sk Anirban	c57db41b8d	drm/xe/guc: Add Wa_14025883347 for GuC DMA failure on reset Prevent GuC firmware DMA failures during GuC-only reset by disabling idle flow and verifying SRAM handling completion. Without this, reset can be issued while SRAM handler is copying WOPCM to SRAM, causing GuC HW to get stuck. v2: Modify error message (Badal) Rename reg bit name (Daniele) Update WA skip condition (Daniele) Update SRAM handling logic (Daniele) v3: Reorder WA call (Badal) Wait for GuC ready status (Daniele) v4: Update reg name (Badal) Add comment (Daniele) Add extended graphics version (Daniele) Modify rules Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260202105313.3338094-4-sk.anirban@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-09 12:00:16 -08:00
Matthew Auld	dc90ead440	drm/xe/uapi: update used tracking kernel-doc In commit `4d0b035fd6` ("drm/xe/uapi: loosen used tracking restriction") we dropped the CAP_PERMON restriction but missed updating the corresponding kernel-doc. Fix that. v2 (Sanjay): - Don't drop the note around the extra cpu_visible_used expectations. Reported-by: Ulisses Furquim <ulisses.furquim@intel.com> Fixes: `4d0b035fd6` ("drm/xe/uapi: loosen used tracking restriction") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Link: https://patch.msgid.link/20260130125105.451229-2-matthew.auld@intel.com	2026-02-09 10:09:15 +00:00
Jia Yao	944a3329b0	drm/xe: Add bounds check on pat_index to prevent OOB kernel read in madvise When user provides a bogus pat_index value through the madvise IOCTL, the xe_pat_index_get_coh_mode() function performs an array access without validating bounds. This allows a malicious user to trigger an out-of-bounds kernel read from the xe->pat.table array. The vulnerability exists because the validation in madvise_args_are_sane() directly calls xe_pat_index_get_coh_mode(xe, args->pat_index.val) without first checking if pat_index is within [0, xe->pat.n_entries). Although xe_pat_index_get_coh_mode() has a WARN_ON to catch this in debug builds, it still performs the unsafe array access in production kernels. v2(Matthew Auld) - Using array_index_nospec() to mitigate spectre attacks when the value is used v3(Matthew Auld) - Put the declarations at the start of the block Fixes: `ada7486c56` ("drm/xe: Implement madvise ioctl for xe") Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260205161529.1819276-1-jia.yao@intel.com	2026-02-09 10:06:40 +00:00
Michał Winiarski	6fa45759cf	drm/xe/pf: Fix the address range assert in ggtt_get_pte helper The ggtt_get_pte helper used for saving VF GGTT incorrectly assumes that ggtt_size == ggtt_end. Fix it to avoid triggering spurious asserts if VF GGTT object lands in high GGTT range. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260130215624.556099-1-michal.winiarski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2026-02-09 09:22:32 +01:00
Matt Roper	e8100643ff	drm/xe/xe3p_xpc: XeCore mask spans four registers On Xe3p_XPC, there are now four registers reserved to express the XeCore mask rather than just three. Define the new registers and update the IP descriptor accordingly. Note that this only applies to Xe3p_XPC for now; Xe3p_LPG still only uses three registers to express the mask. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20260205214139.48515-4-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-06 09:49:20 -08:00
Matt Roper	6acf3d3ed6	drm/xe: Move number of XeCore fuse registers to graphics descriptor The number of registers used to express the XeCore mask has some "special cases" that don't always get inherited by later IP versions so it's cleaner and simpler to record the numbers in the IP descriptor rather than adding extra conditions to the standalone get_num_dss_regs() function. Note that a minor change here is that we now always treat the number of registers as 0 for the media GT. Technically a copy of these fuse registers does exist in the media GT as well (at the usual 0x380000+$offset location), but the value of those is always supposed to read back as 0 because media GTs never have any XeCores or EUs. v2: - Add a kunit assertion to catch descriptors that forget to initialize either count. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20260205214139.48515-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-06 09:49:20 -08:00
Vinay Belgaumkar	91be6115e4	drm/xe: Add forcewake status to powergate_info Dump forcewake status and ref counts for all domains as part of this debugfs. This is the sample output from gt1- $ cat /sys/kernel/debug/dri//0/gt1/powergate_info Media Power Gating Enabled: yes Media Slice0 Power Gate Status: down GSC Power Gate Status: down GT.ref_count=0, GT.forcewake=0x10000 VDBox0.ref_count=0, VDBox0.forcewake=0x10000 VEBox0.ref_count=0, VEBox0.forcewake=0x10000 GSC.ref_count=0, GSC.forcewake=0x10000 v2: Fix checkpatch issues Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Vinay Belgaumkar<vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260204190314.2904009-3-vinay.belgaumkar@intel.com	2026-02-05 14:33:44 -08:00
Vinay Belgaumkar	2ea05b4b02	drm/xe: Add GSC to powergate_info Add GSC powergate status to the existing debugfs. Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260204190314.2904009-2-vinay.belgaumkar@intel.com	2026-02-05 14:33:43 -08:00
Vinay Belgaumkar	fabedb758f	drm/xe: Add a wrapper for SLPC set/unset params Also, extract out the GuC RC related set/unset param functions into xe_guc_rc file. GuC still allows us to override GuC RC mode using an SLPC H2G interface. Continue to use that interface, but move the related code to the newly created xe_guc_rc file. Cc: Riana Tauro <riana.tauro@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260204014234.2867763-4-vinay.belgaumkar@intel.com	2026-02-05 14:17:37 -08:00
Vinay Belgaumkar	a3f949cd61	drm/xe: Use FORCEWAKE_GT in xe_guc_pc_fini_hw() No need to use FORCEWAKE_ALL since the registers being written are in GT domain. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260204014234.2867763-3-vinay.belgaumkar@intel.com	2026-02-05 14:17:36 -08:00

1 2 3 4 5 ...

1413057 Commits