Current GT statistics use atomic64_t counters. Atomic operations incur
a global coherency penalty.
Transition to dynamic per-cpu counters using alloc_percpu(). This allows
stats to be incremented via this_cpu_add(), which compiles to a single
non-locking instruction. This approach keeps the hot-path updates local
to the CPU, avoiding expensive cross-core cache invalidation traffic.
Use for_each_possible_cpu() during aggregation and clear operations to
ensure data consistency across CPU hotplug events.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20260217200552.596718-1-matthew.brost@intel.com
Remove the unnecessary VRAM channel entry introduced in xe_hwmon_channel.
Without this, adding any new hwmon channel causes extra VRAM channel
to appear. This remained unnoticed earlier because VRAM was the
final xe hwmon channel.
v2: Use MAX_VRAM_CHANNELS with in_range() instead of
CHANNEL_VRAM_N_MAX. (Raag)
Fixes: 49a4983384 ("drm/xe/hwmon: Expose individual VRAM channel temperature")
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260206081655.2115439-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Passing a structure by value into a function is sometimes problematic,
for a number of reasons. Of of these is a warning from the 32-bit arm
compiler:
drivers/gpu/drm/drm_gpusvm.c: In function '__drm_gpusvm_unmap_pages':
drivers/gpu/drm/drm_gpusvm.c:1152:33: note: parameter passing for argument of type 'struct drm_pagemap_addr' changed in GCC 9.1
1152 | dpagemap->ops->device_unmap(dpagemap,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1153 | dev, *addr);
| ~~~~~~~~~~~
This particular problem is harmless since we are not mixing compiler versions
inside of the compiler. However, passing this by reference avoids the warning
along with providing slightly better calling conventions as it avoids an
extra copy on the stack.
Fixes: 75af93b3f5 ("drm/pagemap, drm/xe: Support destination migration over interconnect")
Fixes: 2df55d9e66 ("drm/xe: Support pcie p2p dma as a fast interconnect")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260216134644.1025365-1-arnd@kernel.org
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
With the intermediate state gone, no longer useful. Just check against
NULL where needed.
After looking carefully, the check for allocated in xe_fb_pin.c is
unneeded. vma->node is never NULL. The check is specifically only
to check if vma->node == the bo's root tile ggtt_obj.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260206112108.1453809-12-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
The previous code was using a complicated system with 2 balloons to
set GGTT size and adjust GGTT offset. While it works, it's overly
complicated.
A better approach is to set the offset and size when initializing GGTT,
this removes the need for adding balloons. The resize function only
needs readjust ggtt->start to have GGTT at the new offset.
This removes the need to manipulate the internals of xe_ggtt outside
of xe_ggtt, and cleans up a lot of now unneeded code.
Co-developed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260206112108.1453809-9-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Fix all functions that use node->start to use xe_ggtt_node_addr,
and add ggtt->start to node->start.
This will make node shifting for SR-IOV VF a one-liner, instead of
manually changing each GGTT node's base address.
Also convert some uses of mutex_lock/unlock to mutex guards.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260206112108.1453809-8-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
The GuC pagefault acknowledgment code is designed to extract the fields
needed for the acknowledgment from the producer-stored message so that
the consumer fields can be overloaded to return additional information.
The ASID is stored in the producer message; extract it from there to
future‑proof this logic.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20260212204227.2764054-3-matthew.brost@intel.com
Wa_16018737384 is one of the rare cases where the hardware teams mark a
workaround as "driver change required" rather than "permanent/temporary
workaround" in the internal workaround database, signifying that the
implementation details of the workaround should just be considered
standard programming instructions on all platforms going forward. Cases
like this are the only time that using XE_RTP_END_VERSION_UNDEFINED as an
upper bound for a workaround's IP range is warranted and correct.
However in this specific case, the register bit in question (0xE4F0[1])
simply no longer exists in hardware from Xe3 onward. Trying to write to
that bit on Xe3 or Xe3p platforms is harmless and just doesn't have any
effect, but it's possible that the register bit could get repurposed to
control something else down the road on future platforms. To avoid any
surprises in the future we should replace the unbounded upper bound in
our RTP table with a value that accurately reflects that Wa_16018737384
can only apply to Xe2 platforms.
Bspec: 56849
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20260211234735.620087-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
SRIOV VFs do not automatically have access to the XeCore fuse registers.
Add the two new registers that show up on Xe3p_XPC to the runtime
register list to grant VFs access. Since there's a single runtime
register list for all Xe3p, this will technically also grant access on
Xe3p_LPG platforms where the registers don't exist, but that should be
harmless since even if a VF tries to read a non-existent register on
those platforms it will just get back a sensible value of 0x0.
Fixes: e8100643ff ("drm/xe/xe3p_xpc: XeCore mask spans four registers")
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ngai-Mint Kwan <ngai-mint.kwan@linux.intel.com>
Link: https://patch.msgid.link/20260210182519.206952-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
When the media GT is not allowed, a VF must not attempt to read
the media version from the GuC. The GuC may not be loaded, and
any attempt to communicate with it would result in a timeout
and a VF probe failure:
(...)
[ 1912.406046] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: GuC mmio request 0x5507: no reply 0x5507
[ 1912.407277] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: [GUC COMMUNICATION] MMIO send failed (-ETIMEDOUT)
[ 1912.408689] xe 0000:01:00.1: [drm] *ERROR* VF: Tile0: GT1: Failed to reset GuC state (-ETIMEDOUT)
[ 1912.413986] xe 0000:01:00.1: probe with driver xe failed with error -110
Let's skip reading the media version for VFs when the media GT is not
allowed.
v2: move the condition directly to the VF path
Fixes: 7abd69278b ("drm/xe/configfs: Add attribute to disable GT types")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260202115041.2863357-1-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
If hmm_range_fault() fails a folio_trylock() in do_swap_page,
trying to acquire the lock of a device-private folio for migration,
to ram, the function will spin until it succeeds grabbing the lock.
However, if the process holding the lock is depending on a work
item to be completed, which is scheduled on the same CPU as the
spinning hmm_range_fault(), that work item might be starved and
we end up in a livelock / starvation situation which is never
resolved.
This can happen, for example if the process holding the
device-private folio lock is stuck in
migrate_device_unmap()->lru_add_drain_all()
sinc lru_add_drain_all() requires a short work-item
to be run on all online cpus to complete.
A prerequisite for this to happen is:
a) Both zone device and system memory folios are considered in
migrate_device_unmap(), so that there is a reason to call
lru_add_drain_all() for a system memory folio while a
folio lock is held on a zone device folio.
b) The zone device folio has an initial mapcount > 1 which causes
at least one migration PTE entry insertion to be deferred to
try_to_migrate(), which can happen after the call to
lru_add_drain_all().
c) No or voluntary only preemption.
This all seems pretty unlikely to happen, but indeed is hit by
the "xe_exec_system_allocator" igt test.
Resolve this by waiting for the folio to be unlocked if the
folio_trylock() fails in do_swap_page().
Rename migration_entry_wait_on_locked() to
softleaf_entry_wait_unlock() and update its documentation to
indicate the new use-case.
Future code improvements might consider moving
the lru_add_drain_all() call in migrate_device_unmap() to be
called *after* all pages have migration entries inserted.
That would eliminate also b) above.
v2:
- Instead of a cond_resched() in hmm_range_fault(),
eliminate the problem by waiting for the folio to be unlocked
in do_swap_page() (Alistair Popple, Andrew Morton)
v3:
- Add a stub migration_entry_wait_on_locked() for the
!CONFIG_MIGRATION case. (Kernel Test Robot)
v4:
- Rename migrate_entry_wait_on_locked() to
softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
v5:
- Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
version of softleaf_entry_wait_on_locked().
- Modify wording around function names in the commit message
(Andrew Morton)
Suggested-by: Alistair Popple <apopple@nvidia.com>
Fixes: 1afaeb8293 ("mm/migrate: Trylock device page in do_swap_page")
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Link: https://patch.msgid.link/20260210115653.92413-1-thomas.hellstrom@linux.intel.com
There is a unbalanced lock/unlock to gpusvm notifier lock:
[ 931.045868] =====================================
[ 931.046509] WARNING: bad unlock balance detected!
[ 931.047149] 6.19.0-rc6+xe-**************** #9 Tainted: G U
[ 931.048150] -------------------------------------
[ 931.048790] kworker/u5:0/51 is trying to release lock (&gpusvm->notifier_lock) at:
[ 931.049801] [<ffffffffa090c0d8>] drm_gpusvm_scan_mm+0x188/0x460 [drm_gpusvm_helper]
[ 931.050802] but there are no more locks to release!
[ 931.051463]
The drm_gpusvm_notifier_unlock() sits under err_free label and the
first jump to err_free is just before calling the
drm_gpusvm_notifier_lock() causing unbalanced unlock.
Fixes: f1d08a5864 ("drm/gpusvm: Introduce a function to scan the current migration state")
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260209123433.1271053-1-maciej.patelczyk@intel.com
The PSS_CHICKEN register has been part of the RCS engine's LRC since it
was first introduced in Xe_LP. That means that any workarounds that
adjust its value (such as Wa_14019988906 and Wa_14019877138) need to be
implemented in the lrc_was[] table so that they become part of the
default LRC from which all subsequent LRCs are copied. Although these
workarounds were implemented correctly on most platforms, they were
incorrectly placed on the engine_was[] table for Xe2_HPG.
Move the workarounds to the proper lrc_was[] table and switch the
'xe_rtp_match_first_render_or_compute' rule to specifically match the
RCS since that's the engine whose LRC manages the register.
Bspec: 65182
Fixes: 7f3ee7d880 ("drm/xe/xe2hpg: Add initial GT workarounds")
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Link: https://patch.msgid.link/20260205220508.51905-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
On NVL-P, the primary GT's WOPCM gained an extra 8MiB for the Memory
URB. As such, we need to bump the maximum size in the driver so that
the driver is able to load without erroring out thinking that the WOPCM
is too small.
FIXME: The wopcm code in xe driver is a bit confusing. For the case
where the offsets for GUC WOPCM are already locked, it appears we are
using the maximum overall WOPCM size instead of the sizes relative to
each type of GT. The function __check_layout() should be checking
against the latter.
Bspec: 67090
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-15-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Like with previous generations, the engine context images for of both
RCS and CCS in Xe3p_LPG contain a common layout at the end for the
context related to the "Compute Pipeline".
The size of the memory area written to such section varies; it depends
on the type of preemption has taken place during the execution and type
of command streamer instruction that was used on the pipeline. For
Xe3p_LPG, the maximum possible size, including NOOPs for cache line
alignment, is 4368 dwords, which would be the case of a mid-thread
preemption during the execution of a COMPUTE_WALKER_2 instruction.
The maximum size has increased in such a way that we need to update
xe_gt_lrc_size() to match the new sizing requirement. When we add that
to the engine-specific parts, we have:
- RCS context image: 6672 dwords = 26688 bytes -> 7 pages
- CCS context image: 5024 dwords = 20096 bytes -> 5 pages
Bspec: 65182, 55793, 73590
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-10-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
From Xe3p onward, the desired settings are now the hardware's
default values and the driver does not need to program them explicitly.
Since 35.xx seems to be the starting point for "Xe3p" version numbers;
we'll adjust the bounds of the old programming to stop at 34.99. Even
though there's no platform with version 35.00 at the moment, this is
simplest in case one does show up in the future.
Bspec: 72161, 59928, 59930
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-8-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Xe3p_LPG adds some additional state instructions to the RCS engine's
LRC. Add support for these to the debugfs LRC parser.
Note that the bspec's LRC description page seems to have a few mistakes
in the name/spelling of these new instructions (e.g.,
"3DSTATE_TASK_DATA_EXT" instead of "3DSTATE_TASK_SHADER_DATA_EXT" or
"3DSTATE_VIEWPORT_STATE_POINTERS_CL_SF_2" instead of
"3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP_2").
Bspec: 65182
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-6-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Xe3p_LPG has nearly identical steering to Xe2 and Xe3. The only
DSS/XeCore change from those IPs is an additional range from
0xDE00-0xDE7F that was previously reserved, so we can simply grow one of
the existing ranges in the Xe2 table to include it. Similarly, the
"instance0" table is also almost identical, but gains one additional
PSMI range and requires a separate table.
v2:
- Drop reserved range from MEMPIPE range. (Dnyaneshwar)
Bspec: 75242
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-5-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
PAT programming for Xe3p_LPG is more similar to Xe2 and Xe3 than it is
to Xe3p_XPC. Compared to Xe2/Xe3 we have:
* There's a slight update to the PAT table, where two new indices (18
and 19) are added to expose a new "WB - Transient App" L3 caching
mode.
* The PTA_MODE entry must be programmed differently according to the
media type, and both differ from Xe2.
There are no changes to the underlying registers, so the Xe2 ops can be
re-used for Xe3p.
Bspec: 71582, 74160
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-4-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Add Xe3p_LPG graphics IP version 35.10. Xe3p_LPG supports all features
described by XE2_GFX_FEATURES and also multi-queue feature on BCS and
CCS engines. As such, create a new struct xe_graphics_desc named
graphics_xe3p_lpg that inherits from XE2_GFX_FEATURES and also includes
the necessary .multi_queue_engine_class_mask.
Here is a list of fields and associated Bspec references for the members
of the IP descriptor:
.hw_engine_mask (Bspec 60149)
.multi_queue_engine_class_mask (Bspec 74110)
.has_asid (Bspec 71132)
.has_atomic_enable_pte_bit (Bspec 59510, 74675)
.has_indirect_ring_state (Bspec 67296)
.has_range_tlb_inval (Bspec 71126)
.has_usm (Bspec 59651)
.has_64bit_timestamp (Bspec 60318)
.num_geometry_xecore_fuse_regs (Bspec 62566, 67401, 67536)
.num_compute_xecore_fuse_regs (Bspec 62565, 62561, 67537)
v2:
- Drop non-existing fields from the list in the commit message. (Matt)
- Squash patch adding .multi_queue_engine_class_mask here. (Matt)
- Rename graphics_xe3p to graphics_xe3p_lpg. (Matt)
- Add fields .num_geometry_xecore_fuse_regs and
.num_compute_xecore_fuse_regs after rebasing and inheriting
commit 6acf3d3ed6 ("drm/xe: Move number of XeCore fuse registers to
graphics descriptor"). (Gustavo)
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-1-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
xe_mmio_read64_2x32() was adjusting register addresses and then
calling xe_mmio_read32(), which applies the adjustment again.
This may shift accesses twice if adj_offset < adj_limit. There is
no issue currently, as for media gt, adj_offset > adj_limit, so
the 2nd adjust will be a no-op. But it may not work in future.
To fix it, replace the adjusted-address comparison with a direct
sanity check that ensures the MMIO address adjustment cutoff never
falls within the 8-byte range of a 64-bit register. And let
xe_mmio_read32() handle address translation.
v2: rewrite the sanity check in a more natural way. (Matt)
v3: Add Fixes tag. (Jani)
Fixes: 07431945d8 ("drm/xe: Avoid 64-bit register reads")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260130165621.471408-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Prevent GuC firmware DMA failures during GuC-only reset by disabling
idle flow and verifying SRAM handling completion. Without this, reset
can be issued while SRAM handler is copying WOPCM to SRAM,
causing GuC HW to get stuck.
v2: Modify error message (Badal)
Rename reg bit name (Daniele)
Update WA skip condition (Daniele)
Update SRAM handling logic (Daniele)
v3: Reorder WA call (Badal)
Wait for GuC ready status (Daniele)
v4: Update reg name (Badal)
Add comment (Daniele)
Add extended graphics version (Daniele)
Modify rules
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260202105313.3338094-4-sk.anirban@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
When user provides a bogus pat_index value through the madvise IOCTL, the
xe_pat_index_get_coh_mode() function performs an array access without
validating bounds. This allows a malicious user to trigger an out-of-bounds
kernel read from the xe->pat.table array.
The vulnerability exists because the validation in madvise_args_are_sane()
directly calls xe_pat_index_get_coh_mode(xe, args->pat_index.val) without
first checking if pat_index is within [0, xe->pat.n_entries).
Although xe_pat_index_get_coh_mode() has a WARN_ON to catch this in debug
builds, it still performs the unsafe array access in production kernels.
v2(Matthew Auld)
- Using array_index_nospec() to mitigate spectre attacks when the value
is used
v3(Matthew Auld)
- Put the declarations at the start of the block
Fixes: ada7486c56 ("drm/xe: Implement madvise ioctl for xe")
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.18+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260205161529.1819276-1-jia.yao@intel.com
The number of registers used to express the XeCore mask has some
"special cases" that don't always get inherited by later IP versions so
it's cleaner and simpler to record the numbers in the IP descriptor
rather than adding extra conditions to the standalone get_num_dss_regs()
function.
Note that a minor change here is that we now always treat the number of
registers as 0 for the media GT. Technically a copy of these fuse
registers does exist in the media GT as well (at the usual
0x380000+$offset location), but the value of those is always supposed to
read back as 0 because media GTs never have any XeCores or EUs.
v2:
- Add a kunit assertion to catch descriptors that forget to initialize
either count. (Gustavo)
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20260205214139.48515-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Dump forcewake status and ref counts for all domains as part
of this debugfs. This is the sample output from gt1-
$ cat /sys/kernel/debug/dri//0/gt1/powergate_info
Media Power Gating Enabled: yes
Media Slice0 Power Gate Status: down
GSC Power Gate Status: down
GT.ref_count=0, GT.forcewake=0x10000
VDBox0.ref_count=0, VDBox0.forcewake=0x10000
VEBox0.ref_count=0, VEBox0.forcewake=0x10000
GSC.ref_count=0, GSC.forcewake=0x10000
v2: Fix checkpatch issues
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Vinay Belgaumkar<vinay.belgaumkar@intel.com>
Link: https://patch.msgid.link/20260204190314.2904009-3-vinay.belgaumkar@intel.com