GCC and clang warn (or error with CONFIG_WERROR=y / W=e) several times
when targeting 32-bit platforms along the lines of
drivers/gpu/drm/xe/xe_lrc.c: In function 'dump_mi_command':
drivers/gpu/drm/xe/xe_lrc.c:1921:40: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'int' [-Werror=format=]
1921 | drm_printf(p, "LRC[%#5lx] = [%#010x] MI_NOOP (%d dwords)\n",
| ~~~~^
| |
| long unsigned int
| %#5x
1922 | dw - num_noop - start, inst_header, num_noop);
| ~~~~~~~~~~~~~~~~~~~~~
| |
| int
drivers/gpu/drm/xe/xe_lrc.c:1922:7: error: format specifies type 'unsigned long' but the argument has type '__ptrdiff_t' (aka 'int') [-Werror,-Wformat]
1921 | drm_printf(p, "LRC[%#5lx] = [%#010x] MI_NOOP (%d dwords)\n",
| ~~~~~
| %#5tx
1922 | dw - num_noop - start, inst_header, num_noop);
| ^~~~~~~~~~~~~~~~~~~~~
Use the '%tx' specifier for printing pointer differences, which clears
up the warnings for 32-bit platforms while introducing no regressions
for 64-bit platforms.
Fixes: 65fcf19cb3 ("drm/xe: Include running dword offset in default_lrc dumps")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260316-drm-xe-fix-32-bit-wformat-ptrdiff-v1-1-0108b10b2b6b@kernel.org
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Wa_14026781792 applies to all graphics versions from 30.00
through 35.10 (inclusive). Since there are no IPs between
30.05 and 35.10, consolidate the RTP rules into a single
GRAPHICS_VERSION_RANGE(3000, 3510).
v2: (Matt)
- There are no IPs between 30.05 and 35.10 either,
So, consolidate this into a single GRAPHICS_VERSION_RANGE(3000, 3510)
- Also move it up to the top part of the table
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260317080059.1275116-2-nitin.r.gote@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Getting engine specific CTX TIMESTAMP register can fail. In that case,
if the context is active, new_ts is uninitialized. Fix that case by
initializing new_ts to the last value that was sampled in SW -
lrc->ctx_timestamp.
Flagged by static analysis.
v2: Fix new_ts initialization (Ashutosh)
Fixes: bb63e7257e ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260312125308.3126607-2-umesh.nerlige.ramappa@intel.com
The check using xe_child->base.children was insufficient in determining
if a pte was a leaf node. So explicitly skip over every non-leaf pt and
conditionally abort if there is a scenario where a non-leaf pt is
interleaved between leaf pt, which results in the page walker skipping
over some leaf pt.
Note that the behavior being targeted for abort is
PD[0] = 2M PTE
PD[1] = PT -> 512 4K PTEs
PD[2] = 2M PTE
results in abort, page walker won't descend PD[1].
With new abort, ensuring valid PRL before handling a second abort.
v2:
- Revert to previous assert.
- Revised non-leaf handling for interleaf child pt and leaf pte.
- Update comments to specifications. (Stuart)
- Remove unnecessary XE_PTE_PS64. (Matthew B)
v3:
- Modify secondary abort to only check non-leaf PTEs. (Matthew B)
Fixes: b912138df2 ("drm/xe: Create page reclaim list on unbind")
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20260305171546.67691-6-brian3.nguyen@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Continue i915 and xe separation from display by moving the bo calls to
the display parent interface. Instead of adding all these functions to
intel_parent.[ch], reuse the now vacated intel_bo.[ch], and avoid mass
renames to calls of these functions. This is similar to
intel_display_rpm.[ch].
Make many of the hooks optional to avoid having to implement dummy
functions in xe. Indeed now we can remove many of the existing dummy
functions.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/7899eef2ccf0cd603df69099df065226a0df917b.1773238670.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
amd-drm-next-7.1-2026-03-12:
amdgpu:
- SMU13 fix
- SMU14 fix
- Fixes for bring up hw testing
- Kerneldoc fix
- GC12 idle power fix for compute workloads
- DCCG fixes
- UserQ fixes
- Move test for fbdev object to a generic helper
- GC 12.1 updates
- Use struct drm_edid in non-DC code
- Include IP discovery data in devcoredump
- SMU 13.x updates
- Misc cleanups
- DML 2.1 fixes
- Enable NV12/P010 support on primary planes
- Enable color encoding and color range on overlay planes
- DC underflow fixes
- HWSS fast path fixes
- Replay fixes
- DCN 4.2 updates
- Support newer IP discovery tables
- LSDMA 7.1 support
- IH 7.1 fixes
- SoC v1 updates
- GC12.1 updates
- PSP 15 updates
- XGMI fixes
- GPUVM locking fix
amdkfd:
- Fix missing BO unreserve in an error path
radeon:
- Move test for fbdev object to a generic helper
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260312184425.3875669-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
This enables support for Transparent Huge Pages (THP) for device pages by
using MIGRATE_VMA_SELECT_COMPOUND during migration. It removes the need to
split folios and loop multiple times over all pages to perform required
operations at page level. Instead, we rely on newly introduced support for
higher orders in drm_pagemap and folio-level API.
In Xe, this drastically improves performance when using SVM. The GT stats
below collected after a 2MB page fault show overall servicing is more than
7 times faster, and thanks to reduced CPU overhead the time spent on the
actual copy goes from 23% without THP to 80% with THP:
Without THP:
svm_2M_pagefault_us: 966
svm_2M_migrate_us: 942
svm_2M_device_copy_us: 223
svm_2M_get_pages_us: 9
svm_2M_bind_us: 10
With THP:
svm_2M_pagefault_us: 132
svm_2M_migrate_us: 128
svm_2M_device_copy_us: 106
svm_2M_get_pages_us: 1
svm_2M_bind_us: 2
v2:
- Fix one occurrence of drm_pagemap_get_devmem_page() (Matthew Brost)
v3:
- Remove migrate_device_split_page() and folio_split_lock, instead rely on
free_zone_device_folio() to split folios before freeing (Matthew Brost)
- Assert folio order is HPAGE_PMD_ORDER (Matthew Brost)
- Always use folio_set_zone_device_data() in split (Matthew Brost)
v4:
- Warn on compound device page, s/continue/goto next/ (Matthew Brost)
v5:
- Revert warn on compound device page
- s/zone_device_page_init()/zone_device_folio_init() (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: linux-mm@kvack.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260312192126.2024853-5-francois.dugast@intel.com
GGTT MMIO access is currently protected by hotplug (drm_dev_enter),
which works correctly when the driver loads successfully and is later
unbound or unloaded. However, if driver load fails, this protection is
insufficient because drm_dev_unplug() is never called.
Additionally, devm release functions cannot guarantee that all BOs with
GGTT mappings are destroyed before the GGTT MMIO region is removed, as
some BOs may be freed asynchronously by worker threads.
To address this, introduce an open-coded flag, protected by the GGTT
lock, that guards GGTT MMIO access. The flag is cleared during the
dev_fini_ggtt devm release function to ensure MMIO access is disabled
once teardown begins.
Cc: stable@vger.kernel.org
Fixes: 919bb54e98 ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node")
Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260310225039.1320161-8-zhanjun.dong@intel.com
In GuC submit fini, forcefully tear down any exec queues by disabling
CTs, stopping the scheduler (which cleans up lost G2H), killing all
remaining queues, and resuming scheduling to allow any remaining cleanup
actions to complete and signal any remaining fences.
Split guc_submit_fini into device related and software only part. Using
device-managed and drm-managed action guarantees the correct ordering of
cleanup.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260310225039.1320161-3-zhanjun.dong@intel.com
By using the same variable for both the return of poll_timeout_us and
the return of the polled function guc_wait_ucode, the return value of
the latter is overwritten and lost after exiting the polling loop. Since
guc_wait_ucode returns -1 on GuC load failure, we lose that information
and always continue as if the GuC had been loaded correctly.
This is fixed by simply using 2 separate variables.
Fixes: a4916b4da4 ("drm/xe/guc: Refactor GuC load to use poll_timeout_us()")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patch.msgid.link/20260303001732.2540493-2-daniele.ceraolospurio@intel.com
Since both the DP SST and MST HPD IRQ handlers call
intel_dp_handle_link_service_irq() with LINK_STATUS_CHANGED set in
irq_mask if intel_dp->link.force_retrain is set, checking for the former
flag is sufficient to determine if the link status needs to be checked
(which includes retraining the link if this is forced); remove checking
for the latter flag.
Since LINK_STATUS_CHANGED is currently set unconditionally for DP SST,
extend the related comment to note that it must be set if
intel_dp->link.force_retrain is set (in case setting LINK_STATUS_CHANGED
becomes conditional on DPCD_REV).
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260311153152.133744-2-imre.deak@intel.com
Handling of a forced link retraining debugfs request via the DP MST HPD
IRQ handler is incorrectly skipped, if the IRQ handler doesn't see any
HPD IRQs raised by the sink. Fix this by ensuring that the request is
always handled (in the Fixes: commit below by directly calling
intel_dp_check_link_state(), later by the same call moved to
intel_dp_handle_link_service_irq()).
Cc: Luca Coelho <luciano.coelho@intel.com>
Fixes: db4855d903 ("drm/i915/dp_mst: Reuse intel_dp_check_link_state() in the HPD IRQ handler")
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260311153152.133744-1-imre.deak@intel.com
During intel_hdcp_check_link phase we need to take into account
if we are currently forcing HDCP 1.4 or not. This is because
we check for HDCP 2.x Link first and only if HDCP 2.x is not being
used check for HDCP 1.4. With force_hdcp14 in picture we should not
be going into intel_hdcp2_check_link because of which we may end
up trying to disable HDCP2.x even if HDCP 1.4 was enabled causing
a lot of issues while IGT tests this.
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Link: https://patch.msgid.link/20260225065045.3040787-1-suraj.kandpal@intel.com
The Xe2_HPM-specific RTP table entries for Wa_16021867713 and
Wa_14019449301 were removed by commit 941f538b0a ("drm/xe: Consolidate
workaround entries for Wa_16021867713") and commit aa0f0a6783
("drm/xe: Consolidate workaround entries for Wa_14019449301") in favor
of alternate entries earlier in the table that cover a wider range of IP
versions. However these Xe2_HPM-specific entries were accidentally
resurrected during a backmerge, which causes the Xe driver to complain
on probe about two entries trying to program the same registers+bits:
<3> [48.491155] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1c3f1c (clear: 00000008, set: 00000008, masked: no, mcr: no): ret=-22
<3> [48.491211] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1d3f1c (clear: 00000008, set: 00000008, masked: no, mcr: no): ret=-22
<3> [48.491225] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1c3f08 (clear: 00000020, set: 00000020, masked: no, mcr: no): ret=-22
<3> [48.491238] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1d3f08 (clear: 00000020, set: 00000020, masked: no, mcr: no): ret=-22
Re-drop the redundant Xe2_HPM-specific entries to eliminate the dmesg
errors.
Fixes: 58351f46de ("Merge v7.0-rc3 into drm-next")
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7608
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patch.msgid.link/20260312-wa_merge_fix-v1-1-2ec6607f1e0c@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Implement handling of VM_BIND(..., DECOMPRESS) in xe_vm_bind_ioctl.
Key changes:
- Parse and record per-op intent (op->map.request_decompress) when the
DECOMPRESS flag is present.
- Use xe_pat_index_get_comp_en() helper to check if a PAT index
has compression enabled via the XE2_COMP_EN bit.
- Validate DECOMPRESS preconditions in the ioctl path:
- Only valid for MAP ops.
- The provided pat_index must select the device's "no-compression" PAT.
- Only meaningful on devices with flat CCS and the required XE2+
otherwise return -EOPNOTSUPP.
- Use XE_IOCTL_DBG for uAPI sanity checks.
- Implement xe_bo_decompress():
For VRAM BOs run xe_bo_move_notify(), reserve one fence slot,
schedule xe_migrate_resolve(), and attach the returned fence
with DMA_RESV_USAGE_KERNEL. Non-VRAM cases are silent no-ops.
- Wire scheduling into vma_lock_and_validate() so VM_BIND will schedule
decompression when request_decompress is set.
- Handle fault-mode VMs by performing decompression synchronously during
the bind process, ensuring that the resolve is completed before the bind
finishes.
This schedules an in-place GPU resolve (xe_migrate_resolve) for
decompression.
Compute PR: https://github.com/intel/compute-runtime/pull/898
IGT PR: https://patchwork.freedesktop.org/series/157553/
v7: Rebase on latest drm-tip and add compute and igt pr info
v6: (Matt Auld)
- Rebase as xe_pat_index_get_comp_en() is added in separate
patch
- Drop vm param from xe_bo_decompress(), instead of it
extract tile from bo
- Reject decompression on igpu instead of silent skipping
to avoid any failure on Xe2+igpu as xe_device_has_flat_ccs()
can sometimes be false on igpu due some setting in the BIOS
to turn off compression on igpu.
- Nits
v5: (Matt)
- Correct the condition check of xe_pat_index_get_comp_en
v4: (Matt)
- Introduce xe_pat_index_get_comp_en(), which checks
XE2_COMP_EN for the pat_index
- .interruptible should be true, everything else false
v3: (Matt)
- s/xe_bo_schedule_decompress/xe_bo_decompress
- skip the decrompress step if the BO isn't in VRAM
- start/size not required in xe_bo_schedule_decompress
- Use xe_bo_move_notify instead of xe_vm_invalidate_vma
with respect to invalidation.
- Nits
v2:
- Move decompression work out of vm_bind ioctl. (Matt)
- Put that work in a small helper at the BO/migrate layer invoke it
from vma_lock_and_validate which already runs under drm_exec.
- Move lightweight checks to vm_bind_ioctl_check_args (Matthew Auld)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260304123758.3050386-8-nitin.r.gote@intel.com
Introduce an internal __xe_migrate_copy(..., is_vram_resolve) path and
expose a small wrapper xe_migrate_resolve() that calls it with
is_vram_resolve=true.
For resolve/decompression operations we must ensure the copy code uses
the compression PAT index when appropriate; this change centralizes that
behavior and allows callers to schedule a resolve (decompress) operation
via the migrate API.
v3: Fix kernel-doc warnings
v2: (Matt)
- Simplify xe_migrate_resolve(), use single BO/resource;
remove copy_only_ccs argument as it's always false.
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260304123758.3050386-7-nitin.r.gote@intel.com
It turned out that protecting the status of each bo_va with a
spinlock was just hiding problems instead of solving them.
Revert the whole approach, add a separate stats_lock and lockdep
assertions that the correct reservation lock is held all over the place.
This not only allows for better checks if a state transition is properly
protected by a lock, but also switching back to using list macros to
iterate over the state of lists protected by the dma_resv lock of the
root PD.
v2: re-add missing check
v3: split into two patches
v4: re-apply by fixing holding the VM lock at the right places.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If discovery has failed for any reason (such as no support for a block)
then there is no need to unwind all the IP blocks in fini. In this
condition there can actually be failures during the unwind too.
Reset num_ip_blocks to zero during failure path and skip the unnecessary
cleanup path.
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix XGMI max bitrate/width reporting on SMUv13.0.12 SOCs. The data
format got changed when moved to static table from dynamic metrics
table.
Fixes: 1bec2f2707 ("drm/amd/pm: Fetch SMUv13.0.12 xgmi max speed/width")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>