Commit Graph

119558 Commits

Author SHA1 Message Date
Vitaly Prosyak
9dff2bb709 drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces
Certain multi-GPU configurations (especially GFX12) may hit
data corruption when a DCC-compressed VRAM surface is shared across GPUs
using peer-to-peer (P2P) DMA transfers.

Such surfaces rely on device-local metadata and cannot be safely accessed
through a remote GPU’s page tables. Attempting to import a DCC-enabled
surface through P2P leads to incorrect rendering or GPU faults.

This change disables P2P for DCC-enabled VRAM buffers that are contiguous
and allocated on GFX12+ hardware.  In these cases, the importer falls back
to the standard system-memory path, avoiding invalid access to compressed
surfaces.

Future work could consider optional migration (VRAM→System→VRAM) if a
performance regression is observed when `attach->peer2peer = false`.

Tested on:
 - Dual RX 9700 XT (Navi4x) setup
 - GNOME and Wayland compositor scenarios
 - Confirmed no corruption after disabling P2P under these conditions
v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS.
v3: simplify for upsteam and fix ip version check (Alex)

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:53:27 -05:00
Wenjing Liu
815e260a18 drm/amd/display: add macros to simplify code
[Why & How]
Adding macros to simplify the process of adding new error codes.
Currently, to add an error code, the developer needs to add both the
enum and the string translation. This is error prone and can lead to
inconsistencies. The refactor adds a macro to automatically add the
string translation based on the enum.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:53:26 -05:00
Tao Zhou
c154a96b55 drm/amdgpu: load RAS bad page from PMFW in page retirement
In legacy way, bad page is queried from MCA registers, switch to
getting it from PMFW when PMFW manages eeprom data.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:53:26 -05:00
Dave Airlie
2a084f4ad7 Merge tag 'amd-drm-next-6.19-2025-11-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.19-2025-11-07:

amdgpu:
- Misc fixes
- HMM cleanup
- HDP flush rework
- RAS updates
- SMU 13.x updates
- SI DPM cleanup
- Suspend rework
- UQ reset support
- Replay/PSR fixes
- HDCP updates
- DC PMO fixes
- DC pstate fixes
- DCN4 fixes
- GPUVM fixes
- SMU 13 parition metrics
- Fix possible fence leak in job cleanup
- Hibernation fix
- MST fix

amdkfd:
- HMM cleanup
- Process cleanup fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251107145938.26669-1-alexander.deucher@amd.com
2025-11-11 15:35:49 +10:00
Dave Airlie
e237dfe708 Merge tag 'drm-misc-next-2025-11-05-1' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.19-rc1:

UAPI Changes:
- Add userptr support to ivpu.
- Add IOCTL's for resource and telemetry data in amdxdna.

Core Changes:
- Improve some atomic state checking handling.
- drm/client updates.
- Use forward declarations instead of including drm_print.h
- RUse allocation flags in ttm_pool/device_init and allow specifying max
  useful pool size and propagate ENOSPC.
- Updates and fixes to scheduler and bridge code.
- Add support for quirking DisplayID checksum errors.

Driver Changes:
- Assorted cleanups and fixes in rcar-du, accel/ivpu, panel/nv3052cf,
  sti, imxm, accel/qaic, accel/amdxdna, imagination, tidss, sti,
  panthor, vkms.
- Add Samsung S6E3FC2X01 DDIC/AMS641RW, Synaptics TDDI series DSI,
  TL121BVMS07-00 (IL79900A) panels.
- Add mali MediaTek MT8196 SoC gpu support.
- Add etnaviv GC8000 Nano Ultra VIP r6205 support.
- Document powervr ge7800 support in the devicetree.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/5afae707-c9aa-4a47-b726-5e1f1aa7a106@linux.intel.com
2025-11-07 12:41:26 +10:00
Dave Airlie
8f037e11d0 Merge tag 'drm-intel-next-2025-11-04' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 feature pull for v6.19:

Features and functionality:
- Enable LNL+ content adaptive sharpness filter (CASF) (Nemesa)
- Use optimized VRR guardband (Ankit, Ville)
- Enable Xe3p LT PHY (Suraj)
- Enable FBC support for Xe3p_LPD display (Sai Teja, Vinod)
- Specify DMC firmware for display version 30.02 (Dnyaneshwar)
- Report reason for disabling PSR to debugfs (Michał)
- Extend i915_display_info with Type-C port details (Khaled)
- Log DSI send packet sequence errors and contents

Refactoring and cleanups:
- Refactoring to prepare for VRR guardband optimization (Ankit)
- Abstract VRR live status wait (Ankit)
- Refactor VRR and DSB timing to handle Set Context Latency explicitly (Ankit)
- Helpers for prefill latency calculations (Ville)
- Refactor SKL+ watermark latency setup (Ville)
- VRR refactoring and cleanups (Ville)
- SKL+ universal plane cleanups (Ville)
- Decouple CDCLK from state->modeset refactor (Ville)
- Refactor VLV/CHV clock functions (Jani)
- Refactor fbdev handling (Jani)
- Call i915 and xe runtime PM from display via function pointers (Jouni)
- IRQ code refactoring  (Jani)
- Drop display dependency on i915 feature check macros (Jani)
- Refactor and unify i915 and xe stolen memory interfaces towards display (Jani)
- Switch to driver agnostic drm to display pointer chase (Jani)
- Use display version over graphics version in display code (Matt A)
- GVT cleanups (Jonathan, Andi)
- Rename a VLV clock function to unify (Michał)
- Explicitly sanitize DMC package header num entries (Luca)
- Remove redundant port clock check from ALPM (Jouni)
- Use sysfs_emit() instead of sprintf() in PMU sysfs (Madhur Kumar)
- Clean up C20 PHY PLL register macros (Imre, Mika))
- Abstract "address in MMIO table" helper for general use (Matt A)
- Improve VRR platform abstractions (Ville)
- Move towards more standard PCI PM code usage (Ville)
- Framebuffer refactoring (Ville)
- Drop display dependency on i915_utils.h (Jani)
- Include cleanups (Jani)

Fixes:
- Workaround docking station DSC issues with high pixel clock and bpp (Imre)
- Fix Panel Replay in DSC mode (Imre)
- Disable tracepoints for PREEMPT_RT as a workaround (Maarten)
- Fix intel_crtc_get_vblank_counter() on PREEMPT_RT (Maarten)
- Fix C10 PHY identification on PTL/WCL (Dnyaneshwar)
- Take AS SDP into account with optimized guardband (Jouni)
- Fix panic structure allocation memory leak (Jani)
- Adjust an FBC workaround platforms (Vinod)
- Add fallback for CDCLK selection (Naladala)
- Avoid using invalid transcoder in MST transport select (Suraj)
- Don't use cursor size reduction on display version 14+ (Nemesa)
- Fix C20 PHY PLL register programming (Imre, Mika)
- Fix PSR frontbuffer flush handling (Jouni)
- Store ALPM parameters in crtc state (Jouni)
- Defeature DRRS on LNL+ (Ville)
- Fix the scope of the large DRAM DIMM workaround (Ville)
- Fix PICA vs. AUX power ordering issue (Gustavo)
- Fix pixel rate for computing watermark line time (Ville)
- Fix framebuffer set_tiling vs. addfb race (Ville)
- DMC event handler fixes (Ville)

DRM Core:
- CRTC sharpness strength property (Nemesa)
- DPCD DSC quirk for Synaptics Panamera devices (Imre)
- Helpers to query the branch DSC max throughput/line-width (Imre)

Merges:
- Backmerge drm-next for v6.18-rc and to sync with drm-xe-next (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/ec5a05f2df6d597a62033ee2d57225cce707b320@intel.com
2025-11-07 09:47:56 +10:00
Asad Kamal
2e640e8e7b drm/amd/pm: Update default power1_cap
Update default power1_cap to max limit for smu_v13_0_6 and smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 10:02:19 -05:00
Tao Zhou
541414065c drm/amdgpu: skip writing eeprom when PMFW manages RAS data
Only update bad page number in legacy eeprom write path.

v2: add null pointer check for con.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 10:02:15 -05:00
Wayne Lin
62320fb8d9 drm/amd/display: Enable mst when it's detected but yet to be initialized
[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.

[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst

V2: Adjust the commit msg a bit

Fixes: bc068194f5 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT")
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 10:01:18 -05:00
Tao Zhou
e1ca536e17 drm/amdgpu: support to load RAS bad pages from PMFW
PMFW manages eeprom bad page records, update bad page loading
accrodingly.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 10:01:14 -05:00
Lijo Lazar
1ad25fd272 drm/amdgpu: Fix wait after reset sequence in S3
For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if
TOS is unloaded.

Fixes: 32f73741d6 ("drm/amdgpu: Wait for bootloader after PSPv11 reset")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:58:38 -05:00
Tao Zhou
7f34ddf77d drm/amdgpu: add ras_eeprom_read_idx interface
PMFW will manage RAS eeprom data by itself, add new interface to read
eeprom data via PMFW, we can read part of records by setting index.

v2: use IPID parse interface.
    pa is not used and set it to a fixed value.
v3: optimize the null pointer check for IPID parse interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:58:35 -05:00
Tao Zhou
cd74132be8 drm/amdgpu: make MCA IPID parse global
So we can call it in other blocks.

v2: add a new IPID parse interface for umc and we can
    implement it for each ASIC.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:58:31 -05:00
Mario Limonciello
4104c0a454 drm/amd: Fix suspend failure with secure display TA
commit c760bcda83 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.

Add that to the failure handling path even on a quick failure.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679
Fixes: c760bcda83 ("drm/amd: Check whether secure display TA loaded successfully")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:24 -05:00
YiPeng Chai
be031770bf drm/amd/ras: Fix the issue of incorrect function call
When amdgpu_device_health_check fails, amdgpu_ras_pre_reset
will not be called and therefore amdgpu_ras_post_reset
cannot be called either.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:17 -05:00
Samuel Zhang
5d1b32cfe4 drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.

Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.

The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:11 -05:00
YiPeng Chai
127cdd726f drm/amd/ras: ras supports i2c eeprom for mp1 v13_0_12
ras supports i2c eeprom for mp1 v13_0_12.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:07 -05:00
Ahmad Rehman
07528f7d97 drm/amdkfd: Do not wait for queue op response during reset
This patch adds the condition to not wait for
the queue response for unmap, if the gpu is in reset.

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:30 -05:00
David (Ming Qiang) Wu
b665f29a2f drm/amdgpu/userq: need to unref bo
unref bo after amdgpu_bo_reserve() failure as it has
called amdgpu_bo_ref() already

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:25 -05:00
Gangliang Xie
1349b31313 drm/amdgpu: initialize max record count after table reset
initialize max record count and record offset after table reset

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:22 -05:00
Gangliang Xie
a448c40ff2 drm/amd/pm: check pmfw eeprom feature bit
get and check the pmfw eeprom feature bit to
decide if pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:19 -05:00
Gangliang Xie
cd5b28a040 drm/amdgpu: add check function for pmfw eeprom
add check function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:15 -05:00
Gangliang Xie
19c815d516 drm/amdgpu: add initialization function for pmfw eeprom
add initialization function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:04 -05:00
Gangliang Xie
9ce015e5fd drm/amdgpu: adapt reset function for pmfw eeprom
adapt reset function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:55:58 -05:00
Alok Tiwari
c84d874615 drm: rcar-du: fix incorrect return in rcar_du_crtc_cleanup()
The rcar_du_crtc_cleanup() function has a void return type, but
incorrectly uses a return statement with a call to drm_crtc_cleanup(),
which also returns void.

Remove the return statement to ensure proper function semantics.
No functional change intended.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20251017191634.1454201-1-alok.a.tiwari@oracle.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
2025-11-05 11:17:26 +02:00
Alex Deucher
f903b85ed0 drm/amdgpu: fix possible fence leaks from job structure
If we don't end up initializing the fences, free them when
we free the job.  We can't set the hw_fence to NULL after
emitting it because we need it in the cleanup path for the
submit direct case.

v2: take a reference to the fences if we emit them
v3: handle non-job fence in error paths

Fixes: db36632ea5 ("drm/amdgpu: clean up and unify hw fence handling")
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:59 -05:00
YiPeng Chai
d95ca7f515 drm/amdgpu: suspend ras module before gpu reset
During gpu reset, all GPU-related resources are
inaccessible. To avoid affecting ras functionality,
suspend ras module before gpu reset and resume
it after gpu reset is complete.

V2:
  Rename functions to avoid misunderstanding.

V3:
  Move flush_delayed_work to amdgpu_ras_process_pause,
  Move schedule_delayed_work to amdgpu_ras_process_unpause.

V4:
  Rename functions.

V5:
  Move the function to amdgpu_ras.c.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:59 -05:00
Gangliang Xie
d4432f16d3 drm/amdgpu: add wrapper functions for pmfw eeprom interface
add wrapper functions for pmfw eeprom interface, for these interfaces
to be easily and safely called

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
f6cdcbd2c0 drm/amdgpu: add function to check if pmfw eeprom is supported
add function to check if pmfw is supported, skip eeprom
check and recover when pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
f5346a176c drm/amd/pm: add smu ras driver framework
add functions to get smu ras driver

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
77dbd7c0a2 drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12
implement ras_smu_drv interface for smu v13.0.12

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
0c6f09e65b drm/amd/pm: add new message definitions for pmfw eeprom interface
Add new message definitions for pmfw eeprom interface

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Philip Yang
88ef4de35f Revert "drm/amdkfd: Improve signal event slow path"
To fix regression report on gfx8, which requires the exhaustive search
path for signaled event.

The high CPU usage of KFD interrupt wq issue is gone after HIP/ROCr add
option to reduce HW event interrupts, safe to revert this optimization
patch now.

This reverts commit de844846f7.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Rong Zhang
f19bbecd34 drm/amd/display: Fix NULL deref in debugfs odm_combine_segments
When a connector is connected but inactive (e.g., disabled by desktop
environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading
odm_combine_segments causes kernel NULL pointer dereference.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy)  e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6
 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025
 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
 Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
 RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
 FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  seq_read_iter+0x125/0x490
  ? __alloc_frozen_pages_noprof+0x18f/0x350
  seq_read+0x12c/0x170
  full_proxy_read+0x51/0x80
  vfs_read+0xbc/0x390
  ? __handle_mm_fault+0xa46/0xef0
  ? do_syscall_64+0x71/0x900
  ksys_read+0x73/0xf0
  do_syscall_64+0x71/0x900
  ? count_memcg_events+0xc2/0x190
  ? handle_mm_fault+0x1d7/0x2d0
  ? do_user_addr_fault+0x21a/0x690
  ? exc_page_fault+0x7e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x6c/0x74
 RIP: 0033:0x7f44d4031687
 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00>
 RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687
 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003
 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000
 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
  </TASK>
 Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x>
  snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn>
  platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp>
 CR2: 0000000000000000
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
 Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
 RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
 FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
 PKRU: 55555554

Fix this by checking pipe_ctx->stream_res.tg before dereferencing.

Fixes: 07926ba8a4 ("drm/amd/display: Add debugfs interface for ODM combine info")
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Mario Limoncello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Philip Yang
10c382ec6c drm/amdkfd: Don't clear PT after process killed
If process is killed. the vm entity is stopped, submit pt update job
will trigger the error message "*ERROR* Trying to push to a killed
entity", job will not execute.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
3f16007d86 drm/amd/ras: Add ras support for umc v12_5_0
Add ras support for umc v12_5_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
d7f105a402 drm/amd/ras: Add ras support for nbio v7_9_1
Add ras support for nbio v7_9_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
2f46c547e4 drm/amdgpu: Add ras ip block name
Add ras ip block name.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
36265d2bcc drm/amd/ras: Increase ras switch control range
Increase ras switch control range.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Alex Deucher
fd39b5a583 drm/amdgpu/smu: Handle S0ix for vangogh
Fix the flows for S0ix.  There is no need to stop
rlc or reintialize PMFW in S0ix.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Lijo Lazar
c83fd2a665 drm/amd/pm: Update SMUv13.0.12 partition metrics
Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
56aeca499a drm/amd/pm: Update SMUv13.0.6 partition metrics
For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
4f993e2309 drm/amd/pm: Add schema v1.1 for parition metrics
Use a schema similar to gpu metrics v1.9 for partition metrics also. It
will have field type encoded followed by the field value(s). The
attribute ids used will be shared with gpu metrics. The structure
definition is only to distinguish between gpu metrics and partition
metrics though both gpu metrics v1.9 and partition metrics v1.1 follow
the same definition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
b480f573a8 drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12
Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Sunil Khatri
cd6250f3ae drm/amdgpu: validate the bo from done list for NULL
Make sure the bo is valid before using it.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Pierre-Eric Pelloux-Prayer
36ffc58b8a drm/amdgpu: lock bo before calling amdgpu_vm_bo_update_shared
BO's reservation object must be locked before using
amdgpu_vm_bo_update_shared otherwise dma_resv_assert_held will
complain in amdgpu_vm_update_shared.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Christian König
c72d41a8f3 drm/amdgpu: grab a BO reference in vm_lock_done_list.
Otherwise it is possible that between dropping the status lock and
locking the BO that the BO is freed up.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Xiang Liu
7cf422ed33 drm/amd/ras: Fix format truncation
../ras/rascore/ras_cper.c: In function ‘cper_generate_fatal_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../ras/rascore/ras_cper.c: In function ‘cper_generate_runtime_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Taimur Hassan
1da571bdb2 drm/amd/display: Promote DC to 3.2.357
This version brings along following update:

- HDCP2 FW locality check refactors
- Fix black screen issue with HDMI output
- Increase IB mem size
- Revert max buffered cursor size to 64
- Extend inbox0 lock to run Replay / PSR
- Refactor VActive implementation
- Add Pstate viewport reduction
- Persist stream refcount through restore

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Taimur Hassan
3f0c27edd8 drm/amd/display: [FW Promotion] Release 0.1.34.0
Release hightlights

DCN35/36
    * Dynamically clock gate before and after prefetch

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00