Commit Graph

1369154 Commits

Author SHA1 Message Date
Lijo Lazar
e87577ef6d drm/amd/pm: Use cached metrics data on aldebaran
Cached metrics data validity is 1ms on aldebaran. It's not reasonable
for any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:59 -04:00
Lijo Lazar
582bf7c515 drm/amdgpu: Add NULL check for asic_funcs
If driver load fails too early, asic_funcs pointer remains unassigned.
Add NULL check to sanitize unwind path.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:39 -04:00
Taimur Hassan
712d98c7da drm/amd/display: Promote DC to 3.2.344
Summary:
* Add interface to log hw state when underflow happens
* Fix hubp programming of 3dlut fast load
* Avoid Read Remote DPCD Many Times
* More liberal vmin/vmax update for freesync
* Fix dmub access race condition

Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:36 -04:00
Muhammad Ahmed
5dc0ec782e drm/amd/display: Adding interface to log hw state when underflow happens
[why]
Will help us better debug underflow issues.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Muhammad Ahmed <Muhammad.Ahmed@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:32 -04:00
Ryan Seto
20ea30a793 drm/amd/display: Toggle for Disable Force Pstate Allow on Disable
[Why & How]
In theory, driver should be able to support disabling force pstate allow
after hardware release however this behavior is not tested yet.
Introducing a new toggle to disable the force on the fly.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:29 -04:00
Reza Amini
e63e9f8b3d drm/amd/display: Fixing hubp programming of 3dlut fast load
[why]
HUBP needs to know the size of the lut's destination in MPC.
This is currently defaulted to 17, and needs to be set for specific
lut size.

[how]
Define and apply the missing hubp field. Taking this opportunity
to consolidate the programming of 3dlut into a hubp and mpc function.

Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com>
Signed-off-by: Reza Amini <reza.amini@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:24 -04:00
Jingwen Zhu
3df957517f drm/amd/display: limited pll vco w/a v2
[Why/How]
The w/a will cause reboot black screen issue.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Jingwen Zhu <Jingwen.Zhu@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:13 -04:00
Fangzhi Zuo
fa5f99ee72 drm/amd/display: Avoid Read Remote DPCD Many Times
Reading remote dpcd is time consuming. Instead of reading each byte
one by one, read 16 bytes together.

Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:06 -04:00
Mario Limonciello
6ec8a5cbec drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
This reverts commit 66abb99699.
This broke custom brightness curves but it wasn't obvious because
of other related changes. Custom brightness curves are always
from a 0-255 input signal. The correct fix was to fix the default
value which was done by [1].

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4412
Link: https://lore.kernel.org/amd-gfx/0f094c4b-d2a3-42cd-824c-dc2858a5618d@kernel.org/T/#m69f875a7e69aa22df3370b3e3a9e69f4a61fdaf2
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:39:06 -04:00
Paul Hsieh
2e72fdba8a drm/amd/display: update dpp/disp clock from smu clock table
[Why]
The reason some high-resolution monitors fail to display properly
is that this platform does not support sufficiently high DPP and
DISP clock frequencies

[How]
Update DISP and DPP clocks from the smu clock table then DML can
filter these mode if not support.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Paul Hsieh <Paul.Hsieh@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:59 -04:00
Aurabindo Pillai
6d31602a9f drm/amd/display: more liberal vmin/vmax update for freesync
[Why]
FAMS2 expects vmin/vmax to be updated in the case when freesync is
off, but supported. But we only update it when freesync is enabled.

[How]
Change the vsync handler such that dc_stream_adjust_vmin_vmax() its called
irrespective of whether freesync is enabled. If freesync is supported,
then there is no harm in updating vmin/vmax registers.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3546
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:40 -04:00
Aurabindo Pillai
c210b757b4 drm/amd/display: fix dmub access race condition
Accessing DC from amdgpu_dm is usually preceded by acquisition of
dc_lock mutex. Most of the DC API that DM calls are under a DC lock.
However, there are a few that are not. Some DC API called from interrupt
context end up sending DMUB commands via a DC API, while other threads were
using DMUB. This was apparent from a race between calls for setting idle
optimization enable/disable and the DC API to set vmin/vmax.

Offload the call to dc_stream_adjust_vmin_vmax() to a thread instead
of directly calling them from the interrupt handler such that it waits
for dc_lock.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:34 -04:00
Duncan Ma
fd20627c74 drm/amd/display: Adjust AUX-less ALPM setting
[Why & How]
Change ACDS period to support LTTPR.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Duncan Ma <Duncan.Ma@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:29 -04:00
Siyang Liu
9dd8e2ba26 drm/amd/display: fix a Null pointer dereference vulnerability
[Why]
A null pointer dereference vulnerability exists in the AMD display driver's
(DC module) cleanup function dc_destruct().
When display control context (dc->ctx) construction fails
(due to memory allocation failure), this pointer remains NULL.
During subsequent error handling when dc_destruct() is called,
there's no NULL check before dereferencing the perf_trace member
(dc->ctx->perf_trace), causing a kernel null pointer dereference crash.

[How]
Check if dc->ctx is non-NULL before dereferencing.

Link: https://lore.kernel.org/r/tencent_54FF4252EDFB6533090A491A25EEF3EDBF06@qq.com
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
(Updated commit text and removed unnecessary error message)
Signed-off-by: Siyang Liu <Security@tencent.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:52 -04:00
Mangesh Gadre
82594ac858 drm/amdgpu: Initialize vcn v5_0_1 ras function
Initialize vcn v5_0_1 ras function

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:48 -04:00
Michel Dänzer
cc7bfba959 drm/amd/display: Add primary plane to commits for correct VRR handling
amdgpu_dm_commit_planes calls update_freesync_state_on_stream only for
the primary plane. If a commit affects a CRTC but not its primary plane,
it would previously not trigger a refresh cycle or affect LFC, violating
current UAPI semantics.

Fixes e.g. atomic commits affecting only the cursor plane being limited
to the minimum refresh rate.

Don't do this for the legacy cursor ioctls though, it would break the
UAPI semantics for those.

Suggested-by: Xaver Hugl <xaver.hugl@kde.org>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:33 -04:00
Yunxiang Li
ba5e322b26 drm/amdgpu: skip mgpu fan boost for multi-vf
On multi-vf setup if the VM have two vf assigned, perhaps from two
different gpus, mgpu fan boost will fail.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:27 -04:00
Mangesh Gadre
01fa9758c8 drm/amdgpu: Initialize jpeg v5_0_1 ras function
Initialize jpeg v5_0_1 ras function

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:24 -04:00
Xiang Liu
8e8e08c831 drm/amdgpu: Skip poison aca bank from UE channel
Avoid GFX poison consumption errors logged when fatal error occurs.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:20 -04:00
Arnd Bergmann
4d22db6d07 drm/amdgpu: fix link error for !PM_SLEEP
When power management is not enabled in the kernel build, the newly
added hibernation changes cause a link failure:

arm-linux-gnueabi-ld: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o: in function `amdgpu_pmops_thaw':
amdgpu_drv.c:(.text+0x1514): undefined reference to `pm_hibernate_is_recovering'

Make the power management code in this driver conditional on
CONFIG_PM and CONFIG_PM_SLEEP

Fixes: 530694f54d ("drm/amdgpu: do not resume device in thaw for normal hibernation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250714081635.4071570-1-arnd@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:54 -04:00
Alex Deucher
3cf06bd4cf drm/amd/display: add more cyan skillfish devices
Add PCI IDs to support display probe for cyan skillfish
family of SOCs.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:50 -04:00
Alex Deucher
e932f4779a drm/amdgpu: update mmhub 3.3 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.

v2: fix typos spotted by David Wu.
v3: fix additional typo spotted by David.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:47 -04:00
Alex Deucher
2a2681eda7 drm/amdgpu: update mmhub 3.0.1 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:40 -04:00
Sathishkumar S
26a63590fe drm/amdgpu/vcn: Register dump cleanup in VCN2_5
Use generic vcn devcoredump helper functions for VCN2_5 and VCN2_6

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:36 -04:00
Sathishkumar S
53c4be7a59 drm/amdgpu/vcn: Register dump cleanup in VCN2_0_0
Use generic vcn devcoredump helper functions for VCN2_0_0

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:31 -04:00
Sathishkumar S
b2d532b588 drm/amdgpu/vcn: Register dump cleanup in VCN3_0
Use generic vcn devcoredump helper functions for VCN3_0

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:28 -04:00
Sathishkumar S
69cc37647b drm/amdgpu/vcn: Register dump cleanup in VCN4_0_3
Use generic vcn devcoredump helper functions for VCN4_0_3

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:23 -04:00
Sathishkumar S
793b97c4ad drm/amdgpu/vcn: Register dump cleanup in VCN4_0_5
Use generic vcn devcoredump helper functions for VCN4_0_5

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:20 -04:00
Sathishkumar S
4e011af912 drm/amdgpu/vcn: Register dump cleanup in VCN4_0_0
Use generic vcn devcoredump helper functions for VCN4_0_0

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:17 -04:00
Sathishkumar S
f4c3be28d5 drm/amdgpu/vcn: Register dump cleanup in VCN5
Use generic vcn devcoredump helper functions for VCN5

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:14 -04:00
Stanley.Yang
08e27c9d92 drm/amdgpu: Add new error code for VCN/JPEG new chain
Add VIDS and JPEG8/9 S|D chain error code for VCN/JPEG v5.0.1.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:06 -04:00
Stanley.Yang
b1b29aa88f drm/amdgpu: Fix vcn v5.0.1 poison irq call trace
Why:
    [13014.890792] Call Trace:
    [13014.890793]  <TASK>
    [13014.890795]  ? show_trace_log_lvl+0x1d6/0x2ea
    [13014.890799]  ? show_trace_log_lvl+0x1d6/0x2ea
    [13014.890800]  ? vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu]
    [13014.890872]  ? show_regs.part.0+0x23/0x29
    [13014.890873]  ? show_regs.cold+0x8/0xd
    [13014.890874]  ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
    [13014.890934]  ? __warn+0x8c/0x100
    [13014.890936]  ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
    [13014.890995]  ? report_bug+0xa4/0xd0
    [13014.890999]  ? handle_bug+0x39/0x90
    [13014.891001]  ? exc_invalid_op+0x19/0x70
    [13014.891003]  ? asm_exc_invalid_op+0x1b/0x20
    [13014.891005]  ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
    [13014.891065]  ? amdgpu_irq_put+0x63/0xe0 [amdgpu]
    [13014.891124]  vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu]
    [13014.891189]  amdgpu_ip_block_hw_fini+0x3b/0x78 [amdgpu]
    [13014.891309]  amdgpu_device_fini_hw+0x3c1/0x479 [amdgpu]
How:
    Add omitted vcn poison irq get call.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:00 -04:00
Sathishkumar S
de55cbff5c drm/amdgpu/vcn: Add regdump helper functions
Add generic helper functions for vcn devcoredump support
which can be re-used for all vcn versions.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:35:54 -04:00
Meng Li
e6c2b0f232 drm/amd/amdgpu: Release xcp drm memory after unplug
Add a new API amdgpu_xcp_drm_dev_free().
After unplug xcp device, need to release xcp drm memory etc.

Co-developed-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:35:48 -04:00
YuanShang
ed76936c6b drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
The field job->vm is used in function amdgpu_job_run to get the page
table re-generation counter and decide whether the job should be skipped.

Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use.
For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed,
then the gfx job should be skipped.

Fixes: 26c95e838e ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare")
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:31:34 -04:00
Mario Limonciello
cc51bbc7d7 drm/amd: Use drm_*() macros instead of DRM_*() for amdgpu_cs
Some of the IOCTL messages can be called for different GPUs and it might
not be obvious which one called them from a problem.  Using the drm_*()
macros the correct device will be shown in the messages.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250715212420.2254925-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:31:25 -04:00
Yunshui Jiang
130c7ed88f drm/amdgpu: use kmalloc_array() instead of kmalloc()
Use kmalloc_array() instead of kmalloc() with multiplication.
kmalloc_array() is a safer way because of its multiply overflow check.

Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:30:31 -04:00
Sathishkumar S
46b0e6b9d7 drm/amdgpu: Fix unintended error log in VCN5_0_0
The error log is supposed to be gaurded under if failure condition.

Fixes: faab5ea083 ("drm/amdgpu: Check vcn sram load return value")
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:29:14 -04:00
Timur Kristóf
35222b5934 drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
Apparently, both DCE 6.0 and 6.4 have 3 PLLs, but PLL0 can only
be used for DP. Make sure to initialize the correct amount of PLLs
in DC for these DCE versions and use PLL0 only for DP.

Also, on DCE 6.0 and 6.4, the PLL0 needs to be powered on at
initialization as opposed to DCE 6.1 and 7.x which use a different
clock source for DFS.

The following functions were used as reference from the	old
radeon driver implementation of	DCE 6.x:
- radeon_atom_pick_pll
- atombios_crtc_set_disp_eng_pll

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:28:48 -04:00
Timur Kristóf
bbddcbe36a drm/amd/display: Don't overwrite dce60_clk_mgr
dc_clk_mgr_create accidentally overwrites the dce60_clk_mgr
with the dce_clk_mgr, causing incorrect behaviour on DCE6.
Fix it by removing the extra dce_clk_mgr_construct.

Fixes: 62eab49faa ("drm/amd/display: hide VGH asic specific structs")
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:59 -04:00
Ce Sun
da46735229 drm/amdgpu: Effective health check before reset
Move amdgpu_device_health_check into amdgpu_device_gpu_recover to
ensure that if the device is present can be checked before reset

The reason is:
1.During the dpc event, the device where the dpc event occurs is not
present on the bus
2.When both dpc event and ATHUB event occur simultaneously,the dpc thread
holds the reset domain lock when detecting error,and the gpu recover thread
acquires the hive lock.The device is simultaneously in the states of
amdgpu_ras_in_recovery and occurs_dpc,so gpu recover thread will not go to
amdgpu_device_health_check.It waits for the reset domain lock held by the
dpc thread, but dpc thread has not released the reset domain lock.In the dpc
callback slot_reset,to obtain the hive lock, the hive lock is held by the
gpu recover thread at this time.So a deadlock occurred

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:49 -04:00
Ce Sun
21c0ffa612 drm/amdgpu: Avoid rma causes GPU duplicate reset
Try to ensure poison creation handle is completed in time
to set device rma value.

Signed-off-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:47 -04:00
Xiang Liu
8f0245ee95 drm/amdgpu: Update IPID value for bad page threshold CPER
Update the IPID register value for bad page threshold CPER according to
the latest definition.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:41 -04:00
Srinivasan Shanmugam
70e33073d9 drm/amdgpu: Fix kdoc style in amdgpu_fence.c
The initial comment block before
amdgpu_fence_driver_guilty_force_completion() incorrectly used '/**' but
is not a kernel-doc comment, causing build warnings.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:742: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Kernel queue reset handling

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:28 -04:00
David Yat Sin
a578f2a58c drm/amdkfd: Fix checkpoint-restore on multi-xcc
GPUs with multi-xcc have multiple MQDs per queue. This patch saves and
restores all the MQDs within the partition.

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:21 -04:00
Alex Deucher
8f249ba6ec Documentation: add RDNA4 dGPUs
Add RDNA4 dGPUs to the dGPU table.

Link: https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html
Link: https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070.html
Link: https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9060xt.html
Link: https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9060xt-8gb.html
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:14 -04:00
Alex Deucher
810a8809cc Documentation: update APU and dGPU tables with MP0/1 info
Add MP1 for APUs and MP0 and MP1 details for dGPUs.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3905
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:07 -04:00
Mario Limonciello
4e9526924d drm/amd: Restore cached manual clock settings during resume
If the SCLK limits have been set before S3 they will not
be restored. The limits are however cached in the driver and so
they can be restored by running a commit sequence during resume.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250725031222.3015095-3-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:27:01 -04:00
Mario Limonciello
26a609e053 drm/amd: Restore cached power limit during resume
The power limit will be cached in smu->current_power_limit but
if the ASIC goes into S3 this value won't be restored.

Restore the value during SMU resume.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250725031222.3015095-2-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:26:46 -04:00
Perry Yuan
8e3967a71e drm/amdgpu: Fix build error when CONFIG_SUSPEND is disabled
The variable `pm_suspend_target_state` is conditionally defined only when
`CONFIG_SUSPEND` is enabled (see `include/linux/suspend.h`). Directly
referencing it without guarding by `#ifdef CONFIG_SUSPEND` causes build
failures when suspend functionality is disabled (e.g., `CONFIG_SUSPEND=n`).

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:26:38 -04:00