Commit Graph

1369172 Commits

Author SHA1 Message Date
James Zhu
bd6093e2f1 drm/amdkfd: return migration pages from copy function
dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit
should always be set when migration success. cpage includes
src MIGRATE_PFN_MIGRATE bit set and MIGRATE_PFN_VALID bit
unset pages for both ram and vram when memory is only allocated
without being populated before migration, those ram pages should
be counted as migrate pages and those vram pages should not be
counted as migrate pages. Here migration pages refer to how many
vram pages involved.

-v2 use dst to check MIGRATE_PFN_VALID bit (suggested-by Philip)
-v3 add warning when vram pages is less than migration pages
    return migration pages directly from copy function
-v4 correct comments and copy function return mpage (suggested-by Felix)

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:19:12 -04:00
James Zhu
5c15a05b52 drm/amdkfd: remove unused code
upages is assigned under cpages = 0, so it isn't really used in this function.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Philip.Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:19:01 -04:00
Lijo Lazar
6ec7120dec drm/amd/pm: Add priority messages for SMU v13.0.6
Certain messages will processed with high priority by PMFW even if it
hasn't responded to a previous message. Send the priority message
regardless of the success/fail status of the previous message. Add
support on SMUv13.0.6 and SMUv13.0.12

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:18:51 -04:00
Lijo Lazar
91c4fd4164 drm/amdgpu: Set dpc status appropriately
Set the dpc status based on hardware state. Also, clear the status before
reinitialization after a successful reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:18:35 -04:00
Amber Lin
0333052d90 drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
    debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:18:02 -04:00
Lijo Lazar
32f73741d6 drm/amdgpu: Wait for bootloader after PSPv11 reset
Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead
of checking for C2PMSG_33 status, add the callback wait_for_bootloader.
Wait for bootloader to be back to steady state is already part of the
generic mode-1 reset flow. Increase the retry count for bootloader wait
and also fix the mask to prevent fake pass.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:17:50 -04:00
Ethan Carter Edwards
66f92d1035 drm/amdgpu/gfx9.4.3: remove redundant repeated nested 0 check
The repeated checks on grbm_soft_reset are unnecessary. Remove them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:17:47 -04:00
Ethan Carter Edwards
90e1d0324a drm/amdgpu/gfx9: remove redundant repeated nested 0 check
The repeated checks on grbm_soft_reset are unnecessary. Remove them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:17:45 -04:00
Ethan Carter Edwards
8802ec0eba drm/amdgpu/gfx10: remove redundant repeated nested 0 check
The repeated checks on grbm_soft_reset are unnecessary. Remove them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:17:40 -04:00
Xaver Hugl
9ed3d7bdf2 amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
With a timeout of only 1 second, my rx 5700XT fails to initialize,
so this increases the timeout to 2s.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697
Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:17:03 -04:00
Alexandre Demers
1b392348de Documentation: Remove VCE support from OLAND's features
OLAND doesn't support VCE at all, but it does support UVD (3 or 4,
depending of the sources).

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06 14:16:47 -04:00
Lijo Lazar
1c2efae2f8 drm/amd/pm: Make static table support conditional
Add PMFW version check for static table support on SMU v13.0.6 VFs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:44:55 -04:00
Xiang Liu
58364f01db drm/amdgpu: Fix vcn v4.0.3 poison irq call trace on sriov guest
Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini.

[25209.468816] Call Trace:
[25209.468817]  <TASK>
[25209.468818]  ? srso_alias_return_thunk+0x5/0x7f
[25209.468820]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.468822]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.468825]  ? vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu]
[25209.468936]  ? show_regs.part.0+0x23/0x29
[25209.468939]  ? show_regs.cold+0x8/0xd
[25209.468940]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469038]  ? __warn+0x8c/0x100
[25209.469040]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469135]  ? report_bug+0xa4/0xd0
[25209.469138]  ? handle_bug+0x39/0x90
[25209.469140]  ? exc_invalid_op+0x19/0x70
[25209.469142]  ? asm_exc_invalid_op+0x1b/0x20
[25209.469146]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469241]  vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu]
[25209.469343]  amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu]
[25209.469511]  amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu]

Fixes: 4c4a891496 ("drm/amdgpu: Register aqua vanjaram vcn poison irq")
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:43:51 -04:00
Xiang Liu
d3d73bdb02 drm/amdgpu: Fix jpeg v4.0.3 poison irq call trace on sriov guest
Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini.

[25209.467154] Call Trace:
[25209.467156]  <TASK>
[25209.467158]  ? srso_alias_return_thunk+0x5/0x7f
[25209.467162]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.467166]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.467171]  ? jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu]
[25209.467300]  ? show_regs.part.0+0x23/0x29
[25209.467303]  ? show_regs.cold+0x8/0xd
[25209.467304]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467403]  ? __warn+0x8c/0x100
[25209.467407]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467503]  ? report_bug+0xa4/0xd0
[25209.467508]  ? handle_bug+0x39/0x90
[25209.467511]  ? exc_invalid_op+0x19/0x70
[25209.467513]  ? asm_exc_invalid_op+0x1b/0x20
[25209.467518]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467613]  ? amdgpu_irq_put+0x5f/0xc0 [amdgpu]
[25209.467709]  jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu]
[25209.467805]  amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu]
[25209.467971]  amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu]

Fixes: 1b2231de41 ("drm/amdgpu: Register aqua vanjaram jpeg poison irq")
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:42:25 -04:00
Lijo Lazar
5c2b3226d0 drm/amdgpu: Add wrapper function for dpc state
Use wrapper functions to set/indicate dpc status.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:42:21 -04:00
Lijo Lazar
9b331f0f60 drm/amd/pm: Allow static metrics table query in VF
Allow statics metrics table to be queried on SMUv13.0.6 SOCs in VF mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:42:16 -04:00
Jesse.Zhang
92e2449241 drm/amdgpu: Update SDMA firmware version check for user queue support
This commit fixes a firmware version check for enabling user queue
support in SDMA v7.0. The previous version check (7836028) was
incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL
commands causing register conflicts between MCU_DBG0 and MCU_DBG1.

Fixes: 8c011408ed ("drm/amdgpu/sdma7: add ucode version checks for userq support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:41:06 -04:00
Lijo Lazar
2f3b1ccf83 drm/amd/pm: Use cached metrics data on arcturus
Cached metrics data validity is 1ms on arcturus. It's not reasonable for
any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:41:03 -04:00
Lijo Lazar
e87577ef6d drm/amd/pm: Use cached metrics data on aldebaran
Cached metrics data validity is 1ms on aldebaran. It's not reasonable
for any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:59 -04:00
Lijo Lazar
582bf7c515 drm/amdgpu: Add NULL check for asic_funcs
If driver load fails too early, asic_funcs pointer remains unassigned.
Add NULL check to sanitize unwind path.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:39 -04:00
Taimur Hassan
712d98c7da drm/amd/display: Promote DC to 3.2.344
Summary:
* Add interface to log hw state when underflow happens
* Fix hubp programming of 3dlut fast load
* Avoid Read Remote DPCD Many Times
* More liberal vmin/vmax update for freesync
* Fix dmub access race condition

Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:36 -04:00
Muhammad Ahmed
5dc0ec782e drm/amd/display: Adding interface to log hw state when underflow happens
[why]
Will help us better debug underflow issues.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Muhammad Ahmed <Muhammad.Ahmed@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:32 -04:00
Ryan Seto
20ea30a793 drm/amd/display: Toggle for Disable Force Pstate Allow on Disable
[Why & How]
In theory, driver should be able to support disabling force pstate allow
after hardware release however this behavior is not tested yet.
Introducing a new toggle to disable the force on the fly.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:29 -04:00
Reza Amini
e63e9f8b3d drm/amd/display: Fixing hubp programming of 3dlut fast load
[why]
HUBP needs to know the size of the lut's destination in MPC.
This is currently defaulted to 17, and needs to be set for specific
lut size.

[how]
Define and apply the missing hubp field. Taking this opportunity
to consolidate the programming of 3dlut into a hubp and mpc function.

Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com>
Signed-off-by: Reza Amini <reza.amini@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:24 -04:00
Jingwen Zhu
3df957517f drm/amd/display: limited pll vco w/a v2
[Why/How]
The w/a will cause reboot black screen issue.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Jingwen Zhu <Jingwen.Zhu@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:13 -04:00
Fangzhi Zuo
fa5f99ee72 drm/amd/display: Avoid Read Remote DPCD Many Times
Reading remote dpcd is time consuming. Instead of reading each byte
one by one, read 16 bytes together.

Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:40:06 -04:00
Mario Limonciello
6ec8a5cbec drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
This reverts commit 66abb99699.
This broke custom brightness curves but it wasn't obvious because
of other related changes. Custom brightness curves are always
from a 0-255 input signal. The correct fix was to fix the default
value which was done by [1].

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4412
Link: https://lore.kernel.org/amd-gfx/0f094c4b-d2a3-42cd-824c-dc2858a5618d@kernel.org/T/#m69f875a7e69aa22df3370b3e3a9e69f4a61fdaf2
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:39:06 -04:00
Paul Hsieh
2e72fdba8a drm/amd/display: update dpp/disp clock from smu clock table
[Why]
The reason some high-resolution monitors fail to display properly
is that this platform does not support sufficiently high DPP and
DISP clock frequencies

[How]
Update DISP and DPP clocks from the smu clock table then DML can
filter these mode if not support.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Paul Hsieh <Paul.Hsieh@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:59 -04:00
Aurabindo Pillai
6d31602a9f drm/amd/display: more liberal vmin/vmax update for freesync
[Why]
FAMS2 expects vmin/vmax to be updated in the case when freesync is
off, but supported. But we only update it when freesync is enabled.

[How]
Change the vsync handler such that dc_stream_adjust_vmin_vmax() its called
irrespective of whether freesync is enabled. If freesync is supported,
then there is no harm in updating vmin/vmax registers.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3546
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:40 -04:00
Aurabindo Pillai
c210b757b4 drm/amd/display: fix dmub access race condition
Accessing DC from amdgpu_dm is usually preceded by acquisition of
dc_lock mutex. Most of the DC API that DM calls are under a DC lock.
However, there are a few that are not. Some DC API called from interrupt
context end up sending DMUB commands via a DC API, while other threads were
using DMUB. This was apparent from a race between calls for setting idle
optimization enable/disable and the DC API to set vmin/vmax.

Offload the call to dc_stream_adjust_vmin_vmax() to a thread instead
of directly calling them from the interrupt handler such that it waits
for dc_lock.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:34 -04:00
Duncan Ma
fd20627c74 drm/amd/display: Adjust AUX-less ALPM setting
[Why & How]
Change ACDS period to support LTTPR.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Duncan Ma <Duncan.Ma@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:38:29 -04:00
Siyang Liu
9dd8e2ba26 drm/amd/display: fix a Null pointer dereference vulnerability
[Why]
A null pointer dereference vulnerability exists in the AMD display driver's
(DC module) cleanup function dc_destruct().
When display control context (dc->ctx) construction fails
(due to memory allocation failure), this pointer remains NULL.
During subsequent error handling when dc_destruct() is called,
there's no NULL check before dereferencing the perf_trace member
(dc->ctx->perf_trace), causing a kernel null pointer dereference crash.

[How]
Check if dc->ctx is non-NULL before dereferencing.

Link: https://lore.kernel.org/r/tencent_54FF4252EDFB6533090A491A25EEF3EDBF06@qq.com
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
(Updated commit text and removed unnecessary error message)
Signed-off-by: Siyang Liu <Security@tencent.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:52 -04:00
Mangesh Gadre
82594ac858 drm/amdgpu: Initialize vcn v5_0_1 ras function
Initialize vcn v5_0_1 ras function

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:48 -04:00
Michel Dänzer
cc7bfba959 drm/amd/display: Add primary plane to commits for correct VRR handling
amdgpu_dm_commit_planes calls update_freesync_state_on_stream only for
the primary plane. If a commit affects a CRTC but not its primary plane,
it would previously not trigger a refresh cycle or affect LFC, violating
current UAPI semantics.

Fixes e.g. atomic commits affecting only the cursor plane being limited
to the minimum refresh rate.

Don't do this for the legacy cursor ioctls though, it would break the
UAPI semantics for those.

Suggested-by: Xaver Hugl <xaver.hugl@kde.org>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:33 -04:00
Yunxiang Li
ba5e322b26 drm/amdgpu: skip mgpu fan boost for multi-vf
On multi-vf setup if the VM have two vf assigned, perhaps from two
different gpus, mgpu fan boost will fail.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:27 -04:00
Mangesh Gadre
01fa9758c8 drm/amdgpu: Initialize jpeg v5_0_1 ras function
Initialize jpeg v5_0_1 ras function

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:24 -04:00
Xiang Liu
8e8e08c831 drm/amdgpu: Skip poison aca bank from UE channel
Avoid GFX poison consumption errors logged when fatal error occurs.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:37:20 -04:00
Arnd Bergmann
4d22db6d07 drm/amdgpu: fix link error for !PM_SLEEP
When power management is not enabled in the kernel build, the newly
added hibernation changes cause a link failure:

arm-linux-gnueabi-ld: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o: in function `amdgpu_pmops_thaw':
amdgpu_drv.c:(.text+0x1514): undefined reference to `pm_hibernate_is_recovering'

Make the power management code in this driver conditional on
CONFIG_PM and CONFIG_PM_SLEEP

Fixes: 530694f54d ("drm/amdgpu: do not resume device in thaw for normal hibernation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250714081635.4071570-1-arnd@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:54 -04:00
Alex Deucher
3cf06bd4cf drm/amd/display: add more cyan skillfish devices
Add PCI IDs to support display probe for cyan skillfish
family of SOCs.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:50 -04:00
Alex Deucher
e932f4779a drm/amdgpu: update mmhub 3.3 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.

v2: fix typos spotted by David Wu.
v3: fix additional typo spotted by David.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:47 -04:00
Alex Deucher
2a2681eda7 drm/amdgpu: update mmhub 3.0.1 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:40 -04:00
Sathishkumar S
26a63590fe drm/amdgpu/vcn: Register dump cleanup in VCN2_5
Use generic vcn devcoredump helper functions for VCN2_5 and VCN2_6

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:36 -04:00
Sathishkumar S
53c4be7a59 drm/amdgpu/vcn: Register dump cleanup in VCN2_0_0
Use generic vcn devcoredump helper functions for VCN2_0_0

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:31 -04:00
Sathishkumar S
b2d532b588 drm/amdgpu/vcn: Register dump cleanup in VCN3_0
Use generic vcn devcoredump helper functions for VCN3_0

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:28 -04:00
Sathishkumar S
69cc37647b drm/amdgpu/vcn: Register dump cleanup in VCN4_0_3
Use generic vcn devcoredump helper functions for VCN4_0_3

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:23 -04:00
Sathishkumar S
793b97c4ad drm/amdgpu/vcn: Register dump cleanup in VCN4_0_5
Use generic vcn devcoredump helper functions for VCN4_0_5

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:20 -04:00
Sathishkumar S
4e011af912 drm/amdgpu/vcn: Register dump cleanup in VCN4_0_0
Use generic vcn devcoredump helper functions for VCN4_0_0

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:17 -04:00
Sathishkumar S
f4c3be28d5 drm/amdgpu/vcn: Register dump cleanup in VCN5
Use generic vcn devcoredump helper functions for VCN5

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:14 -04:00
Stanley.Yang
08e27c9d92 drm/amdgpu: Add new error code for VCN/JPEG new chain
Add VIDS and JPEG8/9 S|D chain error code for VCN/JPEG v5.0.1.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:06 -04:00
Stanley.Yang
b1b29aa88f drm/amdgpu: Fix vcn v5.0.1 poison irq call trace
Why:
    [13014.890792] Call Trace:
    [13014.890793]  <TASK>
    [13014.890795]  ? show_trace_log_lvl+0x1d6/0x2ea
    [13014.890799]  ? show_trace_log_lvl+0x1d6/0x2ea
    [13014.890800]  ? vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu]
    [13014.890872]  ? show_regs.part.0+0x23/0x29
    [13014.890873]  ? show_regs.cold+0x8/0xd
    [13014.890874]  ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
    [13014.890934]  ? __warn+0x8c/0x100
    [13014.890936]  ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
    [13014.890995]  ? report_bug+0xa4/0xd0
    [13014.890999]  ? handle_bug+0x39/0x90
    [13014.891001]  ? exc_invalid_op+0x19/0x70
    [13014.891003]  ? asm_exc_invalid_op+0x1b/0x20
    [13014.891005]  ? amdgpu_irq_put+0xc6/0xe0 [amdgpu]
    [13014.891065]  ? amdgpu_irq_put+0x63/0xe0 [amdgpu]
    [13014.891124]  vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu]
    [13014.891189]  amdgpu_ip_block_hw_fini+0x3b/0x78 [amdgpu]
    [13014.891309]  amdgpu_device_fini_hw+0x3c1/0x479 [amdgpu]
How:
    Add omitted vcn poison irq get call.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04 14:36:00 -04:00