Commit Graph

1397847 Commits

Author SHA1 Message Date
Wayne Lin
62320fb8d9 drm/amd/display: Enable mst when it's detected but yet to be initialized
[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.

[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst

V2: Adjust the commit msg a bit

Fixes: bc068194f5 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT")
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 10:01:18 -05:00
Tao Zhou
e1ca536e17 drm/amdgpu: support to load RAS bad pages from PMFW
PMFW manages eeprom bad page records, update bad page loading
accrodingly.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 10:01:14 -05:00
Lijo Lazar
1ad25fd272 drm/amdgpu: Fix wait after reset sequence in S3
For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if
TOS is unloaded.

Fixes: 32f73741d6 ("drm/amdgpu: Wait for bootloader after PSPv11 reset")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:58:38 -05:00
Tao Zhou
7f34ddf77d drm/amdgpu: add ras_eeprom_read_idx interface
PMFW will manage RAS eeprom data by itself, add new interface to read
eeprom data via PMFW, we can read part of records by setting index.

v2: use IPID parse interface.
    pa is not used and set it to a fixed value.
v3: optimize the null pointer check for IPID parse interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:58:35 -05:00
Tao Zhou
cd74132be8 drm/amdgpu: make MCA IPID parse global
So we can call it in other blocks.

v2: add a new IPID parse interface for umc and we can
    implement it for each ASIC.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:58:31 -05:00
Mario Limonciello
4104c0a454 drm/amd: Fix suspend failure with secure display TA
commit c760bcda83 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.

Add that to the failure handling path even on a quick failure.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679
Fixes: c760bcda83 ("drm/amd: Check whether secure display TA loaded successfully")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:24 -05:00
YiPeng Chai
be031770bf drm/amd/ras: Fix the issue of incorrect function call
When amdgpu_device_health_check fails, amdgpu_ras_pre_reset
will not be called and therefore amdgpu_ras_post_reset
cannot be called either.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:17 -05:00
Samuel Zhang
5d1b32cfe4 drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.

Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.

The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:11 -05:00
YiPeng Chai
127cdd726f drm/amd/ras: ras supports i2c eeprom for mp1 v13_0_12
ras supports i2c eeprom for mp1 v13_0_12.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:57:07 -05:00
Ahmad Rehman
07528f7d97 drm/amdkfd: Do not wait for queue op response during reset
This patch adds the condition to not wait for
the queue response for unmap, if the gpu is in reset.

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:30 -05:00
David (Ming Qiang) Wu
b665f29a2f drm/amdgpu/userq: need to unref bo
unref bo after amdgpu_bo_reserve() failure as it has
called amdgpu_bo_ref() already

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:25 -05:00
Gangliang Xie
1349b31313 drm/amdgpu: initialize max record count after table reset
initialize max record count and record offset after table reset

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:22 -05:00
Gangliang Xie
a448c40ff2 drm/amd/pm: check pmfw eeprom feature bit
get and check the pmfw eeprom feature bit to
decide if pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:19 -05:00
Gangliang Xie
cd5b28a040 drm/amdgpu: add check function for pmfw eeprom
add check function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:15 -05:00
Gangliang Xie
19c815d516 drm/amdgpu: add initialization function for pmfw eeprom
add initialization function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:04 -05:00
Gangliang Xie
9ce015e5fd drm/amdgpu: adapt reset function for pmfw eeprom
adapt reset function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:55:58 -05:00
Alex Deucher
f903b85ed0 drm/amdgpu: fix possible fence leaks from job structure
If we don't end up initializing the fences, free them when
we free the job.  We can't set the hw_fence to NULL after
emitting it because we need it in the cleanup path for the
submit direct case.

v2: take a reference to the fences if we emit them
v3: handle non-job fence in error paths

Fixes: db36632ea5 ("drm/amdgpu: clean up and unify hw fence handling")
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:59 -05:00
YiPeng Chai
d95ca7f515 drm/amdgpu: suspend ras module before gpu reset
During gpu reset, all GPU-related resources are
inaccessible. To avoid affecting ras functionality,
suspend ras module before gpu reset and resume
it after gpu reset is complete.

V2:
  Rename functions to avoid misunderstanding.

V3:
  Move flush_delayed_work to amdgpu_ras_process_pause,
  Move schedule_delayed_work to amdgpu_ras_process_unpause.

V4:
  Rename functions.

V5:
  Move the function to amdgpu_ras.c.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:59 -05:00
Gangliang Xie
d4432f16d3 drm/amdgpu: add wrapper functions for pmfw eeprom interface
add wrapper functions for pmfw eeprom interface, for these interfaces
to be easily and safely called

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
f6cdcbd2c0 drm/amdgpu: add function to check if pmfw eeprom is supported
add function to check if pmfw is supported, skip eeprom
check and recover when pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
f5346a176c drm/amd/pm: add smu ras driver framework
add functions to get smu ras driver

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
77dbd7c0a2 drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12
implement ras_smu_drv interface for smu v13.0.12

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
0c6f09e65b drm/amd/pm: add new message definitions for pmfw eeprom interface
Add new message definitions for pmfw eeprom interface

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Philip Yang
88ef4de35f Revert "drm/amdkfd: Improve signal event slow path"
To fix regression report on gfx8, which requires the exhaustive search
path for signaled event.

The high CPU usage of KFD interrupt wq issue is gone after HIP/ROCr add
option to reduce HW event interrupts, safe to revert this optimization
patch now.

This reverts commit de844846f7.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Rong Zhang
f19bbecd34 drm/amd/display: Fix NULL deref in debugfs odm_combine_segments
When a connector is connected but inactive (e.g., disabled by desktop
environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading
odm_combine_segments causes kernel NULL pointer dereference.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy)  e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6
 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025
 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
 Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
 RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
 FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  seq_read_iter+0x125/0x490
  ? __alloc_frozen_pages_noprof+0x18f/0x350
  seq_read+0x12c/0x170
  full_proxy_read+0x51/0x80
  vfs_read+0xbc/0x390
  ? __handle_mm_fault+0xa46/0xef0
  ? do_syscall_64+0x71/0x900
  ksys_read+0x73/0xf0
  do_syscall_64+0x71/0x900
  ? count_memcg_events+0xc2/0x190
  ? handle_mm_fault+0x1d7/0x2d0
  ? do_user_addr_fault+0x21a/0x690
  ? exc_page_fault+0x7e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x6c/0x74
 RIP: 0033:0x7f44d4031687
 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00>
 RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687
 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003
 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000
 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
  </TASK>
 Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x>
  snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn>
  platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp>
 CR2: 0000000000000000
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
 Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
 RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
 FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
 PKRU: 55555554

Fix this by checking pipe_ctx->stream_res.tg before dereferencing.

Fixes: 07926ba8a4 ("drm/amd/display: Add debugfs interface for ODM combine info")
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Mario Limoncello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Philip Yang
10c382ec6c drm/amdkfd: Don't clear PT after process killed
If process is killed. the vm entity is stopped, submit pt update job
will trigger the error message "*ERROR* Trying to push to a killed
entity", job will not execute.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
3f16007d86 drm/amd/ras: Add ras support for umc v12_5_0
Add ras support for umc v12_5_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
d7f105a402 drm/amd/ras: Add ras support for nbio v7_9_1
Add ras support for nbio v7_9_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
2f46c547e4 drm/amdgpu: Add ras ip block name
Add ras ip block name.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
36265d2bcc drm/amd/ras: Increase ras switch control range
Increase ras switch control range.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Alex Deucher
fd39b5a583 drm/amdgpu/smu: Handle S0ix for vangogh
Fix the flows for S0ix.  There is no need to stop
rlc or reintialize PMFW in S0ix.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Lijo Lazar
c83fd2a665 drm/amd/pm: Update SMUv13.0.12 partition metrics
Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
56aeca499a drm/amd/pm: Update SMUv13.0.6 partition metrics
For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
4f993e2309 drm/amd/pm: Add schema v1.1 for parition metrics
Use a schema similar to gpu metrics v1.9 for partition metrics also. It
will have field type encoded followed by the field value(s). The
attribute ids used will be shared with gpu metrics. The structure
definition is only to distinguish between gpu metrics and partition
metrics though both gpu metrics v1.9 and partition metrics v1.1 follow
the same definition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
b480f573a8 drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12
Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Sunil Khatri
cd6250f3ae drm/amdgpu: validate the bo from done list for NULL
Make sure the bo is valid before using it.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Pierre-Eric Pelloux-Prayer
36ffc58b8a drm/amdgpu: lock bo before calling amdgpu_vm_bo_update_shared
BO's reservation object must be locked before using
amdgpu_vm_bo_update_shared otherwise dma_resv_assert_held will
complain in amdgpu_vm_update_shared.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Christian König
c72d41a8f3 drm/amdgpu: grab a BO reference in vm_lock_done_list.
Otherwise it is possible that between dropping the status lock and
locking the BO that the BO is freed up.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Xiang Liu
7cf422ed33 drm/amd/ras: Fix format truncation
../ras/rascore/ras_cper.c: In function ‘cper_generate_fatal_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../ras/rascore/ras_cper.c: In function ‘cper_generate_runtime_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Taimur Hassan
1da571bdb2 drm/amd/display: Promote DC to 3.2.357
This version brings along following update:

- HDCP2 FW locality check refactors
- Fix black screen issue with HDMI output
- Increase IB mem size
- Revert max buffered cursor size to 64
- Extend inbox0 lock to run Replay / PSR
- Refactor VActive implementation
- Add Pstate viewport reduction
- Persist stream refcount through restore

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Taimur Hassan
3f0c27edd8 drm/amd/display: [FW Promotion] Release 0.1.34.0
Release hightlights

DCN35/36
    * Dynamically clock gate before and after prefetch

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Alex Hung
678c901443 drm/amd/display: Fix black screen with HDMI outputs
[Why & How]
This fixes the black screen issue on certain APUs with HDMI,
accompanied by the following messages:

amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info
                     frame on connector DP-1: -22
amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm]
                     Cannot find any crtc or sizes

Fixes: 489f0f600c ("drm/amd/display: Fix DVI-D/HDMI adapters")
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Alvin Lee
54963d18a8 drm/amd/display: Increase IB mem size
[Why & How]
Increase IB mem size to match size of largest structure that will
use IB transfer between driver and DMU.

Reviewed-by: Oleh Kuzhylnyi <oleh.kuzhylnyi@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Dillon Varone
ac7f4fcc7b drm/amd/display: Revert DCN4 max buffered cursor size to 64
[Why & How]
The buffered cursor cap is expressed assuming a square cursor, and usage
of the cursor buffer is limited by the request size. For greater than 32
pixels, the request size is fixed at 256 bytes, so the maximum width
must be floored to the nearest 256th byte. At 4bpp this means even with
24kB DCN4 can only hold a 64x64 cursor in the buffer as even 65 pixels
would require 512 bytes per line instead of 256.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Joshua Aberback
b0ba3108e3 drm/amd/display: Persist stream refcount through restore
[Why & How]
Overwriting the refcount on stream restore can lead to double-free errors
or memory leaks if an unbalanced number of retains and releases occurs
between a backup and restore.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Austin Zheng
717b836c27 drm/amd/display: Add Pstate viewport reduction
[Why/How]
Add struct to hold calculated reduced viewport pstate
recout reduction lines per plane

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Austin Zheng
1b3246352a drm/amd/display: Refactor VActive implementation
[Why & How]
Refactors VActive accounting in PMO, and breaks down fill time
requirement by P-State type as it can result in drasitcally different
bandwidth requirements depending on the blackout length.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Austin Zheng
e44ee152a5 drm/amd/display: Update P-state naming for clarity.
[Why & How]
P-state can refer to different things like UCLK P-state, PPT, or temp read
Update naming for clarity

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Austin Zheng
9c83768e57 drm/amd/display: Remove old PMO options
[Why & How]
Removes deprecated or unused PMO options.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00
Austin Zheng
e19cb97ea8 drm/amd/display: Add pte_buffer_mode and force_one_row_for_frame in dchub reg
[Why & How]
Update structs for rq regs

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:20 -05:00