Commit Graph

1428920 Commits

Author SHA1 Message Date
Srinivasan Shanmugam
a782576e28 drm/amdgpu: Drop unreachable return in amdgpu_reg_get_smn_base64()
amdgpu_reg_get_smn_base64() returns from all control-flow paths inside
the !adev->reg.smn.get_smn_base fallback path.

For version == 1, the function returns the base address from
amdgpu_reg_smn_v1_0_get_base(). For all other versions, the default
switch branch emits a dev_err_once() and returns 0.

The trailing return 0 after the switch is therefore unreachable and is
reported by Smatch as dead code:

  drivers/gpu/drm/amd/amdgpu/amdgpu_reg_access.c:317
  amdgpu_reg_get_smn_base64() warn: ignoring unreachable code

Remove the redundant return statement.

Fixes: 467ebfe65f ("drm/amdgpu: Add smn callbacks to register block")
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:45:39 -04:00
Jesse.Zhang
2cef848812 drm/amdgpu: validate fence_count in wait_fences ioctl
Add an early parameter check in amdgpu_cs_wait_fences_ioctl() to reject
a zero fence_count with -EINVAL.

dma_fence_wait_any_timeout() requires count > 0. When userspace passes
fence_count == 0, the call propagates down to dma_fence core which does
not expect a zero-length array and triggers a WARN_ON.

Return -EINVAL immediately so the caller gets a clear error instead of
hitting an unexpected warning in the DMA fence subsystem.

No functional change for well-formed userspace callers.

v2:
- Reworked commit message to clarify the parameter validation rationale
- Removed verbose crash log from commit description
- Simplified inline code comment

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:45:30 -04:00
Pierre-Eric Pelloux-Prayer
4bbba79a7f drm/amdgpu: move devcoredump generation to a worker
Update the way drm_coredump_printer is used based on its documentation
and Xe's code: the main idea is to generate the final version in one go
and then use memcpy to return the chunks requested by the caller of
amdgpu_devcoredump_read.

The generation is moved to a separate worker thread.

This cuts the time to copy the dump from 40s to ~0s on my machine.

---
v3:
- removed adev->coredump_in_progress and instead use work as
  the synchronisation mechanism
- use kvfree instead of kfree
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:45:20 -04:00
Jesse.Zhang
15e19d832b drm/amd/amdgpu: Fix build errors due to declarations after labels
In C90 (which the kernel uses with -std=gnu89), declarations must
appear at the beginning of a block and cannot follow a label. The
switch cases in amdgpu_discovery.c and gmc_v12_1.c contained variable
declarations immediately after case labels, causing the compiler to
error:

drivers/gpu/drm/amd/amdgpu/gmc_v12_1.c:533:3: error: a label can only be
part of a statement and a declaration is not a statement

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:42:47 -04:00
Sunil Khatri
f802f7b0bc drm/amdgpu/userq: unlock cancel_delayed_work_sync for hang_detect_work
cancel_delayed_work_sync for work hand_detect_work should not be
locked since the amdgpu_userq_hang_detect_work also need the same
mutex and when they run together it could be a deadlock.

we do not need to hold the mutex for
cancel_delayed_work_sync(&queue->hang_detect_work). With this in place
if cancel and worker thread run at same time they will not deadlock.

Due to any failures if there is a hand detect and reset that there a
deadlock scenarios between cancel and running the main thread.

[ 243.118276] task:kworker/9:0 state:D stack:0 pid:73 tgid:73 ppid:2 task_flags:0x4208060 flags:0x00080000
[ 243.118283] Workqueue: events amdgpu_userq_hang_detect_work [amdgpu]
[ 243.118636] Call Trace:
[ 243.118639] <TASK>
[ 243.118644] __schedule+0x581/0x1810
[ 243.118649] ? srso_return_thunk+0x5/0x5f
[ 243.118656] ? srso_return_thunk+0x5/0x5f
[ 243.118659] ? wake_up_process+0x15/0x20
[ 243.118665] schedule+0x64/0xe0
[ 243.118668] schedule_preempt_disabled+0x15/0x30
[ 243.118671] __mutex_lock+0x346/0x950
[ 243.118677] __mutex_lock_slowpath+0x13/0x20
[ 243.118681] mutex_lock+0x2c/0x40
[ 243.118684] amdgpu_userq_hang_detect_work+0x63/0x90 [amdgpu]
[ 243.118888] process_scheduled_works+0x1f0/0x450
[ 243.118894] worker_thread+0x27f/0x370
[ 243.118899] kthread+0x1ed/0x210
[ 243.118903] ? __pfx_worker_thread+0x10/0x10
[ 243.118906] ? srso_return_thunk+0x5/0x5f
[ 243.118909] ? __pfx_kthread+0x10/0x10
[ 243.118913] ret_from_fork+0x10f/0x1b0
[ 243.118916] ? __pfx_kthread+0x10/0x10
[ 243.118920] ret_from_fork_asm+0x1a/0x30

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:42:39 -04:00
Sunil Khatri
7a14a4e9b3 drm/amdgpu/userq: fix dma_fence refcount underflow in userq path
An extra dma_fence_put() can drop the last reference to a fence while it is
still attached to a dma_resv object. This frees the fence prematurely via
dma_fence_release() while other users still hold the pointer.

Later accesses through dma_resv iteration may then operate on the freed
fence object, leading to refcount underflow warnings and potential hangs
when walking reservation fences.

Fix this by correcting the fence lifetime so the dma_resv object retains a
valid reference until it is done with the fence.i

[   31.133803] refcount_t: underflow; use-after-free.
[   31.133805] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x58/0x90, CPU#18: kworker/u96:1/188

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:42:27 -04:00
Hawking Zhang
311f8fc05c drm/amdgpu: fallback to default discovery offset/size in sriov guest
In SRIOV guest environment, if dynamic critical region
is not enabled, fallback to default discovery offset
and size to ensure proper initialization

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:42:22 -04:00
Sunil Khatri
e9405ce75e drm/amdgpu/userq: Use kvfree instead of kfree in amdgpu_userq_signal_ioctl
In function amdgpu_userq_signal_ioctl, drm_gem_objects_lookup allocates
memory via kvmalloc and hence when that memory is freed the memory
via kvfree.

Fixes: 4ca06f6fb4 ("drm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_signal_ioctl")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:41:10 -04:00
Tao Zhou
6b340cccf1 drm/amdgpu: update flip bit setting of RAS bad page
The flip bit setting is different if umc number is half of original
configuration.

v2: block the flip bit setting for unsupported umc configuration.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:41:05 -04:00
Yicong Hui
736ef29ed4 drm/amdgpu: Replace deprecated strcpy() in amdgpu_virt_write_vf2pf_data
strcpy() is deprecated as it does not do any bounds checking (as
specified in Documentation/process/deprecated.rst).

There is a risk of buffer overflow in the case that the value for
THIS_MODULE->version exceeds the 64 characters. This is unlikely, but
replacing the deprecated function will pre-emptively remove this risk
entirely.

Replace both instances of strcpy() with the safer strscpy() function.

Changes have been compile tested.

Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Yicong Hui <yiconghui@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:40:51 -04:00
Andy Nguyen
9c7be0efa6 drm/amd: fix dcn 2.01 check
The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN.

Fixes: 54b822b3ea ("drm/amd/display: Use dce_version instead of chip_id")
Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:40:23 -04:00
Srinivasan Shanmugam
91c7e6342e drm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr()
parse_edid_displayid_vrr() searches the EDID extension blocks for a
DisplayID extension before parsing the dynamic video timing range.

The code previously checked whether edid_ext was NULL after the search
loop. However, edid_ext is assigned during each iteration of the loop,
so it will never be NULL once the loop has executed. If no DisplayID
extension is found, edid_ext ends up pointing to the last extension
block, and the NULL check does not correctly detect the failure case.

Instead, check whether the loop completed without finding a matching
DisplayID block by testing "i == edid->extensions". This ensures the
function exits early when no DisplayID extension is present and avoids
parsing an unrelated EDID extension block.

Also simplify the EDID validation check using "!edid ||
!edid->extensions".

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075)

Fixes: a638b837d0 ("drm/amd/display: Fix refresh rate range for some panel")
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Jerry Zuo <jerry.zuo@amd.com>
Cc: Sun peng Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:39:33 -04:00
Erik Kurzinger
6736c8ff9d drm/amd/display: remove duplicate format modifier
amdgpu_dm_plane_get_plane_modifiers always adds DRM_FORMAT_MOD_LINEAR to
the list of modifiers. However, with gfx12,
amdgpu_dm_plane_add_gfx12_modifiers also adds that modifier to the list.
So we end up with two copies. Most apps just ignore this but some
(Weston) don't like it.

As a fix, we change amdgpu_dm_plane_add_gfx12_modifiers to not add
DRM_FORMAT_MOD_LINEAR to the list, matching the behavior of analogous
functions for other chips.

Signed-off-by: Erik Kurzinger <ekurzinger@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:39:13 -04:00
David Baum
c955e99a06 drm/amdgpu: switch XGMI sysfs show helpers to sysfs_emit_at()
The XGMI sysfs show helpers amdgpu_xgmi_show_num_hops() and
amdgpu_xgmi_show_num_links() currently populate the output buffer with
sprintf() and then call sysfs_emit(buf, "%s\n", buf) to append the final
newline.

Convert both helpers to use sysfs_emit_at() while tracking the current
offset. This keeps buffer construction in the sysfs helpers, avoids
feeding the output buffer back into the final formatted write, and
matches the style already used by
amdgpu_xgmi_show_connected_port_num().

Signed-off-by: David Baum <davidbaum461@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:38:59 -04:00
Nathan Chancellor
eb422f3bbd drm/amdgpu/discovery: Add braces to case statements in amdgpu_discovery_table_check()
When building with a version of clang that supports the narrower
'-fms-anonymous-structs' (as opposed to the wider '-fms-extensions')
along with the associated kernel support (such as in next-20260312 [1]),
there are warnings (or errors with CONFIG_WERROR=y / W=e) from the
switch statement added by commit 47ab777c16 ("drm/amdgpu/discovery:
use common function to check discovery table").

  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:560:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    560 |                 struct ip_discovery_header *ihdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:568:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    568 |                 struct gpu_info_header *ghdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:576:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    576 |                 struct harvest_info_header *hhdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:584:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    584 |                 struct vcn_info_header *vhdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:592:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    592 |                 struct mall_info_header *mhdr =
        |                 ^

If '-fms-extensions' were not present, this would be a hard error in
older clang versions.

Add braces to the case statements that declare variables to clear up the
warnings.

Fixes: 47ab777c16 ("drm/amdgpu/discovery: use common function to check discovery table")
Link: 0d3fccf68d [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:37:55 -04:00
YiPeng Chai
a0b2afa4c3 drm/amd/ras: Pass ras poison consumption message to sriov host
Pass ras poison consumption message to sriov host.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:37:51 -04:00
Sunil Khatri
087be0cd54 drm/amdgpu/userq: Use kvfree instead of kfree in amdgpu_userq_wait_ioctl
In function amdgpu_userq_wait_ioctl, drm_gem_objects_lookup allocates
memory via kvmalloc and hence when that memory is freed the memory
via kvfree.

Fixes: 2de9353e19 ("drm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_wait_ioctl")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:36:23 -04:00
Asad Kamal
3d0b7f5da0 drm/amd/pm: Use common smu fw check function for smu15
Use common smu fw check function for smu15 and remove dedicated ones

v2: Remove dedicated functions and directly use common one

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:36:20 -04:00
Asad Kamal
fba3ad6f93 drm/amd/pm: Use common smu fw check function for smu13
Use common smu fw check function for smu13 and remove deicated ones

v2: Remove dedicated functions and directly use common one

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:36:15 -04:00
Taimur Hassan
927a16c216 drm/amd/display: Promote DC to 3.2.374
This version brings along the following updates:

- Clamp dc_cursor_position x_hotspot to prevent integer overflow
- Query DC for gfx handling when setting linear tiling
- Add a buffer for boot time crc
- Silence static analysis warnings
- Plumb MRQ programming out of DML for dml2_1
- Add dcn_mrq_present Field
- Fix number of opp
- Add debugfs to disallow eDP Replay entry

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:36:12 -04:00
Benjamin Nwankwo
a2aa7987de drm/amd/display: Clamp dc_cursor_position x_hotspot to prevent integer overflow
why:
Workaround for duplicate cursor. Cursor offsetting via x_hotspot attempts
to write a 32 bit unsigned integer to the 8 bit field CURSOR_HOT_SPOT_X.
This wraps cursor position back into focus if x_hotspot exceeds 8 bits,
making duplicate cursors visible

how:
Clamp x_hotspot before writing to hardware

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com>
Signed-off-by: Benjamin Nwankwo <Benjamin.Nwankwo@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:36:09 -04:00
Nicholas Carbones
8333f22e44 drm/amd/display: Query DC for gfx handling when setting linear tiling
[Why]
Post-driver cases always use linear tiling yet gfx handling for this
case is improper, allowing for incorrect gfx structs to be populated and
used.

[How]
Query DC for the apporpriate linear tiling mode and populate the DCN
specific gfx version structs.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Carbones <Nicholas.Carbones@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:36:06 -04:00
Tom Chung
b034c5b0d8 drm/amd/display: Add a buffer for boot time crc
[Why]
We need to reserve a memory buffer for boot time crc test
during resume.

[How]
Create a buffer during boot up and send the buffer info to
DMUB.

Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:36:01 -04:00
Gaghik Khachatrian
cb0f6a16e2 drm/amd/display: Silence static analysis warning
Silence static analysis warnings by ensuring swath size temporaries are
initialized before use. No functional change intended.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:35:58 -04:00
Nicholas Kazlauskas
beb8e35e2b drm/amd/display: Plumb MRQ programming out of DML for dml2_1
[Why]
If the MRQ is present then these fields are also required to be
plumbed out to the requestor for programming.

[How]
Pipe the fields out through rq_dlg_get_rq_reg.

The implementation follows the previous generation in dml2_0 for DCN35
but adjusted for the new helpers and coding style of dml2_1.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:35:54 -04:00
Austin Zheng
fabd89fc17 drm/amd/display: Add dcn_mrq_present Field
[Why/How]
Add MRQ flag so it can be passed from ip_caps to ip_params

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:35:50 -04:00
Austin Zheng
2c5f15ee2c drm/amd/display: Fix number of opp
[Why/How]
Patch number of opp based on IP caps

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:35:47 -04:00
Ray Wu
f7168d1a8d drm/amd/display: Add debugfs to disallow eDP Replay entry
[Why & How]
Test applications need to read CRC from eDP sink side, but sink
replay feature prevents proper CRC reading and causing timeout.

Add disallow_edp_enter_replay debugfs interface to allow test apps
to temporarily disable Replay for CRC operations.

Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:35:40 -04:00
Xi Ruoyao
25bb1d54ba drm/amd/display: Wrap dcn32_override_min_req_memclk() in DC_FP_{START, END}
[Why]
The dcn32_override_min_req_memclk function is in dcn32_fpu.c, which is
compiled with CC_FLAGS_FPU into FP instructions.  So when we call it we
must use DC_FP_{START,END} to save and restore the FP context, and
prepare the FP unit on architectures like LoongArch where the FP unit
isn't always on.

Reported-by: LiarOnce <liaronce@hotmail.com>
Fixes: ee7be8f3de ("drm/amd/display: Limit DCN32 8 channel or less parts to DPM1 for FPO")
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:34:25 -04:00
Calvin Owens
b7f1402f6a drm/amd/display: Fix uninitialized variable use which breaks full LTO
Commit e1b385726f ("drm/amd/display: Add additional checks for PSP
footer size") introduced a use of an uninitialized stack variable
in dm_dmub_sw_init() (region_params.bss_data_size).

Interestingly, this seems to cause no issue on normal kernels. But when
full LTO is enabled, it causes the compiler to "optimize" out huge
swaths of amdgpu initialization code, and the driver is unusable:

    amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x07002F00
    amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5
    amdgpu 0000:03:00.0: amdgpu_device_ip_init failed
    amdgpu 0000:03:00.0: Fatal error during GPU init

It surprises me that neither gcc nor clang emit a warning about this: I
only found it by bisecting the LTO breakage.

Fix by using the bss_data_size field from fw_meta_info_params, as was
presumably intended.

Fixes: e1b385726f ("drm/amd/display: Add additional checks for PSP footer size")
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:50 -04:00
Asad Kamal
febc4b4366 drm/amd/pm: Add common smu fw check function
Add common smu firmware version check function

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:47 -04:00
Christian König
98dc529a27 drm/amdgpu: fix amdgpu_userq_evict
Canceling the resume worker synchonized can deadlock because it can in
turn wait for the eviction worker through the userq_mutex.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:39 -04:00
Jesse.Zhang
688b87d39e drm/amdgpu: Limit BO list entry count to prevent resource exhaustion
Userspace can pass an arbitrary number of BO list entries via the
bo_number field. Although the previous multiplication overflow check
prevents out-of-bounds allocation, a large number of entries could still
cause excessive memory allocation (up to potentially gigabytes) and
unnecessarily long list processing times.

Introduce a hard limit of 128k entries per BO list, which is more than
sufficient for any realistic use case (e.g., a single list containing all
buffers in a large scene). This prevents memory exhaustion attacks and
ensures predictable performance.

Return -EINVAL if the requested entry count exceeds the limit

Reviewed-by: Christian König <christian.koenig@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:14 -04:00
YiPeng Chai
df1f11fe14 drm/amdgpu: Add poison consumption handling for gfx v12_1
Add poison consumption handling for gfx v12_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:11 -04:00
YiPeng Chai
c32606c8c6 drm/amdgpu: Add umc ecc error handling for gmc v12_1
Add umc ecc error handling for gmc v12_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:08 -04:00
YiPeng Chai
22664436e6 drm/amd/ras: Add unified interface to handle ras interrupts
Add unified interface to handle ras interrupts, some redundant
interrupt function interfaces will be removed later.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:04 -04:00
Hawking Zhang
2886b43922 drm/amdgpu: Place firmware bo in vram for A + A
On A+A platforms, PSP requires the firmware bo
to be located in VRAM

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:01 -04:00
Feifei Xu
38c900e0b4 drm/amdgpu/mmhub_v4_2_0: expand gart aperture to gart_end on A+A
On A+A, sysvm aperture is used to access vram and gart. Gart is placed
right after vram. Adjust gart aperture range in mmhub for A+A.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:57 -04:00
Hawking Zhang
25aa39a863 drm/amdgpu/gmc12: Init vram_size for A + A
Calculate vram_size using the XGMI node segment size
and node count for A+A configurations

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:54 -04:00
Hawking Zhang
93d82ed35d drm/amdgpu/gmc12: Update connected_to_cpu flag
Query the host–GPU interface in gmc early init
phase and set xgmi.connected_to_cpu accordingly

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:51 -04:00
Hawking Zhang
63d3dc9dc4 drm/amdgpu/gmc12: Fix VRAM base offset calculation
Include segment size when calculating vram base offset

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:48 -04:00
Hawking Zhang
ae19135340 drm/amdgpu/gmc12: Query host-gpu interface
Query host-gpu interconnect type for gmc v12 devices

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:44 -04:00
Hawking Zhang
ec5d2d2d55 drm/amdgpu: Retire get_xgmi_info callback for gfxhub v12_1
gfxhub v12_1 is not always on. querying xgmi info
from it may not work consistently

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:41 -04:00
Hawking Zhang
be3f235bb6 drm/amdgpu: Query xgmi info from mmhub if available
Query xgmi info from mmhub if available

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:38 -04:00
Hawking Zhang
20fe5d020f drm/amdgpu: Implement get_xgmi_info callback for mmhub_v4_2
Query memory region assignment and address via mmhub

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:34 -04:00
Hawking Zhang
5aff5c6831 drm/amdgpu/gmc12: Update gmc aperture base for A + A
Query mmhub MC_VM_FB_OFFSET, XGMI_LFB_CNTL|SIZE
registers to calculate gmc apeture base address
for A + A configuration

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:30 -04:00
Hawking Zhang
c5f8454cd1 drm/amdgpu/gmc12: Bypass FB resize on A + A platform
Resizing fb bar is not needed/supported on A + A
platform.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:15 -04:00
Hawking Zhang
7c5ce459dd drm/amdgpu: Update gfxhub system aperture settings for A + A
Bypass the programming from SRIOV guest

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:12 -04:00
Hawking Zhang
23b4886a60 drm/amdgpu: Correct mmhub system aperture settings for A + A
Disable AGP and FB apeture on all available MMHUB
instances when vmid0 page table is enabled

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:07 -04:00
Hawking Zhang
8392ca2d7e drm/amdgpu/gmc12: Set up pdb0 for vmid0 page table
Alloc, Init and free pdb0 for vmid0 page table that
is used for fb translation on A + A platform

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:32:05 -04:00