Commit Graph

1382562 Commits

Author SHA1 Message Date
Liao Yuanhong
220c7a21cb drm/radeon/radeon_legacy_encoders: Remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:40 -04:00
Liao Yuanhong
60d6f01b5a drm/radeon/dpm: Remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:40 -04:00
Liao Yuanhong
7a50377cea drm/radeon/atom: Remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Liao Yuanhong
9ab06ab36d drm/amd/pm/powerplay/smumgr: remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code. Swap variable positions
on either side of '==' to enhance readability.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Liao Yuanhong
fa740b115b drm/amd/pm/powerplay/hwmgr/ppatomctrl: Remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code. Swap variable positions
on either side of '!=' to enhance readability.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Liao Yuanhong
db51c5d98b amdgpu/pm/legacy: remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Liao Yuanhong
53c271b9a0 drm/amd/display: Remove redundant ternary operators
For ternary operators in the form of "a ? true : false" or
"a ? false : true", if 'a' itself returns a boolean result, the ternary
operator can be omitted. Remove redundant ternary operators to clean up the
code.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Jesse.Zhang
54d18bc600 drm/amdgpu/userq: add a detect and reset callback
Add a detect and reset callback and add the implementation
for mes.  The callback will detect all hung queues of a
particular ip type (e.g., GFX or compute or SDMA) and
reset them.

v2: increase reset counter and set fence force completion
v3: Removed userq_mutex in mes_userq_detect_and_reset since the driver holds it when calling

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Qianfeng Rong
cbda64f3f5 drm/amdkfd: Fix error code sign for EINVAL in svm_ioctl()
Use negative error code -EINVAL instead of positive EINVAL in the default
case of svm_ioctl() to conform to Linux kernel error code conventions.

Fixes: 42de677f79 ("drm/amdkfd: register svm range")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Alex Deucher
94bd7bf2c9 drm/amdgpu: don't enable SMU on cyan skillfish
Cyan skillfish uses different SMU firmware.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Alex Deucher
fa819e3a7c drm/amdgpu: add support for cyan skillfish gpu_info
Some SOCs which are part of the cyan skillfish family
rely on an explicit firmware for IP discovery.  Add support
for the gpu_info firmware.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:39 -04:00
Alex Deucher
9e6a5cf1a2 drm/amdgpu: add support for cyan skillfish without IP discovery
For platforms without an IP discovery table.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Alex Deucher
e8529dbc75 drm/amdgpu: add ip offset support for cyan skillfish
For chips that don't have IP discovery tables.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Srinivasan Shanmugam
38ab33dbea drm/amdgpu: Fix function header names in amdgpu_connectors.c
Align the function headers for `amdgpu_max_hdmi_pixel_clock` and
`amdgpu_connector_dvi_mode_valid` with the function implementations so
they match the expected kdoc style.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1199: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Returns the maximum supported HDMI (TMDS) pixel clock in KHz.
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1212: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Validates the given display mode on DVI and HDMI connectors.

Fixes: 585b2f685c ("drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Yifan Zhang
fa7c99f04f amd/amdkfd: correct mem limit calculation for small APUs
Current mem limit check leaks some GTT memory (reserved_for_pt
reserved_for_ras + adev->vram_pin_size) for small APUs.

Since carveout VRAM is tunable on APUs, there are three case
regarding the carveout VRAM size relative to GTT:

1. 0 < carveout < gtt
   apu_prefer_gtt = true, is_app_apu = false

2. carveout > gtt / 2
   apu_prefer_gtt = false, is_app_apu = false

3. 0 = carveout
   apu_prefer_gtt = true, is_app_apu = true

It doesn't make sense to check below limitation in case 1
(default case, small carveout) because the values in the below
expression are mixed with carveout and gtt.

adev->kfd.vram_used[xcp_id] + vram_needed >
    vram_size - reserved_for_pt - reserved_for_ras -
    atomic64_read(&adev->vram_pin_size)

gtt: kfd.vram_used, vram_needed, vram_size
carveout: reserved_for_pt, reserved_for_ras, adev->vram_pin_size

In case 1, vram allocation will go to gtt domain, skip vram check
since ttm_mem_limit check already cover this allocation.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Geoffrey McRae
89923fb7ea drm/amd/display: remove oem i2c adapter on finish
Fixes a bug where unbinding of the GPU would leave the oem i2c adapter
registered resulting in a null pointer dereference when applications try
to access the invalid device.

Fixes: 3d5470c973 ("drm/amd/display/dm: add support for OEM i2c bus")
Cc: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Alex Deucher
276e8beb2a drm/amdgpu/userq: add force completion helpers
Add support for forcing completion of userq fences.
This is needed for userq resets and asic resets so that we
can set the error on the fence and force completion.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Alex Deucher
c5da9e9c02 drm/amdgpu: add user queue reset source
Track resets from user queues.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Jesse.Zhang
724471254e drm/amdgpu/mes12: implement detect and reset callback
Implement support for the hung queue detect and reset
functionality.

v2: Always use AMDGPU_MES_SCHED_PIPE

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Jesse.Zhang
b28cfc8305 drm/amdgpu/mes11: implement detect and reset callback
Implement support for the hung queue detect and reset
functionality.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Jesse.Zhang
78e1222fbf drm/amdgpu/mes: add front end for detect and reset hung queue
Helper function to detect and reset hung queues.  MES will
return an array of doorbell indices of which queues are hung
and were optionally reset.

v2:  Clear the doorbell array before detection

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:31 -04:00
Jesse.Zhang
6abd725fdf drm/amd/amdgpu: Implement MES suspend/resume gang functionality for v12
This commit implements the actual MES (Micro Engine Scheduler) suspend
and resume gang operations for version 12 hardware. Previously these
functions were just stubs returning success.

v2: Always use AMDGPU_MES_SCHED_PIPE

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
2025-09-05 17:38:07 -04:00
Jesse.Zhang
2318336573 drm/amdgpu: Add preempt and restore callbacks to userq funcs
Add two new function pointers to struct amdgpu_userq_funcs:
- preempt: To handle preemption of user mode queues
- restore: To restore preempted user mode queues

These callbacks will allow the driver to properly manage queue
preemption and restoration when needed, such as during context
switching or priority changes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:37:50 -04:00
Sunil Khatri
878f33f390 drm/amdgpu: fix the formating for debugfs print
Fix the format of debugfs print in the mqd. Need to
add a colon so parser can parse it properly.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:07:02 -04:00
Alex Deucher
1e18746381 drm/amd: add more cyan skillfish PCI ids
Add additional PCI IDs to the cyan skillfish family.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:06:58 -04:00
Sunil Khatri
7e0fc7b2b7 drm/amdgpu: add more information in debugfs to pagetable dump
Add more information in the debugfs which is needed to dump
a pagetable correctly for userqueues where vmid is not known
in the kernel.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:06:52 -04:00
Xiang Liu
f320ed01cf drm/amdgpu: Correct info field of bad page threshold exceed CPER
Correct valid_bits and ms_chk_bits of section info field for bad page
threshold exceed CPER to match OOB's behavior.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:06:49 -04:00
Eric Huang
36cc7d1317 drm/amdkfd: fix p2p links bug in topology
When creating p2p links, KFD needs to check XGMI link
with two conditions, hive_id and is_sharing_enabled,
but it is missing to check is_sharing_enabled, so add
it to fix the error.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:06:43 -04:00
Qianfeng Rong
3f36e712a8 drm/radeon/ci_dpm: Use int type to store negative error codes
Change the 'ret' variable in ci_populate_all_graphic_levels()
and ci_populate_all_memory_levels() from u32 to int, as it needs to store
either negative error codes or zero returned by other functions.

Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but can be confusing.  Additionally, assigning negative error
codes to unsigned type may trigger a GCC warning when the -Wsign-conversion
flag is enabled.

No effect on runtime.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:06:37 -04:00
Liao Yuanhong
086f66edf9 drm/amdgpu/vcn: Remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:06:34 -04:00
Liao Yuanhong
9502b09933 drm/amdgpu/jpeg: Remove redundant ternary operators
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:06:29 -04:00
Liao Yuanhong
8a4bc4508c drm/amdgpu/ih: Remove redundant ternary operators
For ternary operators in the form of "a ? false : true", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:00:15 -04:00
Liao Yuanhong
60df3e81d7 drm/amdgpu/gmc: Remove redundant ternary operators
For ternary operators in the form of "a ? false : true", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:00:13 -04:00
Liao Yuanhong
d261e744af drm/amdgpu/gfx: Remove redundant ternary operators
For ternary operators in the form of "a ? false : true", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:00:09 -04:00
Liao Yuanhong
7670daf65a drm/amdgpu/amdgpu_cper: Remove redundant ternary operators
For ternary operators in the form of "a ? false : true", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 16:00:04 -04:00
Colin Ian King
4320fd9e0d drm/amd/amdgpu: Fix a less than zero check on a uint32_t struct field
Currently the error check from the call to mes_v12_inv_tlb_convert_hub_id
is always false because a uint32_t struct field hub_id is being used to
to perform the less than zero error check. Fix this by using the int
variable ret to perform the check.

Fixes: 87e6505261 ("drm/amd/amdgpu : Use the MES INV_TLBS API for tlb invalidation on gfx12")
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 15:59:11 -04:00
Gustavo A. R. Silva
1e6d36e15b drm/amdgpu/amdkfd: Avoid a couple hundred -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declarations to the end of the corresponding
structures. Notice that `struct dev_pagemap` is a flexible structure,
this is a structure that contains a flexible-array member.

struct dev_pagemap always has room for at least one range. amdgpu only
uses a single range. Therefore no change are needed to the allocation
of struct amdgpu_device.

Fix 283 of the following type of warnings:
    283 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:111:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 17:36:20 -04:00
Colin Ian King
1ee9d1a096 drm/amd/amdgpu: Fix missing error return on kzalloc failure
Currently the kzalloc failure check just sets reports the failure
and sets the variable ret to -ENOMEM, which is not checked later
for this specific error. Fix this by just returning -ENOMEM rather
than setting ret.

Fixes: 4fb9307154 ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:57:10 -04:00
Timur Kristóf
98fb1d5596 drm/amd/pm: Print VCE clocks too in si_dpm (v3)
They are part of a power state too and should be printed
alongside the rest of the data from the power state.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:57:06 -04:00
Timur Kristóf
6df0768c0d drm/amd/pm: Remove wm_low and wm_high fields from amdgpu_crtc (v2)
These fields were only used by si_dpm and are not necessary
anymore. They also may have been incorrect because:
- wm_high was set to the LOW_WATERMARK field of watermark A.
- wm_low was not set on DCE 6 and was always zero.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:57:01 -04:00
Timur Kristóf
7009e3af04 drm/amd/pm: Disable SCLK switching on Oland with high pixel clocks (v3)
Port of commit 227545b9a0 ("drm/radeon/dpm: Disable sclk
switching on Oland when two 4K 60Hz monitors are connected")

This is an ad-hoc DPM fix, necessary because we don't have
proper bandwidth calculation for DCE 6.

We define "high pixelclock" for SI as higher than necessary
for 4K 30Hz. For example, 4K 60Hz and 1080p 144Hz fall into
this category.

When two high pixel clock displays are connected to Oland,
additionally disable shader clock switching, which results in
a higher voltage, thereby addressing some visible flickering.

v2:
Add more comments.
v3:
Split into two commits for easier review.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:56:15 -04:00
Timur Kristóf
ed3803533c drm/amd/pm: Disable MCLK switching with non-DC at 120 Hz+ (v2)
According to pp_pm_compute_clocks the non-DC display code
has "issues with mclk switching with refresh rates over 120 hz".
The workaround is to disable MCLK switching in this case.

Do the same for legacy DPM.

Fixes: 6ddbd37f10 ("drm/amd/pm: optimize the amdgpu_pm_compute_clocks() implementations")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:55:44 -04:00
Timur Kristóf
9003a07468 drm/amd/pm: Treat zero vblank time as too short in si_dpm (v3)
Some parts of the code base expect that MCLK switching is turned
off when the vblank time is set to zero.

According to pp_pm_compute_clocks the non-DC code has issues
with MCLK switching with refresh rates over 120 Hz.

v3:
Add code comment to explain this better.
Add an if statement instead of changing the switch_limit.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:55:34 -04:00
Timur Kristóf
ce02513012 drm/amd/pm: Adjust si_upload_smc_data register programming (v3)
Based on some comments in dm_pp_display_configuration
above the crtc_index and line_time fields, these values
are programmed to the SMC to work around an SMC hang
when it switches MCLK.

According to Alex, the Windows driver programs them to:
mclk_change_block_cp_min = 200 / line_time
mclk_change_block_cp_max = 100 / line_time
Let's use the same for the sake of consistency.

Previously we used the watermark values, but it seemed buggy
as the code was mixing up low/high and A/B watermarks, and
was not saving a low watermark value on DCE 6, so
mclk_change_block_cp_max would be always zero previously.

Split this change off from the previous si_upload_smc_data
to make it easier to bisect, in case it causes any issues.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:55:28 -04:00
Timur Kristóf
a43b2cec04 drm/amd/pm: Fix si_upload_smc_data (v3)
The si_upload_smc_data function uses si_write_smc_soft_register
to set some register values in the SMC, and expects the result
to be PPSMC_Result_OK which is 1.

The PPSMC_Result_OK / PPSMC_Result_Failed values are used for
checking the result of a command sent to the SMC.
However, the si_write_smc_soft_register actually doesn't send
any commands to the SMC and returns zero on success,
so this check was incorrect.

Fix that by not checking the return value, just like other
calls to si_write_smc_soft_register.

v3:
Additionally, when no display is plugged in, there is no need
to restrict MCLK switching, so program the registers to zero.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:55:10 -04:00
Timur Kristóf
813d13524a drm/amd/pm: Increase SMC timeout on SI and warn (v3)
The SMC can take an excessive amount of time to process some
messages under some conditions.

Background:
Sending a message to the SMC works by writing the message into
the mmSMC_MESSAGE_0 register and its optional parameter into
the mmSMC_SCRATCH0, and then polling mmSMC_RESP_0. Previously
the timeout was AMDGPU_MAX_USEC_TIMEOUT, ie. 100 ms.

Increase the timeout to 200 ms for all messages and to 1 sec for
a few messages which I've observed to be especially slow:
PPSMC_MSG_NoForcedLevel
PPSMC_MSG_SetEnabledLevels
PPSMC_MSG_SetForcedLevels
PPSMC_MSG_DisableULV
PPSMC_MSG_SwitchToSwState

This fixes the following problems on Tahiti when switching
from a lower clock power state to a higher clock state, such
as when DC turns on a display which was previously turned off.

* si_restrict_performance_levels_before_switch would fail
  (if the user previously forced high clocks using sysfs)
* si_set_sw_state would fail (always)

It turns out that both of those failures were SMC timeouts and
that the SMC actually didn't fail or hang, just needs more time
to process those.

Add a warning when there is an SMC timeout to make it easier to
identify this type of problem in the future.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:55:04 -04:00
Timur Kristóf
3a0c3a4035 drm/amd/pm: Disable ULV even if unsupported (v3)
Always send PPSMC_MSG_DisableULV to the SMC, even if ULV mode
is unsupported, to make sure it is properly turned off.

v3:
Simplify si_disable_ulv further.
Always check the return value of amdgpu_si_send_msg_to_smc.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:54:34 -04:00
Timur Kristóf
c661219cd7 drm/amdgpu: Power up UVD 3 for FW validation (v2)
Unlike later versions, UVD 3 has firmware validation.
For this to work, the UVD should be powered up correctly.

When DPM is enabled and the display clock is off,
the SMU may choose a power state which doesn't power
the UVD, which can result in failure to initialize UVD.

v2:
Add code comments to explain about the UVD power state
and how UVD clock is turned on/off.

Fixes: b38f3e80ec ("drm amdgpu: SI UVD v3_1 (v2)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:54:03 -04:00
David Francis
85705b18ae drm/amdgpu: Allow kfd CRIU with no buffer objects
The kfd CRIU checkpoint ioctl would return an error if trying
to checkpoint a process with no kfd buffer objects.

This is a normal case and should not be an error.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:53:56 -04:00
David Francis
4d82724f7f drm/amdgpu: Add mapping info option for GEM_OP ioctl
Add new GEM_OP_IOCTL option GET_MAPPING_INFO, which
returns a list of mappings associated with a given bo, along with
their positions and offsets.

Userspace for this and the previous change can be found at:
https://github.com/checkpoint-restore/criu/pull/2613

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-02 15:53:33 -04:00