2268 Commits

Author SHA1 Message Date
Alex Deucher
1987c79b4f drm/amdgpu/pm: align Hawaii mclk workaround with radeon
Align the hawaii mclk workaround with radeon and windows.

Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/1816
Fixes: 9f4b35411c ("drm/amd/powerplay: add CI asics support to smumgr (v3)")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9649528b637f668c5af9f2b83ca4ad8576ae2121)
Cc: stable@vger.kernel.org
2026-05-05 10:15:11 -04:00
Alex Deucher
2a561b361b drm/amdgpu/pm: add missing revision check for CI
The ci_populate_all_memory_levels() workaround only
applies to revision 0 SKUs.

Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/1816
Fixes: 9f4b35411c ("drm/amd/powerplay: add CI asics support to smumgr (v3)")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1db15ba8f72f400bbad8ae0ce24fafc43429d4bd)
Cc: stable@vger.kernel.org
2026-05-05 10:14:52 -04:00
Lijo Lazar
47a5dfc8ad drm/amd/pm: Add fine grained flag to SMU v13.0.6
Gfx clock is fine grained on SMU v13.0.6/12 SOCs. Add the flag to report
clock frequencies correctly.

Fixes: 7380228401 ("drm/amd/pm: Use generic dpm table for SMUv13 SOCs")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d4871d837bbf70173f63426a84fa80b39e408b9e)
2026-04-28 15:51:18 -04:00
Lijo Lazar
d6b99885b1 drm/amd/pm: Update emit clock logic
If only one level is enabled in clock table, there is no need to
follow the fine grained clock logic which expects a minimum of
two levels (min/max).

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7f19097af1496dd908a044ca95862f32d05f02df)
2026-04-28 15:46:30 -04:00
Yang Wang
ccf8932ed8 drm/amd/pm: fix missing fine-grained dpm table flag on aldebaran
Add the missing SMU_DPM_TABLE_FINE_GRAINED flag to aldebaran DPM table.
This fixes the pp_dpm_sclk node issue caused by missing flag configuration.

Fixes: 7ea1c722fe ("drm/amd/pm: Use common helper for aldebaran dpm table")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3427dea3a48ebddb491a26093f3627384b3cb2c2)
2026-04-24 11:09:43 -04:00
Yang Wang
4f2c86c62a drm/amd/pm: add od table upload error message parsing for smu v14.0.x
parse and print detailed reasons for od table upload failures to
help users understand error causes.

example:
$ echo "0 30 40" | sudo tee fan_curve
$ echo "1 40 30" | sudo tee fan_curve
$ echo "c" | sudo tee fan_curve

kernel log:
[   75.040174] amdgpu 0000:0a:00.0: Failed to upload overdrive table, ret:-5
[   75.040178] amdgpu 0000:0a:00.0: Invalid overdrive table content: OD_FAN_CURVE_PWM_ERROR (13)
[   75.040181] amdgpu 0000:0a:00.0: Failed to upload overdrive table!

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17 14:53:02 -04:00
Yang Wang
79d47bc4c7 drm/amd/pm: add read arg support to smu_cmn_update_table
Extend the smu_cmn_update_table function to support reading a 32-bit return
argument from the SMU firmware during table transfer operations.

- Rename the original function to smu_cmn_update_table_read_arg
- Add a uint32_t *read_arg output parameter to capture firmware response
- Pass the read_arg pointer to the SMU message command
- Keep full backward compatibility using a macro wrapper for the old API

This allows the driver to retrieve status codes, results, or configuration
feedback from the SMU firmware after table data transfer.

No functional changes for existing users of the original smu_cmn_update_table()
API.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17 14:52:47 -04:00
Yang Wang
25fd8095a8 drm/amd/pm: fix runtime PM imbalance issue in amdgpu_pm.c
Fix runtime PM counter imbalance to prevent device from failing to enter low power state

Fixes: a50d32c41f ("drm/amd/pm: Deprecate print_clock_levels interface")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17 14:51:15 -04:00
Srinivasan Shanmugam
a6d561a88c drm/amd/pm: Fix mode2 reset ACK handling on aldebaran v2
aldebaran_mode2_reset() sends a mode2 reset message and waits for
an acknowledgment from the SMU.

The current ACK handling is incorrect.

The wait loop runs only when ret is -ETIME. But after a successful
async send, ret is 0. Because of this, the loop is skipped and the
code does not wait for the reset acknowledgment.

Also, the code checks for ret != 1 after calling
smu_msg_wait_response(). However, smu_msg_wait_response() returns
0 on success and negative error codes on failure. So checking
against 1 is wrong.

Return -EOPNOTSUPP when the firmware does not support this reset
message.

Fix this by setting ret to -ETIME before entering the wait loop,
checking for ret != 0 after getting the SMU response, and returning
-EOPNOTSUPP when the firmware does not support the message.

v2:
- Update ACK check to use ret != 0 instead of ret != 1, since
  smu_msg_wait_response() returns 0 on success (Feifei)
- Remove unnecessary handling for ret == 0

Fixes: e42569d02a ("drm/amd/pm: Modify mode2 msg sequence on aldebaran")
Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Feifei Xu <Feifei.Xu@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17 14:50:26 -04:00
Srinivasan Shanmugam
e81a492d12 drm/amd/pm: smu7: Remove stale error check in smu7_hwmgr_backend_init
smu7_hwmgr_backend_init() is responsible for initializing the SMU7 power
management backend. It allocates and sets up the backend structure,
initializes voltage tables, configures dependency tables, and prepares
platform-specific power and clock parameters.

The function follows a typical pattern where each initialization step
returns a status in "result", and failures are handled via a common
"goto fail" path that performs cleanup.

Commit 2c21648bb814 ("drm/amd/pm/smu7: Remove non-functional SMU7
voltage dependency on DAL") removed a function call in this
initialization sequence, but left behind the corresponding error check.

As a result, "result" is checked twice without being updated in between:

    result = smu7_init_voltage_dependency_on_display_clock_table(hwmgr);
    if (result)
        goto fail;

    ...

    if (result)
        goto fail;

The second check is redundant and unreachable for any new failure, since
no operation modifies "result" between the two checks. This triggers a
Smatch warning about a duplicate zero check and reduces code clarity.

Remove the stale error check to keep the control flow correct and
readable.

Fixes: 9f49e3d4cb ("drm/amd/pm/smu7: Remove non-functional SMU7 voltage dependency on DAL")
Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Timur Kristóf <timur.kristof@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17 14:49:54 -04:00
Yang Wang
504f0098eb drm/amd/pm: fix incorrect FeatureCtrlMask setting on smu v14.0.x
OverDriveTable.FanMinimumPwm and FeatureCtrlMask.PP_OD_FEATURE_FAN_LEGACY_BIT
have a hard dependency.
Invalid handling of this dependency leads to disabled thermal monitoring
and temperature boundary validation.

v2: squash in typo fix (Yang)

Fixes: 9710b84e2a ("drm/amd/pm: add overdrive support on smu v14.0.2/3")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17 14:45:04 -04:00
Yang Wang
48d1a5b33a drm/amd/pm: fix memleak issue in smu_v15_0_8_get_gpu_metrics()
remove unsued code to avoid memleak issue.
(NOTE: This bug occurs during internal branch switching)

Fixes: 0a66ca3b35 ("drm/amd/pm: add get_gpu_metrics support for 15.0.8")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:54:28 -04:00
Yang Wang
e4465c0464 drm/amd/pm: optimize logic and remove unnecessary checks in smu v15.0.8
the following two sets of logic are clearly mutually exclusive in
smu_v15_0_8_set_soft_freq_limited_range.
remove unnecessary code logic to keep the code logic clear.

e.g:

if (smu_dpm->dpm_level != AMD_DPM_FORCED_LEVEL_MANUAL)
	return -EINVAL;

if (smu_dpm->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL) {
	...
}

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:54:21 -04:00
Yang Wang
95e21dff47 drm/amd/pm: fix null pointer dereference issue in smu_v15_0_8_get_power_limit()
Fix null pointer issues caused by coding errors

Fixes: e20e47bcb3 ("drm/amd/pm: add set{get}_power_limit support for smu 15.0.8")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:53:09 -04:00
Yang Wang
592713a896 drm/amd/pm: correct mem_busy_percent display due to calculation errors
PMFW may return invalid values due to internal calculation errors.
so, the kmd driver must validate and sanitize the returned values to
prevent issues caused by firmware calculation errors.

For example, values 0xfffe (-2) and 0xffff (-1) are treated
as invalid and clamped to 0.

this applies to devices with CAB (Cache As Buffer) functionality.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4905
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:52:44 -04:00
Lijo Lazar
3ecee2073c drm/amd/pm: Use smu vram copy in SMUv15
Use smu vram copy wrapper function for vram copy operations in
SMUv15.0.8

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:52:10 -04:00
Lijo Lazar
3bec582562 drm/amd/pm: Use smu vram copy in SMUv13
Use smu vram copy wrapper function for vram copy operations in
SMUv13.0.6 and SMUv13.0.12.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:52:04 -04:00
Lijo Lazar
f2275ea90b drm/amd/pm: Add smu vram copy function
Add a wrapper function for copying data/to from vram. This additionally
checks for any RAS fatal error. Copy cannot be trusted if any RAS fatal
error happened as VRAM becomes inaccessible.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:51:54 -04:00
Timur Kristóf
4724bc5b8d drm/amd/pm/smu7: Add SCLK cap for quirky Hawaii board
On a specific Radeon R9 390X board, the GPU can "randomly" hang
while gaming. Initially I thought this was a RADV bug and tried
to work around this in Mesa:
commit 8ea08747b86b ("radv: Mitigate GPU hang on Hawaii in Dota 2 and RotTR")

However, I got some feedback from other users who are reporting
that the above mitigation causes a significant performance
regression for them, and they didn't experience the hang on their
GPU in the first place.

After some further investigation, it turns out that the problem
is that the highest SCLK DPM level on this board isn't stable.
Lowering SCLK to 1040 MHz (from 1070 MHz) works around the issue,
and has a negligible impact on performance compared to the Mesa
patch. (Note that increasing the voltage can also work around it,
but we felt that lowering the SCLK is the safer option.)

To solve the above issue, add an "sclk_cap" field to smu7_hwmgr
and set this field for the affected board. The capped SCLK value
correctly appears on the sysfs interface and shows up in GUI
tools such as LACT.

Fixes: 9f4b35411c ("drm/amd/powerplay: add CI asics support to smumgr (v3)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:49:01 -04:00
Timur Kristóf
baf28ec579 drm/amd/pm/ci: Fill DW8 fields from SMC
In ci_populate_dw8() we currently just read a value from the SMU
and then throw it away. Instead of throwing away the value,
we should use it to fill other fields in DW8 (like radeon).

Otherwise the value of the other fiels is just cleared when
we copy this data to the SMU later.

Fixes: 9f4b35411c ("drm/amd/powerplay: add CI asics support to smumgr (v3)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:48:50 -04:00
Timur Kristóf
5facfd4c4c drm/amd/pm/ci: Clear EnabledForActivity field for memory levels
Follow what radeon did and what amdgpu does for other GPUs with SMU7.

Fixes: 9f4b35411c ("drm/amd/powerplay: add CI asics support to smumgr (v3)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:48:41 -04:00
Timur Kristóf
d784759c07 drm/amd/pm/ci: Fix powertune defaults for Hawaii 0x67B0
There is no AMD GPU with the ID 0x66B0, this looks like a typo.
It should be 0x67B0 which is actually part of the PCI ID list,
and should use the Hawaii XT powertune defaults according to
the old radeon driver.

Fixes: 9f4b35411c ("drm/amd/powerplay: add CI asics support to smumgr (v3)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:48:18 -04:00
Timur Kristóf
9f49e3d4cb drm/amd/pm/smu7: Remove non-functional SMU7 voltage dependency on DAL
It looks like this was written for an old version of DC (DAL)
and was never adapted afterwards. This was non-functional
because it relied on the "dal_power_level" field which was
never assigned anywhere in the code base.

Also, it was not implemented for CI ASICs.

Now superseded by the newer voltage dependency on display
clock table added by the previous commit, let's remove.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:48:11 -04:00
Timur Kristóf
0138610c14 drm/amd/pm/smu7: Fix SMU7 voltage dependency on display clock
The DCE (display controller engine) requires a minimum voltage
in order to function correctly, depending on which clock level
it currently uses.

Add a new table that contains display clock frequency levels
and the corresponding required voltages. The clock frequency
levels are taken from DC (and the old radeon driver's voltage
dependency table for CI in cases where its values were lower).
The voltage levels are taken from the following function:
phm_initializa_dynamic_state_adjustment_rule_settings().
Furthermore, in case of CI, call smu7_patch_vddc() on the new
table to account for leakage voltage (like in radeon).

Use the display clock value from amd_pp_display_configuration
to look up the voltage level needed by the DCE. Send the
voltage to the SMU via the PPSMC_MSG_VddC_Request command.

The previous implementation of this feature was non-functional
because it relied on a "dal_power_level" field which was never
assigned; and it was not at all implemented for CI ASICs.

I verified this on a Radeon R9 M380 which previously booted to
a black screen with DC enabled (default since Linux 6.19), but
now works correctly.

Fixes: 599a7e9fe1 ("drm/amd/powerplay: implement smu7 hwmgr to manager asics with smu ip version 7.")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:45:22 -04:00
Timur Kristóf
9851f29cb0 drm/amd/pm/ci: Disable MCLK DPM on problematic CI ASICs
There are two known cases where MCLK DPM can causes issues:

Radeon R9 M380 found in iMac computers from 2015.
The SMU in this GPU just hangs as soon as we send it the
PPSMC_MSG_MCLKDPM_Enable command, even when MCLK switching is
disabled, and even when we only populate one MCLK DPM level.
Apply workaround to all devices with the same subsystem ID.

Radeon R7 260X due to old memory controller microcode.
We only flash the MC ucode when it isn't set up by the VBIOS,
therefore there is no way to make sure that it has the correct
ucode version.

I verified that this patch fixes the SMU hang on the R9 M380
which would previously fail to boot. This also fixes the UVD
initialization error on that GPU which happened because the
SMU couldn't ungate the UVD after it hung.

Fixes: 86457c3b21 ("drm/amd/powerplay: Add support for CI asics to hwmgr")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:44:54 -04:00
Timur Kristóf
894f0d34d6 drm/amd/pm/ci: Use highest MCLK on CI when MCLK DPM is disabled
When MCLK DPM is disabled for any reason, populate the MCLK
table with the highest MCLK DPM level, so that the ASIC can
use the highest possible memory clock to get good performance
even when MCLK DPM is disabled.

Fixes: 9f4b35411c ("drm/amd/powerplay: add CI asics support to smumgr (v3)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 16:44:16 -04:00
Lijo Lazar
964e532d58 drm/amd/pm: Unify version check in SMUv14
Use common helper function for firmware version check and logging in
SMUv14

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:14:38 -04:00
Lijo Lazar
353f200825 drm/amd/pm: Unify version check in SMUv12
Use common helper function for firmware version check and logging in
SMUv12.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:13:06 -04:00
Lijo Lazar
6b0a611628 drm/amd/pm: Unify version check in SMUv11
Use common helper function for firmware version check and logging in
SMUv11

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:12:56 -04:00
Asad Kamal
3cfe23b6b0 drm/amd/pm: Use str_enabled_disabled in amdgpu_pm sysfs
Coccinelle flags hand-rolled "enabled"/"disabled" strings; use the shared
str_enabled_disabled() helper from string_choices.h for npm_status and
thermal throttling logging sysfs text.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603251434.zIN2QYWn-lkp@intel.com/
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 14:33:48 -04:00
Jesse.Zhang
6728daa259 drm/amd/pm: Enable VCN reset for pgm=4 with appropriate FW version
Extend the VCN reset capability to include pgm=4 variants when the
firmware version meets the required threshold (>= 0x04557100). This
follows the existing pattern for pgm=0 and pgm=7, ensuring that VCN
reset is enabled only on configurations where it is supported by the
firmware.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-24 13:32:04 -04:00
Yang Wang
bdb2b9e1e0 drm/amd/pm: add dedicated dram addr msg for smu v15
Add dedicated SMU Dram MSG mapping to avoid conflicts
in SMU IP v15 common code for upcoming ASICs.

add new smu msg:
- SMU_MSG_SetDriverDramAddr
- SMU_MSG_SetToolsDramAddr

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-24 13:30:28 -04:00
Yang Wang
ab4905d466 drm/amd/pm: disable OD_FAN_CURVE if temp or pwm range invalid for smu v14
Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid,
otherwise PMFW will reject this configuration on smu v14.0.2/14.0.3.

example:
$ sudo cat /sys/bus/pci/devices/<BDF>/gpu_od/fan_ctrl/fan_curve

OD_FAN_CURVE:
0: 0C 0%
1: 0C 0%
2: 0C 0%
3: 0C 0%
4: 0C 0%
OD_RANGE:
FAN_CURVE(hotspot temp): 0C 0C
FAN_CURVE(fan speed): 0% 0%

$ echo "0 50 40" | sudo tee fan_curve

kernel log:
[  969.761627] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
[ 1010.897800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-24 13:30:20 -04:00
Yang Wang
470891606c drm/amd/pm: disable OD_FAN_CURVE if temp or pwm range invalid for smu v13
Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid,
otherwise PMFW will reject this configuration on smu v13.0.x

example:
$ sudo cat /sys/bus/pci/devices/<BDF>/gpu_od/fan_ctrl/fan_curve

OD_FAN_CURVE:
0: 0C 0%
1: 0C 0%
2: 0C 0%
3: 0C 0%
4: 0C 0%
OD_RANGE:
FAN_CURVE(hotspot temp): 0C 0C
FAN_CURVE(fan speed): 0% 0%

$ echo "0 50 40" | sudo tee fan_curve

kernel log:
[  756.442527] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
[  777.345800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!

Closes: https://github.com/ROCm/amdgpu/issues/208
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:21:39 -04:00
Asad Kamal
7aaa09ab30 drm/amd/pm: Enable user specified gfx clock ranges
Enable user specified gfx clock ranges for smu_15_0_8

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:19:12 -04:00
Hawking Zhang
df4929d76a drm/amdgpu: Add smu v15_0_8 ip block
Add smu v15_0_8 ip block

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:19:09 -04:00
Asad Kamal
6a609b800c drm/amd/pm: Add NPM support for smu_v15_0_8
Add node power management support for smu_v15_0_8

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:19:07 -04:00
Asad Kamal
8847d59969 drm/amd/pm: Add baseboard temperature metrics support
Add baseboard temperature metrics support via system metrics table for
smu_v15_0_8

v4: Add separate function to fill baseboard temperature, use 16, remove
casting

v5: Optimize to use single switch case (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:19:03 -04:00
Asad Kamal
e3b96f5b20 drm/amd/pm: Add gpuboard temperature metrics support
Add gpuboard temperature metrics support via system metrics table for
smu_v15_0_8

v3: Use per sensor attr id (Lijo)

v4: Use s16 for temp, remove cast, use separate function to fill
gpuboard temperature metrics data (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:58 -04:00
Asad Kamal
1415503db0 drm/amd/pm: Add read sensor support
Add read sensor support for smu_v15_0_8

v2: Remove gfx voltage support (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:55 -04:00
Asad Kamal
a6c2ecd95e drm/amd/pm: Add ppt1 support
Add ppt1 support for smu_v15_0_8

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:53 -04:00
Asad Kamal
bc296e6c95 drm/amd/pm: Add get_thermal_temperature_range support
Add get_thermal_temperature_range support smu_v15_0_8

v2: Remove sriov check (Lijo)

v3: Restrict to 1VF mode(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:46 -04:00
Asad Kamal
422b399b09 drm/amd/pm: Add od_edit_dpm_table support
Add od_edit_dpm_table support for smu_v15_0_8

v2: Skip Gl2clk/Fclk (Lijo)

v3: sqaush in set_performance_support (Asad)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:43 -04:00
Asad Kamal
c7de5a863c drm/amd/pm: add populate_umd_state_clk support
add populate_umd_state_clk support for smu 15.0.8

v2: remove gl2clk/socclk/fclk, restrict to only current min/max (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:36 -04:00
Asad Kamal
bf39c461ad drm/amd/pm: Add emit clock support
Add emit clock support and fetching other metrics data like temperature,
clock for smu_v15_0_8

v2: Use umc count for hbm stack temperature (Lijo)

v3: Use correct logic for hbm stacks (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:11 -04:00
Yang Wang
e20e47bcb3 drm/amd/pm: add set{get}_power_limit support for smu 15.0.8
export .set_power_limit & .get_power_limit interface for smu 15.0.8

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:07 -04:00
Yang Wang
39e0a73bde drm/amd/pm: add get_unique_id support for smu 15.0.8
export .get_unique_id interface for smu 15.0.8

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:18:03 -04:00
Yang Wang
0a66ca3b35 drm/amd/pm: add get_gpu_metrics support for 15.0.8
export .get_gpu_metrics interface for 15.0.8

v2: Remove members already exposed by other interfaces, use mask,
logical conversion (Lijo)

v3: Use correct logic for hbm stacks loop (Lijo)
Remove buffer allocation

v4: Make out of bound check outside loop (Lijo)

v5: fix locking in error case (Alex)

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:17:59 -04:00
Asad Kamal
6bcea37ff2 drm/amd/pm: Add get_pm_metrics support for smu 15.0.8
export .get_pm_metrics interface for smu 15.0.8.

v2: Make tmo as unsigned (Lijo)

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:17:56 -04:00
Asad Kamal
34b19dab0c drm/amd/pm: Add default dpm table support for smu 15.0.8
Add default dpm table support for smu 15.0.8

v2: Remove lclk, move pptable check up, add missing clk (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23 14:17:53 -04:00