[Why]
This change replaces older looping code in favor of these functions.
[How]
There are built in functions for extracting global sync params
during mode validation now.
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Eric Bernstein <Eric.Bernstein@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update sienna_cichlid driver if header and related files.
Support new smu metrics for pre & postDS frequency.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The current outputs of amdgpu_pm_info debugfs come with clock gating
status and followed by current clock/power information. However the
clock gating status retrieving may pull GFX out of CG status. That
will make the succeeding clock/power information retrieving inaccurate.
To overcome this and be with minimum impact, the outputs are updated
to show current clock/power information first.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 058c07201ec7d373fc6a0a570b38a8a9d62c29fb.
Chip NAVY_FLOUNDER uses vcn3.0, but it has only one VCN instance.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
s_barrier triggers a debug exception when issued with PRIV=1,
DEBUG_EN=1. This causes spurious notifications to rocm-gdb.
Clear MODE before issuing s_barrier and restore MODE afterwards
in the context restore handler.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Tested-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Take back patch:drop unnecessary message support check
Because the gpu reset fail problem on renoir can be fixed by:
drm/amd/powerplay: skip invalid msg when smu set mp1 state
It needs to remove SWSMU_CODE_LAYER_L1 in smu_cmn.h to guard a clear
code layer.
Signed-off-by: changfeng <Changfeng.Zhu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support for reporting thermal throttling events through SMI.
Also, add a counter to count the number of throttling interrupts
observed and report the count in the SMI event message.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.
During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.
v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.
v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;
[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249] dump_stack+0x98/0xd5
[ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833] free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831] ksys_ioctl+0x98/0xb0
[ 1230.204004] __x64_sys_ioctl+0x1a/0x20
[ 1230.205174] do_syscall_64+0x5f/0x250
[ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe
2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.
v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset
v5:
1. Fix some style issues.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: Luben Tukov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The variable result is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix below warnings reported by coccicheck:
./drivers/gpu/drm/amd/display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c:557:2-7: WARNING: NULL check before some freeing functions is not needed.
Fixes: 4d55b0dd1c ("drm/amd/display: Add DCN3 CLK_MGR")
Signed-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some asic may not support for some message of set mp1 state.
If the return value of smu_send_smc_msg is -EINVAL, that means it failed
to send msg to smc as it can not map an valid message for the ASIC. And
with that case, smu_set_mp1_state should be skipped as those ASIC was in
fact do not support for that.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's not necessary to retrieve the power features status when the
asic is booted up the first time. This patch can have the features
enablement status still checked in suspend/resume case and removed
from the first boot up sequence.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix this warning:
CC [M] drivers/gpu/drm/amd/amdgpu/gfxhub_v1_1.o
In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h:29,
from drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h:26,
from drivers/gpu/drm/amd/amdgpu/amdgpu.h:43,
from drivers/gpu/drm/amd/amdgpu/df_v3_6.c:23:
drivers/gpu/drm/amd/amdgpu/df_v3_6.c: In function ‘df_v3_6_pmc_get_count’:
./include/drm/drm_print.h:487:2: warning: ‘hi_base_addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
487 | __drm_dbg(DRM_UT_DRIVER, fmt, ##__VA_ARGS__)
| ^~~~~~~~~
drivers/gpu/drm/amd/amdgpu/df_v3_6.c:649:25: note: ‘hi_base_addr’ was declared here
649 | uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
| ^~~~~~~~~~~~
In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h:29,
from drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h:26,
from drivers/gpu/drm/amd/amdgpu/amdgpu.h:43,
from drivers/gpu/drm/amd/amdgpu/df_v3_6.c:23:
./include/drm/drm_print.h:487:2: warning: ‘lo_base_addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
487 | __drm_dbg(DRM_UT_DRIVER, fmt, ##__VA_ARGS__)
| ^~~~~~~~~
drivers/gpu/drm/amd/amdgpu/df_v3_6.c:649:11: note: ‘lo_base_addr’ was declared here
649 | uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to move get_invalidate_req into gfxhub/mmhub level. It will avoid
mismatch of the different gfxhub/mmhub register offsets and fields in the same
gmc block.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to introduce vmhub funcs helper to add following callback
(print_l2_protection_fault_status). Each GC/MMHUB register specific programming
should be in gfxhub/mmhub level.
v2: remove the condition of funcs assignment.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds a member in vmhub structure to store the vm fault interrupt
masks for different version gfxhubs/mmhubs.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The below 3 messages are not supported on Renoir
SMU_MSG_PrepareMp1ForShutdown
SMU_MSG_PrepareMp1ForUnload
SMU_MSG_PrepareMp1ForReset
It needs to revert patch:
drm/amd/powerplay: drop unnecessary message support check
to avoid set mp1 state fail during gpu reset on renoir.
Signed-off-by: changfeng <Changfeng.Zhu@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The driver uses it for EEPROM access, but it's just an i2c bus.
v2: change the callback name as well.
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's not really ras related. It's just a lock for the
bus in general. This removes the ras dependency from
the smu i2c bus.
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
NULL dereference occurs when string that is not ended with space or
newline is written to some dpm sysfs interface (for example pp_dpm_sclk).
This happens because strsep replaces the tmp with NULL if the delimiter
is not present in string, which is then dereferenced by tmp[0].
Reproduction example:
sudo sh -c 'echo -n 1 > /sys/class/drm/card0/device/pp_dpm_sclk'
Signed-off-by: Paweł Gronowski <me@woland.xyz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>