Print the name of the client rather than the number. This
makes it easier to debug what block is causing the fault.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Print the name of the client rather than the number. This
makes it easier to debug what block is causing the fault.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It needs to load renoir_ta firmware because hdcp is enabled by default
for renoir now. This can avoid error:DTM TA is not initialized
Signed-off-by: Changfeng <Changfeng.Zhu@amd.com>
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Calculate the correct value for max_entries or we might run after the
page_address array.
v2: Xinhui pointed out we don't need the shift
v3: use local copy of start and simplify some calculation
v4: fix the case that we map less VA range than BO size
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 1e691e2444 drm/amdgpu: stop allocating dummy GTT nodes
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable multi-ring ih1 and ih2 for Arcturus only.
For Navi10 family multi-ring has been disabled.
Apparently, having multi-ring enabled in Navi was causing
continus page fault interrupts.
Further investigation is needed to get to the root cause.
Related issue link:
https://gitlab.freedesktop.org/drm/amd/-/issues/1279
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When GPU is in reset, its status isn't stable and ring buffer also need
be reset when resuming. Therefore driver should protect GPU recovery
thread from ring buffer accessed by other threads. Otherwise GPU will
randomly hang during recovery.
v2: correct indent
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Will be used to fetch the fan speeds when manual fan mode is
set.
v2: squash in a Coverity fix from Colin Ian King
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No longer needed as we can calculate it based on
the fan's max rpm.
v2: minor code rework
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No longer needed as we can calculate it based on
the fan's max rpm.
v2: rework code to avoid possible uninitialized
variable use.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On hardware with multiple uvd instances, dependent uvd jobs
may get scheduled to different uvd instances. Because uvd_enc
jobs retain hw context, dependent jobs should always run on the
same uvd instance. This patch disables GPU scheduler's load balancer
for a context that binds jobs from the same context to a uvd
instance.
v2: Squash in uvd_enc fix
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support for reporting GPU reset events through SMI. KFD
would report both pre and post GPU reset events.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes below compiler warnings:
CC [M] drivers/gpu/drm/amd/amdgpu/amdgpu_device.o
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
381 | void static inline amdgpu_mm_wreg_mmio(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags)
| ^~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_fini’:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3381:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]
3381 | int r;
| ^
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On my R9 390, the voltage was reported as a constant 1000 mV.
This was due to a bug in smu7_hwmgr.c, in the smu7_read_sensor()
function, where some magic constants were used in a condition,
to determine whether the voltage should be read from PLANE2_VID
or PLANE1_VID. The VDDC mask was incorrectly used, instead of
the VDDGFX mask.
This patch changes the code to use the correct defined constants
(and apply the correct bitshift), thus resulting in correct voltage reporting.
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Starting in Linux 5.8, the graphics and memory clock frequency were not being
reported for CIK cards. This is a regression, since they were reported correctly
in Linux 5.7.
After investigation, I discovered that the smum_send_msg_to_smc() function,
attempts to call the corresponding get_argument() function of ci_smu_funcs.
However, the get_argument() function is not defined in ci_smu_funcs.
This patch fixes the bug by specifying the correct get_argument() function.
Fixes: a0ec225633 ("drm/amd/powerplay: unified interfaces for message issuing and response checking")
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Normally softwareshutdowntemp should be greater than Thotspotlimit.
However, on some VEGA10 ASIC, the softwareshutdowntemp is 91C while
Thotspotlimit is 105C. This seems not right and may trigger some
false alarms.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v1:
the C type "unsigned long" size is 32bit on 32bit system,
it will cause code logic error, so replace it with "uint64_t".
v2:
remove duplicate cast operation.
Signed-off-by: Kevin <kevin1.wang@amd.com>
Suggest-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When resetting EDC related register, all CUs needs to be visited,
otherwise, garbage data from EDC register of missed SEs would present.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When amdgpu_display_modeset_create_props() fails, state and
state->context should be freed to prevent memleak. It's the
same when amdgpu_dm_audio_init() fails.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In dm_dp_aux_transfer() now, we forget to handle AUX_WR fail cases. We
suppose every write wil get done successfully and hence some AUX
commands might not sent out indeed.
[How]
Check if AUX_WR success. If not, retry it.
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The values for "se_num" and "sh_num" come from the user in the ioctl.
They can be in the 0-255 range but if they're more than
AMDGPU_GFX_MAX_SE (4) or AMDGPU_GFX_MAX_SH_PER_SE (2) then it results in
an out of bounds read.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Virtual display is non-atomic so report false to avoid checking
atomic state and other atomic things at runtime.
v2: squash into the sr-iov check
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DC uses these to raise the voltage as needed for higher dispclk/dppclk
and to ensure that we have enough bandwidth to drive the displays.
There's a bug preventing these from actuially sending messages since
it's checking the actual clock (which is 0) instead of the incoming
clock (which shouldn't be 0) when deciding to send the hardmin.
[How]
Check the clocks != 0 instead of the actual clocks.
Fixes: 9ed9203c3e ("drm/amd/powerplay: rv dal-pplib interface refactor powerplay part")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The ctx->features are new RAS implementation which
is only available for Vega20 and onwards, it is not
available for vega10, vega10 should follow legacy
ECC implementation.
Changed from V1:
wrap function to initialize kfd node properties
Changed from V2:
remove wrap function and SDMA SRAM ECC check
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Do the maths in celsius degree. This can fix the issues caused
by the changes below:
drm/amd/pm: correct Vega20 swctf limit setting
drm/amd/pm: correct Vega12 swctf limit setting
drm/amd/pm: correct Vega10 swctf limit setting
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to add asic specific workarounds for atom
asic init while keeping the adev specifics out of the
atombios parser code.
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to restore some registers prior to running asic
init to work around a firmware bug.
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This callback can be used by asics that need to
do something special prior to calling atom asic init.
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Properly define this register using a relative offset rather
than an absolute offset and use the proper SOC15 macros to
access it. It's also DCN, not DCE, so remove it from the
DCE12 header.
No functional change.
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>