Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.
v2: fix typos spotted by David Wu.
v3: fix additional typo spotted by David.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The field job->vm is used in function amdgpu_job_run to get the page
table re-generation counter and decide whether the job should be skipped.
Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use.
For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed,
then the gfx job should be skipped.
Fixes: 26c95e838e ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare")
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use kmalloc_array() instead of kmalloc() with multiplication.
kmalloc_array() is a safer way because of its multiply overflow check.
Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Apparently, both DCE 6.0 and 6.4 have 3 PLLs, but PLL0 can only
be used for DP. Make sure to initialize the correct amount of PLLs
in DC for these DCE versions and use PLL0 only for DP.
Also, on DCE 6.0 and 6.4, the PLL0 needs to be powered on at
initialization as opposed to DCE 6.1 and 7.x which use a different
clock source for DFS.
The following functions were used as reference from the old
radeon driver implementation of DCE 6.x:
- radeon_atom_pick_pll
- atombios_crtc_set_disp_eng_pll
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
dc_clk_mgr_create accidentally overwrites the dce60_clk_mgr
with the dce_clk_mgr, causing incorrect behaviour on DCE6.
Fix it by removing the extra dce_clk_mgr_construct.
Fixes: 62eab49faa ("drm/amd/display: hide VGH asic specific structs")
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move amdgpu_device_health_check into amdgpu_device_gpu_recover to
ensure that if the device is present can be checked before reset
The reason is:
1.During the dpc event, the device where the dpc event occurs is not
present on the bus
2.When both dpc event and ATHUB event occur simultaneously,the dpc thread
holds the reset domain lock when detecting error,and the gpu recover thread
acquires the hive lock.The device is simultaneously in the states of
amdgpu_ras_in_recovery and occurs_dpc,so gpu recover thread will not go to
amdgpu_device_health_check.It waits for the reset domain lock held by the
dpc thread, but dpc thread has not released the reset domain lock.In the dpc
callback slot_reset,to obtain the hive lock, the hive lock is held by the
gpu recover thread at this time.So a deadlock occurred
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The initial comment block before
amdgpu_fence_driver_guilty_force_completion() incorrectly used '/**' but
is not a kernel-doc comment, causing build warnings.
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:742: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Kernel queue reset handling
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GPUs with multi-xcc have multiple MQDs per queue. This patch saves and
restores all the MQDs within the partition.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The variable `pm_suspend_target_state` is conditionally defined only when
`CONFIG_SUSPEND` is enabled (see `include/linux/suspend.h`). Directly
referencing it without guarding by `#ifdef CONFIG_SUSPEND` causes build
failures when suspend functionality is disabled (e.g., `CONFIG_SUSPEND=n`).
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Previously we tried to keep the HW specific PTE flags in each mapping,
but for CRIU that isn't sufficient any more since the original value is
needed for the checkpoint procedure.
So rework the whole handling, nuke the early mapping function, keep the
UAPI flags in each mapping instead of the HW flags and translate them to
the HW flags while filling in the PTEs.
Only tested on Navi 23 for now, so probably needs quite a bit of more
work.
v2: fix KFD and SVN handling
v3: one more SVN fix pointed out by Felix
v4: squash in gfx12 fix from David
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Summary:
* Fix caching streams for LT automation
* Fix DMUB command alignment
* Disabling DSC power gating on DCN314
* Add debugfs for Replay
* Add debug option for BW allocation mode
* Removal of unnecessary includes for faster compilation
* Refactor of code, including adding SPDX license to amdgpu_dm
Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Last LT automation update can cause crash by referencing current_state and
calling into dc_update_planes_and_stream which may clobber current_state.
[HOW]
Cache relevant stream pointers and iterate through them instead of relying
on the current_state.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why & how]
Header misalignment in struct dmub_cmd_replay_copy_settings_data and
struct dmub_alpm_auxless_data causes incorrect data read between driver
and dmub.
Fix the misalignment and ensure that everything is aligned to 4-byte
boundaries.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Users can access the replay residency to get PHY off percentage
[How]
Start capture residency:
sudo echo 1 /sys/kernel/debug/dri/0/eDP-1/replay_residency
Stop and Get replay residency:
sudo cat /sys/kernel/debug/dri/0/eDP-1/replay_residency
Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
In the worst case, AUX intra-hop done can take hundreds of milliseconds as
each retimer in a link might have to wait a full AUX_RD_INTERVAL to send
LT abort downstream.
[HOW]
Wait 300ms for each retimer in a link to allow time to propagate a LT abort
without infinitely waiting on intra-hop done.
For no-retimer case, keep the max duration at 10ms.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
Allow for sharing of some link and audio link functions by removing static
keyword from function definitions.
Expose those functions in the HWSEQ header.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 82139760dc.
[why & how]
DMUB header changes should be submitted to firmware branch first and
allowed to propagate to driver. Currently, this change breaks linux
builds so need to revert it until it's ready.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
If symclk RCO is enabled, stream encoder may not be receiving an ungated
clock by the time we attempt to set stream attributes when setting dpms
on. Since the clock is gated, register writes to the stream encoder fail.
[HOW]
Move set_stream_attribute call into enable_stream, just after the point
where symclk32_se is ungated.
Logically there is no need to set stream attributes as early as is
currently done in link_set_dpms_on, so this should have no impact beyond
the RCO fix.
Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>