Commit Graph

1237289 Commits

Author SHA1 Message Date
Nicholas Kazlauskas
e6f82bd44b drm/amd/display: Rework DC Z10 restore
[Why]
The call currently does two things:
1. Exits DMCUB from idle optimization if it was in
2. Checks DMCUB scratch register to determine if we need to call
   DMCUB to do deferred HW restore and then sends the command if it's
   ready for it.

By doing (1) we prevent driver idle from being renotified in the cases
where driver had previously allowed DC level idle optimizations via
dc_allow_idle_optimizations since it thinks:

allow == dc->idle_optimizations_allowed

...and that the operation is a no-op.

We want driver idle to be resent at the next opprotunity to do so
for video playback cases.

[How]
Migrate all usecases of dc_z10_restore to only perform (2).

Add extra calls to dc_allow_idle_optimizations to handle (1) and also
keep SW state matching with when we requested enter/exit of DMCUB
idle optimizations.

Ensure cursor idle optimizations false always get called when IPS
is supported.

Further rework/redesign is needed to decide whether we need a separate
level of DM allow vs DC allow and when to attempt re-entry.

Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
Tom Chung
5950efe25e drm/amd/display: Enable Panel Replay for static screen use case
[Why]
Enable the Panel Replay if eDP panel and ASIC support.
(prioritize Panel Replay over PSR)

[How]
- Setup the Panel Replay config during the device init
  (prioritize Panel Replay over PSR).
- Separate the Replay init function into two functions
  amdgpu_dm_link_setup_replay() and amdgpu_dm_set_replay_caps()
  to fix the issue in the earlier commit that cause PSR and Replay
  enabled at the same time.

Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
George Shen
12f72a1599 drm/amd/display: Add DP audio BW validation
[Why]
Timings with small HBlank (such as CVT RBv2) can result in insufficient
HBlank bandwidth for audio SDP transmission when DSC is active. This
will cause some higher bandwidth audio modes to fail.

The combination of CVT RBv2 timings + DSC can commonly be encountered
in MST scenarios.

[How]
Add DP audio bandwidth validation for 8b/10b MST and 128b/132b SST/MST
cases and filter out modes that cannot be supported with the current
timing config.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
Dmytro Laktyushkin
d451b534e0 drm/amd/display: Fix dml2 assigned pipe search
[Why & How]
DML2 currently finds assigned pipes in array order rather than the
existing linked list order. This results in rearranging pipe order
on flip and more importantly otg inst and pipe idx mismatch.

This change preserves the order of existing pipes and guarantees
the head pipe will have matching otg inst and pipe idx.

Reviewed-by: Gabe Teeger <gabe.teeger@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
Alvin Lee
f0ec30549a drm/amd/display: Ensure populate uclk in bb construction
[Description]
- For some SKUs, the optimal DCFCLK for each UCLK is less than the
  smallest DCFCLK STA target due to low memory bandwidth. There is
  an assumption that the DCFCLK STA targets will always be less
  than one of the optimal DCFCLK values, but this is not true for
  SKUs that have low memory bandwidth. In this case we need to
  populate the optimal UCLK for each DCFCLK STA targets as the max
  UCLK freq.
- Also fix a bug in DML where start_state is not assigned and used
  correctly.

Reviewed-by: Samson Tam <samson.tam@amd.com>
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
Charlene Liu
038c532346 drm/amd/display: Update P010 scaling cap
[Why]
Keep the same as previous APU and also insert clock dump

Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
Ovidiu Bunea
2254ab45da drm/amd/display: Fix DML2 watermark calculation
[Why]
core_mode_programming in DML2 should output watermark calculations
to locals, but it incorrectly uses mode_lib

[How]
update code to match HW DML2

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
Ilya Bakoulin
b4e05bb1de drm/amd/display: Clear OPTC mem select on disable
[Why]
Not clearing the memory select bits prior to OPTC disable can cause DSC
corruption issues when attempting to reuse a memory instance for another
OPTC that enables ODM.

[How]
Clear the memory select bits prior to disabling an OPTC.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ilya Bakoulin <ilya.bakoulin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
Wenjing Liu
3fc394111e drm/amd/display: Floor to mhz when requesting dpp disp clock changes to SMU
[Why]
SMU uses discrete dpp and disp clock levels. When we submit SMU request
for clock changes in Mhz we need to floor the requested value from Khz so
SMU will choose the next higher clock level in Khz to set. If we ceil to
Mhz, SMU will have to choose the next higher clock level after the ceil,
which could result in unnecessarily jumpping to the next level.

For example, we request 1911,111Khz which is exactly one of the SMU preset
level. If we pass 1912Mhz, SMU will choose 2150,000 khz. If we pass
1911Mhz, SMU will choose 1911,111kHz, which is the expected value.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
Nicholas Kazlauskas
6c605f4408 drm/amd/display: Port DENTIST hang and TDR fixes to OTG disable W/A
[Why]
We can experience DENTIST hangs during optimize_bandwidth or TDRs if
FIFO is toggled and hangs.

[How]
Port the DCN35 fixes to DCN314.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
Charlene Liu
012fe0674a drm/amd/display: Add logging resource checks
[Why]
When mapping resources, resources could be unavailable.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sung joon Kim <sungjoon.kim@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing
73888bad4d drm/amdgpu: Clean up errors in amdgpu_rlc.c
Fix the following errors reported by checkpatch:

ERROR: space prohibited before that '++' (ctx:WxB)

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing
1ed8ccf268 drm/amd: Clean up errors in sdma_v2_4.c
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line
ERROR: trailing statements should be on next line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing
32a0a398fc drm/amd/amdgpu: Clean up errors in amdgpu_umr.h
Fix the following errors reported by checkpatch:

spaces required around that '=' (ctx:VxV)

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing
2bb012138d drm/amdgpu: Clean up errors in amdgpu_atomfirmware.h
Fix the following errors reported by checkpatch:

ERROR: "foo* bar" should be "foo *bar"

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing
05ec623147 drm/amdgpu: Clean up errors in clearstate_gfx9.h
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing
2763da27f9 drm/amdgpu: Clean up errors in navi10_ih.c
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
Alexander Richards
f7a16fa376 drm/radeon: check PS, WS index
Theoretically, it would be possible for a buggy or malicious VBIOS to
overwrite past the bounds of the passed parameters (or its own
workspace); add bounds checking to prevent this from happening.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093
Signed-off-by: Alexander Richards <electrodeyt@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Alexander Richards
4630d5031c drm/amdgpu: check PS, WS index
Theoretically, it would be possible for a buggy or malicious VBIOS to
overwrite past the bounds of the passed parameters (or its own
workspace); add bounds checking to prevent this from happening.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093
Signed-off-by: Alexander Richards <electrodeyt@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Alvin Lee
a25dea474a drm/amd/display: Add Replay IPS register for DMUB command table
- Introduce a new Replay mode for DMUB version 0.0.199.0

Reviewed-by: Martin Leung <martin.leung@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Dillon Varone
ca25a2b5f8 drm/amd/display: Init link enc resources in dc_state only if res_pool presents
[Why & How]
res_pool is not initialized in all situations such as virtual
environments, and therefore link encoder resources should not be
initialized if res_pool is NULL.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Martin Leung <martin.leung@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Nicholas Kazlauskas
ac9c748362 drm/amd/display: Allow IPS2 during Replay
[Why & How]
Add regkey to block video playback in IPS2 by default

Allow idle optimizations in the same spot we allow Replay for
video playback usecases.

Avoid sending it when there's an external display connected by
modifying the allow idle checks to check for active non-eDP screens.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Srinivasan Shanmugam
166225e79c drm/amd/display: Fix late derefrence 'dsc' check in 'link_set_dsc_pps_packet()'
In link_set_dsc_pps_packet(), 'struct display_stream_compressor *dsc'
was dereferenced in a DC_LOGGER_INIT(dsc->ctx->logger); before the 'dsc'
NULL pointer check.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_dpms.c:905 link_set_dsc_pps_packet() warn: variable dereferenced before check 'dsc' (see line 903)

Cc: stable@vger.kernel.org
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Wenjing Liu <wenjing.liu@amd.com>
Cc: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Srinivasan Shanmugam
81d4b97068 drm/amdkfd: Fix variable dereferenced before NULL check in 'kfd_dbg_trap_device_snapshot()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:1024 kfd_dbg_trap_device_snapshot() warn: variable dereferenced before check 'entry_size' (see line 1021)

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Hawking Zhang
30df05fb74 drm/amdgpu: Align ras block enum with firmware
Driver and firmware share the same ras block enum.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Yang Wang
1714a1ffaf drm/amdgpu: replace MCA macro with ACA for XGMI
use new ACA macro to instead of MCA

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Candice Li
46e2231ce0 drm/amdgpu: Log deferred error separately
Separate deferred error from UE and CE and log it
individually.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Candice Li
9c97bf88f4 drm/amdgpu: Do bad page retirement for deferred errors
Needs to do bad page retirement for deferred errors.

v2: Drop unused dev_info.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Yang Wang
bbcbfd4363 drm/amdgpu: add xgmi v6.4.0 ACA support
add xgmi v6.4.0 ACA driver support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Yang Wang
f45e6f2b5c drm/amdgpu: add mmhub v1.8 ACA support
v1:
add mmhub v1.8 ACA driver support

v2:
use macro to define smn address value.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang
373e970a4a drm/amdgpu: add sdma v4.4.2 ACA support
v1:
add sdma v4.4.2 ACA driver support

v2:
use macro to define smn address value.
v3: squash in fix for unbalanced irqs

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang
0f3cd24e96 drm/amdgpu: add gfx v9.4.3 ACA support
v1:
add gfx v9.4.3 ACA driver support

v2:
use macro to define smn address value.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Ma Jun
e372baeb3d drm/amdgpu: Check extended configuration space register when system uses large bar
Some customer platforms do not enable mmconfig for various reasons,
such as bios bug, and therefore cannot access the GPU extend configuration
space through mmio.

When the system enters the d3cold state and resumes, the amdgpu driver
fails to resume because the extend configuration space registers of
GPU can't be restored. At this point, Usually we only see some failure
dmesg log printed by amdgpu driver, it is difficult to find the root
cause.

Therefor print a warnning message if the system can't access the
extended configuration space register when using large bar.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang
f38765de83 drm/amdgpu: add umc v12.0 ACA support
add umc v12.0 ACA driver support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang
37973b69ea drm/amdgpu: add aca sysfs support
add aca sysfs node support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang
04c4fcd263 drm/amdgpu: add amdgpu ras aca query interface
v1:
add ACA error query interface

v2:
Add a new helper function to determine whether to use ACA or MCA.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang
0c54e457ac drm/amd/pm: add aca smu backend support for smu v13.0.6
add aca smu backend support for smu v13.0.6.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Alex Deucher
26405ff430 drm/amdgpu: move kiq_reg_write_reg_wait() out of amdgpu_virt.c
It's used for more than just SR-IOV now, so move it to
amdgpu_gmc.c and rename it to better match the functionality and
update the comments in the code paths to better document
when each path is used and why.  No functional change.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun.Liu@amd.com
Cc: Christian.Koenig@amd.com
2024-01-15 18:35:36 -05:00
Alex Deucher
d3f452f3a0 drm/amdgpu: add new INFO IOCTL query for input power
Some chips provide both average and input power.  Previously
we just exposed average power, add a new query for input
power.

Example userspace:
https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Hawking Zhang
d4b9cfe2c7 drm/amdgpu: Query boot status if boot failed
Check and report firmware boot status if it doesn't
reach steady status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang
33dcda51e9 drm/amdgpu: add ACA bank dump debugfs support
add ACA bank dump debugfs support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Yang Wang
0599849c32 drm/amdgpu: add ACA kernel hardware error log support
add ACA kernel hardware error log support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang
c8cb7e09db drm/amdgpu: Query boot status if discovery failed
Check and report boot status if discovery failed.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang
cce4febb27 drm/amdgpu: Add ras helper to query boot errors v2
Add ras helper function to query boot time gpu
errors.
v2: use aqua_vanjaram smn addressing pattern

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Yang Wang
f5e4cc8461 drm/amdgpu: implement RAS ACA driver framework
v1:
implement new RAS ACA driver code framework.

v2:
- rename aca_bank_set to aca_banks.
- rename aca_source_xxx to aca_handle_xxx.

v3:
Optimize some function implementation details. (from Hawking's suggestion)

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang
ad390542ec drm/amdgpu: Init pcie_index/data address as fallback (v2)
To allow using this helper for indirect access when
nbio funcs is not available. For instance, in ip
discovery phase.

v2: define macro for pcie_index/data/index_hi fallback.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang
ea0f6dfeec drm/amdgpu: drop psp v13 query_boot_status implementation
Will replace it with new implementation to cover
boot fails in ip discovery phase.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang
ac3ff8a906 drm/amdgpu: Replace DRM_* with dev_* in amdgpu_psp.c
So kernel message has the device pcie bdf information,
which helps issue debugging especially in multiple GPU
system.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Felix Kuehling
4cabb2174d drm/amdkfd: Bump KFD ioctl version
This is not strictly a change in the IOCTL API. This version bump is meant
to indicate to user mode the presence of a number of changes and fixes
that enable the management of VA mappings in compute VMs using the GEM_VA
ioctl for DMABufs exported from KFD.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Felix Kuehling
50661eb1a2 drm/amdgpu: Auto-validate DMABuf imports in compute VMs
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
process_info->kfd_bo_list. There is no explicit KFD API call to validate
them or add eviction fences to them.

This patch automatically validates and fences dymanic DMABuf imports when
they are added to a compute VM. Revalidation after evictions is handled
in the VM code.

v2:
* Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate
* Eliminated evicted_user state, use evicted state for VM BOs and user BOs
* Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs
* Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into
  amdgpu_vm_validate, outside the vm->status_lock
* Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds
  without KFD

v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the
reservation with its fences is shared with the export, as long as all
imports are from KFD, with the exports already reserved, validated and
fenced by the KFD restore worker.

v5: Reintroduced separate evicted_user state to simplify the state machine
and CS error handling when amdgpu_vm_validate is called without a ticket.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00