[Why]
If amdgpu_dm failed to initalize before amdgpu_dm_initialize_drm_device()
completed then freeing atomic_obj will lead to list corruption.
[How]
Check if atomic_obj state is initialized before trying to free.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
If PSR-SU is disabled on the link, then configuring su_y granularity in
mod_power_calc_psr_configs() can lead to assertions in
psr_su_set_dsc_slice_height().
[How]
Check the PSR version in amdgpu_dm_link_setup_psr() to determine whether
or not to configure granularity.
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
"REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line"
warnings seen after resuming from s2idle.
DCN314 has issues with DSC power gating that cause REG_WAIT timeouts
when attempting to power down DSC blocks.
[How]
Disable dsc_power_gate for dcn314 by default.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If vm belongs to another process, this is fclose after fork,
wait may enable signaling KFD eviction fence and cause parent process queue evicted.
[677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu]
[677852.634814] __dma_fence_enable_signaling+0x3e/0xe0
[677852.634820] dma_fence_wait_timeout+0x3a/0x140
[677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl]
[677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu]
[677852.635026] amdgpu_flush+0x34/0x50 [amdgpu]
[677852.635208] filp_flush+0x38/0x90
[677852.635213] filp_close+0x14/0x30
[677852.635216] do_close_on_exec+0xdd/0x130
[677852.635221] begin_new_exec+0x1da/0x490
[677852.635225] load_elf_binary+0x307/0xea0
[677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5
[677852.635235] ? ima_bprm_check+0xa2/0xd0
[677852.635240] search_binary_handler+0xda/0x260
[677852.635245] exec_binprm+0x58/0x1a0
[677852.635249] bprm_execve.part.0+0x16f/0x210
[677852.635254] bprm_execve+0x45/0x80
[677852.635257] do_execveat_common.isra.0+0x190/0x200
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Gang Ba <Gang.Ba@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
HUBBUB structure is not initialized on DCE hardware, so check if it is NULL
to avoid null dereference while accessing amdgpu_dm_capabilities file in
debugfs.
Signed-off-by: Peter Shkenev <mustela@erminea.space>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
After a recent change in clang to expose uninitialized warnings from
const variables and pointers [1], there is a warning in
imu_v12_0_program_rlc_ram() because data is passed uninitialized to
program_imu_rlc_ram():
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized]
374 | program_imu_rlc_ram(adev, data, (const u32)size);
| ^~~~
As this warning happens early in clang's frontend, it does not realize
that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is
never actually called, and even if it were, data would not be
dereferenced because size is 0.
Just initialize data to NULL to silence the warning, as the commit that
added program_imu_rlc_ram() mentioned it would eventually be used over
the old method, at which point data can be properly initialized and
used.
Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2107
Fixes: 56159fffaa ("drm/amdgpu: use new method to program rlc ram")
Link: 2464313eef [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the device is running older pcode firmware, it is possible that newer
mailbox commands are not supported by it. The sysfs attributes aren't
useful in that case, but we shouldn't fail driver probe because of it.
As of now, it is unknown if we can distinguish unsupported commands before
attempting them. But until we figure out a way to do that, fix the
regressions.
v2: Add debug message (Lucas)
Fixes: cdc36b66cd ("drm/xe: Expose fan control and voltage regulator version")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250714215503.2897748-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit ed5461daa1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Nova changes for v6.17
DMA:
- Merge topic/dma-features-2025-06-23 from alloc tree.
- Clarify wording and be consistent in 'coherent' nomenclature.
- Convert the read!() / write!() macros to return a Result.
- Add as_slice() / write() methods in CoherentAllocation.
- Fix doc-comment of dma_handle().
- Expose count() and size() in CoherentAllocation and add the
corresponding type invariants.
- Implement CoherentAllocation::dma_handle_with_offset().
nova-core:
- Various register!() macro improvements.
- Custom Sleep / Delay helpers (until the actual abstractions land).
- Add DMA object abstraction.
- VBIOS
- Image parser / iterator.
- PMU table look up in FWSEC.
- FWSEC ucode extraction.
- Register sysmem flush page.
- Falcon
- Generic falcon boot code and HAL (Ampere).
- GSP / SEC2 specific code.
- FWSEC-FRTS
- Compute layout of FRTS region (FbLayout and HAL).
- Load into GSP falcon and execute.
- Add Documentation for VBIOS layout, Devinit process, Fwsec operation
and layout, Falcon basics.
- Update and annotate TODO list.
- Add Alexandre Courbot as co-maintainer.
Rust:
- Make ETIMEDOUT error available.
- Add size constants up to SZ_2G.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: "Danilo Krummrich" <dakr@kernel.org>
Link: https://lore.kernel.org/r/DBFKLDMUGZD9.Z93GN2N5B0FI@kernel.org
Rather than checking in the callbacks, check if the reset
type is supported in the caller.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set the dirty bit when the memory resource is not cleared
during BO release.
v2(Christian):
- Drop the cleared flag set to false.
- Improve the amdgpu_vram_mgr_set_clear_state() function.
v3:
- Add back the resource clear flag set function call after
being cleared during eviction (Christian).
- Modified the patch subject name.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cached metrics data validity is 1ms on SMUv13.0.6 SOCs. It's not
reasonable for any client to query gpu_metrics at a faster rate and
constantly interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If dpm tables are already populated on SMU v13.0.6 SOCs, use the cached
data. Otherwise, fetch values from firmware.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>