Summary for changes in firmware:
* Use panel_inst instead of otg_inst when getting fw state
* Contrast strength improves when HDR desktop mode
* Ensure pipes have no outstanding HUBP requests prior to IPS RCG entry
* Check for vm request and vm idle status in IPS1/2 entry sequence
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
An invalidation request arriving during prefetch can potentially hang
the system if dynamic clock gating is enabled and memory power requests
are disabled.
[How]
• Disable clock gating and enable memory power requests for the duration
of the prefetch.
• Turn on clock gating and disable memory power requests again after
prefetch is complete.
Limit the scope for DCN35 and DCN42 only.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
why:
dcn35 currently uses a hardcoded DSC display clock value which is incorrect
for some asic types. Newer DCN versions retrieve dsc display clock from
clk_mgr. The same can be done for dcn35.
how:
Refactor the DSC cap calculation using pre-existing logic.
Handle ODM combine requirements in dc_dsc.c.
Replace hardcoded display clock with actual value retrieved from clk_mgr.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Mohit Bawa <Mohit.Bawa@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The status of various power features is often important information when
debugging certain issues, such as underflow. This info helps to
narrow down the potential sources of errors.
[How]
Add dc interface to capture power feature enablement status.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When changing resolution (e.g., 4K → FHD) in mirror/clone mode with
certain monitors, the monitor blanks and loses connection due to an early
exit in vrr_settings_require_update(). The function only checks if VRR
state, fixed refresh target, or min/max refresh rate range has changed.
During mode changes, if the calculated min/max refresh values remain the
same even though the stream's v_total changed, the function returns early
without updating vrr_params.adjust.v_total_min/max, leaving the monitor's
VRR timing parameters unsynced with the new mode, causing it to blank out.
[How]
Explicitly adjust VRR parameters to the stream's nominal v_total when VRR
is supported, but inactive.
Fixes: 6d31602a9f ("drm/amd/display: more liberal vmin/vmax update for freesync")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
smu busy is a normal case when calling SMU_MSG_GetBadPageCount, so no need
to print error status at each time.Instead, only print error status when
timeout given by user is reached.
Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The busy status returned by ras_eeprom_update_record_num may not be
an error, increase timeout to exclude false busy status. Also add more
comments to make the code readable.
v2: define a macro for the timeout value.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check if bad page threshold is reached and take actions accordingly.
v2: remove rma message sent to smu when pmfw manages eeprom.
v3: add null pointer check for con.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix a potential deadlock caused by inconsistent spinlock usage
between interrupt and process contexts in the userq fence driver.
The issue occurs when amdgpu_userq_fence_driver_process() is called
from both:
- Interrupt context: gfx_v11_0_eop_irq() -> amdgpu_userq_fence_driver_process()
- Process context: amdgpu_eviction_fence_suspend_worker() ->
amdgpu_userq_fence_driver_force_completion() -> amdgpu_userq_fence_driver_process()
In interrupt context, the spinlock was acquired without disabling
interrupts, leaving it in {IN-HARDIRQ-W} state. When the same lock
is acquired in process context, the kernel detects inconsistent
locking since the process context acquisition would enable interrupts
while holding a lock previously acquired in interrupt context.
Kernel log shows:
[ 4039.310790] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 4039.310804] kworker/7:2/409 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 4039.310818] ffff9284e1bed000 (&fence_drv->fence_list_lock){?...}-{3:3},
[ 4039.310993] {IN-HARDIRQ-W} state was registered at:
[ 4039.311004] lock_acquire+0xc6/0x300
[ 4039.311018] _raw_spin_lock+0x39/0x80
[ 4039.311031] amdgpu_userq_fence_driver_process.part.0+0x30/0x180 [amdgpu]
[ 4039.311146] amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu]
[ 4039.311257] gfx_v11_0_eop_irq+0x132/0x170 [amdgpu]
Fix by using spin_lock_irqsave()/spin_unlock_irqrestore() to properly
manage interrupt state regardless of calling context.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm_sched_entity_init wasn't called yet, so the only thing to
do is to release allocated memory.
This doesn't fix any bug since entity is zero allocated and
drm_sched_entity_fini does nothing in this case.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch fixes the formatting in the patch
"amdkfd: Do not wait for queue op response during reset"
Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain multi-GPU configurations (especially GFX12) may hit
data corruption when a DCC-compressed VRAM surface is shared across GPUs
using peer-to-peer (P2P) DMA transfers.
Such surfaces rely on device-local metadata and cannot be safely accessed
through a remote GPU’s page tables. Attempting to import a DCC-enabled
surface through P2P leads to incorrect rendering or GPU faults.
This change disables P2P for DCC-enabled VRAM buffers that are contiguous
and allocated on GFX12+ hardware. In these cases, the importer falls back
to the standard system-memory path, avoiding invalid access to compressed
surfaces.
Future work could consider optional migration (VRAM→System→VRAM) if a
performance regression is observed when `attach->peer2peer = false`.
Tested on:
- Dual RX 9700 XT (Navi4x) setup
- GNOME and Wayland compositor scenarios
- Confirmed no corruption after disabling P2P under these conditions
v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS.
v3: simplify for upsteam and fix ip version check (Alex)
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
Adding macros to simplify the process of adding new error codes.
Currently, to add an error code, the developer needs to add both the
enum and the string translation. This is error prone and can lead to
inconsistencies. The refactor adds a macro to automatically add the
string translation based on the enum.
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In legacy way, bad page is queried from MCA registers, switch to
getting it from PMFW when PMFW manages eeprom data.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-misc-next for v6.19-rc1:
UAPI Changes:
- Add userptr support to ivpu.
- Add IOCTL's for resource and telemetry data in amdxdna.
Core Changes:
- Improve some atomic state checking handling.
- drm/client updates.
- Use forward declarations instead of including drm_print.h
- RUse allocation flags in ttm_pool/device_init and allow specifying max
useful pool size and propagate ENOSPC.
- Updates and fixes to scheduler and bridge code.
- Add support for quirking DisplayID checksum errors.
Driver Changes:
- Assorted cleanups and fixes in rcar-du, accel/ivpu, panel/nv3052cf,
sti, imxm, accel/qaic, accel/amdxdna, imagination, tidss, sti,
panthor, vkms.
- Add Samsung S6E3FC2X01 DDIC/AMS641RW, Synaptics TDDI series DSI,
TL121BVMS07-00 (IL79900A) panels.
- Add mali MediaTek MT8196 SoC gpu support.
- Add etnaviv GC8000 Nano Ultra VIP r6205 support.
- Document powervr ge7800 support in the devicetree.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/5afae707-c9aa-4a47-b726-5e1f1aa7a106@linux.intel.com
drm/i915 feature pull for v6.19:
Features and functionality:
- Enable LNL+ content adaptive sharpness filter (CASF) (Nemesa)
- Use optimized VRR guardband (Ankit, Ville)
- Enable Xe3p LT PHY (Suraj)
- Enable FBC support for Xe3p_LPD display (Sai Teja, Vinod)
- Specify DMC firmware for display version 30.02 (Dnyaneshwar)
- Report reason for disabling PSR to debugfs (Michał)
- Extend i915_display_info with Type-C port details (Khaled)
- Log DSI send packet sequence errors and contents
Refactoring and cleanups:
- Refactoring to prepare for VRR guardband optimization (Ankit)
- Abstract VRR live status wait (Ankit)
- Refactor VRR and DSB timing to handle Set Context Latency explicitly (Ankit)
- Helpers for prefill latency calculations (Ville)
- Refactor SKL+ watermark latency setup (Ville)
- VRR refactoring and cleanups (Ville)
- SKL+ universal plane cleanups (Ville)
- Decouple CDCLK from state->modeset refactor (Ville)
- Refactor VLV/CHV clock functions (Jani)
- Refactor fbdev handling (Jani)
- Call i915 and xe runtime PM from display via function pointers (Jouni)
- IRQ code refactoring (Jani)
- Drop display dependency on i915 feature check macros (Jani)
- Refactor and unify i915 and xe stolen memory interfaces towards display (Jani)
- Switch to driver agnostic drm to display pointer chase (Jani)
- Use display version over graphics version in display code (Matt A)
- GVT cleanups (Jonathan, Andi)
- Rename a VLV clock function to unify (Michał)
- Explicitly sanitize DMC package header num entries (Luca)
- Remove redundant port clock check from ALPM (Jouni)
- Use sysfs_emit() instead of sprintf() in PMU sysfs (Madhur Kumar)
- Clean up C20 PHY PLL register macros (Imre, Mika))
- Abstract "address in MMIO table" helper for general use (Matt A)
- Improve VRR platform abstractions (Ville)
- Move towards more standard PCI PM code usage (Ville)
- Framebuffer refactoring (Ville)
- Drop display dependency on i915_utils.h (Jani)
- Include cleanups (Jani)
Fixes:
- Workaround docking station DSC issues with high pixel clock and bpp (Imre)
- Fix Panel Replay in DSC mode (Imre)
- Disable tracepoints for PREEMPT_RT as a workaround (Maarten)
- Fix intel_crtc_get_vblank_counter() on PREEMPT_RT (Maarten)
- Fix C10 PHY identification on PTL/WCL (Dnyaneshwar)
- Take AS SDP into account with optimized guardband (Jouni)
- Fix panic structure allocation memory leak (Jani)
- Adjust an FBC workaround platforms (Vinod)
- Add fallback for CDCLK selection (Naladala)
- Avoid using invalid transcoder in MST transport select (Suraj)
- Don't use cursor size reduction on display version 14+ (Nemesa)
- Fix C20 PHY PLL register programming (Imre, Mika)
- Fix PSR frontbuffer flush handling (Jouni)
- Store ALPM parameters in crtc state (Jouni)
- Defeature DRRS on LNL+ (Ville)
- Fix the scope of the large DRAM DIMM workaround (Ville)
- Fix PICA vs. AUX power ordering issue (Gustavo)
- Fix pixel rate for computing watermark line time (Ville)
- Fix framebuffer set_tiling vs. addfb race (Ville)
- DMC event handler fixes (Ville)
DRM Core:
- CRTC sharpness strength property (Nemesa)
- DPCD DSC quirk for Synaptics Panamera devices (Imre)
- Helpers to query the branch DSC max throughput/line-width (Imre)
Merges:
- Backmerge drm-next for v6.18-rc and to sync with drm-xe-next (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/ec5a05f2df6d597a62033ee2d57225cce707b320@intel.com
[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.
[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst
V2: Adjust the commit msg a bit
Fixes: bc068194f5 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT")
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PMFW will manage RAS eeprom data by itself, add new interface to read
eeprom data via PMFW, we can read part of records by setting index.
v2: use IPID parse interface.
pa is not used and set it to a fixed value.
v3: optimize the null pointer check for IPID parse interface.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So we can call it in other blocks.
v2: add a new IPID parse interface for umc and we can
implement it for each ASIC.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
commit c760bcda83 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.
Add that to the failure handling path even on a quick failure.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679
Fixes: c760bcda83 ("drm/amd: Check whether secure display TA loaded successfully")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When amdgpu_device_health_check fails, amdgpu_ras_pre_reset
will not be called and therefore amdgpu_ras_post_reset
cannot be called either.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.
Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.
The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add IOCTL debug bit for logging user provided parameter validation
errors.
Refactor several warning and error messages to better reflect fault
reason. User generated faults should not flood kernel messages with
warnings or errors, so change those to ivpu_dbg(). Add additional debug
logs for parameter validation in IOCTLs.
Check size provided by in metric streamer start and return -EINVAL
together with a debug message print.
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251104132418.970784-1-karol.wachowski@linux.intel.com