linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-04 17:05:12 -04:00

Author	SHA1	Message	Date
Harish Kasiviswanathan	6b61a54e68	drm/amdgpu: Fix double deletion of validate_list If amdgpu_amdkfd_gpuvm_free_memory_of_gpu() fails after kgd_mem is removed from validate_list, the mem handle still lingers in the KFD idr. This means when process is terminated, kfd_process_free_outstanding_kfd_bos() will call amdgpu_amdkfd_gpuvm_free_memory_of_gpu() again resulting in double deletion. To avoid this - (a) Check if list is empty before deleting it (b) Rearragne amdgpu_amdkfd_gpuvm_free_memory_of_gpu() such that it can be safely called again if it returns failure the first time. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6ba60345f4`)	2026-02-03 17:24:21 -05:00
Melissa Wen	84962445cd	drm/amd/display: remove assert around dpp_base replacement There is nothing wrong if in_shaper_func type is DISTRIBUTED POINTS. Remove the assert placed for a TODO to avoid misinterpretations. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1714dcc4c2`)	2026-02-03 17:21:58 -05:00
Melissa Wen	d25b32aa82	drm/amd/display: extend delta clamping logic to CM3 LUT helper Commit `27fc10d109` ("drm/amd/display: Fix the delta clamping for shaper LUT") fixed banding when using plane shaper LUT in DCN10 CM helper. The problem is also present in DCN30 CM helper, fix banding by extending the same bug delta clamping fix to CM3. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0274a54897`)	2026-02-03 17:21:50 -05:00
Melissa Wen	8f959d37c1	drm/amd/display: fix wrong color value mapping on MCM shaper LUT Some shimmer/colorful points appears when using the steamOS color pipeline for HDR on gaming with DCN32. These points look like black values being wrongly mapped to red/blue/green values. It was caused because the number of hw points in regular LUTs and in a shaper LUT was treated as the same. DCN3+ regular LUTs have 257 bases and implicit deltas (i.e. HW calculates them), but shaper LUT is a special case: it has 256 bases and 256 deltas, as in DCN1-2 regular LUTs, and outputs 14-bit values. Fix that by setting by decreasing in 1 the number of HW points computed in the LUT segmentation so that shaper LUT (i.e. fixpoint == true) keeps the same DCN10 CM logic and regular LUTs go with `hw_points + 1`. CC: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Fixes: `4d5fd3d08e` ("drm/amd/display: PQ tail accuracy") Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5006505b19`)	2026-02-03 17:21:43 -05:00
Bert Karwatzki	243b467dea	Revert "drm/amd: Check if ASPM is enabled from PCIe subsystem" This reverts commit `7294863a6f`. This commit was erroneously applied again after commit `0ab5d711ec` ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device") removed it, leading to very hard to debug crashes, when used with a system with two AMD GPUs of which only one supports ASPM. Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/ Link: https://github.com/acpica/acpica/issues/1060 Fixes: `0ab5d711ec` ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device") Signed-off-by: Bert Karwatzki <spasswolf@web.de> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `97a9689300`) Cc: stable@vger.kernel.org	2026-02-03 17:20:59 -05:00
Mario Limonciello	1478a34470	drm/amd: Set minimum version for set_hw_resource_1 on gfx11 to 0x52 commit `f81cd79311` ("drm/amd/amdgpu: Fix MES init sequence") caused a dependency on new enough MES firmware to use amdgpu. This was fixed on most gfx11 and gfx12 hardware with commit `0180e0a5dd` ("drm/amdgpu/mes: add compatibility checks for set_hw_resource_1"), but this left out that GC 11.0.4 had breakage at MES 0x51. Bump the requirement to 0x52 instead. Reported-by: danijel@nausys.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4576 Fixes: `f81cd79311` ("drm/amd/amdgpu: Fix MES init sequence") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c2d2ccc85f`) Cc: stable@vger.kernel.org	2026-02-03 17:20:38 -05:00
Linus Torvalds	4327db89f5	Merge tag 'drm-fixes-2026-01-30' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Seems to be a bit quieter this week, mostly xe and amdgpu, with msm and imx fixes and one WARN_ON from user blocked. Nothing of note outstanding either. uapi: - Fix a WARN_ON() when passing an invalid handle to drm_gem_change_handle_ioctl() msm: - GPU: - Fix bogus hwcg register update for a690 xe: - Skip address copy for sync-only execs - Fix a WA - Derive mem_copy cap from graphics version - Fix is_bound() pci_dev lifetime - xe nvm cleanup fixes amdgpu: - SMU 13 fixes - SMU 14 fixes - GPUVM fault filter fix - Powergating fix - HDMI debounce fix - Xclk fix for soc21 APUs - Fix COND_EXEC handling for GC 11 - GC 10-12 KGQ init fixes - GC 11-12 KGQ reset fixes imx/tve: - drop ddc device reference when unloading" * tag 'drm-fixes-2026-01-30' of https://gitlab.freedesktop.org/drm/kernel: (21 commits) drm/xe/nvm: Fix double-free on aux add failure drm/xe/nvm: Manage nvm aux cleanup with devres drm/amdgpu/gfx12: adjust KGQ reset sequence drm/amdgpu/gfx11: adjust KGQ reset sequence drm/amdgpu/gfx12: fix wptr reset in KGQ init drm/amdgpu/gfx11: fix wptr reset in KGQ init drm/amdgpu/gfx10: fix wptr reset in KGQ init drm/xe/configfs: Fix is_bound() pci_dev lifetime drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule() drm/amdgpu/soc21: fix xclk for APUs drm/amd/display: Clear HDMI HPD pending work only if it is enabled drm/imx/tve: fix probe device leak drm/amd/pm: fix race in power state check before mutex lock drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove drm/amd/pm: fix smu v14 soft clock frequency setting issue drm/amd/pm: fix smu v13 soft clock frequency setting issue drm/xe: derive mem copy capability from graphics version drm/xe/xelp: Fix Wa_18022495364 drm/xe: Skip address copy for sync-only execs drm: Do not allow userspace to trigger kernel warnings in drm_gem_change_handle_ioctl() ...	2026-01-29 23:20:51 -08:00
Linus Torvalds	bcb6058a4b	Merge tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 9 are cc:stable, 12 are for MM. There's a patch series from Pratyush Yadav which fixes a few things in the new-in-6.19 LUO memfd code. Plus the usual shower of singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: vmcoreinfo: make hwerr_data visible for debugging mm/zone_device: reinitialize large zone device private folios mm/mm_init: don't cond_resched() in deferred_init_memmap_chunk() if called from deferred_grow_zone() mm/kfence: randomize the freelist on initialization kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho: init alloc tags when restoring pages from reserved memory mm: memfd_luo: restore and free memfd_luo_ser on failure mm: memfd_luo: use memfd_alloc_file() instead of shmem_file_setup() memfd: export alloc_file() flex_proportions: make fprop_new_period() hardirq safe mailmap: add entry for Viacheslav Bocharov mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn mm/memory-failure: fix missing ->mf_stats count in hugetlb poison mm, swap: restore swap_space attr aviod kernel panic mm/kasan: fix KASAN poisoning in vrealloc() mm/shmem, swap: fix race of truncate and swap entry split	2026-01-29 11:09:13 -08:00
Alex Deucher	dfd64f6e8c	drm/amdgpu/gfx12: adjust KGQ reset sequence Kernel gfx queues do not need to be reinitialized or remapped after a reset. Align with gfx11. v2: preserve init and remap for MMIO case. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0a6d6ed694`) Cc: stable@vger.kernel.org	2026-01-29 12:39:21 -05:00
Alex Deucher	3eb46fbb60	drm/amdgpu/gfx11: adjust KGQ reset sequence Kernel gfx queues do not need to be reinitialized or remapped after a reset. This fixes queue reset failures on APUs. v2: preserve init and remap for MMIO case. Fixes: `b3e9bfd866` ("drm/amdgpu/gfx11: add ring reset callbacks") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b340ff216f`) Cc: stable@vger.kernel.org	2026-01-29 12:39:21 -05:00
Alex Deucher	9077d32a4b	drm/amdgpu/gfx12: fix wptr reset in KGQ init wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `a2918f958d`) Cc: stable@vger.kernel.org	2026-01-29 12:39:15 -05:00
Alex Deucher	b1f810471c	drm/amdgpu/gfx11: fix wptr reset in KGQ init wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1f16866bdb`) Cc: stable@vger.kernel.org	2026-01-29 12:39:09 -05:00
Alex Deucher	cc4f433b14	drm/amdgpu/gfx10: fix wptr reset in KGQ init wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e80b1d1aa1`) Cc: stable@vger.kernel.org	2026-01-29 12:38:54 -05:00
Alex Deucher	b1defcdc44	drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule() The EXEC_COUNT field must be > 0. In the gfx shadow handling we always emit a cond_exec packet after the gfx_shadow packet, but the EXEC_COUNT never gets patched. This leads to a hang when we try and reset queues on gfx11 APUs. Fixes: `c68cbbfd54` ("drm/amdgpu: cleanup conditional execution") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ba205ac3d6`) Cc: stable@vger.kernel.org	2026-01-28 16:35:00 -05:00
Alex Deucher	e7fbff9e76	drm/amdgpu/soc21: fix xclk for APUs The reference clock is supposed to be 100Mhz, but it appears to actually be slightly lower (99.81Mhz). Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14451 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `637fee3954`) Cc: stable@vger.kernel.org	2026-01-28 16:34:24 -05:00
Ivan Lipski	acecfee885	drm/amd/display: Clear HDMI HPD pending work only if it is enabled [Why&How] On amdgpu_dm_connector_destroy(), the driver attempts to cancel pending HDMI HPD work without checking if the HDMI HPD is enabled. Added a check that it is enabled before clearing it. Fixes: `6a681cd903` ("drm/amd/display: Add an hdmi_hpd_debounce_delay_ms module") Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `17b2c526fd`)	2026-01-28 15:16:34 -05:00
Yang Wang	ee8d07cd57	drm/amd/pm: fix race in power state check before mutex lock The power state check in amdgpu_dpm_set_powergating_by_smu() is done before acquiring the pm mutex, leading to a race condition where: 1. Thread A checks state and thinks no change is needed 2. Thread B acquires mutex and modifies the state 3. Thread A returns without updating state, causing inconsistency Fix this by moving the mutex lock before the power state check, ensuring atomicity of the state check and modification. Fixes: `6ee27ee27b` ("drm/amd/pm: avoid duplicate powergate/ungate setting") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `7a3fbdfd19`)	2026-01-27 18:25:15 -05:00
Jon Doron	8b1ecc9377	drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and ih2 interrupt ring buffers are not initialized. This is by design, as these secondary IH rings are only available on discrete GPUs. See vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when AMD_IS_APU is set. However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to get the timestamp of the last interrupt entry. When retry faults are enabled on APUs (noretry=0), this function is called from the SVM page fault recovery path, resulting in a NULL pointer dereference when amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[]. The crash manifests as: BUG: kernel NULL pointer dereference, address: 0000000000000004 RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu] Call Trace: amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu] svm_range_restore_pages+0xae5/0x11c0 [amdgpu] amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu] gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu] amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu] amdgpu_ih_process+0x84/0x100 [amdgpu] This issue was exposed by commit `1446226d32` ("drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1") which changed the default for Renoir APU from noretry=1 to noretry=0, enabling retry fault handling and thus exercising the buggy code path. Fix this by adding a check for ih1.ring_size before attempting to use it. Also restore the soft_ih support from commit `dd29944165` ("drm/amdgpu: Rework retry fault removal"). This is needed if the hardware doesn't support secondary HW IH rings. v2: additional updates (Alex) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814 Fixes: `dd29944165` ("drm/amdgpu: Rework retry fault removal") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6ce8d536c8`) Cc: stable@vger.kernel.org	2026-01-27 18:24:39 -05:00
Yang Wang	239d0ccf56	drm/amd/pm: fix smu v14 soft clock frequency setting issue v1: resolve the issue where some freq frequencies cannot be set correctly due to insufficient floating-point precision. v2: patch this convert on 'max' value only. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `53868dd877`) Cc: stable@vger.kernel.org	2026-01-27 18:24:21 -05:00
Yang Wang	c764b7af15	drm/amd/pm: fix smu v13 soft clock frequency setting issue v1: resolve the issue where some freq frequencies cannot be set correctly due to insufficient floating-point precision. v2: patch this convert on 'max' value only. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6194f60c70`) Cc: stable@vger.kernel.org	2026-01-27 18:24:00 -05:00
Matthew Brost	12b2285bf3	mm/zone_device: reinitialize large zone device private folios Reinitialize metadata for large zone device private folios in zone_device_page_init prior to creating a higher-order zone device private folio. This step is necessary when the folio's order changes dynamically between zone_device_page_init calls to avoid building a corrupt folio. As part of the metadata reinitialization, the dev_pagemap must be passed in from the caller because the pgmap stored in the folio page may have been overwritten with a compound head. Without this fix, individual pages could have invalid pgmap fields and flags (with PG_locked being notably problematic) due to prior different order allocations, which can, and will, result in kernel crashes. Link: https://lkml.kernel.org/r/20260116111325.1736137-2-francois.dugast@intel.com Fixes: `d245f9b4ab` ("mm/zone_device: support large zone device private folios") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:03:48 -08:00
Dave Airlie	2312e0ab59	Merge tag 'amd-drm-fixes-6.19-2026-01-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.19-2026-01-22: amdgpu: - GC 12 fix - Misc error path fixes - DC analog fix - SMU 6 fixes - TLB flush fix - DC idle optimization fix amdkfd: - GC 11 cooperative launch fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260122204308.946339-1-alexander.deucher@amd.com	2026-01-23 08:12:39 +10:00
Alex Deucher	f377ea0561	Revert "drm/amd/display: pause the workload setting in dm" This reverts commit `bc6d54ac7e`. The workload profile needs to be in the default state when the dc idle optimizaion state is entered. However, when jobs come in for video or GFX or compute, the profile may be set to a non-default profile resulting in the dc idle optimizations not taking affect and resulting in higher power usage. As such we need to pause the workload profile changes during this transition. When this patch was originally committed, it caused a regression with a Dell U3224KB display, but no other problems were reported at the time. When it was reapplied (this patch) to address increased power usage, it seems to have caused additional regressions. This change seems to have a number of side affects (audio issues, stuttering, etc.). I suspect the pause should only happen when all displays are off or in static screen mode, but I think this call site gets called more often than that which results in idle state entry more often than intended. For now revert. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4894 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4717 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4725 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4517 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4806 Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Cc: Roman Li <Roman.Li@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1412482b71`)	2026-01-22 12:10:44 -05:00
Chaitanya Kumar Borah	7d8257fe25	drm/amd/display: Fix color pipeline enum name leak dm_plane_init_colorops() allocates enum names for color pipelines. These are eventually passed to drm_property_create_enum() which create its own copies of the string. Free the strings after initialization is done. Also, allocate color pipeline enum names only after successfully creating color pipeline. Fixes: `9ba25915ef` ("drm/amd/display: Add support for sRGB EOTF in DEGAM block") Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Alex Deucher <alexander.deucher@amd.com> #irc Link: https://patch.msgid.link/20260113102303.724205-3-chaitanya.kumar.borah@intel.com	2026-01-22 10:24:55 +01:00
Alex Deucher	095ca81517	drm/amdgpu: fix type for wptr in ring backup Needs to be a u64. Fixes: `77cc0da39c` ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `56fff1941a`)	2026-01-21 14:55:56 -05:00
Timur Kristóf	fd2ac113a5	drm/amdgpu: Fix validating flush_gpu_tlb_pasid() When a function holds a lock and we return without unlocking it, it deadlocks the kernel. We should always unlock before returning. This commit fixes suspend/resume on SI. Tested on two Tahiti GPUs: FirePro W9000 and R9 280X. Fixes: `f4db9913e4` ("drm/amdgpu: validate the flush_gpu_tlb_pasid()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202601190121.z9C0uml5-lkp@intel.com/ Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e3a6eff92b`)	2026-01-21 14:55:44 -05:00
Timur Kristóf	764a90eb02	drm/amd/pm: Workaround SI powertune issue on Radeon 430 (v2) Radeon 430 and 520 are OEM GPUs from 2016~2017 They have the same device id: 0x6611 and revision: 0x87 On the Radeon 430, powertune is buggy and throttles the GPU, never allowing it to reach its maximum SCLK. Work around this bug by raising the TDP limits we program to the SMC from 24W (specified by the VBIOS on Radeon 430) to 32W. Disabling powertune entirely is not a viable workaround, because it causes the Radeon 520 to heat up above 100 C, which I prefer to avoid. Additionally, revise the maximum SCLK limit. Considering the above issue, these GPUs never reached a high SCLK on Linux, and the workarounds were added before the GPUs were released, so the workaround likely didn't target these specifically. Use 780 MHz (the maximum SCLK according to the VBIOS on the Radeon 430). Note that the Radeon 520 VBIOS has a higher maximum SCLK: 905 MHz, but in practice it doesn't seem to perform better with the higher clock, only heats up more. v2: Move the workaround to si_populate_smc_tdp_limits. Fixes: `841686df9f` ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `966d70f1e1`)	2026-01-21 14:55:33 -05:00
Timur Kristóf	d5077426e1	drm/amd/pm: Don't clear SI SMC table when setting power limit There is no reason to clear the SMC table. We also don't need to recalculate the power limit then. Fixes: `841686df9f` ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e214d62625`)	2026-01-21 14:55:33 -05:00
Timur Kristóf	4ca284c6d1	drm/amd/pm: Fix si_dpm mmCG_THERMAL_INT setting Use WREG32 to write mmCG_THERMAL_INT. This is a direct access register. Fixes: `841686df9f` ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2555f4e4a7`)	2026-01-21 14:53:51 -05:00
Timur Kristóf	f6cc7f1c11	drm/amd/display: Only poll analog connectors Analog connectors may be hot-plugged unlike other connector types that don't support HPD. Stop DRM from polling other connector types that don't support HPD, such as eDP, LVDS, etc. These were wrongly polled when analog connector support was added, causing issues with the seamless boot process. Fixes: `c4f3f114e7` ("drm/amd/display: Poll analog connectors (v3)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e924c7004b`)	2026-01-20 21:53:34 -05:00
Alex Deucher	82a401ceff	drm/amdgpu: fix error handling in ib_schedule() If fence emit fails, free the fence if necessary. Fixes: `db36632ea5` ("drm/amdgpu: clean up and unify hw fence handling") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5eb680a060`)	2026-01-20 21:53:18 -05:00
Jonathan Kim	b6aff8bb0c	drm/amdkfd: fix gfx11 restrictions on debugging cooperative launch Restrictions on debugging cooperative launch for GFX11 devices should align to CWSR work around requirements. i.e. devices without the need for the work around should not be subject to such restrictions. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: James Zhu <james.zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `230ef3977d`)	2026-01-20 21:50:12 -05:00
Jiqian Chen	8e96b36d9b	drm/amdgpu: free hw_vm_fence when fail in amdgpu_job_alloc If drm_sched_job_init fails, hw_vm_fence is not freed currently, then cause memory leak. Fixes: `db36632ea5` ("drm/amdgpu: clean up and unify hw fence handling") Link: https://lore.kernel.org/amd-gfx/a5a828cb-0e4a-41f0-94c3-df31e5ddad52@amd.com/T/#t Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Reviewed-by: Amos Kong <kongjianjun@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5d42ee457c`)	2026-01-20 21:50:03 -05:00
Likun Gao	1034325332	drm/amdgpu: remove frame cntl for gfx v12 Remove emit_frame_cntl function for gfx v12, which is not support. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5aaa5058de`) Cc: stable@vger.kernel.org	2026-01-20 21:49:25 -05:00
Ivan Lipski	d04f73668b	drm/amd/display: Add an hdmi_hpd_debounce_delay_ms module [Why&How] Right now, the HDMI HPD filter is enabled by default at 1500ms. We want to disable it by default, as most modern displays with HDMI do not require it for DPMS mode. The HPD can instead be enabled as a driver parameter with a custom delay value in ms (up to 5000ms). Fixes: `c918e75e1e` ("drm/amd/display: Add an HPD filter for HDMI") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4859 Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6a681cd903`)	2026-01-14 15:07:43 -05:00
Srinivasan Shanmugam	b2426a211d	drm/amdgpu/userq: Fix fence reference leak on queue teardown v2 The user mode queue keeps a pointer to the most recent fence in userq->last_fence. This pointer holds an extra dma_fence reference. When the queue is destroyed, we free the fence driver and its xarray, but we forgot to drop the last_fence reference. Because of the missing dma_fence_put(), the last fence object can stay alive when the driver unloads. This leaves an allocated object in the amdgpu_userq_fence slab cache and triggers This is visible during driver unload as: BUG amdgpu_userq_fence: Objects remaining on __kmem_cache_shutdown() kmem_cache_destroy amdgpu_userq_fence: Slab cache still has objects Call Trace: kmem_cache_destroy amdgpu_userq_fence_slab_fini amdgpu_exit __do_sys_delete_module Fix this by putting userq->last_fence and clearing the pointer during amdgpu_userq_fence_driver_free(). This makes sure the fence reference is released and the slab cache is empty when the module exits. v2: Update to only release userq->last_fence with dma_fence_put() (Christian) Fixes: `edc762a51c` ("drm/amdgpu/userq: move some code around") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8e051e38a8`)	2026-01-14 15:07:29 -05:00
Harish Kasiviswanathan	18dbcfb46f	drm/amdkfd: No need to suspend whole MES to evict process Each queue of the process is individually removed and there is not need to suspend whole mes. Suspending mes stops kernel mode queues also causing unnecessary timeouts when running mixed work loads Fixes: `079ae5118e` ("drm/amdkfd: fix suspend/resume all calls in mes based eviction path") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4765 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3fd20580b9`)	2026-01-14 15:07:05 -05:00
Prike Liang	808c2052f0	Revert "drm/amdgpu: don't attach the tlb fence for SI" This reverts commit `820b3d376e`. It’s better to validate VM TLB flushes in the flush‑TLB backend rather than in the generic VM layer. Reverting this patch depends on commit fa7c231fc2b0 ("drm/amdgpu: validate the flush_gpu_tlb_pasid()") being present in the tree. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9163fe4d79`)	2026-01-14 15:06:51 -05:00
Prike Liang	0bea77b13b	drm/amdgpu: validate the flush_gpu_tlb_pasid() Validate flush_gpu_tlb_pasid() availability before flushing tlb. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f4db9913e4`)	2026-01-14 15:06:43 -05:00
Yang Wang	90dbc0bc2a	drm/amd/pm: fix smu overdrive data type wrong issue on smu 14.0.2 resolving the issue of incorrect type definitions potentially causing calculation errors. Fixes: `54f7f3ca98` ("drm/amdgpu/swm14: Update power limit logic") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e3a03d0ae1`)	2026-01-14 15:05:52 -05:00
Vivek Das Mohapatra	52d3d115e9	drm/amd/display: Initialise backlight level values from hw Internal backlight levels are initialised from ACPI but the values are sometimes out of sync with the levels in effect until there has been a read from hardware (eg triggered by reading from sysfs). This means that the first drm_commit can cause the levels to be set to a different value than the actual starting one, which results in a sudden change in brightness. This path shows the problem (when the values are out of sync): amdgpu_dm_atomic_commit_tail() -> amdgpu_dm_commit_streams() -> amdgpu_dm_backlight_set_level(..., dm->brightness[n]) This patch calls the backlight ops get_brightness explicitly at the end of backlight registration to make sure dm->brightness[n] is in sync with the actual hardware levels. Fixes: `2fe87f54ab` ("drm/amd/display: Set default brightness according to ACPI") Signed-off-by: Vivek Das Mohapatra <vivek@collabora.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `318b1c36d8`) Cc: stable@vger.kernel.org	2026-01-14 15:04:42 -05:00
Mario Limonciello	fee5007765	drm/amd/display: Bump the HDMI clock to 340MHz [Why] DP-HDMI dongles can execeed bandwidth requirements on high resolution monitors. This can lead to pruning the high resolution modes. HDMI 1.3 bumped the clock to 340MHz, but display code never matched it. [How] Set default to (DVI) 165MHz. Once HDMI display is identified update to 340MHz. Reported-by: Dianne Skoll <dianne@skoll.ca> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4780 Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ac1e65d8ad`) Cc: stable@vger.kernel.org	2026-01-14 15:00:39 -05:00
Mario Limonciello (AMD)	0a1253ba50	drm/amd/display: Show link name in PSR status message [Why] The PSR message was moved in commit `4321742c39` ("drm/amd/display: Move PSR support message into amdgpu_dm"). This message however shows for every single link without showing which link is which. This can send a confusing message to the user. [How] Add link name into the message. Fixes: `4321742c39` ("drm/amd/display: Move PSR support message into amdgpu_dm") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `99f77f6229`)	2026-01-14 14:59:38 -05:00
Haoxiang Li	80614c5098	drm/amdkfd: fix a memory leak in device_queue_manager_init() If dqm->ops.initialize() fails, add deallocate_hiq_sdma_mqd() to release the memory allocated by allocate_hiq_sdma_mqd(). Move deallocate_hiq_sdma_mqd() up to ensure proper function visibility at the point of use. Fixes: `11614c36bc` ("drm/amdkfd: Allocate MQD trunk for HIQ and SDMA") Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b7cccc8286`) Cc: stable@vger.kernel.org	2026-01-14 14:58:24 -05:00
Alex Deucher	b6dff005fc	drm/amdgpu: make sure userqs are enabled in userq IOCTLs These IOCTLs shouldn't be called when userqs are not enabled. Make sure they are enabled before executing the IOCTLs. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d967509651`) Cc: stable@vger.kernel.org	2026-01-14 14:57:55 -05:00
Xiaogang Chen	122b15cdbc	drm/amdgpu: Use correct address to setup gart page table for vram access Use dst input parameter to setup gart page table entries instead of using fixed location. Fixes: `237d623ae6` ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ca5d4db8db`)	2026-01-14 14:57:34 -05:00
Peter Colberg	9c81200152	Revert duplicate "drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces" This reverts commit `22a36e660d` once, which was merged twice due to an incorrect backmerge resolution. Fixes: `ce0478b02e` ("Merge tag 'v6.18-rc6' into drm-next") Signed-off-by: Peter Colberg <pcolberg@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `38a0f4cf8c`)	2026-01-14 14:51:36 -05:00
Mario Limonciello (AMD)	28695ca09d	drm/amd: Clean up kfd node on surprise disconnect When an eGPU is unplugged the KFD topology should also be destroyed for that GPU. This never happens because the fini_sw callbacks never get to run. Run them manually before calling amdgpu_device_ip_fini_early() when a device has already been disconnected. This location is intentionally chosen to make sure that the kfd locking refcount doesn't get incremented unintentionally. Cc: kent.russell@amd.com Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6a23e7b433`) Cc: stable@vger.kernel.org	2026-01-14 14:51:36 -05:00
Lu Yao	9cb6278b44	drm/amdgpu: fix drm panic null pointer when driver not support atomic When driver not support atomic, fb using plane->fb rather than plane->state->fb. Fixes: `fe151ed7af` ("drm/amdgpu: add generic display panic helper code") Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2f2a72de67`)	2026-01-14 14:51:36 -05:00
Philip Yang	292e5757b2	drm/amdgpu: Fix gfx9 update PTE mtype flag Fix copy&paste error, that should have been an assignment instead of an or, otherwise MTYPE_UC 0x3 can not be updated to MTYPE_RW 0x1. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fc1366016a`) Cc: stable@vger.kernel.org	2026-01-14 14:51:36 -05:00

1 2 3 4 5 ...

35192 Commits