Commit Graph

116970 Commits

Author SHA1 Message Date
Dave Airlie
9a3f210737 Merge tag 'drm-xe-fixes-2025-09-11' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Don't touch survivability_mode on fini (Michal)
- Fixes around eviction and suspend (Thomas)
- Extend Wa_13011645652 to PTL-H, WCL (Julia)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aMLq7QlaEPHGKXKX@intel.com
2025-09-12 09:44:07 +10:00
Dave Airlie
dab1f85526 Merge tag 'drm-misc-fixes-2025-09-11' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A maintainer update, an out-of-bound check for panthor and a revert for
nouveau to fix a race.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250911-glistening-uakari-of-serendipity-06ceb1@houat
2025-09-12 09:34:48 +10:00
Dave Airlie
f2c8bbb6e9 Merge tag 'mediatek-drm-fixes-20250910' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes
Mediatek DRM Fixes - 20250910

1. fix potential OF node use-after-free

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://lore.kernel.org/r/20250910231813.3526-1-chunkuang.hu@kernel.org
2025-09-12 09:31:27 +10:00
Dave Airlie
1d00adb873 Merge tag 'amd-drm-fixes-6.17-2025-09-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.17-2025-09-10:

amdgpu:
- PSP 11.x fix
- DPCD quirk handing fix
- DCN 3.5 PG fix
- Audio suspend fix
- OEM i2c clean up fix
- Module unload memory leak fix
- DC delay fix
- ISP firmware fix
- VCN fixes

amdkfd:
- P2P topology fix
- APU mem limit calculation fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250910162855.2507853-1-alexander.deucher@amd.com
2025-09-12 09:24:57 +10:00
Dave Airlie
467360e295 Merge tag 'drm-intel-fixes-2025-09-10' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix size for for_each_set_bit() in abox iteration [display] (Jani Nikula)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://lore.kernel.org/r/aMFUtRdJ46qK-EXl@linux
2025-09-12 09:21:20 +10:00
Dave Airlie
2c38074c36 Merge tag 'drm-rust-fixes-2025-09-05' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-fixes
- Add drm-rust tree to MAINTAINERS
- Require CONFIG_64BIT for Nova

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/aLquN1YvdyI_6PJS@google.com
2025-09-12 09:05:42 +10:00
Johan Hovold
9ba2556cef drm/mediatek: clean up driver data initialisation
The platform and drm devices are only used to look up the drm device and
its driver data respectively when initialising the driver data during
bind().

Drop the reference counts as soon as they have been used to make the
code more readable.

Note that the crtc count is never incremented on lookup failures.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829090345.21075-3-johan@kernel.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-09-10 12:52:59 +00:00
Johan Hovold
4de37a48b6 drm/mediatek: fix potential OF node use-after-free
The for_each_child_of_node() helper drops the reference it takes to each
node as it iterates over children and an explicit of_node_put() is only
needed when exiting the loop early.

Drop the recently introduced bogus additional reference count decrement
at each iteration that could potentially lead to a use-after-free.

Fixes: 1f403699c4 ("drm/mediatek: Fix device/node reference count leaks in mtk_drm_get_all_drm_priv")
Cc: Ma Ke <make24@iscas.ac.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829090345.21075-2-johan@kernel.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-09-10 12:49:37 +00:00
David Rosca
3318f2d20c drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any time
There is no reason to require this to happen on first submitted IB only.
We need to wait for the queue to be idle, but it can be done at any
time (including when there are multiple video sessions active).

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8908fdce06)
Cc: stable@vger.kernel.org
2025-09-09 16:42:26 -04:00
David Rosca
2b10cb58d7 drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packages
There can be multiple engine info packages in one IB and the first one
may be common engine, not decode/encode.
We need to parse the entire IB instead of stopping after finding first
engine info.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dc8f9f0f45)
Cc: stable@vger.kernel.org
2025-09-09 16:41:49 -04:00
Pratap Nirujogi
857ccfc19f drm/amd/amdgpu: Declare isp firmware binary file
Declare isp firmware file isp_4_1_1.bin required by isp4.1.1 device.

Suggested-by: Alexey Zagorodnikov <xglooom@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d97b74a833)
Cc: stable@vger.kernel.org
2025-09-09 16:41:15 -04:00
Alex Deucher
1d66c3f2b8 drm/amd/display: use udelay rather than fsleep
This function can be called from an atomic context so we can't use
fsleep().

Fixes: 01f60348d8 ("drm/amd/display: Fix 'failed to blank crtc!'")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4549
Cc: Wen Chen <Wen.Chen3@amd.com>
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 27e4dc2c05)
2025-09-09 16:39:16 -04:00
Alex Deucher
7838fb5f11 drm/amdgpu: fix a memory leak in fence cleanup when unloading
Commit b61badd20b ("drm/amdgpu: fix usage slab after free")
reordered when amdgpu_fence_driver_sw_fini() was called after
that patch, amdgpu_fence_driver_sw_fini() effectively became
a no-op as the sched entities we never freed because the
ring pointers were already set to NULL.  Remove the NULL
setting.

Reported-by: Lin.Cao <lincao12@amd.com>
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Fixes: b61badd20b ("drm/amdgpu: fix usage slab after free")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a525fa37aa)
Cc: stable@vger.kernel.org
2025-09-09 16:38:26 -04:00
Julia Filipchuk
fd99415ec8 drm/xe: Extend Wa_13011645652 to PTL-H, WCL
Expand workaround to additional graphics architectures.

Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.17+
Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250903190122.1028373-2-julia.filipchuk@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 6fc957185e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09 13:20:36 -04:00
Thomas Hellström
eb5723a751 drm/xe: Block exec and rebind worker while evicting for suspend / hibernate
When the xe pm_notifier evicts for suspend / hibernate, there might be
racing tasks trying to re-validate again. This can lead to suspend taking
excessive time or get stuck in a live-lock. This behaviour becomes
much worse with the fix that actually makes re-validation bring back
bos to VRAM rather than letting them remain in TT.

Prevent that by having exec and the rebind worker waiting for a completion
that is set to block by the pm_notifier before suspend and is signaled
by the pm_notifier after resume / wakeup.

It's probably still possible to craft malicious applications that block
suspending. More work is pending to fix that.

v3:
- Avoid wait_for_completion() in the kernel worker since it could
  potentially cause work item flushes from freezable processes to
  wait forever. Instead terminate the rebind workers if needed and
  re-launch at resume. (Matt Auld)
v4:
- Fix some bad naming and leftover debug printouts.
- Fix kerneldoc.
- Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld).
- Rework the interface of xe_vm_rebind_resume_worker (Matt Auld).

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
Fixes: c6a4d46ec1 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com
(cherry picked from commit 599334572a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09 13:20:31 -04:00
Thomas Hellström
d84820309e drm/xe: Allow the pm notifier to continue on failure
Its actions are opportunistic anyway and will be completed
on device suspend.

Marking as a fix to simplify backporting of the fix
that follows in the series.

v2:
- Keep the runtime pm reference over suspend / hibernate and
  document why. (Matt Auld, Rodrigo Vivi):

Fixes: c6a4d46ec1 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-3-thomas.hellstrom@linux.intel.com
(cherry picked from commit ebd546fdff)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09 13:20:26 -04:00
Thomas Hellström
5c87fee3c9 drm/xe: Attempt to bring bos back to VRAM after eviction
VRAM+TT bos that are evicted from VRAM to TT may remain in
TT also after a revalidation following eviction or suspend.

This manifests itself as applications becoming sluggish
after buffer objects get evicted or after a resume from
suspend or hibernation.

If the bo supports placement in both VRAM and TT, and
we are on DGFX, mark the TT placement as fallback. This means
that it is tried only after VRAM + eviction.

This flaw has probably been present since the xe module was
upstreamed but use a Fixes: commit below where backporting is
likely to be simple. For earlier versions we need to open-
code the fallback algorithm in the driver.

v2:
- Remove check for dgfx. (Matthew Auld)
- Update the xe_dma_buf kunit test for the new strategy (CI)
- Allow dma-buf to pin in current placement (CI)
- Make xe_bo_validate() for pinned bos a NOP.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995
Fixes: a78a8da51b ("drm/ttm: replace busy placement with flags v6")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit cb3d7b3b46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09 13:20:22 -04:00
Michal Wajdeczko
7934fdc25a drm/xe/configfs: Don't touch survivability_mode on fini
This is a user controlled configfs attribute, we should not
modify that outside the configfs attr.store() implementation.

Fixes: bc417e54e2 ("drm/xe: Enable configfs support for survivability mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com
(cherry picked from commit 079a5c83db)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09 13:20:17 -04:00
Yifan Zhang
5350355627 amd/amdkfd: correct mem limit calculation for small APUs
Current mem limit check leaks some GTT memory (reserved_for_pt
reserved_for_ras + adev->vram_pin_size) for small APUs.

Since carveout VRAM is tunable on APUs, there are three case
regarding the carveout VRAM size relative to GTT:

1. 0 < carveout < gtt
   apu_prefer_gtt = true, is_app_apu = false

2. carveout > gtt / 2
   apu_prefer_gtt = false, is_app_apu = false

3. 0 = carveout
   apu_prefer_gtt = true, is_app_apu = true

It doesn't make sense to check below limitation in case 1
(default case, small carveout) because the values in the below
expression are mixed with carveout and gtt.

adev->kfd.vram_used[xcp_id] + vram_needed >
    vram_size - reserved_for_pt - reserved_for_ras -
    atomic64_read(&adev->vram_pin_size)

gtt: kfd.vram_used, vram_needed, vram_size
carveout: reserved_for_pt, reserved_for_ras, adev->vram_pin_size

In case 1, vram allocation will go to gtt domain, skip vram check
since ttm_mem_limit check already cover this allocation.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fa7c99f04f)
2025-09-09 12:28:28 -04:00
Eric Huang
ce42a3b581 drm/amdkfd: fix p2p links bug in topology
When creating p2p links, KFD needs to check XGMI link
with two conditions, hive_id and is_sharing_enabled,
but it is missing to check is_sharing_enabled, so add
it to fix the error.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 36cc7d1317)
2025-09-09 12:28:18 -04:00
Geoffrey McRae
1dfd2864a1 drm/amd/display: remove oem i2c adapter on finish
Fixes a bug where unbinding of the GPU would leave the oem i2c adapter
registered resulting in a null pointer dereference when applications try
to access the invalid device.

Fixes: 3d5470c973 ("drm/amd/display/dm: add support for OEM i2c bus")
Cc: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 89923fb7ea)
Cc: stable@vger.kernel.org
2025-09-09 12:27:52 -04:00
Mario Limonciello (AMD)
60f71f0db7 drm/amd/display: Drop dm_prepare_suspend() and dm_complete()
[Why]
dm_prepare_suspend() was added in commit 50e0bae34f
("drm/amd/display: Add and use new dm_prepare_suspend() callback")
to allow display to turn off earlier in the suspend sequence.

This caused a regression that HDMI audio sometimes didn't work
properly after resume unless audio was playing during suspend.

[How]
Drop dm_prepare_suspend() callback. All code in it will still run
during dm_suspend(). Also drop unnecessary dm_complete() callback.
dm_complete() was used for failed prepare and also for any case
of successful resume.  The code in it already runs in dm_resume().

This change will introduce more time that the display is turned on
during suspend sequence. The compositor can turn it off sooner if
desired.

Cc: Harry Wentland <harry.wentland@amd.com>
Reported-by: Przemysław Kopa <prz.kopa@gmail.com>
Closes: https://lore.kernel.org/amd-gfx/1cea0d56-7739-4ad9-bf8e-c9330faea2bb@kernel.org/T/#m383d9c08397043a271b36c32b64bb80e524e4b0f
Reported-by: Kalvin <hikaph+oss@gmail.com>
Closes: https://github.com/alsa-project/alsa-lib/issues/465
Closes: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4809
Fixes: 50e0bae34f ("drm/amd/display: Add and use new dm_prepare_suspend() callback")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2fd653b9bb)
Cc: stable@vger.kernel.org
2025-09-09 12:26:28 -04:00
Ovidiu Bunea
70f0b051f8 drm/amd/display: Correct sequences and delays for DCN35 PG & RCG
[why]
The current PG & RCG programming in driver has some gaps and incorrect
sequences.

[how]
Added delays after ungating clocks to allow ramp up, increased polling
to allow more time for power up, and removed the incorrect sequences.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1bde5584e2)
Cc: stable@vger.kernel.org
2025-09-09 12:25:22 -04:00
Fangzhi Zuo
f5c32370db drm/amd/display: Disable DPCD Probe Quirk
Disable dpcd probe quirk to native aux.

Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4500
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250904191351.746707-1-Jerry.Zuo@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c5f4fb4058)
Cc: stable@vger.kernel.org # 6.16.y: 5281cbe0b5
Cc: stable@vger.kernel.org # 6.16.y: 0b4aa85e89
Cc: stable@vger.kernel.org # 6.16.y: b87ed522b3
Cc: stable@vger.kernel.org # 6.16.y
2025-09-09 12:23:05 -04:00
Jani Nikula
cfa7b76597 drm/i915/power: fix size for for_each_set_bit() in abox iteration
for_each_set_bit() expects size to be in bits, not bytes. The abox mask
iteration uses bytes, but it works by coincidence, because the local
variable holding the mask is unsigned long, and the mask only ever has
bit 2 as the highest bit. Using a smaller type could lead to subtle and
very hard to track bugs.

Fixes: 62afef2811 ("drm/i915/rkl: RKL uses ABOX0 for pixel transfers")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org # v5.9+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250905104149.1144751-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 7ea3baa6ef)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2025-09-09 09:08:37 +01:00
Lijo Lazar
440cec4ca1 drm/amdgpu: Wait for bootloader after PSPv11 reset
Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead
of checking for C2PMSG_33 status, add the callback wait_for_bootloader.
Wait for bootloader to be back to steady state is already part of the
generic mode-1 reset flow. Increase the retry count for bootloader wait
and also fix the mask to prevent fake pass.

Fixes: 8345a71fc5 ("drm/amdgpu: Add more checks to PSP mailbox")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4531
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 32f73741d6)
2025-09-08 11:05:53 -04:00
Dave Airlie
8b556ddeee Merge tag 'amd-drm-fixes-6.17-2025-09-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.17-2025-09-03:

amdgpu:
- UserQ fixes
- MES 11 fix
- eDP/LVDS fix
- Fix non-DC audio clean up
- Fix duplicate cursor issue
- Fix error path in PSP init

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250903221656.251254-1-alexander.deucher@amd.com
2025-09-05 08:06:34 +10:00
Chia-I Wu
a00f2015ac drm/panthor: validate group queue count
A panthor group can have at most MAX_CS_PER_CSG panthor queues.

Fixes: 4bdca11507 ("drm/panthor: Add the driver frontend block")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # v1
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250903192133.288477-1-olvaffe@gmail.com
2025-09-04 15:59:23 +01:00
Dave Airlie
40bcf6ecf9 Merge tag 'drm-xe-fixes-2025-09-03' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Fix incorrect migration of backed-up object to VRAM (Thomas)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aLiP26TiHkYxtBXL@intel.com
2025-09-04 12:52:19 +10:00
Dave Airlie
42e0a73bf7 Merge tag 'drm-misc-fixes-2025-09-03' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Two nouveau interrupt handling fixes, one race fix for ivpu, a race fix
for drm_sched, and a clock fix for ti-sn65dsi86.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/qc2rd7bskgufjtyspbjflyjpswcnhyja6s7nm2yb67j7hezyey@yfn2w6n5trff
2025-09-04 12:36:11 +10:00
Philipp Stanner
d506703472 Revert "drm/nouveau: Remove waitque for sched teardown"
This reverts:

commit bead880022 ("drm/nouveau: Remove waitque for sched teardown")
commit 5f46f5c7af ("drm/nouveau: Add new callback for scheduler teardown")

from the drm/sched teardown leak fix series:

https://lore.kernel.org/dri-devel/20250710125412.128476-2-phasta@kernel.org/

The aforementioned series removed a blocking waitqueue from
nouveau_sched_fini(). It was mistakenly assumed that this waitqueue only
prevents jobs from leaking, which the series fixed.

The waitqueue, however, also guarantees that all VM_BIND related jobs
are finished in order, cleaning up mappings in the GPU's MMU. These jobs
must be executed sequentially. Without the waitqueue, this is no longer
guaranteed, because entity and scheduler teardown can race with each
other.

Revert all patches related to the waitqueue removal.

Fixes: bead880022 ("drm/nouveau: Remove waitque for sched teardown")
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250901083107.10206-2-phasta@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-09-03 23:16:59 +02:00
Colin Ian King
467e00b30d drm/amd/amdgpu: Fix missing error return on kzalloc failure
Currently the kzalloc failure check just sets reports the failure
and sets the variable ret to -ENOMEM, which is not checked later
for this specific error. Fix this by just returning -ENOMEM rather
than setting ret.

Fixes: 4fb9307154 ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1ee9d1a096)
2025-09-03 16:27:56 -04:00
Michael Walle
bdd5a14e66 drm/bridge: ti-sn65dsi86: fix REFCLK setting
The bridge has three bootstrap pins which are sampled to determine the
frequency of the external reference clock. The driver will also
(over)write that setting. But it seems this is racy after the bridge is
enabled. It was observed that although the driver write the correct
value (by sniffing on the I2C bus), the register has the wrong value.
The datasheet states that the GPIO lines have to be stable for at least
5us after asserting the EN signal. Thus, there seems to be some logic
which samples the GPIO lines and this logic appears to overwrite the
register value which was set by the driver. Waiting 20us after
asserting the EN line resolves this issue.

Fixes: a095f15c00 ("drm/bridge: add support for sn65dsi86 bridge driver")
Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250821122341.1257286-1-mwalle@kernel.org
2025-09-02 09:56:05 -07:00
Thomas Hellström
379b3c983f drm/xe: Fix incorrect migration of backed-up object to VRAM
If an object is backed up to shmem it is incorrectly identified
as not having valid data by the move code. This means moving
to VRAM skips the -EMULTIHOP step and the bo is cleared. This
causes all sorts of weird behaviour on DGFX if an already evicted
object is targeted by the shrinker.

Fix this by using ttm_tt_is_swapped() to identify backed-up
objects.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5996
Fixes: 00c8efc318 ("drm/xe: Add a shrinker for xe bos")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250828134837.5709-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 1047bd8279)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-02 09:00:47 -04:00
Pierre-Eric Pelloux-Prayer
232674e1a6 drm/sched: Fix racy access to drm_sched_entity.dependency
The drm_sched_job_unschedulable trace point can access
entity->dependency after it was cleared by the callback
installed in drm_sched_entity_add_dependency_cb, causing:

BUG: kernel NULL pointer dereference, address: 0000000000000020
[...]
Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched]
RIP: 0010:trace_event_raw_event_drm_sched_job_unschedulable+0x70/0xd0 [gpu_sched]

To fix this we either need to keep a reference to the fence before
setting up the callbacks, or move the trace_drm_sched_job_unschedulable
calls into drm_sched_entity_add_dependency_cb where they can be
done earlier.

Fixes: 76d97c870f ("drm/sched: Trace dependencies for GPU jobs")

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250901124032.1955-1-pierre-eric.pelloux-prayer@amd.com
(cherry picked from commit b2b8af21fe)
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-09-02 12:58:56 +02:00
Danilo Krummrich
5c5a41a754 gpu: nova-core: depend on CONFIG_64BIT
If built on architectures with CONFIG_ARCH_DMA_ADDR_T_64BIT=y nova-core
produces that following build failures:

    error[E0308]: mismatched types
      --> drivers/gpu/nova-core/fb.rs:49:59
       |
    49 |         hal::fb_hal(chipset).write_sysmem_flush_page(bar, page.dma_handle())?;
       |                              -----------------------      ^^^^^^^^^^^^^^^^^ expected `u64`, found `u32`
       |                              |
       |                              arguments to this method are incorrect
       |
    note: method defined here
      --> drivers/gpu/nova-core/fb/hal.rs:19:8
       |
    19 |     fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result;
       |        ^^^^^^^^^^^^^^^^^^^^^^^
    help: you can convert a `u32` to a `u64`
       |
    49 |         hal::fb_hal(chipset).write_sysmem_flush_page(bar, page.dma_handle().into())?;
       |                                                                            +++++++

    error[E0308]: mismatched types
      --> drivers/gpu/nova-core/fb.rs:65:47
       |
    65 |         if hal.read_sysmem_flush_page(bar) == self.page.dma_handle() {
       |            -------------------------------    ^^^^^^^^^^^^^^^^^^^^^^ expected `u64`, found `u32`
       |            |
       |            expected because this is `u64`
       |
    help: you can convert a `u32` to a `u64`
       |
    65 |         if hal.read_sysmem_flush_page(bar) == self.page.dma_handle().into() {
       |                                                                     +++++++

    error: this arithmetic operation will overflow
       --> drivers/gpu/nova-core/falcon.rs:469:23
        |
    469 |             .set_base((dma_start >> 40) as u16)
        |                       ^^^^^^^^^^^^^^^^^ attempt to shift right by `40_i32`, which would overflow
        |
        = note: `#[deny(arithmetic_overflow)]` on by default

This is due to the code making assumptions on the width of dma_addr_t to
be 64 bit.

While this could technically be handled, it is rather painful to deal
with, as the following example illustrates:

	pub(super) fn read_sysmem_flush_page_ga100(bar: &Bar0) -> DmaAddress {
	    let addr = u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08())
	        << FLUSH_SYSMEM_ADDR_SHIFT
	        | u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::read(bar).adr_63_40())
	            << FLUSH_SYSMEM_ADDR_SHIFT_HI;

	    addr.try_into().unwrap_or_else(|_| {
	        kernel::warn_on!(true);

	        0
	    })
	}

At the same time there's not much value for nova-core to support 32-bit,
given that the supported GPU architectures are Turing and later, hence
depend on CONFIG_64BIT.

Cc: John Hubbard <jhubbard@nvidia.com>
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://lore.kernel.org/lkml/20250828160247.37492-1-ojeda@kernel.org/
Fixes: 6554ad65b5 ("gpu: nova-core: register sysmem flush page")
Fixes: 69f5cd67ce ("gpu: nova-core: add falcon register definitions and base code")
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: https://lore.kernel.org/r/20250828223954.351348-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-09-01 11:06:37 +02:00
Faith Ekstrand
2cb66ae604 nouveau: Membar before between semaphore writes and the interrupt
This ensures that the memory write and the interrupt are properly
ordered and we won't wake up the kernel before the semaphore write has
hit memory.

Fixes: b1ca384772 ("drm/nouveau/gv100-: switch to volta semaphore methods")
Cc: stable@vger.kernel.org
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250829021633.1674524-2-airlied@gmail.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-08-29 18:36:51 +02:00
Dave Airlie
0ef5c4e4db nouveau: fix disabling the nonstall irq due to storm code
Nouveau has code that when it gets an IRQ with no allowed handler
it disables it to avoid storms.

However with nonstall interrupts, we often disable them from
the drm driver, but still request their emission via the push submission.

Just don't disable nonstall irqs ever in normal operation, the
event handling code will filter them out, and the driver will
just enable/disable them at load time.

This fixes timeouts we've been seeing on/off for a long time,
but they became a lot more noticeable on Blackwell.

This doesn't fix all of them, there is a subsequent fence emission
fix to fix the last few.

Fixes: 3ebd64aa3c ("drm/nouveau/intr: support multiple trees, and explicit interfaces")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250829021633.1674524-1-airlied@gmail.com
[ Fix a typo and a minor checkpatch.pl warning; remove "v2" from commit
  subject. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-08-29 18:36:23 +02:00
Ivan Lipski
3ebf766c35 drm/amd/display: Clear the CUR_ENABLE register on DCN314 w/out DPP PG
[Why&How]
ON DCN314, clearing DPP SW structure without power gating it can cause a
double cursor in full screen with non-native scaling.

A W/A that clears CURSOR0_CONTROL cursor_enable flag if
dcn10_plane_atomic_power_down is called and DPP power gating is disabled.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4168
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 645f74f1dc)
Cc: stable@vger.kernel.org
2025-08-29 11:15:08 -04:00
Alex Deucher
71403f58b4 drm/amdgpu: drop hw access in non-DC audio fini
We already disable the audio pins in hw_fini so
there is no need to do it again in sw_fini.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4481
Cc: oushixiong <oushixiong1025@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5eeb16ca72)
Cc: stable@vger.kernel.org
2025-08-29 11:13:38 -04:00
Mario Limonciello
a8b79b0918 drm/amd: Re-enable common modes for eDP and LVDS
[Why]
Although compositors will add their own modes, Xorg won't use it's own
modes and will only stick to modes advertised by the driver. This mean a
user that used to pick 1024x768 could no longer access it unless the
panel's native resolution was 1024x768.

[How]
Revert commit 6d396e7ac1 ("drm/amd/display: Disable common modes for
LVDS") and commit 7948afb46a ("drm/amd/display: Disable common modes
for eDP").

The panel will still use scaling for any non-native modes due to
commit 978fa2f6d0 ("drm/amd/display: Use scaling for non-native
resolutions on eDP")

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250828140856.2887993-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c2fbf72fe3)
2025-08-29 11:13:04 -04:00
Alex Deucher
5171848bdf drm/amdgpu/mes11: make MES_MISC_OP_CHANGE_CONFIG failure non-fatal
If the firmware is too old, just warn and return success.

Fixes: 27b7915147 ("drm/amdgpu/mes: keep enforce isolation up to date")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4414
Cc: shaoyun.Liu@amd.com
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9f28af76fa)
Cc: stable@vger.kernel.org
2025-08-29 11:12:44 -04:00
Jesse.Zhang
2d41a4bfee drm/amdgpu/sdma: bump firmware version checks for user queue support
Using the previous firmware could lead to problems with
PROTECTED_FENCE_SIGNAL commands, specifically causing register
conflicts between MCU_DBG0 and MCU_DBG1.

The updated firmware versions ensure proper alignment
and unification of the SDMA_SUBOP_PROTECTED_FENCE_SIGNAL value with SDMA 7.x,
resolving these hardware coordination issues

Fixes: e8cca30d8b ("drm/amdgpu/sdma6: add ucode version checks for userq support")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aab8b689ad)
Cc: stable@vger.kernel.org
2025-08-29 11:11:45 -04:00
Dave Airlie
42d2abbfa8 Merge tag 'mediatek-drm-fixes-20250829' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes
Mediatek DRM Fixes - 20250829

1. Add error handling for old state CRTC in atomic_disable
2. Fix DSI host and panel bridge pre-enable order
3. Fix device/node reference count leaks in mtk_drm_get_all_drm_priv
4. mtk_hdmi: Fix inverted parameters in some regmap_update_bits calls

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://lore.kernel.org/r/20250828234116.4960-1-chunkuang.hu@kernel.org
2025-08-29 10:05:10 +10:00
Louis-Alexis Eyraud
c34414883f drm/mediatek: mtk_hdmi: Fix inverted parameters in some regmap_update_bits calls
In mtk_hdmi driver, a recent change replaced custom register access
function calls by regmap ones, but two replacements by regmap_update_bits
were done incorrectly, because original offset and mask parameters were
inverted, so fix them.

Fixes: d6e25b3590 ("drm/mediatek: hdmi: Use regmap instead of iomem for main registers")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250818-mt8173-fix-hdmi-issue-v1-1-55aff9b0295d@collabora.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-08-28 23:15:41 +00:00
Dave Airlie
49862587fa Merge tag 'drm-msm-fixes-2025-08-26' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Fixes for v6.17-rc4

Core/GPU:
- fix comment doc warning in gpuvm
- fix build with KMS disabled
- fix pgtable setup/teardown race
- global fault counter fix
- various error path fixes
- GPU devcoredump snapshot fixes
- handle in-place VM_BIND remaps to solve turnip vm update race
- skip re-emitting IBs for unusable VMs
- Don't use %pK through printk
- moved display snapshot init earlier, fixing a crash

DPU:
- Fixed crash in virtual plane checking code
- Fixed mode comparison in virtual plane checking code

DSI:
- Adjusted width of resulution-related registers
- Fixed locking issue on 14nm PLLs

UBWC (per Bjorn's ack)
- Added UBWC configuration for several missing platforms (fixing
  regression)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <rob.clark@oss.qualcomm.com>
Link: https://lore.kernel.org/r/CACSVV02+u1VW1dzuz6JWwVEfpgTj6Y-JXMH+vX43KsKTVsW+Yg@mail.gmail.com
2025-08-29 09:05:18 +10:00
Dave Airlie
4b1c24ef50 Merge tag 'amd-drm-fixes-6.17-2025-08-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.17-2025-08-28:

amdgpu:
- UserQ fixes
- Revert CSA fix
- SR-IOV fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250828173904.75850-1-alexander.deucher@amd.com
2025-08-29 08:50:44 +10:00
Dave Airlie
60d98e1a8d Merge tag 'drm-misc-fixes-2025-08-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Several nouveau fixes to remove unused code, fix an error path and be
less restrictive with the formats it accepts. A fix for amdgpu to pin
vmapped dma-buf, and a revert for tegra for a regression in the dma-buf
/ GEM code.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250828-hypersonic-colorful-squirrel-64f04b@houat
2025-08-29 08:44:53 +10:00
Alex Deucher
c767d74a9c drm/amdgpu/userq: fix error handling of invalid doorbell
If the doorbell is invalid, be sure to set the r to an error
state so the function returns an error.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7e2a5b0a9a)
Cc: stable@vger.kernel.org
2025-08-27 14:01:52 -04:00
Jesse.Zhang
ee38ea0ae4 drm/amdgpu: update firmware version checks for user queue support
The minimum firmware versions required for user queue functionality
have been increased to address an issue where the queue privilege
state was lost during queue connect operations.

The problem occurred because the privilege state was being restored
to its initial value at the beginning of the function, overwriting
the state that was properly set during the queue connect case.

This commit updates the minimum version requirements:
- ME firmware from 2390 to 2420
- PFP firmware from 2530 to 2580
- MEC firmware from 2600 to 2650
- MES firmware remains at 120

These updated firmware versions contain the necessary fixes to
properly maintain queue privilege state throughout connect operations.

Fixes: 61ca97e959 ("drm/amdgpu: Add fw minimum version check for usermode queue")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5f976c9939)
Cc: stable@vger.kernel.org
2025-08-27 14:01:32 -04:00