Commit Graph

115300 Commits

Author SHA1 Message Date
Dave Airlie
4d33ed640f Merge tag 'drm-xe-fixes-2025-07-17' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- SR-IOV fixes for GT reset and TLB invalidation
- Fix memory copy direction during migration
- Fix alignment check on migration
- Fix MOCS and page fault init order to correctly
  account for topology

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/6jworkgupwstm4v7aohbuzod3dyz4u7pyfhshr5ifgf2xisgj3@cm5em5yupjiu
2025-07-18 14:04:06 +10:00
Dave Airlie
4399e3d84d Merge tag 'mediatek-drm-fixes-20250718' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes
Mediatek DRM Fixes - 20250718

1. Add wait_event_timeout when disabling plane
2. only announce AFBC if really supported
3. mtk_dpi: Reorder output formats on MT8195/88

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://lore.kernel.org/r/20250717232916.12372-1-chunkuang.hu@kernel.org
2025-07-18 12:03:00 +10:00
Dave Airlie
8d2ad05666 Merge tag 'amd-drm-fixes-6.16-2025-07-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.16-2025-07-17:

amdgpu:
- Fix a DC memory leak
- DCN 4.0.1 degamma LUT fix
- Fix reset counter handling for soft recovery
- GC 8 fix

radeon:
- Drop console locks when suspending/resuming

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250717171935.642380-1-alexander.deucher@amd.com
2025-07-18 11:46:16 +10:00
Dave Airlie
fbefd8adda Merge tag 'drm-intel-fixes-2025-07-17' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- DP AUX DPCD address fix (Imre)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aHkQmRhelb4Fzqau@intel.com
2025-07-18 09:59:58 +10:00
Dave Airlie
cbc3fa8288 Merge tag 'drm-misc-fixes-2025-07-16' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16 final?:
- nouveau ioctl validation fix.
- panfrost scheduler bug.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/ee784a3a-30b4-489a-8503-b1be3b09268c@linux.intel.com
2025-07-18 09:42:22 +10:00
Louis-Alexis Eyraud
5ceed7a6d3 drm/mediatek: mtk_dpi: Reorder output formats on MT8195/88
Reorder output format arrays in both MT8195 DPI and DP_INTF block
configuration by decreasing preference order instead of alphanumeric
one, as expected by the atomic_get_output_bus_fmts callback function
of drm_bridge controls, so the RGB ones are used first during the
bus format negotiation process.

Fixes: 20fa6a8fc5 ("drm/mediatek: mtk_dpi: Allow additional output formats on MT8195/88")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250606-mtk_dpi-mt8195-fix-wrong-color-v1-1-47988101b798@collabora.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-07-17 23:19:05 +00:00
Icenowy Zheng
8d121a82fa drm/mediatek: only announce AFBC if really supported
Currently even the SoC's OVL does not declare the support of AFBC, AFBC
is still announced to the userspace within the IN_FORMATS blob, which
breaks modern Wayland compositors like KWin Wayland and others.

Gate passing modifiers to drm_universal_plane_init() behind querying the
driver of the hardware block for AFBC support.

Fixes: c410fa9b07 ("drm/mediatek: Add AFBC support to Mediatek DRM driver")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: CK Hu <ck.hu@medaitek.com>
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250531121140.387661-1-uwu@icenowy.me/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-07-17 23:19:05 +00:00
Jason-JH Lin
d208261e9f drm/mediatek: Add wait_event_timeout when disabling plane
Our hardware registers are set through GCE, not by the CPU.
DRM might assume the hardware is disabled immediately after calling
atomic_disable() of drm_plane, but it is only truly disabled after the
GCE IRQ is triggered.

Additionally, the cursor plane in DRM uses async_commit, so DRM will
not wait for vblank and will free the buffer immediately after calling
atomic_disable().

To prevent the framebuffer from being freed before the layer disable
settings are configured into the hardware, which can cause an IOMMU
fault error, a wait_event_timeout has been added to wait for the
ddp_cmdq_cb() callback,indicating that the GCE IRQ has been triggered.

Fixes: 2f965be7f9 ("drm/mediatek: apply CMDQ control flow")
Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20250624113223.443274-1-jason-jh.lin@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-07-17 23:18:53 +00:00
Michal Wajdeczko
5c244eeca5 drm/xe/pf: Resend PF provisioning after GT reset
If we reload the GuC due to suspend/resume or GT reset then we
have to resend not only any VFs provisioning data, but also PF
configuration, like scheduling parameters (EQ, PT), as otherwise
GuC will continue to use default values.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com
(cherry picked from commit 1c38dd6afa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Michal Wajdeczko
81dccec448 drm/xe/pf: Prepare to stop SR-IOV support prior GT reset
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.

However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:

 $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset

this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:

 [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
 [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
 [ ] xe 0000:00:02.0: [drm] GT0: reset queued
 [ ] xe 0000:00:02.0: [drm] GT0: reset started
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
 [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
 [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed

While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
(cherry picked from commit 9f50b729dd)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Lucas De Marchi
057a7d66f9 drm/xe/migrate: Fix alignment check
The check would fail if the address is unaligned, but not when
accounting the offset. Instead of `buf | offset` it should have
been `buf + offset`. To make it more readable and also drop the
uintptr_t, just use the IS_ALIGNED() macro.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 81e139db69)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Matthew Brost
3155ac8925 drm/xe: Move page fault init after topology init
We need the topology to determine GT page fault queue size, move page
fault init after topology init.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com
(cherry picked from commit beb72acb5b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:27 -03:00
Balasubramani Vivekanandan
2a58b21ade drm/xe/mocs: Initialize MOCS index early
MOCS uc_index is used even before it is initialized in the following
callstack
    guc_prepare_xfer()
    __xe_guc_upload()
    xe_guc_min_load_for_hwconfig()
    xe_uc_init_hwconfig()
    xe_gt_init_hwconfig()

Do MOCS index initialization earlier in the device probe.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250520142445.2792824-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 241cc827c0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:12 -03:00
Matthew Auld
1381c231c1 drm/xe/migrate: fix copy direction in access_memory
After we do the modification on the host side, ensure we write the
result back to VRAM and not the other way around, otherwise the
modification will be lost if treated like a read.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com
(cherry picked from commit c12fe703ca)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:45:19 -03:00
Tejas Upadhyay
fd25fa90ed drm/xe: Dont skip TLB invalidations on VF
Skipping TLB invalidations on VF causing unrecoverable
faults. Probable reason for skipping TLB invalidations
on SRIOV could be lack of support for instruction
MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some
additional handling.

Helps in resolving,
[  704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]]
                ASID: 0
                VFID: 0
                PDATA: 0x0d92
                Faulted Address: 0x0000000002fa0000
                FaultType: 0
                AccessType: 1
                FaultLevel: 0
                EngineClass: 3 bcs
                EngineInstance: 8
[  704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22

V2:
 - Use Xmas tree (MichalW)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 97515d0b3e ("drm/xe/vf: Don't emit access to Global HWSP if VF")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit b528e896fa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:45:19 -03:00
Eeli Haapalainen
8326193401 drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resume
Commit 42cdf6f687 ("drm/amdgpu/gfx8: always restore kcq MQDs") made the
ring pointer always to be reset on resume from suspend. This caused compute
rings to fail since the reset was done without also resetting it for the
firmware. Reset wptr on the GPU to avoid a disconnect between the driver
and firmware wptr.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911
Fixes: 42cdf6f687 ("drm/amdgpu/gfx8: always restore kcq MQDs")
Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2becafc319)
Cc: stable@vger.kernel.org
2025-07-16 16:50:45 -04:00
Lijo Lazar
86790e300d drm/amdgpu: Increase reset counter only on success
Increment the reset counter only if soft recovery succeeded. This is
consistent with a ring hard reset behaviour where counter gets
incremented only if hard reset succeeded.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 25c314aa3e)
Cc: stable@vger.kernel.org
2025-07-16 16:49:04 -04:00
Thomas Zimmermann
4c1a0fc2ae drm/radeon: Do not hold console lock during resume
The function radeon_resume_kms() acquires the console lock. It is
inconsistent, as it depends on the notify_client argument. That
lock then covers a number of suspend operations that are unrelated
to the console.

Remove the calls to console_lock() and console_unlock() from the
radeon function. The console lock is only required by DRM's fbdev
emulation, which acquires it as necessary.

Also fixes a possible circular dependency between the console lock
and the client-list mutex, where the mutex is supposed to be taken
first.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fff8e05044)
2025-07-16 16:48:34 -04:00
Thomas Zimmermann
5dd0b96118 drm/radeon: Do not hold console lock while suspending clients
The radeon driver holds the console lock while suspending in-kernel
DRM clients. This creates a circular dependency with the client-list
mutex, which is supposed to be acquired first. Reported when combining
radeon with another DRM driver.

Therefore, do not take the console lock in radeon, but let the fbdev
DRM client acquire the lock when needed. This is what all other DRM
drivers so.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Closes: https://lore.kernel.org/dri-devel/0a087cfd-bd4c-48f1-aa2f-4a3b12593935@oss.qualcomm.com/
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 612ec7c69d)
2025-07-16 16:48:24 -04:00
Melissa Wen
97a0f2b5f4 drm/amd/display: Disable CRTC degamma LUT for DCN401
In DCN401 pre-blending degamma LUT isn't affecting cursor as in previous
DCN version. As this is not the behavior close to what is expected for
CRTC degamma LUT, disable CRTC degamma LUT property in this HW.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4176
---

When enabling HDR on KDE, it takes the first CRTC 1D LUT available and
apply a color transformation (Gamma 2.2 -> PQ). AMD driver usually
advertises a CRTC degamma LUT as the first CRTC 1D LUT, but it's
actually applied pre-blending. In previous HW version, it seems to work
fine because the 1D LUT was applied to cursor too, but DCN401 presents a
different behavior and the 1D LUT isn't affecting the hardware cursor.

To address the wrong gamma on cursor with HDR (see the link), I came up
with this patch that disables CRTC degamma LUT in this hw, since it
presents a different behavior than others. With this KDE sees CRTC
regamma LUT as the first post-blending 1D LUT available. This is
actually more consistent with AMD color pipeline. It was tested by the
reporter, since I don't have the HW available for local testing and
debugging.

Melissa
---

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 340231cdce)
Cc: stable@vger.kernel.org
2025-07-16 16:47:43 -04:00
Clayton King
b2ee9fa0fe drm/amd/display: Free memory allocation
[WHY]

Free memory to avoid memory leak

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Clayton King <clayton.king@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fa699acb8e)
Cc: stable@vger.kernel.org
2025-07-16 16:46:50 -04:00
Imre Deak
d34d6feaf4 drm/dp: Change AUX DPCD probe address from LANE0_1_STATUS to TRAINING_PATTERN_SET
Commit a3ef3c2da6 ("drm/dp: Change AUX DPCD probe address from
DPCD_REV to LANE0_1_STATUS") stopped using the DPCD_REV register for
DPCD probing, since this results in link training failures at least when
using an Intel Barlow Ridge TBT hub at UHBR link rates (the
DP_INTRA_HOP_AUX_REPLY_INDICATION never getting cleared after the failed
link training). Since accessing DPCD_REV during link training is
prohibited by the DP Standard, LANE0_1_STATUS (0x202) was used instead,
as it falls within the Standard's valid register address range
(0x102-0x106, 0x202-0x207, 0x200c-0x200f, 0x2216) and it fixed the link
training on the above TBT hub.

However, reading the LANE0_1_STATUS register also has a side-effect at
least on a Novatek eDP panel, as reported on the Closes: link below,
resulting in screen flickering on that panel. One clear side-effect when
doing the 1-byte probe reads from LANE0_1_STATUS during link training
before reading out the full 6 byte link status starting at the same
address is that the panel will report the link training as completed
with voltage swing 0. This is different from the normal, flicker-free
scenario when no DPCD probing is done, the panel reporting the link
training complete with voltage swing 2.

Using the TRAINING_PATTERN_SET register for DPCD probing doesn't have
the above side-effect, the panel will link train with voltage swing 2 as
expected and it will stay flicker-free. This register is also in the
above valid register range and is unlikely to have a side-effect as that
of LANE0_1_STATUS: Reading LANE0_1_STATUS is part of the link training
CR/EQ sequences and so it may cause a state change in the sink - even if
inadvertently as I suspect in the case of the above Novatek panel. As
opposed to this, reading TRAINING_PATTERN_SET is not part of the link
training sequence (it must be only written once at the beginning of the
CR/EQ sequences), so it's unlikely to cause any state change in the
sink.

As a side-note, this Novatek panel also lacks support for TPS3, while
claiming support for HBR2, which violates the DP Standard (the Standard
mandating TPS3 for HBR2).

Besides the Novatek panel (PSR 1), which this change fixes, I also
verified the change on a Samsung (PSR 1) and an Analogix (PSR 2) eDP
panel as well as on the Intel Barlow Ridge TBT hub.

Note that in the drm-tip tree (targeting the v6.17 kernel version) the
i915 and xe drivers keep DPCD probing enabled only for the panel known
to require this (HP ZR24w), hence those drivers in drm-tip are not
affected by the above problem.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Fixes: a3ef3c2da6 ("drm/dp: Change AUX DPCD probe address from DPCD_REV to LANE0_1_STATUS")
Reported-and-tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14558
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250708212331.112898-1-imre.deak@intel.com
(cherry picked from commit bba9aa4165)
[Imre: Rebased on drm-intel-fixes]
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Changed original commit hash to match with the one propagated in fixes]
2025-07-15 09:37:57 -04:00
Philipp Stanner
cb345f954e drm/panfrost: Fix scheduler workqueue bug
When the GPU scheduler was ported to using a struct for its
initialization parameters, it was overlooked that panfrost creates a
distinct workqueue for timeout handling.

The pointer to this new workqueue is not initialized to the struct,
resulting in NULL being passed to the scheduler, which then uses the
system_wq for timeout handling.

Set the correct workqueue to the init args struct.

Cc: stable@vger.kernel.org # 6.15+
Fixes: 796a9f55a8 ("drm/sched: Use struct for drm_sched_init() params")
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Closes: https://lore.kernel.org/dri-devel/b5d0921c-7cbf-4d55-aa47-c35cd7861c02@igalia.com/
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250709102957.100849-2-phasta@kernel.org
2025-07-14 16:49:10 +01:00
Arnd Bergmann
e5478166df drm/nouveau: check ioctl command codes better
nouveau_drm_ioctl() only checks the _IOC_NR() bits in the
DRM_NOUVEAU_NVIF command, but ignores the type and direction bits, so any
command with '7' in the low eight bits gets passed into
nouveau_abi16_ioctl() instead of drm_ioctl().

Check for all the bits except the size that is handled inside of the
handler.

Fixes: 27111a23d0 ("drm/nouveau: expose the full object/event interfaces to userspace")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ Fix up two checkpatch warnings and a typo. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250711072458.2665325-1-arnd@kernel.org
2025-07-11 20:04:32 +02:00
Simona Vetter
b7dc79a633 Merge tag 'drm-misc-fixes-2025-07-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc6 or final:
- Fix nouveau fail on debugfs errors.
- Magic 50 ms to fix nouveau suspend.
- Call rust destructor on drm device release.
- Fix DMA api error handling in tegra/nvdec.
- Fix PVR device reset.
- Habanalabs maintainer update.
- Small memory leak fix when nouveau acpi init fails.
- Do not attempt to bind to any PCI device with AGP capability.
- Make FB's acquire handles on backing object, same as i915/xe already does.
- Fix race in drm_gem_handle_create_tail.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e522cdc7-1787-48f2-97e5-0f94783970ab@linux.intel.com
2025-07-11 14:11:19 +02:00
Simona Vetter
14e85fabee Merge tag 'drm-xe-fixes-2025-07-11' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Clear LMTT page to avoid leaking data from one VF to another
- Align PF queue size to power of 2
- Disable Indirect Ring State to avoid intermittent issues on context
  switch: feature is not currently needed, so can be disabled for now.
- Fix compression handling when the BO pages are very fragmented
- Restore display pm on error path
- Fix runtime pm handling in xe devcoredump
- Fix xe_pm_set_vram_threshold() doc
- Recommend new minor versions of GuC firmware
- Drop some workarounds on VF
- Do not use verbose GuC logging by default: it should be only for
  debugging

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/s6jyd24mimbzb4vxtgc5vupvbyqplfep2c6eupue7znnlbhuxy@lmvzexfzhrnn
2025-07-11 11:35:39 +02:00
Lucas De Marchi
74806f69b8 drm/xe/guc: Default log level to non-verbose
Currently xe sets the guc log level to a verbose level since it's useful
to debug hangs and general development. However the verbose level may
already be too much and affect performance.

Michal Mrozek did some tests with the L0 compute stack for submission
latency with ULLS disabled. Below are the normalized numbers with log
level 3 (the current default) as baseline for each test:

                          Test \ Log Level                        3      0      1      2
 ----------------------------------------------------------- ------ ------ ------ ------
  BestWalkerNthCommandListSubmission(CmdListCount=2)           1.00   0.63   0.63   0.96
  BestWalkerNthSubmission(KernelCount=2)                       1.00   0.62   0.63   0.96
  BestWalkerNthSubmissionImmediate(KernelCount=2)              1.00   0.58   0.58   0.85
  BestWalkerSubmission                                         1.00   0.62   0.62   0.96
  BestWalkerSubmissionImmediate                                1.00   0.63   0.62   0.96
  BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=2)   1.00   0.58   0.58   0.86
  BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=4)   1.00   0.70   0.70   0.83
  BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=8)   1.00   0.53   0.52   0.78

Log level 2 is the first "verbose level" for GuC, where the biggest
difference happens. Keep log level 3 for CONFIG_DRM_XE_DEBUG, but switch
to 1, i.e.  GUC_LOG_LEVEL_NON_VERBOSE, for "normal" builds.

Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250613-guc-log-level-v2-1-cb84a63e49fe@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit a37128ba61)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 20:59:39 -07:00
Michal Wajdeczko
7a10175a42 drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VF
These workarounds are not applicable for use by the VFs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Link: https://lore.kernel.org/r/20250710103040.375610-2-jakub1.kolakowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 1d2e2503e5)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 20:59:39 -07:00
Julia Filipchuk
8c01880509 drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2
UAPI compatibility version 1.22.2

Resolves various bugs. Recommend newer version.

Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250626182805.1701096-13-daniele.ceraolospurio@intel.com
(cherry picked from commit 0b64addcae)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 20:59:39 -07:00
Shuicheng Lin
0539c5eaf8 drm/xe/pm: Correct comment of xe_pm_set_vram_threshold()
The parameter threshold is with size in MiB, not in bits.
Correct it to avoid any confusion.

v2: s/mb/MiB, s/vram/VRAM, fix return section. (Michal)

Fixes: 30c399529f ("drm/xe: Document Xe PM component")
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250708021450.3602087-2-shuicheng.lin@intel.com
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 0efec05001)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 20:59:38 -07:00
Shuicheng Lin
253a174c06 drm/xe: Release runtime pm for error path of xe_devcoredump_read()
xe_pm_runtime_put() is missed to be called for the error path in
xe_devcoredump_read().
Add function description comments for xe_devcoredump_read() to help
understand it.

v2: more detail function comments and refine goto logic (Matt)

Fixes: c4a2e5f865 ("drm/xe: Add devcoredump chunking")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com
(cherry picked from commit 017ef1228d)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 20:59:38 -07:00
Shuicheng Lin
6d33df611a drm/xe/pm: Restore display pm if there is error after display suspend
xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there
is error with xe_bo_evict_all(), display pm should be restored.

Fixes: 51462211f4 ("drm/xe/pxp: add PXP PM support")
Fixes: cb8f81c175 ("drm/xe/display: Make display suspend/resume work on discrete")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 83dcee1785)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 20:59:38 -07:00
Hans de Goede
e778689390 drm/i915/bios: Apply vlv_fixup_mipi_sequences() to v2 mipi-sequences too
It turns out that the fixup from vlv_fixup_mipi_sequences() is necessary
for some DSI panel's with version 2 mipi-sequences too.

Specifically the Acer Iconia One 8 A1-840 (not to be confused with the
A1-840FHD which is different) has the following sequences:

BDB block 53 (1284 bytes) - MIPI sequence block:
	Sequence block version v2
	Panel 0 *

Sequence 2 - MIPI_SEQ_INIT_OTP
	GPIO index 9, source 0, set 0 (0x00)
	Delay: 50000 us
	GPIO index 9, source 0, set 1 (0x01)
	Delay: 6000 us
	GPIO index 9, source 0, set 0 (0x00)
	Delay: 6000 us
	GPIO index 9, source 0, set 1 (0x01)
	Delay: 25000 us
	Send DCS: Port A, VC 0, LP, Type 39, Length 5, Data ff aa 55 a5 80
	Send DCS: Port A, VC 0, LP, Type 39, Length 3, Data 6f 11 00
	...
	Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 29
	Delay: 120000 us

Sequence 4 - MIPI_SEQ_DISPLAY_OFF
	Send DCS: Port A, VC 0, LP, Type 05, Length 1, Data 28
	Delay: 105000 us
	Send DCS: Port A, VC 0, LP, Type 05, Length 2, Data 10 00
	Delay: 10000 us

Sequence 5 - MIPI_SEQ_ASSERT_RESET
	Delay: 10000 us
	GPIO index 9, source 0, set 0 (0x00)

Notice how there is no MIPI_SEQ_DEASSERT_RESET, instead the deassert
is done at the beginning of MIPI_SEQ_INIT_OTP, which is exactly what
the fixup from vlv_fixup_mipi_sequences() fixes up.

Extend it to also apply to v2 sequences, this fixes the panel not working
on the Acer Iconia One 8 A1-840.

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14605
Signed-off-by: Hans de Goede <hansg@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20250703143824.7121-1-hansg@kernel.org
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 11895f3759)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10 11:35:20 -04:00
Simona Vetter
bd46cece51 drm/gem: Fix race in drm_gem_handle_create_tail()
Object creation is a careful dance where we must guarantee that the
object is fully constructed before it is visible to other threads, and
GEM buffer objects are no difference.

Final publishing happens by calling drm_gem_handle_create(). After
that the only allowed thing to do is call drm_gem_object_put() because
a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id
(which is trivial since we have a linear allocator) can already tear
down the object again.

Luckily most drivers get this right, the very few exceptions I've
pinged the relevant maintainers for. Unfortunately we also need
drm_gem_handle_create() when creating additional handles for an
already existing object (e.g. GETFB ioctl or the various bo import
ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as
the only exported function to stop these issues from happening.

Now unfortunately the implementation of drm_gem_handle_create() isn't
living up to standards: It does correctly finishe object
initialization at the global level, and hence is safe against a
concurrent tear down. But it also sets up the file-private aspects of
the handle, and that part goes wrong: We fully register the object in
the drm_file.object_idr before calling drm_vma_node_allow() or
obj->funcs->open, which opens up races against concurrent removal of
that handle in drm_gem_handle_delete().

Fix this with the usual two-stage approach of first reserving the
handle id, and then only registering the object after we've completed
the file-private setup.

Jacek reported this with a testcase of concurrently calling GEM_CLOSE
on a freshly-created object (which also destroys the object), but it
should be possible to hit this with just additional handles created
through import or GETFB without completed destroying the underlying
object with the concurrent GEM_CLOSE ioctl calls.

Note that the close-side of this race was fixed in f6cd7daecf ("drm:
Release driver references to handle before making it available
again"), which means a cool 9 years have passed until someone noticed
that we need to make this symmetry or there's still gaps left :-/
Without the 2-stage close approach we'd still have a race, therefore
that's an integral part of this bugfix.

More importantly, this means we can have NULL pointers behind
allocated id in our drm_file.object_idr. We need to check for that
now:

- drm_gem_handle_delete() checks for ERR_OR_NULL already

- drm_gem.c:object_lookup() also chekcs for NULL

- drm_gem_release() should never be called if there's another thread
  still existing that could call into an IOCTL that creates a new
  handle, so cannot race. For paranoia I added a NULL check to
  drm_gem_object_release_handle() though.

- most drivers (etnaviv, i915, msm) are find because they use
  idr_find(), which maps both ENOENT and NULL to NULL.

- drivers using idr_for_each_entry() should also be fine, because
  idr_get_next does filter out NULL entries and continues the
  iteration.

- The same holds for drm_show_memory_stats().

v2: Use drm_WARN_ON (Thomas)

Reported-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: stable@vger.kernel.org
Cc: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Simona Vetter <simona.vetter@intel.com>
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20250707151814.603897-1-simona.vetter@ffwll.ch
2025-07-09 15:53:34 +02:00
Thomas Zimmermann
f6bfc9afc7 drm/framebuffer: Acquire internal references on GEM handles
Acquire GEM handles in drm_framebuffer_init() and release them in
the corresponding drm_framebuffer_cleanup(). Ties the handle's
lifetime to the framebuffer. Not all GEM buffer objects have GEM
handles. If not set, no refcounting takes place. This is the case
for some fbdev emulation. This is not a problem as these GEM objects
do not use dma-bufs and drivers will not release them while fbdev
emulation is running. Framebuffer flags keep a bit per color plane
of which the framebuffer holds a GEM handle reference.

As all drivers use drm_framebuffer_init(), they will now all hold
dma-buf references as fixed in commit 5307dce878 ("drm/gem: Acquire
references on GEM handles for framebuffers").

In the GEM framebuffer helpers, restore the original ref counting
on buffer objects. As the helpers for handle refcounting are now
no longer called from outside the DRM core, unexport the symbols.

v3:
- don't mix internal flags with mode flags (Christian)
v2:
- track framebuffer handle refs by flag
- drop gma500 cleanup (Christian)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 5307dce878 ("drm/gem: Acquire references on GEM handles for framebuffers")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/dri-devel/20250703115915.3096-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf@web.de>
Tested-by: Mario Limonciello <superm1@kernel.org>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Anusha Srivatsa <asrivats@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: <stable@vger.kernel.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250707131224.249496-1-tzimmermann@suse.de
2025-07-09 14:03:28 +02:00
Matthew Auld
fee58ca135 drm/xe/bmg: fix compressed VRAM handling
There looks to be an issue in our compression handling when the BO pages
are very fragmented, where we choose to skip the identity map and
instead fall back to emitting the PTEs by hand when migrating memory,
such that we can hopefully do more work per blit operation. However in
such a case we need to ensure the src PTEs are correctly tagged with a
compression enabled PAT index on dgpu xe2+, otherwise the copy will
simply treat the src memory as uncompressed, leading to corruption if
the memory was compressed by the user.

To fix this pass along use_comp_pat into emit_pte() on the src side, to
indicate that compression should be considered.

v2 (Jonathan): tweak the commit message

Fixes: 523f191cc0 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com
(cherry picked from commit f7a2fd776e)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07 20:57:17 -07:00
Matthew Brost
daa099fed5 Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"
This reverts commit fe0154cf82.

Seeing some unexplained random failures during LRC context switches with
indirect ring state enabled. The failures were always there, but the
repro rate increased with the addition of WA BB as a separate BO.
Commit 3a1edef8f4 ("drm/xe: Make WA BB part of LRC BO") helped to
reduce the issues in the context switches, but didn't eliminate them
completely.

Indirect ring state is not required for any current features, so disable
for now until failures can be root caused.

Cc: stable@vger.kernel.org
Fixes: fe0154cf82 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 03d85ab36b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07 20:57:17 -07:00
Matthew Brost
c9a95dbe06 drm/xe: Allocate PF queue size on pow2 boundary
CIRC_SPACE does not work unless the size argument is a power of 2,
allocate PF queue size on power of 2 boundary.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Fixes: 29582e0ea7 ("drm/xe: Add page queue multiplier")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com
(cherry picked from commit 491b978312)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07 20:57:17 -07:00
Michal Wajdeczko
705a412a36 drm/xe/pf: Clear all LMTT pages on alloc
Our LMEM buffer objects are not cleared by default on alloc
and during VF provisioning we only setup LMTT PTEs for the
actually provisioned LMEM range. But beyond that valid range
we might leave some stale data that could either point to some
other VFs allocations or even to the PF pages.

Explicitly clear all new LMTT page to avoid the risk that a
malicious VF would try to exploit that gap.

While around add asserts to catch any undesired PTE overwrites
and low-level debug traces to track LMTT PT life-cycle.

Fixes: b1d2040582 ("drm/xe/pf: Introduce Local Memory Translation Table")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com
(cherry picked from commit 3fae6918a3)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07 20:57:07 -07:00
Ben Skeggs
d133036a0b drm/nouveau/gsp: fix potential leak of memory used during acpi init
If any of the ACPI calls fail, memory allocated for the input buffer
would be leaked.  Fix failure paths to free allocated memory.

Also add checks to ensure the allocations succeeded in the first place.

Reported-by: Danilo Krummrich <dakr@kernel.org>
Fixes: 176fdcbddf ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250617040036.2932-1-bskeggs@nvidia.com
2025-07-07 16:32:44 +02:00
Tamir Duberstein
3d44147494 rust: drm: remove unnecessary imports
`kernel::str::CStr` is included in the prelude.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250704-cstr-include-drm-v1-1-a279dfc4d753@gmail.com
2025-07-05 13:01:59 +02:00
Alessio Belle
d38376b3ee drm/imagination: Fix kernel crash when hard resetting the GPU
The GPU hard reset sequence calls pm_runtime_force_suspend() and
pm_runtime_force_resume(), which according to their documentation should
only be used during system-wide PM transitions to sleep states.

The main issue though is that depending on some internal runtime PM
state as seen by pm_runtime_force_suspend() (whether the usage count is
<= 1), pm_runtime_force_resume() might not resume the device unless
needed. If that happens, the runtime PM resume callback
pvr_power_device_resume() is not called, the GPU clocks are not
re-enabled, and the kernel crashes on the next attempt to access GPU
registers as part of the power-on sequence.

Replace calls to pm_runtime_force_suspend() and
pm_runtime_force_resume() with direct calls to the driver's runtime PM
callbacks, pvr_power_device_suspend() and pvr_power_device_resume(),
to ensure clocks are re-enabled and avoid the kernel crash.

Fixes: cc1aeedb98 ("drm/imagination: Implement firmware infrastructure and META FW support")
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://lore.kernel.org/r/20250624-fix-kernel-crash-gpu-hard-reset-v1-1-6d24810d72a6@imgtec.com
Cc: stable@vger.kernel.org
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
2025-07-04 16:32:10 +01:00
Mikko Perttunen
44306a684c drm/tegra: nvdec: Fix dma_alloc_coherent error check
Check for NULL return value with dma_alloc_coherent, in line with
Robin's fix for vic.c in 'drm/tegra: vic: Fix DMA API misuse'.

Fixes: 46f226c93d ("drm/tegra: Add NVDEC driver")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20250702-nvdec-dma-error-check-v1-1-c388b402c53a@nvidia.com
2025-07-04 11:15:07 +02:00
Dave Airlie
da8d8e9001 Merge tag 'drm-xe-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix chunking the PTE updates and overflowing the maximum number of
  dwords with with MI_STORE_DATA_IMM (Jia Yao)
- Move WA BB to the LRC BO to mitigate hangs on context switch (Matthew
  Brost)
- Fix frequency/flush WAs for BMG (Vinay / Lucas)
- Fix kconfig prompt title and description (Lucas)
- Do not require kunit (Harry Austen / Lucas)
- Extend 14018094691 WA to BMG (Daniele)
- Fix wedging the device on signal (Matthew Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/o5662wz6nrlf6xt5sjgxq5oe6qoujefzywuwblm3m626hreifv@foqayqydd6ig
2025-07-04 10:01:53 +10:00
Dave Airlie
8f954c435f Merge tag 'samsung-dsim-fixes-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
- Fixed raw pointer leakage and unsafe behavior in printk()
  . Switch from %pK to %p for pointer formatting, as %p is now safer
    and prevents issues like raw pointer leakage and acquiring sleeping
    locks in atomic contexts.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Inki Dae <inki.dae@samsung.com>
Link: https://lore.kernel.org/r/20250629091742.29956-1-inki.dae@samsung.com
2025-07-04 09:40:20 +10:00
Dave Airlie
ac2ad73e75 Merge tag 'exynos-drm-fixes-for-v6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
Fixups
- Fixed raw pointer leakage and unsafe behavior in printk()
  . Switch from %pK to %p for pointer formatting, as %p is now safer
    and prevents issues like raw pointer leakage and acquiring sleeping
    locks in atomic contexts.

- Fixed kernel panic during boot
  . A NULL pointer dereference issue occasionally occurred
    when the vblank interrupt handler was called before
    the DRM driver was fully initialized during boot.
    So this patch fixes the issue by adding a check in the interrupt handler
    to ensure the DRM driver is properly initialized.

- Fixed a lockup issue on Samsung Peach-Pit/Pi Chromebooks
  . The issue occurred after commit c9b1150a68 changed
    the call order of CRTC enable/disable and bridge pre_enable/post_disable
    methods, causing fimd_dp_clock_enable() to be called
    before the FIMD device was activated. To fix this,
    runtime PM guards were added to fimd_dp_clock_enable()
    to ensure proper operation even when CRTC is not enabled.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Inki Dae <inki.dae@samsung.com>
Link: https://lore.kernel.org/r/20250629083554.28628-1-inki.dae@samsung.com
2025-07-04 09:38:01 +10:00
Dave Airlie
afd30ace71 Merge tag 'drm-intel-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Make mei interrupt top half irq disabled to fix RT builds
- Fix timeline left held on VMA alloc error
- Fix NULL pointer deref in vlv_dphy_param_init()
- Fix selftest mock_request() to avoid NULL deref

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://lore.kernel.org/r/aGYVPAA4KvsZqDFx@jlahtine-mobl
2025-07-04 09:26:57 +10:00
Dave Airlie
b91e11ec5c Merge tag 'drm-misc-fixes-2025-07-03' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc5:
- Replace simple panel lookup hack with proper fix.
- nullpointer deref in vesadrm fix.
- fix dma_resv_wait_timeout.
- fix error handling in ttm_buffer_object_transfer.
- bridge fixes.
- Fix vmwgfx accidentally allocating encrypted memory.
- Fix race in spsc_queue_push()
- Add refcount on backing GEM objects during fb creation.
- Fix v3d irq's being enabled during gpu reset.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/a7461418-08dc-4b7c-b2fa-264155f66d5e@linux.intel.com
2025-07-04 09:06:57 +10:00
Dave Airlie
e79d0ba605 nouveau/gsp: add a 50ms delay between fbsr and driver unload rpcs
This fixes a bunch of command hangs after runtime suspend/resume.

This fixes a regression caused by code movement in the commit below,
the commit seems to just change timings enough to cause this to happen
now, and adding the sleep seems to avoid it.

I've spent some time trying to root cause it to no great avail,
it seems like a bug on the firmware side, but it could be a bug
in our rpc handling that I can't find.

Either way, we should land the workaround to fix the problem,
while we continue to work out the root cause.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Fixes: c21b039715 ("drm/nouveau/gsp: add hals for fbsr.suspend/resume()")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250702232707.175679-1-airlied@gmail.com
2025-07-04 00:22:12 +02:00
Aaron Thompson
78f88067d5 drm/nouveau: Do not fail module init on debugfs errors
If CONFIG_DEBUG_FS is enabled, nouveau_drm_init() returns an error if it
fails to create the "nouveau" directory in debugfs. One case where that
will happen is when debugfs access is restricted by
CONFIG_DEBUG_FS_ALLOW_NONE or by the boot parameter debugfs=off, which
cause the debugfs APIs to return -EPERM.

So just ignore errors from debugfs. Note that nouveau_debugfs_root may
be an error now, but that is a standard pattern for debugfs. From
include/linux/debugfs.h:

"NOTE: it's expected that most callers should _ignore_ the errors
returned by this function. Other debugfs functions handle the fact that
the "dentry" passed to them could be an error and they don't crash in
that case. Drivers should generally work fine even if debugfs fails to
init anyway."

Fixes: 97118a1816 ("drm/nouveau: create module debugfs root")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Thompson <dev@aaront.org>
Acked-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250703211949.9916-1-dev@aaront.org
2025-07-03 23:56:33 +02:00