linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-02 06:22:33 -04:00

Author	SHA1	Message	Date
Zhenghang Xiao	7164d78559	drm/gem: fix race between change_handle and handle_delete drm_gem_change_handle_ioctl leaves the old handle live in the IDR during the window between spin_unlock(table_lock) and the final spin_lock(table_lock). A concurrent drm_gem_handle_delete on the old handle succeeds in this window, decrements handle_count to 0, and frees the GEM object while the new handle's IDR entry still references it. NULL the old handle's IDR entry before dropping table_lock so that any concurrent GEM_CLOSE on the old handle sees NULL and returns -EINVAL. Restore the old entry on the prime-bookkeeping error path. Fixes: `5e28b7b944` ("drm: Set old handle to NULL before prime swap in change_handle") Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260526085313.26791-1-kipreyyy@gmail.com	2026-05-30 07:01:39 +10:00
Dave Airlie	6e40c93789	Merge tag 'drm-misc-fixes-2026-05-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: amdxdna: - require IOMMU on AIE2 dumb-buffer: - prevent overflows in dumb-buffer creation dma-buf: - fix UAF in dma_buf_fd() tracepoint hyperv: - improve protocol validation ivpu: - test write offset in debugfs rocket: - fix UAF in bo creation Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260529070009.GA313534@linux.fritz.box	2026-05-30 06:40:28 +10:00
Rajat Gupta	5ab62dd368	drm: prevent integer overflows in dumb buffer creation helpers Fix integer overflow issues in the dumb buffer creation path: 1. drm_mode_create_dumb() does not bound width, height, or bpp before passing them to driver callbacks. Downstream helpers (e.g. drm_gem_dma_dumb_create_internal) perform pitch/size alignment in u32 arithmetic that can overflow for extreme values. Add hard limits: width and height < 8192, bpp <= 32. No legitimate software rendering use case exceeds these. 2. drm_mode_align_dumb() uses roundup(pitch, hw_pitch_align) without checking for overflow. If pitch is near U32_MAX, roundup() wraps to a small value, making subsequent check_mul_overflow() pass with a much smaller pitch than intended. Add an overflow check after roundup. 3. drm_mode_align_dumb() uses ALIGN(size, hw_size_align) which only works correctly for power-of-two alignment values. Replace with roundup() which works for any alignment. Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2026-05-29 08:30:47 +02:00
Dave Airlie	e81d3b59f7	Merge tag 'amd-drm-fixes-7.1-2026-05-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.1-2026-05-28: amdgpu: - GEM_OP warning fix - GEM_OP locking fix - Userq fixes - DCN 2.1 refclk fix - SI fix - HMM fixes amdkfd: - svm_range_set_attr locking fix - CRIU restore fix - KFD debugger fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260528211843.893681-1-alexander.deucher@amd.com	2026-05-29 11:55:26 +10:00
Dave Airlie	4ab97280e8	Merge tag 'drm-xe-fixes-2026-05-28' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Restore IDLEDLY regiter on engine reset (Bala) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/ahhBUt8fDqjB-mQq@intel.com	2026-05-29 11:28:58 +10:00
Christian König	1c824497d8	drm/amdgpu: fix calling VM invalidation in amdgpu_hmm_invalidate_gfx Otherwise we don't invalidate page tables on next CS. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b6444d1bcbc34f6f2a31a3aab3059be082f3683e) Cc: stable@vger.kernel.org	2026-05-27 12:06:26 -04:00
Christian König	962d684b5d	drm/amdgpu: fix amdgpu_hmm_range_get_pages The notifier sequence must only be read once or otherwise we could work with invalid pages. While at it also fix the coding style, e.g. drop the pre-initialized return value and use the common define for 2G range. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c08972f555945cda57b0adb72272a37910153390) Cc: stable@vger.kernel.org	2026-05-27 12:02:23 -04:00
Sunil Khatri	181307acf8	drm/amdgpu/userq: use array instead of list for userq_vas Use arrays instead of list for userq_vas since we have fixed no of bos. Also, we dont have to worry to free that memory later since this array would be free along with queue only. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ef7dc711a664b0c548ecfdf13a00436b7446b8e7)	2026-05-27 12:01:42 -04:00
Sunil Khatri	0b8a1600ab	drm/amdgpu/userq: move mqd_destroy to later stage to keep core obj valid mqd_destroy cleans up queue core objects like mqd and fw_object which are needed for any pending fence to signal properly. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4ad65d610096498c8e265615aba42b3c47441bb5)	2026-05-27 12:01:35 -04:00
Eric Huang	93f5534b35	drm/amdkfd: fix a vulnerability of integer overflow in kfd debugger get_queue_ids() computes array_size = num_queues * sizeof(uint32_t), which could overflow on 32-bit size_t build. using array_size() instead, it saturates to SIZE_MAX on overflow. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2d57a0475f085c08b49312dfd8edcb461845f285) Cc: stable@vger.kernel.org	2026-05-27 12:01:13 -04:00
Sunil Khatri	ca8e7a119a	drm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapper Remove the amdgpu_userq_create/destroy_object wrappers and use directly the kernel bo allocation function which does all the things which are done in wrapper. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit deb02080ca5d3f015cf71e56067a39ef2f141998)	2026-05-27 12:01:00 -04:00
Timur Kristóf	dd4f3ee535	drm/amd/pm/si: Disregard vblank time when no displays are connected When no displays are connected, there is no vblank happening so the power management code shouldn't worry about it. This fixes a regression that caused the memory clock to be stuck at maximum when there were no displays connected to a SI GPU. Fixes: `9003a07468` ("drm/amd/pm: Treat zero vblank time as too short in si_dpm (v3)") Fixes: `9d73b107a6` ("drm/amd/pm: Use pm_display_cfg in legacy DPM (v2)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6d87e0199f7b83735b56e422d59f170a201897a8) Cc: stable@vger.kernel.org	2026-05-27 12:00:03 -04:00
David Francis	6842b6a4b7	drm/amdkfd: Check for pdd drm file first in CRIU restore path CRIU restore ioctls are meant to be called by CRIU with no existing drm file. There's an error path for if the drm file unexpectedly exists. It was positioned so it was missing a fput(drm_file). Do that check earlier, as soon as we have the pdd. Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2bab781dac78916c5cc8de76345a4102449267d7) Cc: stable@vger.kernel.org	2026-05-27 11:59:24 -04:00
Stanley.Yang	0fb4a8e64a	drm/amdgpu: fix potential overflow in fs_info.debugfs_name Use snprintf() with sizeof(fs_info.debugfs_name) so a long RAS block name plus the "_err_inject" suffix cannot overflow the 32-byte buffer. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1a58070fda26857a8f6acc0ab05428e60d5c6844)	2026-05-27 11:59:00 -04:00
Sunil Khatri	cf4aafdcce	drm/amdgpu/userq: make sure queue is valid in the hang_detect_work Thread 1: Running amdgpu_userq_destroy which eventually remove the queue from door bell and set userq_mgr = NULL. Thread2: An interrupt might have scheduled the hang_detect_work which still need userq_mgr to be valid but could get an NULL ptrs. To fix that make sure we cancel the hang_detect_work again before setting userq_mgr to NULL. Along with that we also need all the queue va to remain valid till we could be running anything on the queue and hence moving the userq_va post hang_detect handler is cancelled. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1a66ceb98b137d18d303b9889f0e7d8c4db73943)	2026-05-27 11:58:31 -04:00
Sunil Khatri	a00caed230	drm/amdgpu/userq: reserve root bo without interruption Fix the code to make it an uninterruptible reservation for root bo. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d409ab4e387d94b2e593d558b54b7bfd315e0e75)	2026-05-27 11:58:25 -04:00
Sunil Khatri	cedee93d43	drm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart fails Unpin the wptr_obj->obj when amdgpu_ttm_alloc_gart fails. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d8145c437ccdc2d91c579787290f82788172bea0)	2026-05-27 11:58:17 -04:00
Sunil Khatri	d8d9c82040	drm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_index amdgpu_userq_get_doorbell_index returns a uint64 type index as well as a int type failure values. Simplifying this and using a int type return value and getting the index in input pointer of type uint64 type. Also since it's used at once place making it static would be better. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e947ec9d0529d5f93dbdb33cd197347f6a7b2922)	2026-05-27 11:58:10 -04:00
Eric Huang	e984d61d92	drm/amdkfd: fix NULL pointer bug in svm_range_set_attr The process_info could be NULL if user doesn't call kfd_ioctl_acquire_vm before calling kfd_ioctl_svm. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 83a26c812e0529eb040d31a76f73e33e637243d4) Cc: stable@vger.kernel.org	2026-05-27 11:57:49 -04:00
Ivan Lipski	ec78a85d95	drm/amd/display: Write REFCLK to 48MHz on DCN21 [Why&How] dccg21_init() calls dccg2_init() which hardcodes 100MHz refclk values for MICROSECOND_TIME_BASE_DIV and MILLISECOND_TIME_BASE_DIV. DCN21 uses 48MHz refclk, so the wrong values corrupt DCCG timing and cause eDP link training failure on cold boot. Write the correct 48MHz values directly instead of calling dccg2_init(). v2: Fixed typo Fixes: `e6e2b956fc` ("drm/amd/display: Add missing DCCG register entries for DCN20-DCN316") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5272 Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5311 Reported-by: Max Chernoff <git@maxchernoff.ca> Tested-by: Max Chernoff <git@maxchernoff.ca> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 08236c3ef284cd2d110e5e3d51fc9615e551f9dc) Cc: stable@vger.kernel.org	2026-05-27 11:57:22 -04:00
Sunil Khatri	ba4c0ff47e	drm/amdgpu/userq: Fix the mutex_init cleanup for fence_drv_lock mutex fence_drv_lock is destroyed in amdgpu_userq_fence_driver_free also in one of the jump condition mutex_destroy is also called leading to double mutex_destroy. So rearranging the code so amdgpu_userq_fence_driver_free takes care of the clean up along with mutex_destroy. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 384dbef269d101e5b671fc7b942c56734cd1d186)	2026-05-27 11:55:38 -04:00
Sunil Khatri	4a03d23ce6	drm/amdgpu/userq: Fix doorbell object cleanup of queue Unpin and unref the door bell obj if queue creation fails before initialization is complete. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8c7506f7ba945f21e5abe7f8eac0a3acca6b5330)	2026-05-27 11:55:28 -04:00
Ziyi Guo	a1ba459423	drm/amdgpu: check num_entries in GEM_OP GET_MAPPING_INFO kvcalloc(args->num_entries, sizeof(*vm_entries), GFP_KERNEL) at amdgpu_gem.c:1050 uses the user-supplied num_entries directly without any upper bounds check. Since num_entries is a __u32 and sizeof(drm_amdgpu_gem_vm_entry) is 32 bytes, a large num_entries produces an allocation exceeding INT_MAX, triggering WARNING in __kvmalloc_node_noprof(), causing a kernel WARNING, TAINT_WARN, and panic on CONFIG_PANIC_ON_WARN=y systems. Add a size bounds check before we invoke the kvzalloc() to reject oversized num_entries early with -EINVAL. Fixes: `4d82724f7f` ("drm/amdgpu: Add mapping info option for GEM_OP ioctl") Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1fe7bf5457f6efd7be60b17e23163ba54341d73d) Cc: stable@vger.kernel.org	2026-05-27 11:55:06 -04:00
Michael Bommarito	2e7f55eb40	drm/amdgpu: fix lock leak on ENOMEM in AMDGPU_GEM_OP_GET_MAPPING_INFO The AMDGPU_GEM_OP_GET_MAPPING_INFO branch of amdgpu_gem_op_ioctl() holds three cleanup-tracked resources before calling kvcalloc(): the drm_gem_object reference from drm_gem_object_lookup(), the drm_exec lock on the looked-up GEM via drm_exec_lock_obj(), and the drm_exec lock on the per-process VM root page directory via amdgpu_vm_lock_pd(). All three are released by the out_exec label that every other error path in this function jumps to. The kvcalloc() failure path returns -ENOMEM directly, skipping out_exec and leaking all three. The leaked per-process VM root PD dma_resv lock is the load-bearing leak: any subsequent operation on the same VM (further GEM ops, command-submission, eviction, TTM shrinker callbacks) blocks on the held lock. DRM_IOCTL_AMDGPU_GEM_OP is DRM_AUTH \| DRM_RENDER_ALLOW, so this is an unprivileged-local denial of service against the caller's GPU context, reachable by any process with /dev/dri/renderD* access. Route the failure through out_exec so drm_exec_fini() and drm_gem_object_put() run. Reproduced on stock 7.0.0-10, Ryzen 7 5700U / Radeon Vega (Lucienne): the failing ioctl returns -ENOMEM and a second GET_MAPPING_INFO on the same fd then blocks in drm_exec_lock_obj() on the leaked dma_resv. SIGKILL on the caller does not reap the task; the fd-release path during process exit goes through amdgpu_gem_object_close() -> drm_exec_prepare_obj() on the same lock, leaving the task in D state until the box is rebooted. The patched kernel was not rebuilt and re-tested on this hardware; the fix is mechanical. Tested on a single Lucienne / Vega box only. Ziyi Guo posted an independent INT_MAX-bound check for args->num_entries in the same branch [1]; the two patches are complementary and can land in either order. Fixes: `4d82724f7f` ("drm/amdgpu: Add mapping info option for GEM_OP ioctl") Link: https://lore.kernel.org/all/20260208000255.4073363-1-n7l8m4@u.northwestern.edu/ # [1] Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b69d3256d79de15f54c322986ff4da68f1d65b0a) Cc: stable@vger.kernel.org	2026-05-27 11:54:18 -04:00
Balasubramani Vivekanandan	f657a6a3ba	drm/xe: Restore IDLEDLY regiter on engine reset Wa_16023105232 programs the register IDLEDLY. The register is reset whenever the engine is reset. Therefore it should be added to the GuC save-restore register list for it to be restored after reset. Fixes: `7c53ff050b` ("drm/xe: Apply Wa_16023105232") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260522163531.1365540-2-balasubramani.vivekanandan@intel.com Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> (cherry picked from commit df1cfe24743a93b71eab27687e148ab8ae9b69e3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-05-27 11:27:22 -04:00
Jouni Högander	3549a9649d	drm/i915/psr: Use DC_OFF wake reference to block DC6 on vblank enable We are observing following warnings: ERROR power well DC_off state mismatch (refcount 0/enabled 1) gen9_dc_off_power_well_enabled is considering target state DC_STATE_DISABLE as DC_OFF power well being enabled. Fix this by using wakeref for the purpose. To achieve this we need to modify notification code as well. Currently it is possible that PSR gets notified vblank enable/disable twice on same status. This is currently not a problem as it is just triggering call to intel_display_power_set_target_dc_state with same target state as a parameter. When using wakeref this becomes a problem due to reference counting. Fix this storing vbank status on last notification and use that to ensure there are no more than one notification with same vblank status. v2: ensure there is no subsequent notifications with same status Fixes: `aa451abcff` ("drm/i915/display: Prevent DC6 while vblank is enabled for Panel Replay") Cc: <stable@vger.kernel.org> # v6.13+ Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> Link: https://patch.msgid.link/20260520104944.239797-2-jouni.hogander@intel.com (cherry picked from commit 35485ac56d878192a3829a58cb26503125ec7104) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>	2026-05-26 09:31:48 +01:00
Jouni Högander	8bb9093df5	drm/i915/psr: Block DC states on vblank enable when Panel Replay supported Currently we are blocking DC states only when Panel Replay is enabled on vblank enable. It may happen that Panel Replay is getting enabled when vblank is already enabled. Fix this by blocking DC states always if Panel Replay is supported. While at it take care of possible dual eDP case by looping all encoders supporting PSR. Fixes: `0c427ac78a` ("drm/i915/psr: Add interface to notify PSR of vblank enable/disable") Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> Link: https://patch.msgid.link/20260520104944.239797-1-jouni.hogander@intel.com (cherry picked from commit eb5911f990554f7ce947dd53df00c114362e4465) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>	2026-05-26 09:31:46 +01:00
Pranay Samala	d196136a98	drm/i915/color: Fix HDR pre-CSC LUT programming loop The integer lut programming loop never executes completely due to incorrect condition (i++ > 130). Fix to properly program 129th+ entries for values > 1.0. Cc: <stable@vger.kernel.org> #v6.19 Fixes: `82caa1c881` ("drm/i915/color: Program Pre-CSC registers") Signed-off-by: Pranay Samala <pranay.samala@intel.com> Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260519075308.383877-1-pranay.samala@intel.com (cherry picked from commit f33862ec3e8849ad7c0a3dd46719083b13ade248) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>	2026-05-26 09:31:44 +01:00
Michał Grzelak	202e77cf2e	drm/i915/aux: use polling when irqs are unavailable PTL with physically disconnected display was observed to have 40s longer execution time when testing xe_fault_injection@xe_guc_mmio_send_recv. The issue has not been seen when reverting commit `40a9f77a28` ("Revert "drm/i915/dp: change aux_ctl reg read to polling read""). Apparently the configuration suffers from not having AUX enabled when using interrupts. One probable cause can be xe enabling interrupts too late: interrupts need memory allocations which currently can't be done before the display FB takeover is done. As for now, use polling for AUX in case interrupts are unavailable. Fixes: `40a9f77a28` ("Revert "drm/i915/dp: change aux_ctl reg read to polling read"") Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Michał Grzelak <michal.grzelak@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260416163744.288107-1-michal.grzelak@intel.com (cherry picked from commit 05e0550b65cd1604bd515fbc65f522bce4c10a87) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>	2026-05-26 09:31:42 +01:00
Janusz Krzysztofik	5c4063c87a	drm/i915: Fix potential UAF in TTM object purge TLDR: The bo->ttm object might be changed by calling ttm_bo_validate(), move casting it to an i915_tt object later to actually get the right pointer. A user reported hitting the following bug under heavy use on DG2: [26620.095550] Oops: general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 1 SMP NOPTI [26620.095556] CPU: 2 UID: 0 PID: 631 Comm: Xorg Not tainted 6.18.8 #1 PREEMPT(lazy) [26620.095558] Hardware name: ASRock B850M Steel Legend WiFi/B850M Steel Legend WiFi, BIOS 3.50 09/18/2025 [26620.095559] RIP: 0010:i915_ttm_purge+0x84/0x100 [i915] [26620.095604] Code: 00 00 00 48 8d 54 24 10 48 89 e6 48 89 fb e8 83 aa ae ff 85 c0 75 6f 48 83 bb a8 01 00 00 00 74 2c 48 8b 45 78 48 85 c0 74 23 <48> 8b 78 20 48 c7 c2 ff ff ff ff 31 f6 e8 7a 73 e3 e0 48 8b 7d 78 [26620.095605] RSP: 0018:ffffc90005fd7430 EFLAGS: 00010282 [26620.095607] RAX: a56b6b6b6b6b6b6b RBX: ffff8881f46c3dc0 RCX: 0000000000000000 [26620.095608] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 00000000ffffffff [26620.095609] RBP: ffff888289610f00 R08: 0000000000000001 R09: ffff88823b022000 [26620.095609] R10: ffff888103029b28 R11: ffff8881fc7f3800 R12: ffff88810b6150d0 [26620.095609] R13: ffff888289610f00 R14: 0000000000000000 R15: ffff8881f46c3dc0 [26620.095610] FS: 00007f1004d86900(0000) GS:ffff88901c858000(0000) knlGS:0000000000000000 [26620.095611] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [26620.095611] CR2: 00007f0fdf489000 CR3: 000000035b0c1000 CR4: 0000000000750ef0 [26620.095612] PKRU: 55555554 [26620.095612] Call Trace: [26620.095615] <TASK> [26620.095615] i915_ttm_move+0x2b9/0x420 [i915] [26620.095642] ? ttm_tt_init+0x65/0x80 [ttm] [26620.095644] ? i915_ttm_tt_create+0xc6/0x150 [i915] [26620.095667] ttm_bo_handle_move_mem+0xb6/0x160 [ttm] [26620.095669] ttm_bo_evict+0x100/0x150 [ttm] [26620.095671] ? preempt_count_add+0x64/0xa0 [26620.095673] ? _raw_spin_lock+0xe/0x30 [26620.095675] ? _raw_spin_unlock+0xd/0x30 [26620.095675] ? i915_gem_object_evictable+0xb7/0xd0 [i915] [26620.095704] ttm_bo_evict_cb+0x6e/0xd0 [ttm] [26620.095705] ttm_lru_walk_for_evict+0xa6/0x200 [ttm] [26620.095708] ttm_bo_alloc_resource+0x185/0x4f0 [ttm] [26620.095709] ? init_object+0x62/0xd0 [26620.095712] ttm_bo_validate+0x7a/0x180 [ttm] [26620.095713] ? _raw_spin_unlock_irqrestore+0x16/0x30 [26620.095714] __i915_ttm_get_pages+0xb0/0x170 [i915] [26620.095737] i915_ttm_get_pages+0x9f/0x150 [i915] [26620.095759] ? i915_gem_do_execbuffer+0xedc/0x2b40 [i915] [26620.095786] ? alloc_debug_processing+0xd0/0x100 [26620.095787] ? _raw_spin_unlock_irqrestore+0x16/0x30 [26620.095788] ? i915_vma_instance+0xa0/0x4e0 [i915] [26620.095822] __i915_gem_object_get_pages+0x2f/0x40 [i915] [26620.095848] i915_vma_pin_ww+0x706/0x980 [i915] [26620.095875] ? i915_gem_do_execbuffer+0xedc/0x2b40 [i915] [26620.095904] eb_validate_vmas+0x170/0xa00 [i915] [26620.095930] i915_gem_do_execbuffer+0x1201/0x2b40 [i915] [26620.095953] ? alloc_debug_processing+0xd0/0x100 [26620.095954] ? _raw_spin_unlock_irqrestore+0x16/0x30 [26620.095955] ? i915_gem_execbuffer2_ioctl+0xc9/0x240 [i915] [26620.095977] ? __wake_up_sync_key+0x32/0x50 [26620.095979] ? i915_gem_execbuffer2_ioctl+0xc9/0x240 [i915] [26620.096001] ? __slab_alloc.isra.0+0x67/0xc0 [26620.096003] i915_gem_execbuffer2_ioctl+0x11a/0x240 [i915] Results from decode_stacktrace.sh pointed to dereference of a file pointer field of a i915 TTM page vector container associated with an object being purged on eviction. That path is taken when the object is marked as no longer needed. Code analysis revealed a possibility of the i915 TTM page vector container being replaced with a new instance inside a function that purges content of the object, should it be still busy. That function is called, indirectly via a more general function that changes the object's placement and caching policy, before the problematic dereference, but still after a pointer to the container is captured, rendering the pointer no longer valid. Fix the issue by capturing the pointer to the container only after its potential replacement. v2: Move the container_of() inside the if block (Sebastian), - a simplified version of the commit description that explains briefly why the change is necessary (Christian). Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/14882 Fixes: `7ae034590c` ("drm/i915/ttm: add tt shmem backend") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: stable@vger.kernel.org # v5.17+ Cc: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20260508122612.469227-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 4462966a93eb185849b7f174f0d0de53476d00a4) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>	2026-05-26 09:31:40 +01:00
Berkant Koc	7f87763f47	drm/hyperv: validate VMBus packet size in receive callback hyperv_receive_sub() reads msg->vid_hdr.type and dispatches into one of four message-type branches without knowing how many bytes the host wrote into hv->recv_buf. The completion path then runs memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE), so the consumer that wakes on wait_for_completion_timeout() can read up to 16 KiB of residue from a prior message as if it were the response payload. Pass bytes_recvd into hyperv_receive_sub() and reject any packet that does not cover the pipe + synthvid header. A single switch on msg->vid_hdr.type then computes the type-specific payload size: the three completion-driving types (SYNTHVID_VERSION_RESPONSE, SYNTHVID_RESOLUTION_RESPONSE, SYNTHVID_VRAM_LOCATION_ACK) fall through to a shared exit that requires that size before memcpy/complete, while SYNTHVID_FEATURE_CHANGE validates its own payload and returns before reading is_dirt_needed. Unknown types are dropped. SYNTHVID_RESOLUTION_RESPONSE is variable length: the host fills resolution_count entries, not the full SYNTHVID_MAX_RESOLUTION_COUNT array. Validate the fixed prefix first so resolution_count can be read, bound it against the array, then require only the count-sized array, so the shorter responses the host actually sends are accepted. Only run the sub-handler when vmbus_recvpacket() returned success. The memcpy length is bytes_recvd, which is bounded by VMBUS_MAX_PACKET_SIZE only on a successful receive; on -ENOBUFS vmbus_recvpacket() instead reports the required length, which can exceed hv->recv_buf, so copying bytes_recvd would read and write past the 16 KiB buffers. Gating on the success return keeps the copy bounded. The nonzero-return path is itself a malformed-message case and is now logged rather than silently skipped; channel recovery is not attempted. Rejected packets are reported via drm_err_ratelimited() rather than silently dropped, matching the CoCo-hardened pattern in hv_kvp_onchannelcallback(). Fixes: `76c56a5aff` ("drm/hyperv: Add DRM driver for hyperv synthetic video device") Cc: stable@vger.kernel.org # 5.14+ Signed-off-by: Berkant Koc <me@berkoc.com> Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Link: https://patch.msgid.link/8200dbc199c7a9b75ac7e8af6c748d2189b5ebd5.1779542874.git.me@berkoc.com	2026-05-25 07:31:53 -04:00
Berkant Koc	13d33b9ef6	drm/hyperv: validate resolution_count and fix WIN8 fallback A SYNTHVID_RESOLUTION_RESPONSE with resolution_count > 64 walks past the supported_resolution[SYNTHVID_MAX_RESOLUTION_COUNT] array in the parse loop. Bound resolution_count against the array size, folded into the existing zero-check. When the WIN10 resolution probe fails, the caller in hyperv_connect_vsp() left hv->screen__max / preferred_ unpopulated, which sets mode_config.max_width / max_height to 0 and makes drm_internal_framebuffer_create() reject every userspace framebuffer with -EINVAL. The pre-WIN10 branch had the same gap for preferred_width / preferred_height. Use a single post-probe fallback guarded by screen_width_max == 0 so both paths converge on the WIN8 defaults. Signed-off-by: Berkant Koc <me@berkoc.com> Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline Fixes: `76c56a5aff` ("drm/hyperv: Add DRM driver for hyperv synthetic video device") Cc: stable@vger.kernel.org # 5.14+ Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Link: https://patch.msgid.link/6945b22419c7d404b4954a113de2ac9c900dba93.1779542874.git.me@berkoc.com	2026-05-25 07:31:36 -04:00
Nathan Chancellor	53676e4d44	drm/msm: Restore second parameter name in purge() and evict() After commit `3392291fc5` ("drm/msm: Fix shrinker deadlock"), all supported versions of clang warn (or error with CONFIG_WERROR=y): drivers/gpu/drm/msm/msm_gem_shrinker.c:105:58: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions] 105 \| purge(struct drm_gem_object obj, struct ww_acquire_ctx ) \| ^ drivers/gpu/drm/msm/msm_gem_shrinker.c:117:58: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions] 117 \| evict(struct drm_gem_object obj, struct ww_acquire_ctx ) \| ^ 2 errors generated. With older but supported versions of GCC, this is an unconditional hard error: drivers/gpu/drm/msm/msm_gem_shrinker.c: In function 'purge': drivers/gpu/drm/msm/msm_gem_shrinker.c:105:35: error: parameter name omitted purge(struct drm_gem_object obj, struct ww_acquire_ctx ) ^~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/msm/msm_gem_shrinker.c: In function 'evict': drivers/gpu/drm/msm/msm_gem_shrinker.c:117:35: error: parameter name omitted evict(struct drm_gem_object obj, struct ww_acquire_ctx ) ^~~~~~~~~~~~~~~~~~~~~~~ Restore the parameter name to clear up the warnings, renaming it "unused" to make it clear it is only needed to satisfy the prototype of drm_gem_lru_scan(). Cc: stable@vger.kernel.org Fixes: `3392291fc5` ("drm/msm: Fix shrinker deadlock") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-05-24 10:31:24 -07:00
Dave Airlie	84335a9985	Merge tag 'drm-xe-fixes-2026-05-21' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - SRIOV related fixes (Wajdeczko, Mohanram) - Fix leak and double-free (Lin) - Multi-cast register fixes (Gustavo) - Multi-queue fix (Niranjana) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/ag9rR5VwCdkA0lzI@intel.com	2026-05-23 07:57:08 +10:00
Dave Airlie	4378a41165	Merge tag 'mediatek-drm-fixes-20260521' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes Mediatek DRM Fixes - 20260521 1. fix sparse warnings Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patch.msgid.link/20260521135649.4681-1-chunkuang.hu@kernel.org	2026-05-22 08:31:08 +10:00
Dave Airlie	71d9e1561a	Merge tag 'drm-misc-fixes-2026-05-21' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: amdxdna: - remove mmap and export for ubuf bridge: - chipone-icn6211: managed bridge cleanup - lt66121: acquire reset GPIO - megachips: fix clean up on failed IRQ requests gem: - clean up LRU locking v3d: - fix UAF in error code paths - release GEM-object ref on free'd jobs virtio: - use uninterruptible resv locking in plane updates Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260521071456.GA14644@localhost.localdomain	2026-05-22 07:01:04 +10:00
Shuicheng Lin	4d25342543	drm/xe/oa: Fix exec_queue leak on width check in stream open In xe_oa_stream_open_ioctl(), when param.exec_q->width > 1 the function returns -EOPNOTSUPP directly, skipping the existing err_exec_q cleanup path. The exec_queue reference obtained by xe_exec_queue_lookup() is leaked. The exec queue holds a reference on the xe_file, which is only dropped during queue teardown. The leaked lookup ref is not on the file's exec_queue xarray, so file close cannot release it. This keeps both the exec queue and the file private state pinned indefinitely. Jump to err_exec_q instead of returning directly so the reference is released. Fixes: `f0ed39830e` ("xe/oa: Fix query mode of operation for OAR/OAC") Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20260514203210.593488-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit 339fa0be9e4a5d69fa47e91f4a36574224fb478f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-05-21 09:56:49 -04:00
Dave Airlie	aee43aaf26	Merge tag 'amd-drm-fixes-7.1-2026-05-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.1-2026-05-20: amdgpu: - Userq fixes - VPE fix - SMU 15 fix - Misc fixes - VCE fixes - DC bios parsing fixes - DC aux fix - Mode1 reset fix - RAS fixes amdkfd: - Misc fixes radeon: - CS parser fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260520181359.28421-1-alexander.deucher@amd.com	2026-05-21 16:29:45 +10:00
Dave Airlie	30afd245e2	Merge tag 'drm-intel-fixes-2026-05-20' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix joiner color pipeline selection [display] (Chaitanya Kumar Borah) - Fix readback for target_rr in Adaptive Sync SDP [dp] (Ankit Nautiyal) - Apply Intel DPCD workaround when SDP on prior line used [psr] (Jouni Högander) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tursulin@igalia.com> Link: https://patch.msgid.link/ag1hKBRKwwv9JOMW@linux	2026-05-21 11:50:28 +10:00
Dave Airlie	5b4a47dc54	Merge tag 'drm-msm-fixes-2026-05-17' of https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v7.1: Core: - Fixed bindings for SM8650, SM8750 and Eliza - Don't use UTS_RELEASE directly - Fix typo in clock-names property DPU: - Fixed CWB description on Kaanapali - Fixed scanline strides for YUV UBWC formats - Stopped DSI register dumping to access past the end of region DSI: - Fix dumping unaligned regions GPU: - Fix GMEM_BASE for a6xx gen3 - Fix userspace reachable crash on a2xx-a4xx - Fix sysprof_active for counter collection with IFPC enabled GPUs - Fix shrinker lockdep Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <rob.clark@oss.qualcomm.com> Link: https://patch.msgid.link/CACSVV02cTK7h=d0uqanRE-cj35THDqFjqsTB_2zQV1Mcw77aNw@mail.gmail.com	2026-05-21 10:12:22 +10:00
Deepanshu Kartikey	9af1b6e175	drm/virtio: use uninterruptible resv lock for plane updates virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush() lock the framebuffer BO's dma_resv via virtio_gpu_array_lock_resv() and ignore its return value. The function can fail with -EINTR from dma_resv_lock_interruptible() (signal during lock wait) or with -ENOMEM from dma_resv_reserve_fences() (fence slot allocation), leaving the resv lock not held. The queue path then walks the object array and calls dma_resv_add_fence(), which requires the lock held; with lockdep enabled this trips dma_resv_assert_held(): WARNING: drivers/dma-buf/dma-resv.c:296 at dma_resv_add_fence+0x71e/0x840 Call Trace: virtio_gpu_array_add_fence virtio_gpu_queue_ctrl_sgs virtio_gpu_queue_fenced_ctrl_buffer virtio_gpu_cursor_plane_update drm_atomic_helper_commit_planes drm_atomic_helper_commit_tail commit_tail drm_atomic_helper_commit drm_atomic_commit drm_atomic_helper_update_plane __setplane_atomic drm_mode_cursor_universal drm_mode_cursor_common drm_mode_cursor_ioctl drm_ioctl __x64_sys_ioctl Beyond the WARN, mutating the dma_resv fence list without the lock races with concurrent readers/writers and can corrupt the list. Both call sites run inside the .atomic_update plane callback, which DRM atomic helpers do not allow to fail (by the time it runs, the commit has been signed off to userspace and there is no clean rollback path). Moving the lock acquisition to .prepare_fb was rejected because the broader lock scope deadlocks against other BO locking paths in the same atomic commit. Introduce virtio_gpu_lock_one_resv_uninterruptible() that uses dma_resv_lock() instead of dma_resv_lock_interruptible(). This eliminates the -EINTR failure mode -- the realistic syzbot trigger -- without extending the lock hold across the commit. The helper locks a single BO and rejects nents > 1 with -EINVAL; both fix sites lock exactly one BO. Use it from virtio_gpu_cursor_plane_update() and virtio_gpu_resource_flush(); check the return value to handle the remaining -ENOMEM case from dma_resv_reserve_fences() by freeing the objs and skipping the plane update for that frame. The framebuffer BOs touched here are not shared with other contexts and lock contention is expected to be brief, so the loss of signal-interruptibility is acceptable. Other callers of virtio_gpu_array_lock_resv() (the ioctl paths) continue to use the interruptible variant. The bug was reported by syzbot, triggered via fault injection (fail_nth) on the DRM_IOCTL_MODE_CURSOR path, which forces the -ENOMEM branch in dma_resv_reserve_fences(). Reported-by: syzbot+72bd3dd3a5d5f39a0271@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=72bd3dd3a5d5f39a0271 Fixes: `5cfd31c5b3` ("drm/virtio: fix virtio_gpu_cursor_plane_update().") Cc: stable@vger.kernel.org Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patch.msgid.link/20260519082247.34470-1-kartikey406@gmail.com	2026-05-20 18:12:11 +03:00
Christian König	b6fe4ff340	drm/amdgpu: fix handling in amdgpu_userq_create Well mostly the same issues the other code had as well: 1. Memory allocation while holding the userq_mutex lock is forbidden! 2. Things were created/started/published in the wrong order. 3. The reset lock was taken in the wrong order and seems to be unecessary in the first place. 4. Error messages on invalid input parameters can spam the logs. 5. Error messages on memory allocation failures are usually superflous as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89e50de5654dbe7a137e03d78629542e17ba7202)	2026-05-19 12:25:32 -04:00
Vitaliy Triang3l Kuzmin	d093c01d30	drm/radeon/evergreen_cs: Add missing NULL prefix check in surface check 'evergreen_surface_check' is called with a NULL warning prefix when handling potentially recoverable issues or just to compute the alignment requirements, and 'evergreen_surface_check' is called again in case of failure (with the correct prefix, as opposed to NULL), therefore, the initial check must not print a warning, because the surface may be accepted successfully after having been corrected, however if it isn't, the final check will print the warning anyway. The surface check functions specific to array modes already implement this behavior, but the 'evergreen_surface_check' function itself doesn't. This is also supposed to fix the "'%s' directive argument is null [-Werror=format-overflow=]" compiler warning. Fixes: `285484e2d5` ("drm/radeon: add support for evergreen/ni tiling informations v11") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Vitaliy Triang3l Kuzmin <ml@triang3l.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e20ea411c99f6968af35fd03e9ee21f70d799144)	2026-05-19 12:16:16 -04:00
Sunil Khatri	b6a28b77b8	drm/amdgpu: userq_va_mapped should remain true once done Multiple queues needs these bo_va objects belonging to the same uq_mgr. So once they are mapped lets not unmap them as at any point of time any of the queues might be using it. Also userq_va_mapped should be a boolean than atomic. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5c02889ea22575c3bcfdf212e65fac316cbc6c6a)	2026-05-19 12:15:49 -04:00
Ce Sun	cd7cfcdb4d	drm/amdgpu: avoid integer overflow in VA range check The original addition operation in 64-bit unsigned type may encounter overflow situations. To prevent such issues and safely reject invalid inputs, the check_add_overflow() function is used. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cc768f4dd0bb9083c813683eeec44fc23921f771)	2026-05-19 12:15:41 -04:00
Xiang Liu	893fea60f8	drm/amd/ras: Fix UMC error address allocation leak amdgpu_umc_handle_bad_pages() allocates err_data->err_addr before querying UMC error information. In the direct and firmware query paths, the pointer is reassigned to a fresh allocation before the original buffer is released, so the initial allocation is leaked on each handled event. Free the existing buffer before replacing it in those query paths so the function exit cleanup only owns the active allocation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 911b1bdd22c3712a22b60fcc58f7b9f2d07b0803)	2026-05-19 12:15:24 -04:00
Yifan Zhang	353f7430d1	drm/amdgpu: unmap all user mappings of framebuffer and doorbell before mode1 reset During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily inaccessible via PCIe. Any attempt to access framebuffer or MMIO registers during this window can result in uncompleted PCIe transactions, leading to NMI panics or system hangs. To prevent this, Unmap all of the applications mappings of the framebuffer and doorbell BARs before mode1 reset. Also prevent new mappings from coming in during the reset process. v2: remove inode in kfd_dev (Christian) v3: correct unmap offset (Felix), remove prevent new mappings part to avoid deadlock (Christian) Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 70cadefcc6160c575b04f763ada34c20e868d577)	2026-05-19 12:14:55 -04:00
Harry Wentland	6c92f6d960	drm/amd/display: Validate payload length and link_index in dc_process_dmub_aux_transfer_async [Why&How] dc_process_dmub_aux_transfer_async() copies payload->length bytes into a 16-byte stack buffer (dpaux.data[16]) guarded only by an ASSERT(), which is a no-op in release builds. If a caller ever passes length > 16 this results in a stack buffer overflow via memcpy. Additionally, link_index is used to dereference dc->links[] without bounds checking against dc->link_count, risking an out-of-bounds access. Replace the ASSERT with a hard runtime check that returns false when payload->length exceeds the destination buffer size, and add a bounds check for link_index before it is used. Assisted-by: GitHub Copilot:Claude claude-4-opus Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ba4caa9fecdf7a38f98c878ad05a8a64148b6881) Cc: stable@vger.kernel.org	2026-05-19 12:13:56 -04:00
Harry Wentland	86d2b20644	drm/amd/display: Validate GPIO pin LUT table size before iterating [Why&How] The GPIO pin table parsers in get_gpio_i2c_info() and bios_parser_get_gpio_pin_info() derive an element count from the VBIOS table_header.structuresize field, then iterate over gpio_pin[] entries. However, GET_IMAGE() only validates that the table header itself fits within the BIOS image. If the VBIOS reports a structuresize larger than the actual mapped data, the loop reads past the end of the BIOS image, causing an out-of-bounds read. Fix this by calling bios_get_image() to validate that the full claimed structuresize is accessible within the BIOS image before entering the loop in both functions. Assisted-by: GitHub Copilot:claude-opus-4-6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ba5e95b43b773ae1bf1f66ee6b31eb774e65afe3) Cc: stable@vger.kernel.org	2026-05-19 12:13:29 -04:00
Harry Wentland	cd86529ec6	drm/amd/display: Fix integer overflow in bios_get_image() [Why&How] The bounds check in bios_get_image() computes 'offset + size' using unsigned 32-bit arithmetic before comparing against bios_size. If a VBIOS image contains a near-UINT32_MAX offset the addition wraps to a small value, the comparison passes, and the function returns a wild pointer past the VBIOS mapping. Additionally, the comparison uses '<' (strict), which incorrectly rejects the valid exact-fit case where offset + size == bios_size. Fix both issues by restructuring the check to avoid the addition entirely: first reject if offset alone exceeds bios_size, then check size against the remaining space (bios_size - offset). This eliminates the overflow and correctly permits exact-fit accesses. Assisted-by: GitHub Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d40fb392af659c4a02b560319f226842f6ec1a95) Cc: stable@vger.kernel.org	2026-05-19 12:13:07 -04:00

1 2 3 4 5 ...

124636 Commits