linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-23 02:05:46 -04:00

Author	SHA1	Message	Date
YiPeng Chai	fa0b203cd9	drm/amd/ras: Add amdgpu ras management function. Add amdgpu system configuration parameters and functions needed by rascore. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:31 -04:00
YiPeng Chai	e221ac6f42	drm/amd/ras: Amdgpu preprocesses ras interrupts Amdgpu preprocesses ras interrupts. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	ffdab7f4e5	drm/amd/ras: Add amdgpu ras system functions Add amdgpu ras system functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	b658fadf1c	drm/amd/ras: Amdgpu handle ras ioctl command Amdgpu handle ras ioctl command. V2: Remove non-standard device information. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	764e868928	drm/amd/ras: Add amdgpu eeprom i2c configuration function Add amdgpu eeprom i2c configuration function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	585fe8f3b3	drm/amd/ras: Add amdgpu mp1 v13_0 configuration function Add amdgpu mp1 v13_0 configuration function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	960cc53ca8	drm/amd/ras: Add amdgpu nbio v7_9 configuration function Add amdgpu nbio v7_9 configuration function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	cef10272e7	drm/amd/ras: Add files to ras core Makefile Add files to ras core Makefile. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	13c91b5b43	drm/amd/ras: Add rascore unified interface function 1. Complete the initialization call of all sub-functions. 2. Export common interfaces. V2: Remove the use of typedef to define function pointer. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	54ad42c23d	drm/amd/ras: Add cper conversion function Add cper conversion function. V3: Change commit message and update the calling function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	0ec9ed84fb	drm/amd/ras: Use ring buffer to record ras ecc data Use ring buffer to record ras ecc data. V3: Change commit message and rename the file and function names. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	ea61341b90	drm/amd/ras: Add thread to handle ras events Add thread to handle ras events. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:30 -04:00
YiPeng Chai	19030244e1	drm/amd/ras: Add ras ioctl command handler Add ras ioctl command handler. V2: Remove ras global device list. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	c49ef01183	drm/amd/ras: Add psp ras common functions Add psp ras common functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	9f3083dc9f	drm/amd/ras: Add psp v13_0 ras functions Add psp v13_0 ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	5c3be5defc	drm/amd/ras: Add eeprom ras functions Add eeprom ras functions. V5: Remove duplicate data structure definition. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	a8f2352a41	drm/amd/ras: Add gfx common ras functions Add gfx common ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	4b23ebf7a0	drm/amd/ras: Add gfx v9_0 ras functions Add gfx v9_0 ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	7a3f9c0992	drm/amd/ras: Add umc common ras functions Add umc common ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	8bd7fe95a4	drm/amd/ras: Add umc v12_0 ras functions Add umc v12_0 ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	df2d8574c5	drm/amd/ras: Add nbio common ras functions Add nbio common ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	fa4fe20f45	drm/amd/ras: Add nbio v7_9 ras functions Add nbio v7_9 ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:29 -04:00
YiPeng Chai	adf0e0e089	drm/amd/ras: Add mp1 common ras functions Add mp1 common ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
YiPeng Chai	71abe27a9a	drm/amd/ras: Add mp1 v13_0 ras functions Add mp1 v13_0 ras functions. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
YiPeng Chai	88e379e5b8	drm/amd/ras: Add aca common ras functions Add aca common ras functions: 1. Aca hw init/fini. 2. Get ecc count of each ras block. 3. Update query ecc count from mp1. 4. Clear ras block ecc count. V3: Update the calling function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
YiPeng Chai	fd98319f73	drm/amd/ras: Add ras aca parser v1.0 Add ras aca parser v1.0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Sunil Khatri	b1dd0db1c6	drm/amdgpu: clean up amdgpu hmm range functions Clean up the amdgpu hmm range functions for clearer definition of each. a. Split amdgpu_ttm_tt_get_user_pages_done into two: 1. amdgpu_hmm_range_valid: To check if the user pages are valid and update seq num 2. amdgpu_hmm_range_free: Clean up the hmm range and pfn memory. b. amdgpu_ttm_tt_get_user_pages_done and amdgpu_ttm_tt_discard_user_pages are similar function so remove discard and directly use amdgpu_hmm_range_free to clean up the hmm range and pfn memory. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Sunil Khatri	e095b55155	drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages update the amdgpu_ttm_tt_get_user_pages and all dependent function along with it callers to use a user allocated hmm_range buffer instead hmm layer allocates the buffer. This is a need to get hmm_range pointers easily accessible without accessing the bo and that is a requirement for the userqueue to lock the userptrs effectively. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	079ae5118e	drm/amdkfd: fix suspend/resume all calls in mes based eviction path Suspend/resume all gangs should be done with the device lock is held. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	277bb0f83e	drm/amdgpu: enable suspend/resume all for gfx 12 Suspend/resume all gangs has been available for GFX12 for a while now so enable it. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	0ef930e1fa	drm/amdgpu: fix hung reset queue array memory allocation By design the MES will return an array result that is twice the number of hung doorbells it can report. i.e. if up k reported doorbells are supported, then the second half of the array, also of length k, holds the HQD information (type/queue/pipe) where queue 1 corresponds to index 0 and k, queue 2 corresponds to index 1 and k + 1 etc ... The driver will use the HDQ info to target queue/pipe reset for hardware scheduled user compute queues. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	8745ca5efb	drm/amdgpu: fix initialization of doorbell array for detect and hang Initialized doorbells should be set to invalid rather than 0 to prevent driver from over counting hung doorbells since it checks against the invalid value to begin with. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	d0de79f66a	drm/amdgpu: fix gfx12 mes packet status return check GFX12 MES uses low 32 bits of status return for success (1 or 0) and high bits for debug information if low bits are 0. GFX11 MES doesn't do this so checking full 64-bit status return for 1 or 0 is still valid. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-10-13 14:14:16 -04:00
Jesse.Zhang	883f309add	drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices Previously, APU platforms (and other scenarios with uninitialized VRAM managers) triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root cause is not that the `struct ttm_resource_manager man` pointer itself is NULL, but that `man->bdev` (the backing device pointer within the manager) remains uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to a kernel OOPS. 1. amdgpu_cs.c: Extend the existing bandwidth control check in `amdgpu_cs_get_threshold_for_moves()` to include a check for `ttm_resource_manager_used()`. If the manager is not used (uninitialized `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific logic that would trigger the NULL dereference. 2. amdgpu_kms.c: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info reporting to use a conditional: if the manager is used, return the real VRAM usage; otherwise, return 0. This avoids accessing `man->bdev` when it is NULL. 3. amdgpu_virt.c*: Modify the vf2pf (virtual function to physical function) data write path. Use `ttm_resource_manager_used()` to check validity: if the manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set `fb_usage` to 0 (APUs have no discrete framebuffer to report). This approach is more robust than APU-specific checks because it: - Works for all scenarios where the VRAM manager is uninitialized (not just APUs), - Aligns with TTM's design by using its native helper function, - Preserves correct behavior for discrete GPUs (which have fully initialized `man->bdev` and pass the `ttm_resource_manager_used()` check). v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Christian König	33cc891b56	drm/amdgpu: hide VRAM sysfs attributes on GPUs without VRAM Otherwise accessing them can cause a crash. Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Sathishkumar S	74de0eaa00	drm/amdgpu: fix bit shift logic BIT_ULL(n) sets nth bit, remove explicit shift and set the position Fixes: `a7a411e246` ("drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set") Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Timur Kristóf	6917112af2	drm/amd/powerplay: Fix CIK shutdown temperature Remove extra multiplication. CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case the shutdown temperature is hardcoded in smu7_init_dpm_defaults and is already multiplied by 1000. The value was mistakenly multiplied another time by smu7_get_thermal_temperature_range. Fixes: `4ba082572a` ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Gui-Dong Han	6df8e84aa6	drm/amdgpu: use atomic functions with memory barriers for vm fault info The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: `b97dfa27ef` ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	ff780f4f80	drm/amdgpu: set an error on all fences from a bad context When we backup ring contents to reemit after a queue reset, we don't backup ring contents from the bad context. When we signal the fences, we should set an error on those fences as well. v2: misc cleanups v3: add locking for fence error, fix comment (Christian) v4: fix wrap around, locking (Christian) Fixes: `77cc0da39c` ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	1f22fcb88b	drm/amdgpu: handle wrap around in reemit handling Compare the sequence numbers directly. Fixes: `77cc0da39c` ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	357d90be2c	drm/amdgpu: fix handling of harvesting for ip_discovery firmware Chips which use the IP discovery firmware loaded by the driver reported incorrect harvesting information in the ip discovery table in sysfs because the driver only uses the ip discovery firmware for populating sysfs and not for direct parsing for the driver itself as such, the fields that are used to print the harvesting info in sysfs report incorrect data for some IPs. Populate the relevant fields for this case as well. Fixes: `514678da56` ("drm/amdgpu/discovery: fix fw based ip discovery") Acked-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Christian König	8f74c70be5	drm/amdgpu: block CE CS if not explicitely allowed by module option The Constant Engine found on gfx6-gfx10 HW has been a notorious source of problems. RADV never used it in the first place, radeonsi only used it for a few releases around 2017 for gfx6-gfx9 before dropping support for it as well. While investigating another problem I just recently found that submitting to the CE seems to be completely broken on gfx9 for quite a while. Since nobody complained about that problem it most likely means that nobody is using any of the affected radeonsi versions on current Linux kernels any more. So to potentially phase out the support for the CE and eliminate another source of problems block submitting CE IBs unless it is enabled again using a debug flag. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Christian König	5d55ed19d4	drm/amdgpu: remove two invalid BUG_ON()s Those can be triggered trivially by userspace. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Timur Kristóf	7bdd91abf0	drm/amd: Disable ASPM on SI Enabling ASPM causes randoms hangs on Tahiti and Oland on Zen4. It's unclear if this is a platform-specific or GPU-specific issue. Disable ASPM on SI for the time being. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Timur Kristóf	5c05bcf6ae	drm/amd/pm: Disable MCLK switching on SI at high pixel clocks On various SI GPUs, a flickering can be observed near the bottom edge of the screen when using a single 4K 60Hz monitor over DP. Disabling MCLK switching works around this problem. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Matthew Schwartz	9858ea4c29	Revert "drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume" This fix regressed the original issue that commit `7875afafba` ("drm/amd/display: Fix brightness level not retained over reboot") solved, so revert it until a different approach to solve the regression that it caused with AMD_PRIVATE_COLOR is found. Fixes: `a490c8d77d` ("drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4620 Cc: stable@vger.kernel.org Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Linus Torvalds	284fc30e66	Merge tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel Pull more drm fixes from Dave Airlie: "Just the follow up fixes for rc1 from the next branch, amdgpu and xe mostly with a single v3d fix in there. amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL xe: - Fix build with clang 16 - Fix handling of invalid configfs syntax usage and spell out the expected syntax in the documentation - Do not try late bind firmware when running as VF since it shouldn't handle firmware loading - Fix idle assertion for local BOs - Fix uninitialized variable for late binding - Do not require perfmon_capable to expose free memory at page granularity. Handle it like other drm drivers do - Fix lock handling on suspend error path - Fix I2C controller resume after S3 v3d: - fix fence locking" * tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/amd/display: Incorrect Mirror Cositing drm/amd/display: Enable Dynamic DTBCLK Switch drm/amdgpu: Report individual reset error drm/amdgpu: partially revert "revert to old status lock handling v3" drm/amd/display: Fix unsafe uses of kernel mode FPU drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine drm/amdgpu: Check swus/ds for switch state save drm/amdkfd: Fix two comments in kfd_ioctl.h drm/amd/pm: Avoid interface mismatch messaging drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init drm/amd/amdgpu: Fix the mes version that support inv_tlbs drm/amd: Check whether secure display TA loaded successfully drm/amdkfd: Fix mmap write lock not release drm/amdkfd: Fix kfd process ref leaking when userptr unmapping drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. drm/amd/display: Disable scaling on DCE6 for now drm/amd/display: Properly disable scaling on DCE6 drm/amd/display: Properly clear SCL__FILTER_CONTROL on DCE6 drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT SRIs ...	2025-10-10 14:02:14 -07:00
Linus Torvalds	1e5d41b981	Merge tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Some fixes leftover from our fixes branch, just nouveau and vmwgfx: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation" * tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel: drm/nouveau: fix bad ret code in nouveau_bo_move_prep drm/vmwgfx: Fix copy-paste typo in validation drm/vmwgfx: Fix Use-after-free in validation drm/vmwgfx: Fix a null-ptr access in the cursor snooper	2025-10-10 13:59:38 -07:00
Dave Airlie	5ca5f00a16	Merge tag 'drm-misc-fixes-2025-10-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251009120004.GA17570@linux.fritz.box	2025-10-11 06:17:13 +10:00
Dave Airlie	c4b6ddcf01	Merge tag 'amd-drm-next-6.18-2025-10-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-10-09: amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20251009162915.981503-1-alexander.deucher@amd.com	2025-10-10 06:57:56 +10:00

1 2 3 4 5 ...

118453 Commits