linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-03 05:15:53 -04:00

Author	SHA1	Message	Date
Jonathan Kim	079ae5118e	drm/amdkfd: fix suspend/resume all calls in mes based eviction path Suspend/resume all gangs should be done with the device lock is held. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	277bb0f83e	drm/amdgpu: enable suspend/resume all for gfx 12 Suspend/resume all gangs has been available for GFX12 for a while now so enable it. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	0ef930e1fa	drm/amdgpu: fix hung reset queue array memory allocation By design the MES will return an array result that is twice the number of hung doorbells it can report. i.e. if up k reported doorbells are supported, then the second half of the array, also of length k, holds the HQD information (type/queue/pipe) where queue 1 corresponds to index 0 and k, queue 2 corresponds to index 1 and k + 1 etc ... The driver will use the HDQ info to target queue/pipe reset for hardware scheduled user compute queues. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	8745ca5efb	drm/amdgpu: fix initialization of doorbell array for detect and hang Initialized doorbells should be set to invalid rather than 0 to prevent driver from over counting hung doorbells since it checks against the invalid value to begin with. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	d0de79f66a	drm/amdgpu: fix gfx12 mes packet status return check GFX12 MES uses low 32 bits of status return for success (1 or 0) and high bits for debug information if low bits are 0. GFX11 MES doesn't do this so checking full 64-bit status return for 1 or 0 is still valid. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-10-13 14:14:16 -04:00
Jesse.Zhang	883f309add	drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices Previously, APU platforms (and other scenarios with uninitialized VRAM managers) triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root cause is not that the `struct ttm_resource_manager man` pointer itself is NULL, but that `man->bdev` (the backing device pointer within the manager) remains uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to a kernel OOPS. 1. amdgpu_cs.c: Extend the existing bandwidth control check in `amdgpu_cs_get_threshold_for_moves()` to include a check for `ttm_resource_manager_used()`. If the manager is not used (uninitialized `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific logic that would trigger the NULL dereference. 2. amdgpu_kms.c: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info reporting to use a conditional: if the manager is used, return the real VRAM usage; otherwise, return 0. This avoids accessing `man->bdev` when it is NULL. 3. amdgpu_virt.c*: Modify the vf2pf (virtual function to physical function) data write path. Use `ttm_resource_manager_used()` to check validity: if the manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set `fb_usage` to 0 (APUs have no discrete framebuffer to report). This approach is more robust than APU-specific checks because it: - Works for all scenarios where the VRAM manager is uninitialized (not just APUs), - Aligns with TTM's design by using its native helper function, - Preserves correct behavior for discrete GPUs (which have fully initialized `man->bdev` and pass the `ttm_resource_manager_used()` check). v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Christian König	33cc891b56	drm/amdgpu: hide VRAM sysfs attributes on GPUs without VRAM Otherwise accessing them can cause a crash. Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Sathishkumar S	74de0eaa00	drm/amdgpu: fix bit shift logic BIT_ULL(n) sets nth bit, remove explicit shift and set the position Fixes: `a7a411e246` ("drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set") Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Timur Kristóf	6917112af2	drm/amd/powerplay: Fix CIK shutdown temperature Remove extra multiplication. CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case the shutdown temperature is hardcoded in smu7_init_dpm_defaults and is already multiplied by 1000. The value was mistakenly multiplied another time by smu7_get_thermal_temperature_range. Fixes: `4ba082572a` ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Gui-Dong Han	6df8e84aa6	drm/amdgpu: use atomic functions with memory barriers for vm fault info The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: `b97dfa27ef` ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	ff780f4f80	drm/amdgpu: set an error on all fences from a bad context When we backup ring contents to reemit after a queue reset, we don't backup ring contents from the bad context. When we signal the fences, we should set an error on those fences as well. v2: misc cleanups v3: add locking for fence error, fix comment (Christian) v4: fix wrap around, locking (Christian) Fixes: `77cc0da39c` ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	1f22fcb88b	drm/amdgpu: handle wrap around in reemit handling Compare the sequence numbers directly. Fixes: `77cc0da39c` ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	357d90be2c	drm/amdgpu: fix handling of harvesting for ip_discovery firmware Chips which use the IP discovery firmware loaded by the driver reported incorrect harvesting information in the ip discovery table in sysfs because the driver only uses the ip discovery firmware for populating sysfs and not for direct parsing for the driver itself as such, the fields that are used to print the harvesting info in sysfs report incorrect data for some IPs. Populate the relevant fields for this case as well. Fixes: `514678da56` ("drm/amdgpu/discovery: fix fw based ip discovery") Acked-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Christian König	8f74c70be5	drm/amdgpu: block CE CS if not explicitely allowed by module option The Constant Engine found on gfx6-gfx10 HW has been a notorious source of problems. RADV never used it in the first place, radeonsi only used it for a few releases around 2017 for gfx6-gfx9 before dropping support for it as well. While investigating another problem I just recently found that submitting to the CE seems to be completely broken on gfx9 for quite a while. Since nobody complained about that problem it most likely means that nobody is using any of the affected radeonsi versions on current Linux kernels any more. So to potentially phase out the support for the CE and eliminate another source of problems block submitting CE IBs unless it is enabled again using a debug flag. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Christian König	5d55ed19d4	drm/amdgpu: remove two invalid BUG_ON()s Those can be triggered trivially by userspace. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Timur Kristóf	7bdd91abf0	drm/amd: Disable ASPM on SI Enabling ASPM causes randoms hangs on Tahiti and Oland on Zen4. It's unclear if this is a platform-specific or GPU-specific issue. Disable ASPM on SI for the time being. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Timur Kristóf	5c05bcf6ae	drm/amd/pm: Disable MCLK switching on SI at high pixel clocks On various SI GPUs, a flickering can be observed near the bottom edge of the screen when using a single 4K 60Hz monitor over DP. Disabling MCLK switching works around this problem. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Matthew Schwartz	9858ea4c29	Revert "drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume" This fix regressed the original issue that commit `7875afafba` ("drm/amd/display: Fix brightness level not retained over reboot") solved, so revert it until a different approach to solve the regression that it caused with AMD_PRIVATE_COLOR is found. Fixes: `a490c8d77d` ("drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4620 Cc: stable@vger.kernel.org Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Linus Torvalds	284fc30e66	Merge tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel Pull more drm fixes from Dave Airlie: "Just the follow up fixes for rc1 from the next branch, amdgpu and xe mostly with a single v3d fix in there. amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL xe: - Fix build with clang 16 - Fix handling of invalid configfs syntax usage and spell out the expected syntax in the documentation - Do not try late bind firmware when running as VF since it shouldn't handle firmware loading - Fix idle assertion for local BOs - Fix uninitialized variable for late binding - Do not require perfmon_capable to expose free memory at page granularity. Handle it like other drm drivers do - Fix lock handling on suspend error path - Fix I2C controller resume after S3 v3d: - fix fence locking" * tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/amd/display: Incorrect Mirror Cositing drm/amd/display: Enable Dynamic DTBCLK Switch drm/amdgpu: Report individual reset error drm/amdgpu: partially revert "revert to old status lock handling v3" drm/amd/display: Fix unsafe uses of kernel mode FPU drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine drm/amdgpu: Check swus/ds for switch state save drm/amdkfd: Fix two comments in kfd_ioctl.h drm/amd/pm: Avoid interface mismatch messaging drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init drm/amd/amdgpu: Fix the mes version that support inv_tlbs drm/amd: Check whether secure display TA loaded successfully drm/amdkfd: Fix mmap write lock not release drm/amdkfd: Fix kfd process ref leaking when userptr unmapping drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. drm/amd/display: Disable scaling on DCE6 for now drm/amd/display: Properly disable scaling on DCE6 drm/amd/display: Properly clear SCL__FILTER_CONTROL on DCE6 drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT SRIs ...	2025-10-10 14:02:14 -07:00
Jesse Agate	d07e142641	drm/amd/display: Incorrect Mirror Cositing [WHY] hinit/vinit are incorrect in the case of mirroring. [HOW] Cositing sign must be flipped when image is mirrored in the vertical or horizontal direction. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Jesse Agate <jesse.agate@amd.com> Signed-off-by: Brendan Leder <breleder@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:20 -04:00
Fangzhi Zuo	5949e7c489	drm/amd/display: Enable Dynamic DTBCLK Switch [WHAT] Since dcn35, DTBCLK can be disabled when no DP2 sink connected for power saving purpose. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Lijo Lazar	2e97663760	drm/amdgpu: Report individual reset error If reinitialization of one of the GPUs fails after reset, it logs failure on all subsequent GPUs eventhough they have resumed successfully. A sample log where only device at 0000:95:00.0 had a failure - amdgpu 0000:15:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:65:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:75:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:85:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:95:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:e5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:f5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:05:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:15:00.0: amdgpu: GPU reset end with ret = -5 To avoid confusion, report the error for each device separately and return the first error as the overall result. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Christian König	a107aeb6a2	drm/amdgpu: partially revert "revert to old status lock handling v3" The CI systems are pointing out list corruptions, so we still need to fix something here. Keep the asserts, but revert the lock changes for now. Fixes: `59e4405e9e` ("drm/amdgpu: revert to old status lock handling v3") Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Ard Biesheuvel	ddbfac1528	drm/amd/display: Fix unsafe uses of kernel mode FPU The point of isolating code that uses kernel mode FPU in separate compilation units is to ensure that even implicit uses of, e.g., SIMD registers for spilling occur only in a context where this is permitted, i.e., from inside a kernel_fpu_begin/end block. This is important on arm64, which uses -mgeneral-regs-only to build all kernel code, with the exception of such compilation units where FP or SIMD registers are expected to be used. Given that the compiler may invent uses of FP/SIMD anywhere in such a unit, none of its code may be accessible from outside a kernel_fpu_begin/end block. This means that all callers into such compilation units must use the DC_FP start/end macros, which must not occur there themselves. For robustness, all functions with external linkage that reside there should call dc_assert_fp_enabled() to assert that the FPU context was set up correctly. Fix this for the DCN35, DCN351 and DCN36 implementations. Cc: Austin Zheng <austin.zheng@amd.com> Cc: Jun Lei <jun.lei@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <siqueira@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Jesse.Zhang	bd8acfcfce	drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression Disable VCN reset capability for the program 4 as it's causing regressions. Fixes: `9d20f37a10` ("drm/amd/pm: Add VCN reset support for SMU v13.0.6") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Jesse.Zhang	8d557eab3a	drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine After GPU reset with VRAM loss, a general protection fault occurs during user queue restoration when accessing vm_bo->vm after spinlock release in amdgpu_vm_bo_reset_state_machine. The root cause is that vm_bo points to the last entry from the list_for_each_entry loop, but this becomes invalid after the spinlock is released. Accessing vm_bo->vm at this point leads to memory corruption. Crash log shows: [ 326.981811] Oops: general protection fault, probably for non-canonical address 0x4156415741e58ac8: 0000 [#1] SMP NOPTI [ 326.981820] CPU: 13 UID: 0 PID: 1035 Comm: kworker/13:3 Tainted: G E 6.16.0+ #25 PREEMPT(voluntary) [ 326.981826] Tainted: [E]=UNSIGNED_MODULE [ 326.981827] Hardware name: Gigabyte Technology Co., Ltd. X870E AORUS PRO ICE/X870E AORUS PRO ICE, BIOS F3i 12/19/2024 [ 326.981831] Workqueue: events amdgpu_userq_restore_worker [amdgpu] [ 326.981999] RIP: 0010:amdgpu_vm_assert_locked+0x16/0x70 [amdgpu] [ 326.982094] Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 ff 74 45 48 8b 87 80 03 00 00 48 85 c0 74 40 <48> 8b b8 80 01 00 00 48 85 ff 74 3b 8b 05 0c b7 0e f0 85 c0 75 05 [ 326.982098] RSP: 0018:ffffaa91c2a6bc20 EFLAGS: 00010206 [ 326.982100] RAX: 4156415741e58948 RBX: ffff9e8f013e8330 RCX: 0000000000000000 [ 326.982102] RDX: 0000000000000005 RSI: 000000001d254e88 RDI: ffffffffc144814a [ 326.982104] RBP: ffffaa91c2a6bc68 R08: 0000004c21a25674 R09: 0000000000000001 [ 326.982106] R10: 0000000000000001 R11: dccaf3f2f82863fc R12: ffff9e8f013e8000 [ 326.982108] R13: ffff9e8f013e8000 R14: 0000000000000000 R15: ffff9e8f09980000 [ 326.982110] FS: 0000000000000000(0000) GS:ffff9e9e79995000(0000) knlGS:0000000000000000 [ 326.982112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 326.982114] CR2: 000055ed6c9caa80 CR3: 0000000797060000 CR4: 0000000000750ef0 [ 326.982116] PKRU: 55555554 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Lijo Lazar	9b608fe948	drm/amdgpu: Check swus/ds for switch state save For saving switch state, check if the GPU is having SWUS/DS architecture. Otherwise, skip saving. Reported-by: Roman Elshin <roman.elshin@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4602 Fixes: `1dd2fa0e00` ("drm/amdgpu: Save and restore switch state") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Lijo Lazar	4538a93bbb	drm/amd/pm: Avoid interface mismatch messaging PMFW interface version is not used by some IP implementations like SMU v13.0.6/12, instead rely on PMFW version checks. Avoid the log if interface version is not used. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Jesse.Zhang	b809ca91a5	drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init As KFD no longer uses a separate PASID, the global amdgpu_vm_set_pasid()function is no longer necessary. Merge its functionality directly intoamdgpu_vm_init() to simplify code flow and eliminate redundant locking. v2: remove superflous check adjust amdgpu_vm_fin and remove amdgpu_vm_set_pasid (Chritian) v3: drop amdgpu_vm_assert_locked (Chritian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4614 Fixes: `59e4405e9e` ("drm/amdgpu: revert to old status lock handling v3") Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:07 -04:00
Shaoyun Liu	8dbac5cf8b	drm/amd/amdgpu: Fix the mes version that support inv_tlbs MES version 0x83 is not stable to use the inv_tlbs API. Defer it to 0x84 vertsion. Fixes: `85442bac84` ("drm/amd/amdgpu: Fix the mes version that support inv_tlbs") Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Mario Limonciello	c760bcda83	drm/amd: Check whether secure display TA loaded successfully [Why] Not all renoir hardware supports secure display. If the TA is present but the feature isn't supported it will fail to load or send commands. This shows ERR messages to the user that make it seems like there is a problem. [How] Check the resp_status of the context to see if there was an error before trying to send any secure display commands. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1415 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Philip Yang	7574f30337	drm/amdkfd: Fix mmap write lock not release If mmap write lock is taken while draining retry fault, mmap write lock is not released because svm_range_restore_pages calls mmap_read_unlock then returns. This causes deadlock and system hangs later because mmap read or write lock cannot be taken. Downgrade mmap write lock to read lock if draining retry fault fix this bug. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Philip Yang	58e6fc2fb9	drm/amdkfd: Fix kfd process ref leaking when userptr unmapping kfd_lookup_process_by_pid hold the kfd process reference to ensure it doesn't get destroyed while sending the segfault event to user space. Calling kfd_lookup_process_by_pid as function parameter leaks the kfd process refcount and miss the NULL pointer check if app process is already destroyed. Fixes: `2d274bf709` ("amd/amdkfd: Trigger segfault for early userptr unmmapping") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Heng Zhou	0c67342885	drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. There is some probability that reset workqueue is blocked by KIQ I/O for 10+ seconds after gpu hangs. So we need to add a in_reset check during each KIQ register poll. Signed-off-by: Heng Zhou <Heng.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	0e190a0446	drm/amd/display: Disable scaling on DCE6 for now Scaling doesn't work on DCE6 at the moment, the current register programming produces incorrect output when using fractional scaling (between 100-200%) on resolutions higher than 1080p. Disable it until we figure out how to program it properly. Fixes: `7c15fd86aa` ("drm/amd/display: dc/dce: add initial DCE6 support (v10)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	a7dc87f344	drm/amd/display: Properly disable scaling on DCE6 SCL_SCALER_ENABLE can be used to enable/disable the scaler on DCE6. Program it to 0 when scaling isn't used, 1 when used. Additionally, clear some other registers when scaling is disabled and program the SCL_UPDATE register as recommended. This fixes visible glitches for users whose BIOS sets up a mode with scaling at boot, which DC was unable to clean up. Fixes: `b70aaf5586` ("drm/amd/display: dce_transform: add DCE6 specific macros,functions") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	c0aa7cf49d	drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6 Previously, the code would set a bit field which didn't exist on DCE6 so it would be effectively a no-op. Fixes: `b70aaf5586` ("drm/amd/display: dce_transform: add DCE6 specific macros,functions") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	d60f9c45d1	drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs Without these, it's impossible to program these registers. Fixes: `102b2f587a` ("drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init (v2)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Alex Deucher	507296328b	drm/amdgpu: Add additional DCE6 SCL registers Fixes: `102b2f587a` ("drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init (v2)") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Linus Torvalds	58809f614e	Merge tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "cross-subsystem: - i2c-hid: Make elan touch controllers power on after panel is enabled - dt bindings for STM32MP25 SoC - pci vgaarb: use screen_info helpers - rust pin-init updates - add MEI driver for late binding firmware update/load uapi: - add ioctl for reassigning GEM handles - provide boot_display attribute on boot-up devices core: - document DRM_MODE_PAGE_FLIP_EVENT - add vendor specific recovery method to drm device wedged uevent gem: - Simplify gpuvm locking ttm: - add interface to populate buffers sched: - Fix race condition in trace code atomic: - Reallow no-op async page flips display: - dp: Fix command length video: - Improve pixel-format handling for struct screen_info rust: - drop Opaque<> from ioctl args - Alloc: - BorrowedPage type and AsPageIter traits - Implement Vmalloc::to_page() and VmallocPageIter - DMA/Scatterlist: - Add dma::DataDirection and type alias for dma_addr_t - Abstraction for struct scatterlist and sg_table - DRM: - simplify use of generics - add DriverFile type alias - drop Object::SIZE - Rust: - pin-init tree merge - Various methods for AsBytes and FromBytes traits gpuvm: - Support madvice in Xe driver gpusvm: - fix hmm_pfn_to_map_order usage in gpusvm bridge: - Improve and fix ref counting on bridge management - cdns-dsi: Various improvements to mode setting - Support Solomon SSD2825 plus DT bindings - Support Waveshare DSI2DPI plus DT bindings - Support Content Protection property - display-connector: Improve DP display detection - Add support for Radxa Ra620 plus DT bindings - adv7511: Provide SPD and HDMI infoframes - it6505: Replace crypto_shash with sha() - synopsys: Add support for DW DPTX Controller plus DT bindings - adv7511: Write full Audio infoframe - ite6263: Support vendor-specific infoframes - simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings panel: - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64; Support SHP LQ134Z1; Fixes - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings - Support Samsung AMS561RA01 - Support Hydis HV101HD1 plus DT bindings - ilitek-ili9881c: Refactor mode setting; Add support for Bestar BSD1218-A101KL68 LCD plus DT bindings - lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings - edp: Add support for additonal mt8189 Chromebook panels - lvds: Add DT bindings for EDT ETML0700Z8DHA amdgpu: - add CRIU support for gem objects - RAS updates - VCN SRAM load fixes - EDID read fixes - eDP ALPM support - Documentation updates - Rework PTE flag generation - DCE6 fixes - VCN devcoredump cleanup - MMHUB client id fixes - VCN 5.0.1 RAS support - SMU 13.0.x updates - Expanded PCIe DPC support - Expanded VCN reset support - VPE per queue reset support - give kernel jobs unique id for tracing - pre-populate exported buffers - cyan skillfish updates - make vbios build number available in sysfs - userq updates - HDCP updates - support MMIO remap page as ttm pool - JPEG parser updates - DCE6 DC updates - use devm for i2c buses - GPUVM locking updates - Drop non-DC DCE11 code - improve fallback handling for pixel encoding amdkfd: - SVM/page migration fixes - debugfs fixes - add CRIO support for gem objects - SVM updates radeon: - use dev_warn_once in CS parsers xe: - add madvise interface - add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count and memory attributes - drop L# bank mask reporting from media GT3 on Xe3+. - add SLPC power_profile sysfs interface - add configs attribs to add post/mid context-switch commands - handle firmware reported hardware errors notifying userspace with device wedged uevent - use same dir structure across sysfs/debugfs - cleanup and future proof vram region init - add G-states and PCI link states to debugfs - Add SRIOV support for CCS surfaces on Xe2+ - Enable SRIOV PF mode by default on supported platforms - move flush to common code - extended core workarounds for Xe2/3 - use DRM scheduler for delayed GT TLB invalidations - configs improvements and allow VF device enablement - prep work to expose mmio regions to userspace - VF migration support added - prepare GPU SVM for THP migration - start fixing XE_PAGE_SIZE vs PAGE_SIZE - add PSMI support for hw validation - resize VF bars to max possible size according to number of VFs - Ensure GT is in C0 during resume - pre-populate exported buffers - replace xe_hmm with gpusvm - add more SVM GT stats to debugfs - improve fake pci and WA kunnit handle for new platform testing - Test GuC to GuC comms to add debugging - use attribute groups to simplify sysfs registration - add Late Binding firmware code to interact with MEI i915: - apply multiple JSL/EHL/Gen7/Gen6 workarounds properly - protect against overflow in active_engine() - Use try_cmpxchg64() in __active_lookup() - include GuC registers in error state - get rid of dev->struct_mutex - iopoll: generalize read_poll_timout - lots more display refactoring - Reject HBR3 in any eDP Panel - Prune modes for YUV420 - Display Wa fix, additions, and updates - DP: Fix 2.7 Gbps link training on g4x - DP: Adjust the idle pattern handling - DP: Shuffle the link training code a bit - Don't set/read the DSI C clock divider on GLK - Enable_psr kernel parameter changes - Type-C enabled/disconnected dp-alt sink - Wildcat Lake enabling - DP HDR updates - DRAM detection - wait PSR idle on dsb commit - Remove FBC modulo 4 restriction for ADL-P+ - panic: refactor framebuffer allocation habanalabs: - debug/visibility improvements - vmalloc-backed coherent mmap support - HLDIO infrastructure nova-core: - various register!() macro improvements - minor vbios/firmware fixes/refactoring - advance firmware boot stages; process Booter and patch signatures - process GSP and GSP bootloader - Add r570.144 firmware bindings and update to it - Move GSP boot code to own module - Use new pin-init features to store driver's private data in a single allocation - Update ARef import from sync::aref nova-drm: - Update ARef import from sync::aref tyr: - initial driver skeleton for a rust driver for ARM Mali GPUs - capable of powering up, query metadata and provide it to userspace. msm: - GPU and Core: - in DT bindings describe clocks per GPU type - GMU bandwidth voting for x1-85 - a623/a663 speedbins - cleanup some remaining no-iommu leftovers after VM_BIND conversion - fix GEM obj 32b size truncation - add missing VM_BIND param validation - IFPC for x1-85 and a750 - register xml and gen_header.py sync from mesa - Display: - add missing bindings for display on SC8180X - added DisplayPort MST bindings - conversion from round_rate() to determine_rate() amdxdna: - add IOCTL_AMDXDNA_GET_ARRAY - support user space allocated buffers - streamline PM interfaces - Refactoring wrt. hardware contexts - improve error reporting nouveau: - use GSP firmware by default - improve error reporting - Pre-populate exported buffers ast: - Clean up detection of DRAM config exynos: - add DSIM bridge driver support for Exynos7870 - Document Exynos7870 DSIM compatible in dt-binding panthor: - Print task/pid on errors - Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25 - Improve cache flushing - Fail VM bind if BO has offset renesas: - convert to RUNTIME_PM_OPS rcar-du: - Make number of lanes configurable - Use RUNTIME_PM_OPS - Add support for DSI commands rocket: - Add driver for Rockchip NPU plus DT bindings - Use kfree() and sizeof() correctly - Test DMA status rockchip: - dsi2: Add support for RK3576 plus DT bindings - Add support for RK3588 DPTX output tidss: - Use crtc_ fields for programming display mode - Remove other drivers from aperture pixpaper: - Add support for Mayqueen Pixpaper plus DT bindings v3d: - Support querying nubmer of GPU resets for KHR_robustness stm: - Clean up logging - ltdc: Add support support for STM32MP257F-EV1 plus DT bindings sitronix: - st7571-i2c: Add support for inverted displays and 2-bit grayscale tidss: - Convert to kernel's FIELD_ macros vesadrm: - Support 8-bit palette mode imagination: - Improve power management - Add support for TH1520 GPU - Support Risc-V architectures v3d: - Improve job management and locking vkms: - Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x - Spport YUV with 16-bit components" * tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits) drm/amd: Add name to modes from amdgpu_connector_add_common_modes() drm/amd: Drop some common modes from amdgpu_connector_add_common_modes() drm/amdgpu: update MODULE_PARM_DESC for freesync_video drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes() drm/amd/display: Share dce100_validate_global with DCE6-8 drm/amd/display: Share dce100_validate_bandwidth with DCE6-8 drm/amdgpu: Fix fence signaling race condition in userqueue amd/amdkfd: enhance kfd process check in switch partition amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw drm/amd/display: Reject modes with too high pixel clock on DCE6-10 drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes() drm/amd/display: Only enable common modes for eDP and LVDS drm/amdgpu: remove the redeclaration of variable i drm/amdgpu/userq: assign an error code for invalid userq va drm/amdgpu: revert "rework reserved VMID handling" v2 drm/amdgpu: remove leftover from enforcing isolation by VMID drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails accel/habanalabs: add Infineon version check accel/habanalabs/gaudi2: read preboot status after recovering from dirty state accel/habanalabs: add HL_GET_P_STATE passthrough type ...	2025-10-02 12:47:25 -07:00
Linus Torvalds	991053178e	Merge tag 'pm-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of these are cpufreq changes, which has been a recurring pattern for a few recent cycles. Those changes include new hardware support (AN7583 SoC support in the airoha cpufreq driver, ipq5424 support in the qcom-nvmem cpufreq driver, MT8196 support in the mediatek cpufreq driver, AM62D2 support in the ti cpufreq driver), DT bindings and Rust code updates, cleanups of the core and governors, and multiple driver fixes and cleanups. Beyond that, there are hibernation fixes (some remaining 6.16 cycle fallout and an issue related to hybrid suspend in the amdgpu driver), cleanups of the PM core code, runtime PM documentation update, cpuidle and power capping cleanups, and tooling updates. Specifics: - Rearrange variable declarations involving __free() in the cpufreq core and intel_pstate driver to follow common coding style (Rafael Wysocki) - Fix object lifecycle issue in update_qos_request(), rearrange freq QoS updates using __free(), and adjust frequency percentage computations in the intel_pstate driver (Rafael Wysocki) - Update intel_pstate to allow it to enable HWP without EPP if the new DEC (Dynamic Efficiency Control) HW feature is enabled (Rafael Wysocki) - Use on_each_cpu_mask() in drv_write() in the ACPI cpufreq driver to simplify the code (Rafael Wysocki) - Use likely() optimization in intel_pstate_sample() (Yaxiong Tian) - Remove dead EPB-related code from intel_pstate (Srinivas Pandruvada) - Use scope-based cleanup for cpufreq policy references in multiple cpufreq drivers (Zihuan Zhang) - Avoid calling get_governor() for the first policy in the cpufreq core to simplify the initial policy path (Zihuan Zhang) - Clean up the cpufreq core in multiple places (Zihuan Zhang) - Use int type to store negative error codes in the cpufreq core and update the speedstep-lib to use int for error codes (Qianfeng Rong) - Update the efficient idle check for Intel extended Families in the ondemand cpufreq governor (Sohil Mehta) - Replace sscanf() with kstrtouint() in the conservative cpufreq governor (Kaushlendra Kumar) - Rename CpumaskVar::as[_mut]_ref to from_raw[_mut] in the cpumask Rust code and mark CpumaskVar as transparent (Alice Ryhl, Baptiste Lepers) - Update ARef and AlwaysRefCounted imports from sync::aref in the OPP Rust code (Shankari Anand) - Add support for AN7583 SoC to the airoha cpufreq driver (Christian Marangi) - Enable cpufreq for ipq5424 in the qcom-nvmem cpufreq driver (Md Sadre Alam) - Add support for MT8196 to the mediatek-hw cpufreq driver, refactor that driver and add mediatek,mt8196-cpufreq-hw DT binding (Nicolas Frattaroli) - Avoid redundant conditions in the mediatek cpufreq driver (Liao Yuanhong) - Add support for AM62D2 to the ti cpufreq driver and blocklist ti,am62d2 SoC in dt-platdev (Paresh Bhagat) - Support more speed grades on AM62Px SoC in the ti cpufreq driver, allow all silicon revisions to support OPPs in it, and fix supported hardware for 1GHz OPP (Judith Mendez) - Add QCS615 compatible to DT bindings for cpufreq-qcom-hw (Taniya Das) - Minor assorted updates of the scmi, longhaul, CPPC, and armada-37xx cpufreq drivers (Akhilesh Patil, BowenYu, Dennis Beier, and Florian Fainelli) - Remove outdated cpufreq-dt.txt (Frank Li) - Fix python gnuplot package names in the amd_pstate_tracer utility (Kuan-Wei Chiu) - Saravana Kannan will maintain the virtual-cpufreq driver (Saravana Kannan) - Prevent CPU capacity updates after registering a perf domain from failing on a first CPU that is not present (Christian Loehle) - Add support for the cases in which frequency alone is not sufficient to uniquely identify an OPP (Krishna Chaitanya Chundru) - Use to_result() for OPP error handling in Rust (Onur Özkan) - Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi devfreq driver (Nicolas Frattaroli) - Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X) are reported as half by adding a cycle multiplier to the DFI driver in rockchip-dfi devfreq-event driver (Nicolas Frattaroli) - Fix missing error pointer dereference check of regulator instance in the mtk-cci devfreq driver probe and remove a redundant condition from an if () statement in that driver (Dan Carpenter, Liao Yuanhong) - Fail cpuidle device registration if there is one already to avoid sysfs-related issues (Rafael Wysocki) - Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf() in cpuidle (Vivek Yadav) - Fix device and OF node leaks at probe in the qcom-spm cpuidle driver and drop unnecessary initialisations from it (Johan Hovold) - Remove unnecessary address-of operators from the intel_idle cpuidle driver (Kaushlendra Kumar) - Rearrange main loop in menu_select() to make the code in that funtion easier to follow (Rafael Wysocki) - Convert values in microseconds to ktime using us_to_ktime() where applicable in the intel_idle power capping driver (Xichao Zhao) - Annotate loops walking device links in the power management core code as _srcu and add macros for walking device links to reduce the likelihood of coding mistakes related to them (Rafael Wysocki) - Document time units for _time functions in the runtime PM API (Brian Norris) - Clear power.must_resume in noirq suspend error path to avoid resuming a dependant device under a suspended parent or supplier (Rafael Wysocki) - Fix GFP mask handling during hybrid suspend and make the amdgpu driver handle hybrid suspend correctly (Mario Limonciello, Rafael Wysocki) - Fix GFP mask handling after aborted hibernation in platform mode and combine exit paths in power_down() to avoid code duplication (Rafael Wysocki) - Use vmalloc_array() and vcalloc() in the hibernation core to avoid open-coded size computations (Qianfeng Rong) - Fix typo in hibernation core code comment (Li Jun) - Call pm_wakeup_clear() in the same place where other functions that do bookkeeping prior to suspend_prepare() are called (Samuel Wu) - Fix and clean up the x86_energy_perf_policy utility and update its documentation (Len Brown, Kaushlendra Kumar) - Fix incorrect sorting of PMT telemetry in turbostat (Kaushlendra Kumar) - Fix incorrect size in cpuidle_state_disable() and the error return value of cpupower_write_sysfs() in cpupower (Kaushlendra Kumar)" tag 'pm-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits) PM: hibernate: Combine return paths in power_down() PM: hibernate: Restrict GFP mask in power_down() PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage PM: runtime: Documentation: ABI: Document time units for *_time tools/power x86_energy_perf_policy.8: Emphasize preference for SW interfaces tools/power x86_energy_perf_policy: Add make snapshot target tools/power x86_energy_perf_policy: Prefer driver HWP limits tools/power x86_energy_perf_policy: EPB access is only via sysfs tools/power x86_energy_perf_policy: Prepare for MSR/sysfs refactoring tools/power x86_energy_perf_policy: Enhance HWP enable tools/power x86_energy_perf_policy: Enhance HWP enabled check tools/power x86_energy_perf_policy: Fix incorrect fopen mode usage tools/power turbostat: Fix incorrect sorting of PMT telemetry drm/amd: Fix hybrid sleep PM: hibernate: Add pm_hibernation_mode_is_suspend() PM: hibernate: Fix hybrid-sleep tools/cpupower: Fix incorrect size in cpuidle_state_disable() tools/power/x86/amd_pstate_tracer: Fix python gnuplot package names cpufreq: Replace pointer subtraction with iteration macro cpuidle: Fail cpuidle device registration if there is one already ...	2025-10-01 16:08:37 -07:00
Rafael J. Wysocki	f58f86df6a	Merge branches 'pm-core', 'pm-runtime' and 'pm-sleep' Merge changes related to system sleep and runtime PM framework for 6.18-rc1: - Annotate loops walking device links in the power management core code as _srcu and add macros for walking device links to reduce the likelihood of coding mistakes related to them (Rafael Wysocki) - Document time units for _time functions in the runtime PM API (Brian Norris) - Clear power.must_resume in noirq suspend error path to avoid resuming a dependant device under a suspended parent or supplier (Rafael Wysocki) - Fix GFP mask handling during hybrid suspend and make the amdgpu driver handle hybrid suspend correctly (Mario Limonciello, Rafael Wysocki) - Fix GFP mask handling after aborted hibernation in platform mode and combine exit paths in power_down() to avoid code duplication (Rafael Wysocki) - Use vmalloc_array() and vcalloc() in the hibernation core to avoid open-coded size computations (Qianfeng Rong) - Fix typo in hibernation core code comment (Li Jun) - Call pm_wakeup_clear() in the same place where other functions that do bookkeeping prior to suspend_prepare() are called (Samuel Wu) pm-core: PM: core: Add two macros for walking device links PM: core: Annotate loops walking device links as _srcu * pm-runtime: PM: runtime: Documentation: ABI: Document time units for _time pm-sleep: PM: hibernate: Combine return paths in power_down() PM: hibernate: Restrict GFP mask in power_down() PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage drm/amd: Fix hybrid sleep PM: hibernate: Add pm_hibernation_mode_is_suspend() PM: hibernate: Fix hybrid-sleep PM: sleep: core: Clear power.must_resume in noirq suspend error path PM: sleep: Make pm_wakeup_clear() call more clear PM: hibernate: Fix typo in memory bitmaps description comment PM: hibernate: Use vmalloc_array() and vcalloc() to improve code	2025-09-29 12:54:01 +02:00
Mario Limonciello	df2ba57094	drm/amd: Add name to modes from amdgpu_connector_add_common_modes() [Why] When DC adds common modes it adds modes with a string to match what they are. Non-DC doesn't. This can be inconsistent when turning on/off DC support. [How] Add a name member to common_modes[] and copy it into the drm display mode. Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Link: https://lore.kernel.org/r/20250924161624.1975819-6-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:54:22 -04:00
Mario Limonciello	6d622755bc	drm/amd: Drop some common modes from amdgpu_connector_add_common_modes() [Why] DC and non-DC codepaths have different sets of common modes that are added for eDP and LVDS cases. This can cause different behaviors for turning on DC on hardware that can support both. [How] Drop extra modes from amdgpu_connector_add_common_modes() not present in amdgpu_dm_connector_add_common_modes(). Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-5-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:54:12 -04:00
Alex Deucher	dbf2341569	drm/amdgpu: update MODULE_PARM_DESC for freesync_video To better describe what it does. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3756 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:54:07 -04:00
Mario Limonciello	123a1750c5	drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes() [Why] Adding or removing a mode from common_modes[] can be fragile if a user forgot to update the for loop boundaries. [How] Use ARRAY_SIZE() to detect size of the array and use that instead. Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Link: https://lore.kernel.org/r/20250924161624.1975819-4-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:53:55 -04:00
Timur Kristóf	1f721ebcf3	drm/amd/display: Share dce100_validate_global with DCE6-8 The dce100_validate_global function was verbatim exactly the same as dce60_validate_global and dce80_validate_global. Share dce100_validate_global between DCE6-10 to save code size. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:53:46 -04:00
Timur Kristóf	ee352f6c56	drm/amd/display: Share dce100_validate_bandwidth with DCE6-8 DCE6-8 have very similar capabilities to DCE10, they support the same DP and HDMI versions and work similarly. Share dce100_validate_bandwidth between DCE6-10 to reduce code duplication in the DC driver. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:53:33 -04:00
Jesse.Zhang	b8ae2640f9	drm/amdgpu: Fix fence signaling race condition in userqueue This commit fixes a potential race condition in the userqueue fence signaling mechanism by replacing dma_fence_is_signaled_locked() with dma_fence_is_signaled(). The issue occurred because: 1. dma_fence_is_signaled_locked() should only be used when holding the fence's individual lock, not just the fence list lock 2. Using the locked variant without the proper fence lock could lead to double-signaling scenarios: - Hardware completion signals the fence - Software path also tries to signal the same fence By using dma_fence_is_signaled() instead, we properly handle the locking hierarchy and avoid the race condition while still maintaining the necessary synchronization through the fence_list_lock. v2: drop the comment (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:53:23 -04:00
Yifan Zhang	45da20e00d	amd/amdkfd: enhance kfd process check in switch partition current switch partition only check if kfd_processes_table is empty. kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but kfd_process tear down is in kfd_process_wq_release. consider two processes: Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node member Process B switch partition -> amdgpu_xcp_pre_partition_switch -> amdgpu_amdkfd_device_fini_sw -> kfd_node tear down. Process A and B may trigger a race as shown in dmesg log. This patch is to resolve the race by adding an atomic kfd_process counter kfd_processes_count, it increment as create kfd process, decrement as finish kfd_process_wq_release. v2: Put kfd_processes_count per kfd_dev, move decrement to kfd_process_destroy_pdds and bug fix. (Philip Yang) [3966658.307702] divide error: 0000 [#1] SMP NOPTI [3966658.350818] i10nm_edac [3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: loaded Tainted [3966658.356890] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu] [3966658.362839] nfit [3966658.366457] RIP: 0010:kfd_get_num_sdma_engines+0x17/0x40 [amdgpu] [3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 ba 02 00 00 [3966658.380967] x86_pkg_temp_thermal [3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246 [3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: ffff888645900000 [3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: ffff888129151c00 [3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: ffff8890d2750af4 [3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000 [3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: ffff8974e593b800 [3966658.391533] FS: 0000000000000000(0000) GS:ffff88fe7f600000(0000) knlGS:0000000000000000 [3966658.391534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 0000000002770ee0 [3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [3966658.391536] PKRU: 55555554 [3966658.391536] Call Trace: [3966658.391674] deallocate_sdma_queue+0x38/0xa0 [amdgpu] [3966658.391762] process_termination_cpsch+0x1ed/0x480 [amdgpu] [3966658.399754] intel_powerclamp [3966658.402831] kfd_process_dequeue_from_all_devices+0x5b/0xc0 [amdgpu] [3966658.402908] kfd_process_wq_release+0x1a/0x1a0 [amdgpu] [3966658.410516] coretemp [3966658.434016] process_one_work+0x1ad/0x380 [3966658.434021] worker_thread+0x49/0x310 [3966658.438963] kvm_intel [3966658.446041] ? process_one_work+0x380/0x380 [3966658.446045] kthread+0x118/0x140 [3966658.446047] ? __kthread_bind_mask+0x60/0x60 [3966658.446050] ret_from_fork+0x1f/0x30 [3966658.446053] Modules linked in: kpatch_20765354(OEK) [3966658.455310] kvm [3966658.464534] mptcp_diag xsk_diag raw_diag unix_diag af_packet_diag netlink_diag udp_diag act_pedit act_mirred act_vlan cls_flower kpatch_21951273(OEK) kpatch_18424469(OEK) kpatch_19749756(OEK) [3966658.473462] idxd_mdev [3966658.482306] kpatch_17971294(OEK) sch_ingress xt_conntrack amdgpu(OE) amdxcp(OE) amddrm_buddy(OE) amd_sched(OE) amdttm(OE) amdkcl(OE) intel_ifs iptable_mangle tcm_loop target_core_pscsi tcp_diag target_core_file inet_diag target_core_iblock target_core_user target_core_mod coldpgs kpatch_18383292(OEK) ip6table_nat ip6table_filter ip6_tables ip_set_hash_ipportip ip_set_hash_ipportnet ip_set_hash_ipport ip_set_bitmap_port xt_comment iptable_nat nf_nat iptable_filter ip_tables ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sn_core_odd(OE) i40e overlay binfmt_misc tun bonding(OE) aisqos(OE) aisqos_hotfixes(OE) rfkill uio_pci_generic uio cuse fuse nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm idxd_mdev [3966658.491237] vfio_pci [3966658.501196] vfio_pci vfio_virqfd mdev vfio_iommu_type1 vfio iax_crypto intel_pmt_telemetry iTCO_wdt intel_pmt_class iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq [3966658.508537] vfio_virqfd [3966658.517569] snd_seq_device ipmi_ssif isst_if_mbox_pci isst_if_mmio pcspkr snd_pcm idxd intel_uncore ses isst_if_common intel_vsec idxd_bus enclosure snd_timer mei_me snd i2c_i801 i2c_smbus mei i2c_ismt soundcore joydev acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad vfat fat [3966658.526851] mdev [3966658.536096] nfsd auth_rpcgss nfs_acl lockd grace slb_vtoa(OE) sunrpc dm_mod hookers mlx5_ib(OE) ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm mlx5_core(OE) mlxfw(OE) [3966658.540381] vfio_iommu_type1 [3966658.544341] nvme mpt3sas tls drm nvme_core pci_hyperv_intf raid_class psample libcrc32c crc32c_intel mlxdevm(OE) i2c_core [3966658.551254] vfio [3966658.558742] scsi_transport_sas wmi pinctrl_emmitsburg sd_mod t10_pi sg ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE) [3966658.563004] iax_crypto [3966658.570988] [last unloaded: diagnose] [3966658.571027] ---[ end trace cc9dbb180f9ae537 ]--- Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Philip.Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:53:08 -04:00

1 2 3 4 5 ...

34584 Commits