linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-10 16:21:45 -04:00

Author	SHA1	Message	Date
Lijo Lazar	8a358aaa5d	drm/amd/pm: Free SMUv13.0.6 resources on failure Free the resources allocated if smu_v13_0_12_tables_init fails. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Fixes: `5bf93e1d6e` ("drm/amd/pm: Add caching to SMUv13.0.12 temp metric") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:05:14 -04:00
Jesse.Zhang	655d6403ad	drm/amd/vcn: Add late_init callback for VCN v4.0.3 reset handling This change reorganizes VCN reset capability detection by: 1. Moving reset mask configuration from sw_init to new late_init phase 2. Adding vcn_v4_0_3_late_init() to properly check for per-queue reset support 3. Only setting soft full reset mask as fallback when per-queue reset isn't supported 4. Removing TODO comment now that queue reset support is implemented V2: Removed unrelated changes. Keep amdgpu_get_soft_full_reset_mask in place and remove TODO comment. (Alex) v3: set the flags at one place (all in late_init) (Lijo) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:05:06 -04:00
Kent Russell	0ed704d058	drm/amdkfd: Handle lack of READ permissions in SVM mapping HMM assumes that pages have READ permissions by default. Inside svm_range_validate_and_map, we add READ permissions then add WRITE permissions if the VMA isn't read-only. This will conflict with regions that only have PROT_WRITE or have PROT_NONE. When that happens, svm_range_restore_work will continue to retry, silently, giving the impression of a hang if pr_debug isn't enabled to show the retries.. If pages don't have READ permissions, simply unmap them and continue. If they weren't mapped in the first place, this would be a no-op. Since x86 doesn't support write-only, and PROT_NONE doesn't allow reads or writes anyways, this will allow the svm range validation to continue without getting stuck in a loop forever on mappings we can't use with HMM. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:04:41 -04:00
Jesse.Zhang	9d20f37a10	drm/amd/pm: Add VCN reset support for SMU v13.0.6 This commit implements VCN reset capability for SMU v13.0.6 with the following changes: 1. Added new PPSMC message ID (0x5B) for VCN reset in SMU firmware interface 2. Extended SMU capabilities to include VCN_RESET support 3. Implemented VCN reset support check: - Added smu_v13_0_6_reset_vcn_is_supported() function 4. Updated SMU v13.0.6 PPT functions to include VCN reset operations v2: clean up debug info (Alex) v3: remove unsupported message and split smu v13.0.6 changes to a separate patch (Lijo) v4: simply the function (smu_v13_0_6_reset_vcn_is_supported) (Lijo) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:04:33 -04:00
Jesse.Zhang	37b9257be7	drm/amd/pm: Add VCN reset support check capability This change introduces infrastructure to check whether VCN reset is supported by the SMU firmware. Key changes include: 1. Added new functions to query VCN reset support: - amdgpu_dpm_reset_vcn_is_supported() - smu_reset_vcn_is_supported() - pptable_funcs.reset_vcn_is_supported callback 2. Implemented proper locking in the DPM layer with mutex protection 3. Maintained consistency with existing SDMA reset support checks The new capability allows callers to check for VCN reset support before attempting the operation, preventing unnecessary attempts on unsupported platforms. v2: clean up debug info(Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:04:25 -04:00
Heng Zhou	859958a7fa	drm/amdgpu: fix nullptr err of vm_handle_moved If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL. So, such kind of amdgpu_bo_va should be updated separately before amdgpu_vm_handle_moved. Signed-off-by: Heng Zhou <Heng.Zhou@amd.com> Reviewed-by: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:04:07 -04:00
Eric Huang	3a75edf93a	drm/amdkfd: set uuid for each partition in topology Currently each kfd compute partition/node is sharing the same uuid of AID, which doen't meet the CUDA spec for visible device, so corresponding XCD id for each partition in smu has been assigned to xcp, and exposed to kfd topology. v2: add NULL check (Lijo) Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:04:02 -04:00
Qianfeng Rong	5c8d5e2619	drm/amd/display: Use boolean context for pointer null checks Replace "out == 0" with "!out" for pointer comparison to improve code readability and conform to coding style. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:03:57 -04:00
Liao Yuanhong	90b810dd85	drm/amd/display: Remove redundant semicolons Remove unnecessary semicolons. Fixes: `dda4fb85e4` ("drm/amd/display: DML changes for DCN32/321") Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:03:26 -04:00
Xichao Zhao	3e03525ce1	drm/radeon: replace min/max nesting with clamp() The clamp() macro explicitly expresses the intent of constraining a value within bounds.Therefore, replacing min(max(a, b), c) and max(min(a,b),c) with clamp(val, lo, hi) can improve code readability. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:03:21 -04:00
Liu01 Tong	f101c13a87	drm/amdgpu: fix task hang from failed job submission during process kill During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:22:54 -04:00
Jack Xiao	b08425fa77	drm/amdgpu: fix incorrect vm flags to map bo It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: `7946340fa3` ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:19:57 -04:00
YiPeng Chai	d38eaf27de	drm/amdgpu: fix vram reservation issue The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: `c9cad937c0` ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:15:03 -04:00
Geoffrey McRae	57af162bfc	drm/amdkfd: return -ENOTTY for unsupported IOCTLs Some kfd ioctls may not be available depending on the kernel version the user is running, as such we need to report -ENOTTY so userland can determine the cause of the ioctl failure. Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:14:54 -04:00
Frank Min	065e23170a	drm/amdgpu: Add PSP fw version check for fw reserve GFX command The fw reserved GFX command is only supported starting from PSP fw version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command. Add a version guard to ensure the command is only used when the running PSP fw meets the minimum version requirement. This ensures backward compatibility and safe operation across fw revisions. Fixes: `a3b7f9c306` ("drm/amdgpu: reclaim psp fw reservation memory region") Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:13:03 -04:00
Lijo Lazar	d543489aa1	drm/amdgpu: Add description for partition commands Add string description for partition commands. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:12:57 -04:00
Brahmajit Das	260dcf5b06	drm/radeon/r600_cs: clean up of dead code in r600_cs GCC 16 enables -Werror=unused-but-set-variable= which results in build error with the following message. drivers/gpu/drm/radeon/r600_cs.c: In function ‘r600_texture_size’: drivers/gpu/drm/radeon/r600_cs.c:1411:29: error: variable ‘level’ set but not used [-Werror=unused-but-set-variable=] 1411 \| unsigned offset, i, level; \| ^~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:287: drivers/gpu/drm/radeon/r600_cs.o] Error 1 level although is set, but in never used in the function r600_texture_size. Thus resulting in dead code and this error getting triggered. Fixes: `60b212f8dd` ("drm/radeon: overhaul texture checking. (v3)") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Brahmajit Das <listout@listout.xyz> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:44 -04:00
Cryolitia PukNgae	388b68aef7	drm/amdgpu: fix incorrect comment format Comments should not have a leading plus sign. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:41 -04:00
Taimur Hassan	8d6593c192	drm/amd/display: Promote DC to 3.2.345 This version brings along following update: -Fix close and open lid may cause eDP remaining blank -Fix frequently disabling/enabling OTG may cause incorrect configuration of OTG Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:38 -04:00
Taimur Hassan	7552bee9dd	drm/amd/display: [FW Promotion] Release 0.1.22.0 Add a new command for Panel Replay. Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:35 -04:00
Danny Wang	ad335b5fc9	drm/amd/display: Reset apply_eamless_boot_optimization when dpms_off [WHY&HOW] The user closed the lid while the system was powering on and opened it again before the “apply_seamless_boot_optimization” was set to false, resulting in the eDP remaining blank. Reset the “apply_seamless_boot_optimization” to false when dpms off. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Danny Wang <Danny.Wang@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:26 -04:00
TungYu Lu	e7496c15d8	drm/amd/display: Wait until OTG enable state is cleared [Why] Customer reported an issue that OS starts and stops device multiple times during driver installation. Frequently disabling and enabling OTG may prevent OTG from being safely disabled and cause incorrect configuration upon the next enablement. [How] Add a wait until OTG_CURRENT_MASTER_EN_STATE is cleared as a short term solution. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: TungYu Lu <tungyu.lu@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:21 -04:00
Vitaly Prosyak	c31f486bc8	drm/amdgpu: add to custom amdgpu_drm_release drm_dev_enter/exit User queues are disabled before GEM objects are released (protecting against user app crashes). No races with PCI hot-unplug (because drm_dev_enter prevents cleanup if iewdevice is being removed). Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:03 -04:00
Lijo Lazar	1dd2fa0e00	drm/amdgpu: Save and restore switch state During a DPC error kernel waits for the link to be active before notifying downstream devices. On certain platforms with Broadcom switch in synthetiic mode, switch responds with values even though the link is not fully ready. The config space restoration done by pcie port driver for SWUS/DS of dGPU is thus not effective as the switch is still doing internal enumeration. As a workaround, save state of SWUS/DS device in driver. Add additional check to see if link is active and restore the values during DPC error callbacks. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:12:53 -04:00
Sathishkumar S	111821e4b5	drm/amdgpu/vcn: Hold pg_lock before vcn power off Acquire vcn_pg_lock before changes to vcn power state and release it after power off in idle work handler. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:54 -04:00
Sathishkumar S	0e7581eda8	drm/amdgpu/jpeg: Hold pg_lock before jpeg poweroff Acquire jpeg_pg_lock before changes to jpeg power state and release it after power off from idle work handler. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:50 -04:00
Lijo Lazar	0b4d79dafa	drm/amdgpu: Assign unique id to compute partition Assign unique id to compute partition. This is the unique id of the first XCD instance belonging to the partition. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:47 -04:00
Lijo Lazar	6fa8216854	drm/amd/pm: Add unique ids for SMUv13.0.12 SOCs Fetch and store the unique ids for AIDs/XCDs in SMUv13.0.12 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:42 -04:00
Alex Deucher	aae94897b6	drm/amdgpu: add missing vram lost check for LEGACY RESET Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:14 -04:00
Alex Deucher	62eedd150f	drm/amdgpu/discovery: fix fw based ip discovery We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: `80a0e82829` ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:32:38 -04:00
Srinivasan Shanmugam	9d69391794	drm/amd/display: Add NULL check for stream before dereference in 'dm_vupdate_high_irq' Add a NULL check for acrtc->dm_irq_params.stream before accessing its members. Fixes below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:623 dm_vupdate_high_irq() warn: variable dereferenced before check 'acrtc->dm_irq_params.stream' (see line 615) 614 if (vrr_active) { 615 bool replay_en = acrtc->dm_irq_params.stream->link->replay_settings.replay_feature_enabled; ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 616 bool psr_en = acrtc->dm_irq_params.stream->link->psr_settings.psr_feature_enabled; ^^^^^^^^^^^^^^^^^^^^^^^^^^^ New dereferences 617 bool fs_active_var_en = acrtc->dm_irq_params.freesync_config.state 618 == VRR_STATE_ACTIVE_VARIABLE; 619 620 amdgpu_dm_crtc_handle_vblank(acrtc); 621 622 /* BTR processing for pre-DCE12 ASICs */ 623 if (acrtc->dm_irq_params.stream && ^^^^^^^^^^^^^^^^^^^^^^^^^^^ But the existing code assumed it could be NULL. Someone is wrong. 624 adev->family < AMDGPU_FAMILY_AI) { 625 spin_lock_irqsave(&adev_to_drm(adev)->event_lock, flags); Fixes: `6d31602a9f` ("drm/amd/display: more liberal vmin/vmax update for freesync") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Ray Wu <ray.wu@amd.com> Cc: Daniel Wheeler <daniel.wheeler@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:30:54 -04:00
Lijo Lazar	5bf93e1d6e	drm/amd/pm: Add caching to SMUv13.0.12 temp metric Add table caching logic to temperature metrics tables in SMUv13.0.12 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:30:51 -04:00
Lijo Lazar	476060020f	drm/amd/pm: Add cache logic for temperature metric Add caching logic for baseboard and gpuboard temperature metrics tables. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:30:47 -04:00
Lijo Lazar	615471b860	drm/amd/pm: Remove cache logic from SMUv13.0.12 Remove caching logic of temperature metrics from SMUv13.0.12. The caching logic needs to be moved to a higher level. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:30:43 -04:00
Lijo Lazar	10c12aae4e	drm/amd/pm: Add unique ids for SMUv13.0.6 SOCs Fetch and store the unique ids for AIDs/XCDs in SMUv13.0.6 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:22:11 -04:00
Lijo Lazar	589ea8a1fd	drm/amdgpu: Add helpers to set/get unique ids Add a struct to store unique id information for each type. Add helper to fetch the unique id. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:22:08 -04:00
Lijo Lazar	892bac995b	drm/amdgpu: Prevent hardware access in dpc state Don't allow hardware access while in dpc state. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:22:03 -04:00
Lijo Lazar	1a0e57eb96	drm/amdgpu/vcn: Fix double-free of vcn dump buffer The buffer is already freed as part of amdgpu_vcn_reg_dump_fini(). The issue is introduced by below patch series. Fixes: `de55cbff5c` ("drm/amdgpu/vcn: Add regdump helper functions") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:21:16 -04:00
Lijo Lazar	c5c62160a5	drm/amdgpu: Log reset source during recovery To get more context, add reset source to identify the source of gpu recovery - job timeout, RAS, HWS hang etc. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:21:05 -04:00
Xiang Liu	b3505c2c48	drm/amdgpu: Generate BP threshold exceed CPER once threshold exceeded The bad pages threshold exceed CPER should be generated once threshold exceeded, no matter the bad_page_threshold setted or not. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:21:01 -04:00
Asad Kamal	d9f6a07043	drm/amd/pm: Enable temperature metrics caps Enable temperature metrics caps for smu_v13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:20:57 -04:00
Asad Kamal	25e82f2e2c	drm/amd/pm: Add temperature metrics sysfs entry Add temperature metrics sysfs entry to expose gpuboard/baseboard temperature metrics v2: Removed unused function, rename functions(Lijo) v3: Remove unnecessary initialization Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:20:55 -04:00
Asad Kamal	33074558ec	drm/amd/pm: Fetch and fill temperature metrics Fetch system metrics table to fill gpuboard/baseboard temperature metrics data for smu_v13_0_12 v2: Remove unnecessary checks, used separate metrics time for temperature metrics table(Lijo) v3: Use cached values for back to back system metrics query(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:20:51 -04:00
Asad Kamal	793ff2bafe	drm/amd/pm: Update pmfw header for smu_v13_0_12 Update pmfw header for smu_v13_0_12 with system temperature metrics table Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:20:47 -04:00
Asad Kamal	775c7e8e4d	drm/amd/pm: Add smu interface for temp metrics Add smu interface to get baseboard/gpuboard temperature metrics v2: Rename is_support to is_supported(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:20:44 -04:00
Asad Kamal	83953ec1fe	drm/amd/pm: Add dpm interface for temp metrics Add dpm interface to get gpuboard/baseboard temperature metrics v2: Add temperature metrics support check(Lijo) v3: Return error code in case of operation not supported(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:20:34 -04:00
Aurabindo Pillai	e9c840d450	drm/amd/display: Fix vupdate_offload_work doc Fix the following warning in struct documentation: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:168: warning: expecting prototype for struct dm_vupdate_work. Prototype was for struct vupdate_offload_work instead Fixes: `c210b757b4` ("drm/amd/display: fix dmub access race condition") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:19:44 -04:00
James Zhu	bd6093e2f1	drm/amdkfd: return migration pages from copy function dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit should always be set when migration success. cpage includes src MIGRATE_PFN_MIGRATE bit set and MIGRATE_PFN_VALID bit unset pages for both ram and vram when memory is only allocated without being populated before migration, those ram pages should be counted as migrate pages and those vram pages should not be counted as migrate pages. Here migration pages refer to how many vram pages involved. -v2 use dst to check MIGRATE_PFN_VALID bit (suggested-by Philip) -v3 add warning when vram pages is less than migration pages return migration pages directly from copy function -v4 correct comments and copy function return mpage (suggested-by Felix) Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:19:12 -04:00
James Zhu	5c15a05b52	drm/amdkfd: remove unused code upages is assigned under cpages = 0, so it isn't really used in this function. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Philip.Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:19:01 -04:00
Lijo Lazar	6ec7120dec	drm/amd/pm: Add priority messages for SMU v13.0.6 Certain messages will processed with high priority by PMFW even if it hasn't responded to a previous message. Send the priority message regardless of the success/fail status of the previous message. Add support on SMUv13.0.6 and SMUv13.0.12 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:18:51 -04:00

1 2 3 4 5 ...

1369219 Commits