linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-06 07:55:28 -04:00

Author	SHA1	Message	Date
Vitaly Prosyak	9dff2bb709	drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces Certain multi-GPU configurations (especially GFX12) may hit data corruption when a DCC-compressed VRAM surface is shared across GPUs using peer-to-peer (P2P) DMA transfers. Such surfaces rely on device-local metadata and cannot be safely accessed through a remote GPU’s page tables. Attempting to import a DCC-enabled surface through P2P leads to incorrect rendering or GPU faults. This change disables P2P for DCC-enabled VRAM buffers that are contiguous and allocated on GFX12+ hardware. In these cases, the importer falls back to the standard system-memory path, avoiding invalid access to compressed surfaces. Future work could consider optional migration (VRAM→System→VRAM) if a performance regression is observed when `attach->peer2peer = false`. Tested on: - Dual RX 9700 XT (Navi4x) setup - GNOME and Wayland compositor scenarios - Confirmed no corruption after disabling P2P under these conditions v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS. v3: simplify for upsteam and fix ip version check (Alex) Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-11 21:53:27 -05:00
Wenjing Liu	815e260a18	drm/amd/display: add macros to simplify code [Why & How] Adding macros to simplify the process of adding new error codes. Currently, to add an error code, the developer needs to add both the enum and the string translation. This is error prone and can lead to inconsistencies. The refactor adds a macro to automatically add the string translation based on the enum. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-11 21:53:26 -05:00
Tao Zhou	c154a96b55	drm/amdgpu: load RAS bad page from PMFW in page retirement In legacy way, bad page is queried from MCA registers, switch to getting it from PMFW when PMFW manages eeprom data. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-11 21:53:26 -05:00
Dave Airlie	2a084f4ad7	Merge tag 'amd-drm-next-6.19-2025-11-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.19-2025-11-07: amdgpu: - Misc fixes - HMM cleanup - HDP flush rework - RAS updates - SMU 13.x updates - SI DPM cleanup - Suspend rework - UQ reset support - Replay/PSR fixes - HDCP updates - DC PMO fixes - DC pstate fixes - DCN4 fixes - GPUVM fixes - SMU 13 parition metrics - Fix possible fence leak in job cleanup - Hibernation fix - MST fix amdkfd: - HMM cleanup - Process cleanup fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20251107145938.26669-1-alexander.deucher@amd.com	2025-11-11 15:35:49 +10:00
Dave Airlie	e237dfe708	Merge tag 'drm-misc-next-2025-11-05-1' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.19-rc1: UAPI Changes: - Add userptr support to ivpu. - Add IOCTL's for resource and telemetry data in amdxdna. Core Changes: - Improve some atomic state checking handling. - drm/client updates. - Use forward declarations instead of including drm_print.h - RUse allocation flags in ttm_pool/device_init and allow specifying max useful pool size and propagate ENOSPC. - Updates and fixes to scheduler and bridge code. - Add support for quirking DisplayID checksum errors. Driver Changes: - Assorted cleanups and fixes in rcar-du, accel/ivpu, panel/nv3052cf, sti, imxm, accel/qaic, accel/amdxdna, imagination, tidss, sti, panthor, vkms. - Add Samsung S6E3FC2X01 DDIC/AMS641RW, Synaptics TDDI series DSI, TL121BVMS07-00 (IL79900A) panels. - Add mali MediaTek MT8196 SoC gpu support. - Add etnaviv GC8000 Nano Ultra VIP r6205 support. - Document powervr ge7800 support in the devicetree. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/5afae707-c9aa-4a47-b726-5e1f1aa7a106@linux.intel.com	2025-11-07 12:41:26 +10:00
Dave Airlie	8f037e11d0	Merge tag 'drm-intel-next-2025-11-04' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next drm/i915 feature pull for v6.19: Features and functionality: - Enable LNL+ content adaptive sharpness filter (CASF) (Nemesa) - Use optimized VRR guardband (Ankit, Ville) - Enable Xe3p LT PHY (Suraj) - Enable FBC support for Xe3p_LPD display (Sai Teja, Vinod) - Specify DMC firmware for display version 30.02 (Dnyaneshwar) - Report reason for disabling PSR to debugfs (Michał) - Extend i915_display_info with Type-C port details (Khaled) - Log DSI send packet sequence errors and contents Refactoring and cleanups: - Refactoring to prepare for VRR guardband optimization (Ankit) - Abstract VRR live status wait (Ankit) - Refactor VRR and DSB timing to handle Set Context Latency explicitly (Ankit) - Helpers for prefill latency calculations (Ville) - Refactor SKL+ watermark latency setup (Ville) - VRR refactoring and cleanups (Ville) - SKL+ universal plane cleanups (Ville) - Decouple CDCLK from state->modeset refactor (Ville) - Refactor VLV/CHV clock functions (Jani) - Refactor fbdev handling (Jani) - Call i915 and xe runtime PM from display via function pointers (Jouni) - IRQ code refactoring (Jani) - Drop display dependency on i915 feature check macros (Jani) - Refactor and unify i915 and xe stolen memory interfaces towards display (Jani) - Switch to driver agnostic drm to display pointer chase (Jani) - Use display version over graphics version in display code (Matt A) - GVT cleanups (Jonathan, Andi) - Rename a VLV clock function to unify (Michał) - Explicitly sanitize DMC package header num entries (Luca) - Remove redundant port clock check from ALPM (Jouni) - Use sysfs_emit() instead of sprintf() in PMU sysfs (Madhur Kumar) - Clean up C20 PHY PLL register macros (Imre, Mika)) - Abstract "address in MMIO table" helper for general use (Matt A) - Improve VRR platform abstractions (Ville) - Move towards more standard PCI PM code usage (Ville) - Framebuffer refactoring (Ville) - Drop display dependency on i915_utils.h (Jani) - Include cleanups (Jani) Fixes: - Workaround docking station DSC issues with high pixel clock and bpp (Imre) - Fix Panel Replay in DSC mode (Imre) - Disable tracepoints for PREEMPT_RT as a workaround (Maarten) - Fix intel_crtc_get_vblank_counter() on PREEMPT_RT (Maarten) - Fix C10 PHY identification on PTL/WCL (Dnyaneshwar) - Take AS SDP into account with optimized guardband (Jouni) - Fix panic structure allocation memory leak (Jani) - Adjust an FBC workaround platforms (Vinod) - Add fallback for CDCLK selection (Naladala) - Avoid using invalid transcoder in MST transport select (Suraj) - Don't use cursor size reduction on display version 14+ (Nemesa) - Fix C20 PHY PLL register programming (Imre, Mika) - Fix PSR frontbuffer flush handling (Jouni) - Store ALPM parameters in crtc state (Jouni) - Defeature DRRS on LNL+ (Ville) - Fix the scope of the large DRAM DIMM workaround (Ville) - Fix PICA vs. AUX power ordering issue (Gustavo) - Fix pixel rate for computing watermark line time (Ville) - Fix framebuffer set_tiling vs. addfb race (Ville) - DMC event handler fixes (Ville) DRM Core: - CRTC sharpness strength property (Nemesa) - DPCD DSC quirk for Synaptics Panamera devices (Imre) - Helpers to query the branch DSC max throughput/line-width (Imre) Merges: - Backmerge drm-next for v6.18-rc and to sync with drm-xe-next (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/ec5a05f2df6d597a62033ee2d57225cce707b320@intel.com	2025-11-07 09:47:56 +10:00
Asad Kamal	2e640e8e7b	drm/amd/pm: Update default power1_cap Update default power1_cap to max limit for smu_v13_0_6 and smu_v13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 10:02:19 -05:00
Tao Zhou	541414065c	drm/amdgpu: skip writing eeprom when PMFW manages RAS data Only update bad page number in legacy eeprom write path. v2: add null pointer check for con. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 10:02:15 -05:00
Wayne Lin	62320fb8d9	drm/amd/display: Enable mst when it's detected but yet to be initialized [Why] drm_dp_mst_topology_queue_probe() is used under the assumption that mst is already initialized. If we connect system with SST first then switch to the mst branch during suspend, we will fail probing topology by calling the wrong API since the mst manager is yet to be initialized. [How] At dm_resume(), once it's detected as mst branc connected, check if the mst is initialized already. If not, call dm_helpers_dp_mst_start_top_mgr() instead to initialize mst V2: Adjust the commit msg a bit Fixes: `bc068194f5` ("drm/amd/display: Don't write DP_MSTM_CTRL after LT") Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 10:01:18 -05:00
Tao Zhou	e1ca536e17	drm/amdgpu: support to load RAS bad pages from PMFW PMFW manages eeprom bad page records, update bad page loading accrodingly. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 10:01:14 -05:00
Lijo Lazar	1ad25fd272	drm/amdgpu: Fix wait after reset sequence in S3 For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if TOS is unloaded. Fixes: `32f73741d6` ("drm/amdgpu: Wait for bootloader after PSPv11 reset") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:58:38 -05:00
Tao Zhou	7f34ddf77d	drm/amdgpu: add ras_eeprom_read_idx interface PMFW will manage RAS eeprom data by itself, add new interface to read eeprom data via PMFW, we can read part of records by setting index. v2: use IPID parse interface. pa is not used and set it to a fixed value. v3: optimize the null pointer check for IPID parse interface. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:58:35 -05:00
Tao Zhou	cd74132be8	drm/amdgpu: make MCA IPID parse global So we can call it in other blocks. v2: add a new IPID parse interface for umc and we can implement it for each ASIC. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:58:31 -05:00
Mario Limonciello	4104c0a454	drm/amd: Fix suspend failure with secure display TA commit `c760bcda83` ("drm/amd: Check whether secure display TA loaded successfully") attempted to fix extra messages, but failed to port the cleanup that was in commit `5c6d52ff4b` ("drm/amd: Don't try to enable secure display TA multiple times") to prevent multiple tries. Add that to the failure handling path even on a quick failure. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679 Fixes: `c760bcda83` ("drm/amd: Check whether secure display TA loaded successfully") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:57:24 -05:00
YiPeng Chai	be031770bf	drm/amd/ras: Fix the issue of incorrect function call When amdgpu_device_health_check fails, amdgpu_ras_pre_reset will not be called and therefore amdgpu_ras_post_reset cannot be called either. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:57:17 -05:00
Samuel Zhang	5d1b32cfe4	drm/amdgpu: fix gpu page fault after hibernation on PF passthrough On PF passthrough environment, after hibernate and then resume, coralgemm will cause gpu page fault. Mode1 reset happens during hibernate, but partition mode is not restored on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right after resume. When CP access the MQD BO, wrong stride size is used, this will cause out of bound access on the MQD BO, resulting page fault. The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called when resume from a hibernation. KFD resume is called separately during a reset recovery or resume from suspend sequence. Hence it's not required to be called as part of partition switch. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:57:11 -05:00
YiPeng Chai	127cdd726f	drm/amd/ras: ras supports i2c eeprom for mp1 v13_0_12 ras supports i2c eeprom for mp1 v13_0_12. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:57:07 -05:00
Ahmad Rehman	07528f7d97	drm/amdkfd: Do not wait for queue op response during reset This patch adds the condition to not wait for the queue response for unmap, if the gpu is in reset. Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:56:30 -05:00
David (Ming Qiang) Wu	b665f29a2f	drm/amdgpu/userq: need to unref bo unref bo after amdgpu_bo_reserve() failure as it has called amdgpu_bo_ref() already Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:56:25 -05:00
Gangliang Xie	1349b31313	drm/amdgpu: initialize max record count after table reset initialize max record count and record offset after table reset Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:56:22 -05:00
Gangliang Xie	a448c40ff2	drm/amd/pm: check pmfw eeprom feature bit get and check the pmfw eeprom feature bit to decide if pmfw eeprom is supported Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:56:19 -05:00
Gangliang Xie	cd5b28a040	drm/amdgpu: add check function for pmfw eeprom add check function for pmfw eeprom Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:56:15 -05:00
Gangliang Xie	19c815d516	drm/amdgpu: add initialization function for pmfw eeprom add initialization function for pmfw eeprom Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:56:04 -05:00
Gangliang Xie	9ce015e5fd	drm/amdgpu: adapt reset function for pmfw eeprom adapt reset function for pmfw eeprom Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:55:58 -05:00
Marek Vasut	6126a7f27f	dt-bindings: gpu: img,powervr-rogue: Document GE7800 GPU in Renesas R-Car M3-N Document Imagination Technologies PowerVR Rogue GE7800 BNVC 15.5.1.64 present in Renesas R-Car R8A77965 M3-N SoC. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20251104135716.12497-2-marek.vasut+renesas@mailbox.org Signed-off-by: Matt Coster <matt.coster@imgtec.com>	2025-11-05 10:54:39 +00:00
Marek Vasut	cc2a5cae75	dt-bindings: gpu: img,powervr-rogue: Keep lists sorted alphabetically Sort the enum: list alphabetically. No functional change. Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20251104135716.12497-1-marek.vasut+renesas@mailbox.org Signed-off-by: Matt Coster <matt.coster@imgtec.com>	2025-11-05 10:54:39 +00:00
Alok Tiwari	c84d874615	drm: rcar-du: fix incorrect return in rcar_du_crtc_cleanup() The rcar_du_crtc_cleanup() function has a void return type, but incorrectly uses a return statement with a call to drm_crtc_cleanup(), which also returns void. Remove the return statement to ensure proper function semantics. No functional change intended. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Link: https://patch.msgid.link/20251017191634.1454201-1-alok.a.tiwari@oracle.com Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>	2025-11-05 11:17:26 +02:00
Karol Wachowski	db892a9f7a	accel/ivpu: Improve debug and warning messages Add IOCTL debug bit for logging user provided parameter validation errors. Refactor several warning and error messages to better reflect fault reason. User generated faults should not flood kernel messages with warnings or errors, so change those to ivpu_dbg(). Add additional debug logs for parameter validation in IOCTLs. Check size provided by in metric streamer start and return -EINVAL together with a debug message print. Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20251104132418.970784-1-karol.wachowski@linux.intel.com	2025-11-05 08:35:33 +01:00
Lizhi Hou	e568dc3e62	accel/amdxdna: Add IOCTL parameter for telemetry data Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters that allow collection of telemetry data. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251104062546.833771-3-lizhi.hou@amd.com	2025-11-04 09:04:21 -08:00
Lizhi Hou	1556c170d2	accel/amdxdna: Add IOCTL parameter for resource data Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters that allow collection of resource data. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251104062546.833771-2-lizhi.hou@amd.com	2025-11-04 09:03:11 -08:00
Lizhi Hou	c48f1f459e	accel/amdxdna: Add hardware specific attributes Add three hardware specific attributes to describe device capabilities: hwctx_limit: The maximum number of hardware context supported. max_tops: The maximum TOPS supported. curr_tops: The TOPS achievable with the current power and frequency configuration. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251104062546.833771-1-lizhi.hou@amd.com	2025-11-04 09:01:44 -08:00
Alex Deucher	f903b85ed0	drm/amdgpu: fix possible fence leaks from job structure If we don't end up initializing the fences, free them when we free the job. We can't set the hw_fence to NULL after emitting it because we need it in the cleanup path for the submit direct case. v2: take a reference to the fences if we emit them v3: handle non-job fence in error paths Fixes: `db36632ea5` ("drm/amdgpu: clean up and unify hw fence handling") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> (v1) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:59 -05:00
YiPeng Chai	d95ca7f515	drm/amdgpu: suspend ras module before gpu reset During gpu reset, all GPU-related resources are inaccessible. To avoid affecting ras functionality, suspend ras module before gpu reset and resume it after gpu reset is complete. V2: Rename functions to avoid misunderstanding. V3: Move flush_delayed_work to amdgpu_ras_process_pause, Move schedule_delayed_work to amdgpu_ras_process_unpause. V4: Rename functions. V5: Move the function to amdgpu_ras.c. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:59 -05:00
Gangliang Xie	d4432f16d3	drm/amdgpu: add wrapper functions for pmfw eeprom interface add wrapper functions for pmfw eeprom interface, for these interfaces to be easily and safely called Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:58 -05:00
Gangliang Xie	f6cdcbd2c0	drm/amdgpu: add function to check if pmfw eeprom is supported add function to check if pmfw is supported, skip eeprom check and recover when pmfw eeprom is supported Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:58 -05:00
Gangliang Xie	f5346a176c	drm/amd/pm: add smu ras driver framework add functions to get smu ras driver Signed-off-by: Gangliang Xie <ganglxie@amd.com> Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:58 -05:00
Gangliang Xie	77dbd7c0a2	drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12 implement ras_smu_drv interface for smu v13.0.12 Signed-off-by: Gangliang Xie <ganglxie@amd.com> Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:58 -05:00
Gangliang Xie	0c6f09e65b	drm/amd/pm: add new message definitions for pmfw eeprom interface Add new message definitions for pmfw eeprom interface Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:58 -05:00
Philip Yang	88ef4de35f	Revert "drm/amdkfd: Improve signal event slow path" To fix regression report on gfx8, which requires the exhaustive search path for signaled event. The high CPU usage of KFD interrupt wq issue is gone after HIP/ROCr add option to reduce HW event interrupts, safe to revert this optimization patch now. This reverts commit `de844846f7`. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
Rong Zhang	f19bbecd34	drm/amd/display: Fix NULL deref in debugfs odm_combine_segments When a connector is connected but inactive (e.g., disabled by desktop environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading odm_combine_segments causes kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy) e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> seq_read_iter+0x125/0x490 ? __alloc_frozen_pages_noprof+0x18f/0x350 seq_read+0x12c/0x170 full_proxy_read+0x51/0x80 vfs_read+0xbc/0x390 ? __handle_mm_fault+0xa46/0xef0 ? do_syscall_64+0x71/0x900 ksys_read+0x73/0xf0 do_syscall_64+0x71/0x900 ? count_memcg_events+0xc2/0x190 ? handle_mm_fault+0x1d7/0x2d0 ? do_user_addr_fault+0x21a/0x690 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f44d4031687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00> RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000 </TASK> Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x> snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn> platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Fix this by checking pipe_ctx->stream_res.tg before dereferencing. Fixes: `07926ba8a4` ("drm/amd/display: Add debugfs interface for ODM combine info") Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Mario Limoncello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
Philip Yang	10c382ec6c	drm/amdkfd: Don't clear PT after process killed If process is killed. the vm entity is stopped, submit pt update job will trigger the error message "ERROR Trying to push to a killed entity", job will not execute. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
YiPeng Chai	3f16007d86	drm/amd/ras: Add ras support for umc v12_5_0 Add ras support for umc v12_5_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
YiPeng Chai	d7f105a402	drm/amd/ras: Add ras support for nbio v7_9_1 Add ras support for nbio v7_9_1. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
YiPeng Chai	2f46c547e4	drm/amdgpu: Add ras ip block name Add ras ip block name. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
YiPeng Chai	36265d2bcc	drm/amd/ras: Increase ras switch control range Increase ras switch control range. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
Alex Deucher	fd39b5a583	drm/amdgpu/smu: Handle S0ix for vangogh Fix the flows for S0ix. There is no need to stop rlc or reintialize PMFW in S0ix. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Antheas Kapenekakis <lkml@antheas.dev> Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
Lijo Lazar	c83fd2a665	drm/amd/pm: Update SMUv13.0.12 partition metrics Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:21 -05:00
Lijo Lazar	56aeca499a	drm/amd/pm: Update SMUv13.0.6 partition metrics For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:21 -05:00
Lijo Lazar	4f993e2309	drm/amd/pm: Add schema v1.1 for parition metrics Use a schema similar to gpu metrics v1.9 for partition metrics also. It will have field type encoded followed by the field value(s). The attribute ids used will be shared with gpu metrics. The structure definition is only to distinguish between gpu metrics and partition metrics though both gpu metrics v1.9 and partition metrics v1.1 follow the same definition. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:21 -05:00
Lijo Lazar	b480f573a8	drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12 Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:21 -05:00

1 2 3 4 5 ...

1398252 Commits