The conversion of all GPIO drivers to using the .set_rv() and
.set_multiple_rv() callbacks from struct gpio_chip (which - unlike their
predecessors - return an integer and allow the controller drivers to
indicate failures to users) is now complete and the legacy ones have
been removed. Rename the new callbacks back to their original names in
one sweeping change.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e82829 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62eedd150f)
Cc: stable@vger.kernel.org
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0333052d90)
Cc: stable@vger.kernel.org
Assign unique id to compute partition. This is the unique id of the
first XCD instance belonging to the partition.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Legacy resets reset the memory controllers so VRAM contents
may be unreliable after reset.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e82829 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a NULL check for acrtc->dm_irq_params.stream before
accessing its members.
Fixes below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:623
dm_vupdate_high_irq() warn: variable dereferenced before check
'acrtc->dm_irq_params.stream' (see line 615)
614 if (vrr_active) {
615 bool replay_en = acrtc->dm_irq_params.stream->link->replay_settings.replay_feature_enabled;
^^^^^^^^^^^^^^^^^^^^^^^^^^^
616 bool psr_en = acrtc->dm_irq_params.stream->link->psr_settings.psr_feature_enabled;
^^^^^^^^^^^^^^^^^^^^^^^^^^^ New dereferences
617 bool fs_active_var_en = acrtc->dm_irq_params.freesync_config.state
618 == VRR_STATE_ACTIVE_VARIABLE;
619
620 amdgpu_dm_crtc_handle_vblank(acrtc);
621
622 /* BTR processing for pre-DCE12 ASICs */
623 if (acrtc->dm_irq_params.stream &&
^^^^^^^^^^^^^^^^^^^^^^^^^^^ But the existing code assumed it could be NULL. Someone is wrong.
624 adev->family < AMDGPU_FAMILY_AI) {
625 spin_lock_irqsave(&adev_to_drm(adev)->event_lock, flags);
Fixes: 6d31602a9f ("drm/amd/display: more liberal vmin/vmax update for freesync")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Daniel Wheeler <daniel.wheeler@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove caching logic of temperature metrics from SMUv13.0.12. The
caching logic needs to be moved to a higher level.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The buffer is already freed as part of amdgpu_vcn_reg_dump_fini(). The
issue is introduced by below patch series.
Fixes: de55cbff5c ("drm/amdgpu/vcn: Add regdump helper functions")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The bad pages threshold exceed CPER should be generated once threshold
exceeded, no matter the bad_page_threshold setted or not.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fetch system metrics table to fill gpuboard/baseboard temperature
metrics data for smu_v13_0_12
v2: Remove unnecessary checks, used separate metrics time for
temperature metrics table(Lijo)
v3: Use cached values for back to back system metrics query(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add smu interface to get baseboard/gpuboard temperature metrics
v2: Rename is_support to is_supported(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add dpm interface to get gpuboard/baseboard temperature metrics
v2: Add temperature metrics support check(Lijo)
v3: Return error code in case of operation not supported(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the following warning in struct documentation:
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:168: warning: expecting prototype for struct dm_vupdate_work. Prototype was for struct vupdate_offload_work instead
Fixes: c210b757b4 ("drm/amd/display: fix dmub access race condition")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit
should always be set when migration success. cpage includes
src MIGRATE_PFN_MIGRATE bit set and MIGRATE_PFN_VALID bit
unset pages for both ram and vram when memory is only allocated
without being populated before migration, those ram pages should
be counted as migrate pages and those vram pages should not be
counted as migrate pages. Here migration pages refer to how many
vram pages involved.
-v2 use dst to check MIGRATE_PFN_VALID bit (suggested-by Philip)
-v3 add warning when vram pages is less than migration pages
return migration pages directly from copy function
-v4 correct comments and copy function return mpage (suggested-by Felix)
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain messages will processed with high priority by PMFW even if it
hasn't responded to a previous message. Send the priority message
regardless of the success/fail status of the previous message. Add
support on SMUv13.0.6 and SMUv13.0.12
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set the dpc status based on hardware state. Also, clear the status before
reinitialization after a successful reset.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead
of checking for C2PMSG_33 status, add the callback wait_for_bootloader.
Wait for bootloader to be back to steady state is already part of the
generic mode-1 reset flow. Increase the retry count for bootloader wait
and also fix the mask to prevent fake pass.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The DMA mapping can now correspond to a folio (order > 0), so move
the iterator by the number of pages in the folio in order to migrate
all pages at once.
This requires forcing contiguous memory for SVM BOs, which greatly
simplifies the code and enables 2MB device page support, allowing a
major performance improvement. Negative effects like extra eviction
are unlikely as SVM BOs have a maximal size of 2MB.
v2:
- Improve commit message (Matthew Brost)
- Fix increment, chunk, assert match (Matthew Brost)
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-7-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
If the order is greater than zero, allocate a folio when populating the
RAM PFNs instead of allocating individual pages one after the other. For
example if 2MB folios are used instead of 4KB pages, this reduces the
number of calls to the allocation API by 512.
v2:
- Use page order instead of extra argument (Matthew Brost)
- Allocate with folio_alloc() (Matthew Brost)
- Loop for mpages and free_pages based on order (Matthew Brost)
v3:
- Fix loops in drm_pagemap_migrate_populate_ram_pfn() (Matthew Brost)
v4:
- Use folio_trylock(), set local variable to NULL (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-5-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
If the page is part of a folio, DMA map the whole folio at once instead of
mapping individual pages one after the other. For example if 2MB folios
are used instead of 4KB pages, this reduces the number of DMA mappings by
512.
The folio order (and consequently, the size) is persisted in the struct
drm_pagemap_device_addr to be available at the time of unmapping.
v2:
- Initialize order variable (Matthew Brost)
- Set proto and dir for completeness (Matthew Brost)
- Do not populate drm_pagemap_addr, document it (Matthew Brost)
- Add and use macro NR_PAGES(order) (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-4-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
This struct embeds more information than just the DMA address. This will
help later to support folio orders greater than zero. At this point, there
is no functional change as the only struct member used is addr.
In Xe, adapt to the new drm_gpusvm_devmem_ops type signatures using struct
drm_pagemap_addr, as well as the internal xe SVM functions implementing
those operations. The use of this struct is propagated to xe_migrate as it
makes indexed accesses to the next DMA address but they are no longer
contiguous.
v2:
- Rename drm_pagemap_device_addr to drm_pagemap_addr (Matthew Brost)
- Squash with patch for Xe (Matthew Brost)
- Set proto and dir for completeness (Matthew Brost)
- Assess DMA map protocol (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-3-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>