Commit Graph

33543 Commits

Author SHA1 Message Date
Linus Torvalds
b08494a8f7 Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
 "As part of building up nova-core/nova-drm pieces we've brought in some
  rust abstractions through this tree, aux bus being the main one, with
  devres changes also in the driver-core tree. Along with the drm core
  abstractions and enough nova-core/nova-drm to use them. This is still
  all stub work under construction, to build the nova driver upstream.

  The other big NVIDIA related one is nouveau adds support for
  Hopper/Blackwell GPUs, this required a new GSP firmware update to
  570.144, and a bunch of rework in order to support multiple fw
  interfaces.

  There is also the introduction of an asahi uapi header file as a
  precursor to getting the real driver in later, but to unblock
  userspace mesa packages while the driver is trapped behind rust
  enablement.

  Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe,
  and msm being the main ones, and some changes to vsprintf.

  new drivers:
   - bring in the asahi uapi header standalone
   - nova-drm: stub driver

  rust dependencies (for nova-core):
   - auxiliary
       - bus abstractions
       - driver registration
       - sample driver
   - devres changes from driver-core
   - revocable changes

  core:
   - add Apple fourcc modifiers
   - add virtio capset definitions
   - extend EXPORT_SYNC_FILE for timeline syncobjs
   - convert to devm_platform_ioremap_resource
   - refactor shmem helper page pinning
   - DP powerup/down link helpers
   - extended %p4cc in vsprintf.c to support fourcc prints
   - change vsprintf %p4cn to %p4chR, remove %p4cn
   - Add drm_file_err function
   - IN_FORMATS_ASYNC property
   - move sitronix from tiny to their own subdir

  rust:
   - add drm core infrastructure rust abstractions
     (device/driver, ioctl, file, gem)

  dma-buf:
   - adjust sg handling to not cache map on attach
   - allow setting dma-device for import
   - Add a helper to sort and deduplicate dma_fence arrays

  docs:
   - updated drm scheduler docs
   - fbdev todo update
   - fb rendering
   - actual brightness

  ttm:
   - fix delayed destroy resv object

  bridge:
   - add kunit tests
   - convert tc358775 to atomic
   - convert drivers to devm_drm_bridge_alloc
   - convert rk3066_hdmi to bridge driver

  scheduler:
   - add kunit tests

  panel:
   - refcount panels to improve lifetime handling
   - Powertip PH128800T004-ZZA01
   - NLT NL13676BC25-03F, Tianma TM070JDHG34-00
   - Himax HX8279/HX8279-D DDIC
   - Visionox G2647FB105
   - Sitronix ST7571
   - ZOTAC rotation quirk

  vkms:
   - allow attaching more displays

  i915:
   - xe3lpd display updates
   - vrr refactor
   - intel_display struct conversions
   - xe2hpd memory type identification
   - add link rate/count to i915_display_info
   - cleanup VGA plane handling
   - refactor HDCP GSC
   - fix SLPC wait boosting reference counting
   - add 20ms delay to engine reset
   - fix fence release on early probe errors

  xe:
   - SRIOV updates
   - BMG PCI ID update
   - support separate firmware for each GT
   - SVM fix, prelim SVM multi-device work
   - export fan speed
   - temp disable d3cold on BMG
   - backup VRAM in PM notifier instead of suspend/freeze
   - update xe_ttm_access_memory to use GPU for non-visible access
   - fix guc_info debugfs for VFs
   - use copy_from_user instead of __copy_from_user
   - append PCIe gen5 limitations to xe_firmware document

  amdgpu:
   - DSC cleanup
   - DC Scaling updates
   - Fused I2C-over-AUX updates
   - DMUB updates
   - Use drm_file_err in amdgpu
   - Enforce isolation updates
   - Use new dma_fence helpers
   - USERQ fixes
   - Documentation updates
   - SR-IOV updates
   - RAS updates
   - PSP 12 cleanups
   - GC 9.5 updates
   - SMU 13.x updates
   - VCN / JPEG SR-IOV updates

  amdkfd:
   - Update error messages for SDMA
   - Userptr updates
   - XNACK fixes

  radeon:
   - CIK doorbell cleanup

  nouveau:
   - add support for NVIDIA r570 GSP firmware
   - enable Hopper/Blackwell support

  nova-core:
   - fix task list
   - register definition infrastructure
   - move firmware into own rust module
   - register auxiliary device for nova-drm

  nova-drm:
   - initial driver skeleton

  msm:
   - GPU:
       - ACD (adaptive clock distribution) for X1-85
       - drop fictional address_space_size
       - improve GMU HFI response time out robustness
       - fix crash when throttling during boot
   - DPU:
       - use single CTL path for flushing on DPU 5.x+
       - improve SSPP allocation code for better sharing
       - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550
       - Added SAR2130P support
       - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660
   - DP:
       - switch to new audio helpers
       - better LTTPR handling
   - DSI:
       - Added support for SA8775P
       - Added SAR2130P support
   - HDMI:
       - Switched to use new helpers for ACR data
       - Fixed old standing issue of HPD not working in some cases

  amdxdna:
   - add dma-buf support
   - allow empty command submits

  renesas:
   - add dma-buf support
   - add zpos, alpha, blend support

  panthor:
   - fail properly for NO_MMAP bos
   - add SET_LABEL ioctl
   - debugfs BO dumping support

  imagination:
   - update DT bindings
   - support TI AM68 GPU

  hibmc:
   - improve interrupt handling and HPD support

  virtio:
   - add panic handler support

  rockchip:
   - add RK3588 support
   - add DP AUX bus panel support

  ivpu:
   - add heartbeat based hangcheck

  mediatek:
   - prepares support for MT8195/99 HDMIv2/DDCv2

  anx7625:
   - improve HPD

  tegra:
   - speed up firmware loading

* tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits)
  drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr()
  drm/xe: Default auto_link_downgrade status to false
  drm/xe/guc: Make creation of SLPC debugfs files conditional
  drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue()
  drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read
  drm/i915/ptl: Use everywhere the correct DDI port clock select mask
  drm/nouveau/kms: add support for GB20x
  drm/dp: add option to disable zero sized address only transactions.
  drm/nouveau: add support for GB20x
  drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle
  drm/nouveau: add support for GB10x
  drm/nouveau/gf100-: track chan progress with non-WFI semaphore release
  drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA
  drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos
  drm/nouveau: add support for GH100
  drm/nouveau: improve handling of 64-bit BARs
  drm/nouveau/gv100-: switch to volta semaphore methods
  drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES
  drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY
  drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM
  ...
2025-05-28 09:46:39 -07:00
Linus Torvalds
97851c6016 Merge tag 'ratelimit.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull rate-limit updates from Paul McKenney:
 "lib/ratelimit: Reduce false-positive and silent misses:

   - Reduce open-coded use of ratelimit_state structure fields.

   - Convert the ->missed field to atomic_t.

   - Count misses that are due to lock contention.

   - Eliminate jiffies=0 special case.

   - Reduce ___ratelimit() false-positive rate limiting (Petr Mladek).

   - Allow zero ->burst to hard-disable rate limiting.

   - Optimize away atomic operations when a miss is guaranteed.

   - Warn if ->interval or ->burst are negative (Petr Mladek).

   - Simplify the resulting code.

  A smoke test and stress test have been created, but they are not yet
  ready for mainline. With luck, we will offer them for the v6.17 merge
  window"

* tag 'ratelimit.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  ratelimit: Drop redundant accesses to burst
  ratelimit: Use nolock_ret restructuring to collapse common case code
  ratelimit: Use nolock_ret label to collapse lock-failure code
  ratelimit: Use nolock_ret label to save a couple of lines of code
  ratelimit: Simplify common-case exit path
  ratelimit: Warn if ->interval or ->burst are negative
  ratelimit: Avoid atomic decrement under lock if already rate-limited
  ratelimit: Avoid atomic decrement if already rate-limited
  ratelimit: Don't flush misses counter if RATELIMIT_MSG_ON_RELEASE
  ratelimit: Force re-initialization when rate-limiting re-enabled
  ratelimit: Allow zero ->burst to disable ratelimiting
  ratelimit: Reduce ___ratelimit() false-positive rate limiting
  ratelimit: Avoid jiffies=0 special case
  ratelimit: Count misses due to lock contention
  ratelimit: Convert the ->missed field to atomic_t
  drm/amd/pm: Avoid open-coded use of ratelimit_state structure's internals
  drm/i915: Avoid open-coded use of ratelimit_state structure's ->missed field
  random: Avoid open-coded use of ratelimit_state structure's ->missed field
  ratelimit: Create functions to handle ratelimit_state internals
2025-05-27 10:48:36 -07:00
Linus Torvalds
2bd1bea5fa Merge tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq cleanups from Thomas Gleixner:
 "A set of cleanups for the generic interrupt subsystem:

   - Consolidate on one set of functions for the interrupt domain code
     to get rid of pointlessly duplicated code with only marginal
     different semantics.

   - Update the documentation accordingly and consolidate the coding
     style of the irqdomain header"

* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  irqdomain: Consolidate coding style
  irqdomain: Fix kernel-doc and add it to Documentation
  Documentation: irqdomain: Update it
  Documentation: irq-domain.rst: Simple improvements
  Documentation: irq/concepts: Minor improvements
  Documentation: irq/concepts: Add commas and reflow
  irqdomain: Improve kernel-docs of functions
  irqdomain: Make struct irq_domain_info variables const
  irqdomain: Use irq_domain_instantiate()'s return value as initializers
  irqdomain: Drop irq_linear_revmap()
  pinctrl: keembay: Switch to irq_find_mapping()
  irqchip/armada-370-xp: Switch to irq_find_mapping()
  gpu: ipu-v3: Switch to irq_find_mapping()
  gpio: idt3243x: Switch to irq_find_mapping()
  sh: Switch to irq_find_mapping()
  powerpc: Switch to irq_find_mapping()
  irqdomain: Drop irq_domain_add_*() functions
  powerpc: Switch irq_domain_add_nomap() to use fwnode
  thermal: Switch to irq_domain_create_linear()
  soc: Switch to irq_domain_create_*()
  ...
2025-05-27 08:07:32 -07:00
Mario Limonciello
7e7cb7a13c Revert "drm/amd: Keep display off while going into S4"
commit 68bfdc8dc0 ("drm/amd: Keep display off while going into S4")
attempted to keep displays off during the S4 sequence by not resuming
display IP.  This however leads to hangs because DRM clients such as the
console can try to access registers and cause a hang.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155
Fixes: 68bfdc8dc0 ("drm/amd: Keep display off while going into S4")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e485502c37)
Cc: stable@vger.kernel.org
2025-05-22 12:13:51 -04:00
Jiri Slaby (SUSE)
493e109267 gpu: Switch to irq_domain_create_linear()
irq_domain_add_linear() is going away as being obsolete now. Switch to
the preferred irq_domain_create_linear(). That differs in the first
parameter: It takes more generic struct fwnode_handle instead of struct
device_node. Therefore, of_fwnode_handle() is added around the
parameter.

Note some of the users can likely use dev->fwnode directly instead of
indirect of_fwnode_handle(dev->of_node). But dev->fwnode is not
guaranteed to be set for all, so this has to be investigated on case to
case basis (by people who can actually test with the HW).

[ tglx: Fix up subject prefix ]

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-19-jirislaby@kernel.org
2025-05-16 21:06:09 +02:00
fanhuang
2f0268ca1c drm/amdgpu/jpeg: sriov support for jpeg_v5_0_1
initialization table handshake with mmsch

Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:39:14 -04:00
fanhuang
56fc141a5c drm/amdgpu/vcn: sriov support for vcn_v5_0_1
initialization table handshake with mmsch

Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:39:10 -04:00
Taimur Hassan
7ce316620d drm/amd/display: Promote DAL to 3.2.334
This version brings along following update:
-Support external tunneling feature
-Modify DCN401 DMUB reset & halt sequence
-Fix the typo in dcn401 Hubp block
-Skip backend validation for virtual monitors

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:39:07 -04:00
Taimur Hassan
81fc9ca25f drm/amd/display: [FW Promotion] Release 0.1.11.0
Refactoring some DMUB related structs and enum.

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:39:04 -04:00
Ovidiu Bunea
29e82a2716 drm/amd/display: Add GPINT retries to ips_query_residency_info
[why & how]
GPINTs can timeout without returning any data. Since this path is
only for testing purposes, it should retry several times to ensure
data is collected.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:59 -04:00
Dillon Varone
6d8f73885e drm/amd/display: Modify DCN401 DMUB reset & halt sequence
[WHY&HOW]
If DMCUB is already disabled or reset, no need to send the halt command
again.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:56 -04:00
Samson Tam
4a16285aa1 drm/amd/display: add support for 2nd sharpening range
[Why & How]
Add support for 2nd sharpening range for cases where we want
override existing DCN sharpening range.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:51 -04:00
Nevenko Stupar
5dd63a0bfc drm/amd/display: Fix the typo in dcn401 Hubp block
[Why & How]
Fix the typo for hubp_clear_tiling, currently calls hubp2_clear_tiling
for dcn401 instead of intended hubp401_clear_tiling.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Nevenko Stupar <Nevenko.Stupar@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:47 -04:00
Chiawen Huang
eee5e5b358 drm/amd/display: Skip backend validation for virtual monitors
[Why&How]
Virtual monitors are now being validated during set_mode.
Virtual monitors should not undergo backend validation,
as the backend is intended only for physical monitors.
Virtual sinks have no real backend part information and
should be excluded from this validation.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Chiawen Huang <chiawen.huang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:44 -04:00
Karthi Kandasamy
40bae1aea0 drm/amd/display: Move mcache allocation programming from DML to resource
[Why]
mcache allocation programming is not part of DML's core responsibilities.
Keeping this logic in DML leads to poor separation of concerns and complicates maintenance.

[How]
Refactored code to move mcache parameter preparation and mcache ID assignment
into the resource file.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:36 -04:00
Cruise Hung
17accf4f22 drm/amd/display: Support external tunneling feature
[Why & How]
The original code only supports the tunneling for embedded one.
To support external tunneling feature, it needs to check
Tunneling_Support bit register.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:32 -04:00
Yihan Zhu
fe1903bc95 drm/amd/display: init local variable to fix format errors
[WHY & HOW]
Uninitialized local variables will cause format checker complain
about them.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:29 -04:00
Tomasz Siemek
c9b4fa034c drm/amd/display: Extend dc_plane_get_status with flags
[WHY]
dc_plane_get_status may be used for reading other plane properties
in the future.

[HOW]
Provide API for choosing plane properties to read.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Tomasz Siemek <Tomasz.Siemek@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:25 -04:00
Arvind Yadav
13d0724f0f drm/amdgpu: fix use-after-unlock in eviction fence destroy
The eviction fence destroy path incorrectly calls dma_fence_put() on
evf_mgr->ev_fence after releasing the ev_fence_lock. This introduces a
potential use-after-unlock or race because another thread concurrently
modifies evf_mgr->ev_fence.

Fix this by grabbing a local reference to evf_mgr->ev_fence under the
lock and using that for dma_fence_put() after waiting.

Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:10 -04:00
Lijo Lazar
cc473057bb drm/amdgpu: Allow NPS2-CPX combination for VFs
CPX partition mode is compatible with NPS2 on aquavanjaram VFs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:05 -04:00
fanhuang
67cc7f9096 drm/amdgpu/mmsch: Add MMSCH v5_0 support for sriov
These structures are basically ported from MMSCH v4_0
The structures are the same as v4_0 except for the
init header

Signed-off-by: fanhuang <FangSheng.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:37:57 -04:00
Lijo Lazar
3aa37922c6 drm/amdgpu: Use compatible NPS mode info
Compatible NPS modes for a partition mode are exposed through xcp_config
interface. To determine if a compute partition mode is valid, check if
the current NPS mode is part of compatible NPS modes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:37:50 -04:00
Lijo Lazar
9a343a64fa drm/amd/pm: Move SMUv13.0.12 function declarations
Move them to SMUv13.0.6 header file as they are used only in SMU
v13.0.6.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:37:44 -04:00
Asad Kamal
58c397890f drm/amdgpu: Add pldm version reporting
Add pldm version reporting through sysfs node

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:37:38 -04:00
Amber Lin
e3d0870a90 drm/amdkfd: Support chain runlists of XNACK+/XNACK-
If the MEC firmware supports chaining runlists of XNACK+/XNACK-
processes, set SQ_CONFIG1 chicken bit and SET_RESOURCES bit 28.

When the MEC/HWS supports it, KFD checks the XNACK+/XNACK- processes mix
happens or not. If it does, enter over-subscription.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:37:29 -04:00
David (Ming Qiang) Wu
ee7360fc27 drm/amdgpu: read back register after written for VCN v4.0.5
On VCN v4.0.5 there is a race condition where the WPTR is not
updated after starting from idle when doorbell is used. Adding
register read-back after written at function end is to ensure
all register writes are done before they can be used.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 07c9db090b)
Cc: stable@vger.kernel.org
2025-05-14 11:51:31 -04:00
Melissa Wen
fe14c0f096 Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor"
This reverts commit 272e6aab14.

Applying degamma curve to the cursor by default breaks Linux userspace
expectation.

On Linux, AMD display manager enables cursor degamma ROM just for
implict sRGB on HW versions where degamma is split into two blocks:
degamma ROM for pre-defined TFs and `gamma correction` for user/custom
curves, and degamma ROM settings doesn't apply to cursor plane.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1513
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Reported-by: Michel Dänzer <michel.daenzer@mailbox.org>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4144
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f6a305d474)
Cc: stable@vger.kernel.org
2025-05-14 11:51:04 -04:00
Shiwu Zhang
72ea78335e drm/amdgpu: add debugfs for spirom IFWI dump
Expose the debugfs file node for user space to dump the IFWI image
on spirom.

For one transaction between PSP and host, it will read out the
images on both active and inactive partitions so a buffer with two
times the size of maximum IFWI image (currently 16MByte) is needed.

v2: move the vbios gfl macros to the common header and rename the
    bo triplet struct to spirom_bo for this specific usage (Hawking)

v3: return directly the result of last command execution (Lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14 11:30:15 -04:00
Prike Liang
64db767013 drm/amdgpu: fix userq resource double freed
As the userq resource was already freed at the drm_release
early phase, it should avoid freeing userq resource again
at the later kms postclose callback.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14 11:30:03 -04:00
Jesse.Zhang
96a86dcb5b drm/amdgpu: Fix circular locking in userq creation
A circular locking dependency was detected between the global
`adev->userq_mutex` and per-file `userq_mgr->userq_mutex` when
creating user queues. The issue occurs because:

1. `amdgpu_userq_suspend()` and `amdgpu_userq_resume` take `adev->userq_mutex` first, then
   `userq_mgr->userq_mutex`
2. While `amdgpu_userq_create()` takes them in reverse order

This patch resolves the issue by:
1. Moving the `adev->userq_mutex` lock earlier in `amdgpu_userq_create()`
   to cover the `amdgpu_userq_ensure_ev_fence()` call
2. Releasing it after we're done with both queue creation and the
   scheduling halt check

v2: remove unused adev->userq_mutex lock (Prike)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14 11:29:38 -04:00
David (Ming Qiang) Wu
07c9db090b drm/amdgpu: read back register after written for VCN v4.0.5
On VCN v4.0.5 there is a race condition where the WPTR is not
updated after starting from idle when doorbell is used. Adding
register read-back after written at function end is to ensure
all register writes are done before they can be used.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14 11:28:39 -04:00
Melissa Wen
f6a305d474 Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor"
This reverts commit 272e6aab14.

Applying degamma curve to the cursor by default breaks Linux userspace
expectation.

On Linux, AMD display manager enables cursor degamma ROM just for
implict sRGB on HW versions where degamma is split into two blocks:
degamma ROM for pre-defined TFs and `gamma correction` for user/custom
curves, and degamma ROM settings doesn't apply to cursor plane.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1513
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Reported-by: Michel Dänzer <michel.daenzer@mailbox.org>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4144
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14 11:27:09 -04:00
Wayne Lin
d33724ffb7 drm/amd/display: Avoid flooding unnecessary info messages
It's expected that we'll encounter temporary exceptions
during aux transactions. Adjust logging from drm_info to
drm_dbg_dp to prevent flooding with unnecessary log messages.

Fixes: 3637e457eb ("drm/amd/display: Fix wrong handling for AUX_DEFER case")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250513032026.838036-1-Wayne.Lin@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9a9c3e1fe5)
Cc: stable@vger.kernel.org
2025-05-13 14:36:27 -04:00
Melissa Wen
a3b7e65b6b drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp
Similar to commit 6a057072dd ("drm/amd/display: Fix null check for
pipe_ctx->plane_state in dcn20_program_pipe") that addresses a null
pointer dereference on dcn20_update_dchubp_dpp. This is the same
function hooked for update_dchubp_dpp in dcn401, with the same issue.
Fix possible null pointer deference on dcn401_program_pipe too.

Fixes: 63ab80d9ac ("drm/amd/display: DML2.1 Post-Si Cleanup")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d8d47f7397)
2025-05-13 14:35:10 -04:00
Aurabindo Pillai
2ddac70fed drm/amd/display: check stream id dml21 wrapper to get plane_id
[Why & How]
Fix a false positive warning which occurs due to lack of correct checks
when querying plane_id in DML21. This fixes the warning when performing a
mode1 reset (cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover):

[   35.751250] WARNING: CPU: 11 PID: 326 at /tmp/amd.PHpyAl7v/amd/amdgpu/../display/dc/dml2/dml2_dc_resource_mgmt.c:91 dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu]
[   35.751434] Modules linked in: amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit rfcomm qrtr cmac algif_hash algif_skcipher af_alg bnep amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm_amd snd_hda_core snd_hwdep snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi crct10dif_pclmul polyval_clmulni polyval_generic btusb ghash_clmulni_intel sha256_ssse3 btrtl sha1_ssse3 snd_seq btintel aesni_intel btbcm btmtk snd_seq_device crypto_simd sunrpc cryptd bluetooth snd_timer ccp binfmt_misc rapl snd i2c_piix4 wmi_bmof gigabyte_wmi k10temp i2c_smbus soundcore gpio_amdpt mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul igc ahci xhci_pci libahci xhci_pci_renesas video wmi
[   35.751501] CPU: 11 UID: 0 PID: 326 Comm: kworker/u64:9 Tainted: G           OE      6.11.0-21-generic #21~24.04.1-Ubuntu
[   35.751504] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   35.751505] Hardware name: Gigabyte Technology Co., Ltd. X670E AORUS PRO X/X670E AORUS PRO X, BIOS F30 05/22/2024
[   35.751506] Workqueue: amdgpu-reset-dev amdgpu_debugfs_reset_work [amdgpu]
[   35.751638] RIP: 0010:dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu]
[   35.751794] Code: 6d 0c 00 00 8b 84 24 88 00 00 00 41 3b 44 9c 20 0f 84 fc 07 00 00 48 83 c3 01 48 83 fb 06 75 b3 4c 8b 64 24 68 4c 8b 6c 24 40 <0f> 0b b8 06 00 00 00 49 8b 94 24 a0 49 00 00 89 c3 83 f8 07 0f 87
[   35.751796] RSP: 0018:ffffbfa3805d7680 EFLAGS: 00010246
[   35.751798] RAX: 0000000000010000 RBX: 0000000000000006 RCX: 0000000000000000
[   35.751799] RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000
[   35.751800] RBP: ffffbfa3805d78f0 R08: 0000000000000000 R09: 0000000000000000
[   35.751801] R10: 0000000000000000 R11: 0000000000000000 R12: ffffbfa383249000
[   35.751802] R13: ffffa0e68f280000 R14: ffffbfa383249658 R15: 0000000000000000
[   35.751803] FS:  0000000000000000(0000) GS:ffffa0edbe580000(0000) knlGS:0000000000000000
[   35.751804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   35.751805] CR2: 00005d847ef96c58 CR3: 000000041de3e000 CR4: 0000000000f50ef0
[   35.751806] PKRU: 55555554
[   35.751807] Call Trace:
[   35.751810]  <TASK>
[   35.751816]  ? show_regs+0x6c/0x80
[   35.751820]  ? __warn+0x88/0x140
[   35.751822]  ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu]
[   35.751964]  ? report_bug+0x182/0x1b0
[   35.751969]  ? handle_bug+0x6e/0xb0
[   35.751972]  ? exc_invalid_op+0x18/0x80
[   35.751974]  ? asm_exc_invalid_op+0x1b/0x20
[   35.751978]  ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu]
[   35.752117]  ? math_pow+0x48/0xa0 [amdgpu]
[   35.752256]  ? srso_alias_return_thunk+0x5/0xfbef5
[   35.752260]  ? math_pow+0x48/0xa0 [amdgpu]
[   35.752400]  ? srso_alias_return_thunk+0x5/0xfbef5
[   35.752403]  ? math_pow+0x11/0xa0 [amdgpu]
[   35.752524]  ? srso_alias_return_thunk+0x5/0xfbef5
[   35.752526]  ? core_dcn4_mode_programming+0xe4d/0x20d0 [amdgpu]
[   35.752663]  ? srso_alias_return_thunk+0x5/0xfbef5
[   35.752669]  dml21_validate+0x3d4/0x980 [amdgpu]

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f8ad62c0a9)
2025-05-13 14:24:12 -04:00
George Shen
3c1a467372 drm/amd/display: fix link_set_dpms_off multi-display MST corner case
[Why & How]
When MST config is unplugged/replugged too quickly, it can potentially
result in a scenario where previous DC state has not been reset before
the HPD link detection sequence begins. In this case, driver will
disable the streams/link prior to re-enabling the link for link
training.

There is a bug in the current logic that does not account for the fact
that current_state can be released and cleared prior to swapping to a
new state (resulting in the pipe_ctx stream pointers to be cleared) in
between disabling streams.

To resolve this, cache the original streams prior to committing any
stream updates.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1561782686)
2025-05-13 14:22:03 -04:00
John Olender
874697e127 drm/amd/display: Defer BW-optimization-blocked DRR adjustments
[Why & How]
Instead of dropping DRR updates, defer them. This fixes issues where
monitor continues to see incorrect refresh rate after VRR was turned off
by userspace.

Fixes: 32953485c5 ("drm/amd/display: Do not update DRR while BW optimizations pending")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3546
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: John Olender <john.olender@gmail.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 53761b7ecd)
Cc: stable@vger.kernel.org
2025-05-13 14:21:24 -04:00
Gabe Teeger
190818d1b6 Revert: "drm/amd/display: Enable urgent latency adjustment on DCN35"
This reverts commit 756c85e4d0 ("drm/amd/display: Enable urgent latency adjustment on DCN35")

Reason for revert: Negative power impact.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Gabe Teeger <Gabe.Teeger@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9334c491cd)
2025-05-13 14:20:55 -04:00
Wayne Lin
d433981385 drm/amd/display: Correct the reply value when AUX write incomplete
[Why]
Now forcing aux->transfer to return 0 when incomplete AUX write is
inappropriate. It should return bytes have been transferred.

[How]
aux->transfer is asked not to change original msg except reply field of
drm_dp_aux_msg structure. Copy the msg->buffer when it's write request,
and overwrite the first byte when sink reply 1 byte indicating partially
written byte number. Then we can return the correct value without
changing the original msg.

Fixes: 3637e457eb ("drm/amd/display: Fix wrong handling for AUX_DEFER case")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7ac37f0dcd)
Cc: stable@vger.kernel.org
2025-05-13 14:18:09 -04:00
Tim Huang
2d73b0845a drm/amdgpu: fix incorrect MALL size for GFX1151
On GFX1151, the reported MALL cache size reflects only
half of its actual size; this adjustment corrects the discrepancy.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a5c060b59)
Cc: stable@vger.kernel.org
2025-05-13 14:16:43 -04:00
Philip Yang
a0fa7873f2 drm/amdgpu: csa unmap use uninterruptible lock
After process exit to unmap csa and free GPU vm, if signal is accepted
and then waiting to take vm lock is interrupted and return, it causes
memory leaking and below warning backtrace.

Change to use uninterruptible wait lock fix the issue.

WARNING: CPU: 69 PID: 167800 at amd/amdgpu/amdgpu_kms.c:1525
 amdgpu_driver_postclose_kms+0x294/0x2a0 [amdgpu]
 Call Trace:
  <TASK>
  drm_file_free.part.0+0x1da/0x230 [drm]
  drm_close_helper.isra.0+0x65/0x70 [drm]
  drm_release+0x6a/0x120 [drm]
  amdgpu_drm_release+0x51/0x60 [amdgpu]
  __fput+0x9f/0x280
  ____fput+0xe/0x20
  task_work_run+0x67/0xa0
  do_exit+0x217/0x3c0
  do_group_exit+0x3b/0xb0
  get_signal+0x14a/0x8d0
  arch_do_signal_or_restart+0xde/0x100
  exit_to_user_mode_loop+0xc1/0x1a0
  exit_to_user_mode_prepare+0xf4/0x100
  syscall_exit_to_user_mode+0x17/0x40
  do_syscall_64+0x69/0xc0

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7dbbfb3c17)
Cc: stable@vger.kernel.org
2025-05-13 14:16:30 -04:00
Arunpravin Paneer Selvam
553ad6fc2b drm/amdgpu/userq: Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock)
Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock) warning logs.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 13:39:52 -04:00
Arunpravin Paneer Selvam
bc5bab82d3 drm/amdgpu: Fix userq ttm_bo_pin and ttm_bo_unpin lockdep warnings
The ttm_bo_pin and ttm_bo_unpin warnings are resolved by moving the
doorbell bo reserve up before pin/unpin.

WARNING: CPU: 11 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:592 ttm_bo_pin+0x1f6/0x270 [ttm]
[  +0.000277] CPU: 11 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G        W          6.12.0+ #15
[  +0.000006] Tainted: [W]=WARN
[  +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024
[  +0.000004] RIP: 0010:ttm_bo_pin+0x1f6/0x270 [ttm]
[  +0.000005] RSP: 0018:ffff88846ca879d0 EFLAGS: 00010246
[  +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000
[  +0.000004] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  +0.000005] RBP: ffff88846ca879e8 R08: 0000000000000000 R09: 0000000000000000
[  +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810b7ca848
[  +0.000004] R13: ffff88846c666250 R14: 1ffff1108d950f44 R15: ffff88846ca87aa0
[  +0.000005] FS:  00007c45ff436d00(0000) GS:ffff888409580000(0000) knlGS:0000000000000000
[  +0.000004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000005] CR2: 00005b0c142a60e0 CR3: 000000012ce5a000 CR4: 0000000000f50ef0
[  +0.000004] PKRU: 55555554
[  +0.000004] Call Trace:
[  +0.000004]  <TASK>
[  +0.000005]  ? show_regs+0x6c/0x80
[  +0.000007]  ? __warn+0xd2/0x2d0
[  +0.000007]  ? ttm_bo_pin+0x1f6/0x270 [ttm]
[  +0.000031]  ? report_bug+0x282/0x2f0
[  +0.000012]  ? handle_bug+0x6e/0xc0
[  +0.000007]  ? exc_invalid_op+0x18/0x50
[  +0.000007]  ? asm_exc_invalid_op+0x1b/0x20
[  +0.000017]  ? ttm_bo_pin+0x1f6/0x270 [ttm]
[  +0.000014]  amdgpu_bo_pin+0x365/0x9d0 [amdgpu]
[  +0.000191]  ? __pfx_amdgpu_bo_pin+0x10/0x10 [amdgpu]
[  +0.000185]  ? drm_gem_object_lookup+0x81/0xc0
[  +0.000008]  ? kasan_save_alloc_info+0x37/0x60
[  +0.000007]  ? __kasan_kmalloc+0xc3/0xd0
[  +0.000013]  amdgpu_userqueue_get_doorbell_index+0xee/0x5f0 [amdgpu]
[  +0.000209]  amdgpu_userq_ioctl+0x6b4/0xd40 [amdgpu]
[  +0.000193]  ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[  +0.000211]  ? lock_acquire+0x7c/0xc0
[  +0.000006]  ? drm_dev_enter+0x51/0x190
[  +0.000015]  drm_ioctl_kernel+0x18b/0x330
[  +0.000007]  ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[  +0.000190]  ? __pfx_drm_ioctl_kernel+0x10/0x10
[  +0.000005]  ? lock_acquire+0x7c/0xc0
[  +0.000009]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? __kasan_check_write+0x14/0x30
[  +0.000005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000011]  drm_ioctl+0x589/0xd00
[  +0.000005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000006]  ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[  +0.000194]  ? __pfx_drm_ioctl+0x10/0x10
[  +0.000006]  ? __pm_runtime_resume+0x80/0x110
[  +0.000021]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? trace_hardirqs_on+0x53/0x60
[  +0.000005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? _raw_spin_unlock_irqrestore+0x51/0x80
[  +0.000013]  amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu]
[  +0.000185]  __x64_sys_ioctl+0x13a/0x1c0
[  +0.000010]  x64_sys_call+0x11ad/0x25f0
[  +0.000007]  do_syscall_64+0x91/0x180
[  +0.000007]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? irqentry_exit+0x77/0xb0
[  +0.000005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? exc_page_fault+0x93/0x150
[  +0.000009]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000005] RIP: 0033:0x7c45ff924ded
[  +0.000005] RSP: 002b:00007ffff7167810 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded
[  +0.000004] RDX: 00007ffff7167870 RSI: 00000000c0486456 RDI: 000000000000000b
[  +0.000004] RBP: 00007ffff7167860 R08: ffff800100000000 R09: 0000000000010000
[  +0.000005] R10: 00007ffff7167950 R11: 0000000000000246 R12: 00005b0c2a51bc48
[  +0.000004] R13: 000000000000000b R14: 0000000000000000 R15: 00007ffff7167950
[  +0.000022]  </TASK>
[  +0.000004] irq event stamp: 80693
[  +0.000004] hardirqs last  enabled at (80699): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0
[  +0.000005] hardirqs last disabled at (80704): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0
[  +0.000005] softirqs last  enabled at (80390): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000005] softirqs last disabled at (80385): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000006] ---[ end trace 0000000000000000 ]---
------------------------------------------------------------------------------------------------------

[  +0.000006] WARNING: CPU: 10 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:611 ttm_bo_unpin+0x21f/0x2c0 [ttm]
[  +0.000280] CPU: 10 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G        W          6.12.0+ #15
[  +0.000006] Tainted: [W]=WARN
[  +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024
[  +0.000004] RIP: 0010:ttm_bo_unpin+0x21f/0x2c0 [ttm]
[  +0.000005] RSP: 0018:ffff88846ca87888 EFLAGS: 00010246
[  +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000
[  +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  +0.000004] RBP: ffff88846ca878a0 R08: 0000000000000000 R09: 0000000000000000
[  +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164e90050
[  +0.000005] R13: ffff88846c666200 R14: 0000000000000001 R15: ffff888168402d28
[  +0.000004] FS:  00007c45ff436d00(0000) GS:ffff888409500000(0000) knlGS:0000000000000000
[  +0.000005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000004] CR2: 00007c45f7373b20 CR3: 000000012ce5a000 CR4: 0000000000f50ef0
[  +0.000005] PKRU: 55555554
[  +0.000004] Call Trace:
[  +0.000004]  <TASK>
[  +0.000005]  ? show_regs+0x6c/0x80
[  +0.000008]  ? __warn+0xd2/0x2d0
[  +0.000007]  ? ttm_bo_unpin+0x21f/0x2c0 [ttm]
[  +0.000012]  ? report_bug+0x282/0x2f0
[  +0.000013]  ? handle_bug+0x6e/0xc0
[  +0.000006]  ? exc_invalid_op+0x18/0x50
[  +0.000008]  ? asm_exc_invalid_op+0x1b/0x20
[  +0.000017]  ? ttm_bo_unpin+0x21f/0x2c0 [ttm]
[  +0.000011]  ? ttm_bo_unpin+0x217/0x2c0 [ttm]
[  +0.000011]  amdgpu_bo_unpin+0x45/0x250 [amdgpu]
[  +0.000216]  amdgpu_userq_ioctl+0x2c3/0xd40 [amdgpu]
[  +0.000226]  ? drm_dev_exit+0x2d/0x60
[  +0.000010]  ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[  +0.000201]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? lock_acquire+0x7c/0xc0
[  +0.000006]  ? drm_dev_enter+0x51/0x190
[  +0.000015]  drm_ioctl_kernel+0x18b/0x330
[  +0.000007]  ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[  +0.000188]  ? __pfx_drm_ioctl_kernel+0x10/0x10
[  +0.000006]  ? lock_acquire+0x7c/0xc0
[  +0.000008]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? __kasan_check_write+0x14/0x30
[  +0.000006]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000010]  drm_ioctl+0x589/0xd00
[  +0.000005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000006]  ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[  +0.000211]  ? __pfx_drm_ioctl+0x10/0x10
[  +0.000006]  ? __pm_runtime_resume+0x80/0x110
[  +0.000020]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000006]  ? trace_hardirqs_on+0x53/0x60
[  +0.000005]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? _raw_spin_unlock_irqrestore+0x51/0x80
[  +0.000013]  amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu]
[  +0.000186]  __x64_sys_ioctl+0x13a/0x1c0
[  +0.000010]  x64_sys_call+0x11ad/0x25f0
[  +0.000007]  do_syscall_64+0x91/0x180
[  +0.000007]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? do_syscall_64+0x9d/0x180
[  +0.000007]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000010]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
[  +0.000005]  ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
[  +0.000013]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000009]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000008]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? syscall_exit_to_user_mode+0x95/0x260
[  +0.000008]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? do_syscall_64+0x9d/0x180
[  +0.000007]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? do_syscall_64+0x9d/0x180
[  +0.000011]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000010]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000009]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000008]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? irqentry_exit_to_user_mode+0x8b/0x260
[  +0.000007]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000006]  ? irqentry_exit+0x77/0xb0
[  +0.000004]  ? srso_alias_return_thunk+0x5/0xfbef5
[  +0.000005]  ? exc_page_fault+0x93/0x150
[  +0.000010]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000005] RIP: 0033:0x7c45ff924ded
[  +0.000005] RSP: 002b:00007ffff7168790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded
[  +0.000005] RDX: 00007ffff71687f0 RSI: 00000000c0486456 RDI: 000000000000000b
[  +0.000004] RBP: 00007ffff71687e0 R08: 00005b0c2a49b010 R09: 0000000000000007
[  +0.000004] R10: 00005b0c2a4d7140 R11: 0000000000000246 R12: 000000000000000b
[  +0.000004] R13: 00007c45ff19e5cc R14: 00005b0c2a51c538 R15: 00005b0c2a51bbd8
[  +0.000022]  </TASK>
[  +0.000005] irq event stamp: 87419
[  +0.000004] hardirqs last  enabled at (87425): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0
[  +0.000005] hardirqs last disabled at (87430): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0
[  +0.000005] softirqs last  enabled at (87058): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000006] softirqs last disabled at (87053): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000005] ---[ end trace 0000000000000000 ]---

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 13:39:46 -04:00
Arunpravin Paneer Selvam
218caca4ba drm/amdgpu/userq: Fix lock contention in userq fence
Fix lockdep warnings.

[  +0.000637] ================================
[  +0.000004] WARNING: inconsistent lock state
[  +0.000004] 6.12.0+ #18 Tainted: G        W  OE
[  +0.000004] --------------------------------
[  +0.000004] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[  +0.000004] Xwayland/1952 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  +0.000005] ffff8884636f4740 (&fence_drv->fence_list_lock){?...}-{2:2}, at: amdgpu_userq_fence_driver_destroy+0xb8/0x540 [amdgpu]
[  +0.000208] {IN-HARDIRQ-W} state was registered at:
[  +0.000004]   lock_acquire.part.0+0x116/0x360
[  +0.000005]   lock_acquire+0x7c/0xc0
[  +0.000005]   _raw_spin_lock+0x2f/0x60
[  +0.000005]   amdgpu_userq_fence_driver_process+0x75/0x400 [amdgpu]
[  +0.000185]   gfx_v12_0_eop_irq+0x29f/0x420 [amdgpu]
[  +0.000210]   amdgpu_irq_dispatch+0x2a4/0x7b0 [amdgpu]
[  +0.000191]   amdgpu_ih_process+0x1e1/0x3d0 [amdgpu]
[  +0.000185]   amdgpu_irq_handler+0x28/0xc0 [amdgpu]
[  +0.000183]   __handle_irq_event_percpu+0x1bb/0x590
[  +0.000005]   handle_irq_event+0xab/0x1d0
[  +0.000005]   handle_edge_irq+0x1fd/0xc10
[  +0.000005]   __common_interrupt+0x83/0x190
[  +0.000004]   common_interrupt+0xb1/0xe0
[  +0.000005]   asm_common_interrupt+0x27/0x40
[  +0.000004]   cpuidle_enter_state+0x2ba/0x530
[  +0.000005]   cpuidle_enter+0x4f/0xb0
[  +0.000006]   call_cpuidle+0x46/0xd0
[  +0.000005]   do_idle+0x367/0x430
[  +0.000004]   cpu_startup_entry+0x58/0x70
[  +0.000005]   start_secondary+0x224/0x2b0
[  +0.000005]   common_startup_64+0x13e/0x141
[  +0.000005] irq event stamp: 88271
[  +0.000004] hardirqs last  enabled at (88271): [<ffffffffad9ca7a1>] _raw_spin_unlock_irqrestore+0x51/0x80
[  +0.000005] hardirqs last disabled at (88270): [<ffffffffad9ca424>] _raw_spin_lock_irqsave+0x74/0x80
[  +0.000005] softirqs last  enabled at (87858): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000005] softirqs last disabled at (87849): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000005]
              other info that might help us debug this:
[  +0.000004]  Possible unsafe locking scenario:

[  +0.000003]        CPU0
[  +0.000004]        ----
[  +0.000003]   lock(&fence_drv->fence_list_lock);

v2:
  Drop fence_list_flags and use xa_lock_irqsave() flags parameter (Christian)
  Fix merge conflicts.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 13:39:20 -04:00
Wayne Lin
9a9c3e1fe5 drm/amd/display: Avoid flooding unnecessary info messages
It's expected that we'll encounter temporary exceptions
during aux transactions. Adjust logging from drm_info to
drm_dbg_dp to prevent flooding with unnecessary log messages.

Fixes: 3637e457eb ("drm/amd/display: Fix wrong handling for AUX_DEFER case")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250513032026.838036-1-Wayne.Lin@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 13:38:20 -04:00
Jesse.Zhang
3b63602614 drm/amdgpu: Add GFX 9.5.0 support for per-queue/pipe reset
This patch enables per-queue and per-pipe reset functionality for
GFX IP v9.5.0 when using MEC firmware version 21 (0x15) or later.

This change:
1. Refactors the pipe reset support check in gfx_v9_4_3_pipe_reset_support()
   to use the compute_supported_reset flags instead of hardcoding
   version checks.
2. Adds support for GFX9.5.0 (IP 9.5.0) with MEC firmware version >= 21
   to enable per-queue and per-pipe reset capabilities.

v2: Replaced mec version check with !!(adev->gfx.compute_supported_reset & AMDGPU_RESET_TYPE_PER_PIPE) (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:37:32 -04:00
Tao Zhou
5d6fddac55 drm/amdgpu: set vram type for GC 9.5.0
Set vram type so we can take different actions according to the type.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:37:28 -04:00
Tao Zhou
dc111f8fb1 drm/amdgpu: set flip bits for RAS bad pages
Make the code more general, user doesn't need to pay attention to the
detail of flip bits setting.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:37:19 -04:00
Sebastian Aguilera Novoa
7e340d3cea drm/amd/display/dc/irq: Remove duplications of hpd_ack function from IRQ
The major of dcn and dce irqs share a copy-pasted collection
of copy-pasted function, which is: hpd_ack.

This patch removes the multiple copy-pasted by moving them to
the irq_service.c and make the irq_service's
calls the functions implemented by the irq_service.c
instead.

The hpd_ack function is replaced by hpd0_ack and hpd1_ack, the
required constants are also added.

The changes were not tested on actual hardware. I am only able
to verify that the changes keep the code compileable and do my
best to look repeatedly if I am not actually changing any code.

Signed-off-by: Sebastian Aguilera Novoa <saguileran@ime.usp.br>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:37:09 -04:00
Melissa Wen
d8d47f7397 drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp
Similar to commit 6a057072dd ("drm/amd/display: Fix null check for
pipe_ctx->plane_state in dcn20_program_pipe") that addresses a null
pointer dereference on dcn20_update_dchubp_dpp. This is the same
function hooked for update_dchubp_dpp in dcn401, with the same issue.
Fix possible null pointer deference on dcn401_program_pipe too.

Fixes: 63ab80d9ac ("drm/amd/display: DML2.1 Post-Si Cleanup")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:35:17 -04:00