Commit Graph

1384687 Commits

Author SHA1 Message Date
Dave Airlie
62bea0e1d5 Merge tag 'drm-habanalabs-next-2025-09-25' of https://github.com/HabanaAI/drivers.accel.habanalabs.kernel into drm-next
This tag contains habanalabs driver changes for v6.18.
It continues the previous upstream work from tags/drm-habanalabs-next-2024-06-23,
Including improvements in debug and visibility, alongside general code cleanups,
and new features such as vmalloc-backed coherent mmap, HLDIO infrastructure, etc.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: "Elbaz, Koby" <koby.elbaz@intel.com>
Link: https://lore.kernel.org/r/da02d370-9967-49d2-9eef-7aeaa40c987c@intel.com
2025-09-26 13:26:51 +10:00
Dave Airlie
a2caae58f8 Merge tag 'drm-misc-next-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Short summary of fixes pull:

bridge:
- waveshare-dsi: Fix error handling in probe function

pixpaper:
- select GEM SHMEM helpers

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250925064257.GA9107@linux.fritz.box
2025-09-26 13:06:39 +10:00
Pavan S
6ca282c3e6 accel/habanalabs: add Infineon version check
On HL338 ASICs, the Infineon first‑stage firmware is not present and
the reported version is 0. In this case printing a version number is
misleading, as it suggests valid firmware when it does not exist.

Fix this by printing the first‑stage Infineon firmware version only
if the reported value is non‑zero. This avoids confusing or incorrect
log messages on devices where the first stage is not applicable.

Signed-off-by: Pavan S <pavan.sreenivas@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:14:45 +03:00
Konstantin Sinyuk
a0d866bab1 accel/habanalabs/gaudi2: read preboot status after recovering from dirty state
Dirty state can occur when the host VM undergoes a reset while the
device does not. In such a case, the driver must reset the device before
it can be used again. As part of this reset, the device capabilities
are zeroed. Therefore, the driver must read the Preboot status again to
learn the Preboot state, capabilities, and security configuration.

Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:32 +03:00
Ariel Aviad
65a3f5bc33 accel/habanalabs: add HL_GET_P_STATE passthrough type
Add a new passthrough type HL_GET_P_STATE to the cpucp generic ioctl
to allow userspace to read the device performance state via firmware.

Signed-off-by: Ariel Aviad <ariel.aviad@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:31 +03:00
Konstantin Sinyuk
eeb38d0e91 accel/habanalabs: add debugfs interface for HLDIO testing
Add debugfs files for NVMe Direct I/O (HLDIO) functionality.
This interface allows userspace access to direct SSD ↔ device transfers
through debugfs nodes.

Four debugfs files are created under /sys/kernel/debug/habanalabs/hlN/:

  - dio_ssd2hl : trigger SSD-to-device transfers
  - dio_hl2ssd : trigger device-to-SSD transfers
    (placeholder, not yet implemented)
  - dio_stats  : show transfer statistics
  - dio_reset  : reset statistics counters

Usage examples:

  # Perform SSD → device transfer
  echo "fd=3 va=0x10000 off=0 len=4096" > \
    /sys/kernel/debug/habanalabs/hl0/dio_ssd2hl

  # View statistics
  cat /sys/kernel/debug/habanalabs/hl0/dio_stats

  # Reset counters
  echo 1 > /sys/kernel/debug/habanalabs/hl0/dio_reset

This interface provides access to HLDIO functionality for validation
and diagnostics.

Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Farah Kassabri <farah.kassabri@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:31 +03:00
Konstantin Sinyuk
8cbacc9a27 accel/habanalabs: add NVMe Direct I/O (HLDIO) infrastructure
Introduce NVMe Direct I/O (HLDIO) infrastructure to support
peer‑to‑peer DMA in the habanalabs driver. This adds internal helpers
and data structures to enable direct transfers between NVMe storage
and device memory.

The feature is built only when CONFIG_HL_HLDIO is enabled. A debugfs
interface is also provided for functional validation.

Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Farah Kassabri <farah.kassabri@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:30 +03:00
Moti Haimovski
513024d5a0 accel/habanalabs: support mapping cb with vmalloc-backed coherent memory
When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return
addresses from the vmalloc range. If such an address is mapped without
VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the
VM_PFNMAP restriction.

Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP
in the VMA before mapping. This ensures safe mapping and avoids kernel
crashes. The memory is still driver-allocated and cannot be accessed
directly by userspace.

Signed-off-by: Moti Haimovski  <moti.haimovski@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:30 +03:00
Ilia Levi
0668db41b5 accel/habanalabs: remove old interface variation of 'access_ok()'
The access_ok() API no longer requires the VERIFY_WRITE argument,
and the use of the old interface with VERIFY_WRITE is deprecated.

Clean up the habanalabs memory manager to use the modern access_ok()
interface consistently. This removes old #ifdef guards and aligns the
driver with current upstream kernel APIs.

Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:29 +03:00
Konstantin Sinyuk
0529b191ac accel/habanalabs/gaudi2: use the CPLD_SHUTDOWN event handler
After CPLD shutdown event the device is not usable anymore. The common
CPLD_SHUTDOWN event handler disables any subsequent device access.

Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:29 +03:00
Konstantin Sinyuk
083c53a854 accel/habanalabs: disable device access after CPLD_SHUTDOWN
After a CPLD shutdown event the device becomes unusable. Prevent further
device access once this event is received.

Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:28 +03:00
Tomer Tayar
cade027efa accel/habanalabs: fix typo in trace output (cms -> cmd)
Fix a typo in TP_printk format string of habanalabs tracepoint:
replace "cms" with "cmd".

Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:28 +03:00
Tomer Tayar
d0dd796bec accel/habanalabs: clarify ctx use after hl_ctx_put() in dmabuf release
In hl_release_dmabuf(), ctx is dereferenced after calling hl_ctx_put()
to obtain the compute device file.

This is safe because the dma-buf object holds a file reference taken in
export_dmabuf(), and the file release (which drops another ctx reference)
can only happen after we drop that file reference via fput(). Thus, this
hl_ctx_put() call cannot be the last one at this point.

Add a comment explaining this to avoid confusion.

Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:27 +03:00
Sharley Calzolari
b5cddeb0dc accel/habanalabs/gaudi2: add support for logging register accesses from debugfs
Add infrastructure for logging the last configuration register accesses
that occur via debugfs read/write operations. At interrupt time, these
log entries can be dumped to dmesg, which helps in diagnosing the cause
of RAZWI and ADDR_DEC interrupts.

The logging is implemented as a ring buffer of access entries, with each
entry recording timestamp and access details. To ensure correctness
under concurrent access, operations are now protected using spinlocks.
Entries are copied under lock and then printed after releasing it, which
minimizes time spent in the critical section.

Signed-off-by: Sharley Calzolari <sharley.calzolari@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:26 +03:00
Ariel Suller
214e26a43f accel/habanalabs/gaudi2: stringify engine/queue ids
Print engine/queue names instead of numerical engine/queue IDs to make
logs and debug output more readable.

Signed-off-by: Ariel Suller <ariel.suller@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:25 +03:00
Vitaly Margolin
5295be6c4e accel/habanalabs: add generic message type to get error counters
Add a new CPUCP generic message type to retrieve HBM, SRAM and critical
error counters from the device.

Signed-off-by: Vitaly Margolin <vitaly.margolin@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:25 +03:00
Vered Yavniely
b4fd8e56c9 accel/habanalabs/gaudi2: fix BMON disable configuration
Change the BMON_CR register value back to its original state before
enabling, so that BMON does not continue to collect information
after being disabled.

Signed-off-by: Vered Yavniely <vered.yavniely@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:24 +03:00
Tomer Tayar
9f5067531c accel/habanalabs: return ENOMEM if less than requested pages were pinned
EFAULT is currently returned if less than requested user pages are
pinned. This value means a "bad address" which might be confusing to
the user, as the address of the given user memory is not necessarily
"bad".

Modify the return value to ENOMEM, as "out of memory" is more suitable
in this case.

Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25 09:09:22 +03:00
LiangCheng Wang
0c4932f6dd drm/tiny: pixpaper: Fix missing dependency on DRM_GEM_SHMEM_HELPER
The driver uses drm_gem_shmem_prime_import_no_map() and
drm_gem_shmem_dumb_create(), but the Kconfig currently selects
DRM_GEM_DMA_HELPER instead of DRM_GEM_SHMEM_HELPER. This causes
link failures when DRM_GEM_SHMEM_HELPER is not enabled.

Select DRM_GEM_SHMEM_HELPER to fix the build.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509220320.gfFZjmyg-lkp@intel.com/
Fixes: c9e70639f5 ("drm: tiny: Add support for Mayqueen Pixpaper e-ink panel")
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: LiangCheng Wang <zaq14760@gmail.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250922-bar-v1-1-b2a1f54ace82@gmail.com
2025-09-23 14:07:04 +02:00
Dave Airlie
342f141ba9 Merge tag 'amd-drm-next-6.18-2025-09-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.18-2025-09-19:

amdgpu:
- Fence drv clean up fix
- DPC fixes
- Misc display fixes
- Support the MMIO remap page as a ttm pool
- JPEG parser updates
- UserQ updates
- VCN ctx handling fixes
- Documentation updates
- Misc cleanups
- SMU 13.0.x updates
- SI DPM updates
- GC 11.x cleaner shader updates
- DMCUB updates
- DML fixes
- Improve fallback handling for pixel encoding
- VCN reset improvements
- DCE6 DC updates
- DSC fixes
- Use devm for i2c buses
- GPUVM locking updates
- GPUVM documentation improvements
- Drop non-DC DCE11 code
- S0ix fixes
- Backlight fix
- SR-IOV fixes

amdkfd:
- SVM updates

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250919193354.2989255-1-alexander.deucher@amd.com
2025-09-22 08:45:51 +10:00
Dave Airlie
0faeb8cf99 Merge tag 'drm-xe-next-2025-09-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
 - Drop L3 bank mask reporting from the media GT on Xe3 and later. Only
   do that for the primary GT. No userspace needs or uses it for media
   and some platforms may report bogus values.
 - Add SLPC power_profile sysfs interface with support for base and
   power_saving modes (Vinay Belgaumkar, Rodrigo Vivi)
 - Add configfs attributes to add post/mid context-switch commands
   (Lucas De Marchi)

Cross-subsystem Changes:
 - Fix hmm_pfn_to_map_order() usage in gpusvm and refactor APIs to
   align with pieces previous handled by xe_hmm (Matthew Auld)

Core Changes:
 - Add MEI driver for Late Binding Firmware Update/Upload
   (Alexander Usyskin)

Driver Changes:
 - Fix GuC CT teardown wrt TLB invalidation (Satyanarayana)
 - Fix CCS save/restore on VF (Satyanarayana)
 - Increase default GuC crash buffer size (Zhanjun)
 - Allow to clear GT stats in debugfs to aid debugging (Matthew Brost)
 - Add more SVM GT stats to debugfs (Matthew Brost)
 - Fix error handling in VMA attr query (Himal)
 - Move sa_info in debugfs to be per tile (Michal Wajdeczko)
 - Limit number of retries upon receiving NO_RESPONSE_RETRY from GuC to
   avoid endless loop (Michal Wajdeczko)
 - Fix configfs handling for survivability_mode undoing user choice when
   unbinding the module (Michal Wajdeczko)
 - Refactor configfs attribute visibility to future-proof it and stop
   exposing survivability_mode if not applicable (Michal Wajdeczko)
 - Constify some functions (Harish Chegondi, Michal Wajdeczko)
 - Add/extend more HW workarounds for Xe2 and Xe3
   (Harish Chegondi, Tangudu Tilak Tirumalesh)
 - Replace xe_hmm with gpusvm (Matthew Auld)
 - Improve fake pci and WA kunit handling for testing new platforms
   (Michal Wajdeczko)
 - Reduce unnecessary PTE writes when migrating (Sanjay Yadav)
 - Cleanup GuC interface definitions and log message (John Harrison)
 - Small improvements around VF CCS (Michal Wajdeczko)
 - Enable bus mastering for the I2C controller (Raag Jadav)
 - Prefer devm_mutex of hand rolling it (Christophe JAILLET)
 - Drop sysfs and debugfs attributes not available for VF (Michal Wajdeczko)
 - GuC CT devm actions improvements (Michal Wajdeczko)
 - Recommend new GuC versions for PTL and BMG (Julia Filipchuk)
 - Improveme driver handling for exhaustive eviction using new
   xe_validation wrapper around drm_exec (Thomas Hellström)
 - Add and use printk wrappers for tile and device (Michal Wajdeczko)
 - Better document workaround handling in Xe (Lucas De Marchi)
 - Improvements on ARRAY_SIZE  and ERR_CAST usage (Lucas De Marchi,
   Fushuai Wang)
 - Align CSS firmware headers with the GuC APIs (John Harrison)
 - Test GuC to GuC (G2G) communication to aid debug in pre-production
   firmware (John Harrison)
 - Bail out driver probing if GuC fails to load (John Harrison)
 - Allow error injection in xe_pxp_exec_queue_add()
   (Daniele Ceraolo Spurio)
 - Minor refactors in xe_svm (Shuicheng Lin)
 - Fix madvise ioctl error handling (Shuicheng Lin)
 - Use attribute groups to simplify sysfs registration
   (Michal Wajdeczko)
 - Add Late Binding Firmware implementation in Xe to work together with
   the MEI component (Badal Nilawar, Daniele Ceraolo Spurio, Rodrigo
   Vivi)
 - Fix build with CONFIG_MODULES=n (Lucas De Marchi)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/c2et6dnkst2apsgt46dklej4nprqdukjosb55grpaknf3pvcxy@t7gtn3hqtp6n
2025-09-22 08:21:42 +10:00
Lucas De Marchi
d9b2623319 drm/xe: Fix build with CONFIG_MODULES=n
When building with CONFIG_MODULES=n, the __exit functions are dropped.
However our init functions may call them for error handling, so they are
not good candidates for the exit sections.

Fix this error reported by 0day:

	ld.lld: error: relocation refers to a symbol in a discarded section: xe_configfs_exit
	>>> defined in vmlinux.a(drivers/gpu/drm/xe/xe_configfs.o)
	>>> referenced by xe_module.c
	>>>               drivers/gpu/drm/xe/xe_module.o:(init_funcs) in archive vmlinux.a

This is the only exit function using __exit. Drop it to fix the build.

Cc: Riana Tauro <riana.tauro@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506092221.1FmUQmI8-lkp@intel.com/
Fixes: 16280ded45 ("drm/xe: Add configfs to enable survivability mode")
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250912-fix-nomodule-build-v1-1-d11b70a92516@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-19 05:41:01 -07:00
Dave Airlie
748f41f353 Merge tag 'drm-intel-next-2025-09-12' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Cross-subsystem Changes:
- Overflow: add range_overflows and range_end_overflows (Jani)

Core Changes:
- Get rid of dev->struct_mutex (Luiz)

Non-display related:
 - GVT: Remove redundant ternary operators (Liao)
 - Various i915_utils clean-ups (Jani)

 Display related:
 - Wait PSR idle before on dsb commit (Jouni)
 - Fix size for for_each_set_bit() in abox iteration (Jani)
 - Abstract figuring out encoder name (Jani)
 - Remove FBC modulo 4 restriction for ADL-P+ (Uma)
 - Panic: refactor framebuffer allocation (Jani)
 - Backlight luminance control improvements (Suraj, Aaron)
 - Add intel_display_device_present (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aMxX_lBxm7wd5wmi@intel.com
2025-09-19 13:02:46 +10:00
Liu Ying
3e6339a19c drm/bridge: waveshare-dsi: Fix bailout for devm_drm_bridge_alloc()
devm_drm_bridge_alloc() returns ERR_PTR on failure instead of a
NULL pointer, so use IS_ERR() to check the returned pointer and
turn proper error code on failure by using PTR_ERR().

Fixes: dbdea37add ("drm: bridge: Add waveshare DSI2DPI unit driver")
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250806084121.510207-1-victor.liu@nxp.com
2025-09-19 10:52:45 +08:00
Dave Airlie
124076705c Merge tag 'drm-misc-next-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Short summary of fixes pull:

pixpaper:
- Fix mode_valid function signature

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250918064558.GA10017@linux.fritz.box
2025-09-19 12:50:35 +10:00
Lucas De Marchi
b30d5de3d4 drm/xe/configfs: Add mid context restore bb
Like done for post context restore, allow the user to add commands to
the middle of context restore, at the beginning of engine restore
commands.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-7-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
7a4756b2fd drm/xe/lrc: Allow to add user commands mid context switch
Like done for post-context-restore commands, allow to add commands from
configfs in the middle of context restore. Since currently the indirect
ctx hardcodes the offset to CTX_INDIRECT_CTX_OFFSET_DEFAULT, this is
executed in the very beginning of engine context restore.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-6-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
c9dfd66cb9 drm/xe/lrc: Allow INDIRECT_CTX for more engine classes
Currently it's only allowed for render and compute. Going forward we
want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX
flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor
for its availability.

While at it, add the missing const to rcs_funcs array. Since
CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and
gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there
is no change in behavior, it's only preparation for future use case.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
39ac06f700 drm/xe/configfs: Add post context restore bb
Allow the user to specify commands to execute during a context restore.
Currently it's possible to parse 2 types of actions:

	- cmd: the instructions are added as is to the bb
	- reg: just use the address and value, without worrying about
	  encoding the right LRI instruction. This is possibly the most
	  useful use case, so added a dedicated action for that.

This also prepares for future BBs: mid context restore and rc6 context
restore that can re-use the same parsing functions.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-4-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
6c6988c5e0 drm/xe/lrc: Allow to add user commands on context switch
During validation it's useful to allows additional commands to be
executed on context switch. Fetch the commands from configfs (to be
added) and add them to the WA BB.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-3-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
e2a9854d80 drm/xe/configfs: Allow to select by class only
For a future configfs attribute, it's desirable to select by engine mask
only as the instance doesn't make sense.

Rename the function lookup_engine_mask() to lookup_engine_info() and
make it return the entry. This allows parse_engine() to still return an
item if the caller wants to allow parsing a class-only string like
"rcs", "bcs", "ccs", etc.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-2-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:39 -07:00
Lucas De Marchi
7166cc3a6a drm/xe/configfs: Extract function to parse engine
Move the part that copies the engine to a local buffer so it can be
shared in future for other configfs attributes parsing an engine.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-1-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 14:20:38 -07:00
Matthew Schwartz
a490c8d77d drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume
On clients that utilize AMD_PRIVATE_COLOR properties for HDR support,
brightness sliders can include a hardware controlled portion and a
gamma-based portion. This is the case on the Steam Deck OLED when using
gamescope with Steam as a client.

When a user sets a brightness level while HDR is active, the gamma-based
portion and/or hardware portion are adjusted to achieve the desired
brightness. However, when a modeset takes place while the gamma-based
portion is in-use, restoring the hardware brightness level overrides the
user's overall brightness level and results in a mismatch between what
the slider reports and the display's current brightness.

To avoid overriding gamma-based brightness, only restore HW backlight
level after boot or resume. This ensures that the backlight level is
set correctly after the DC layer resets it while avoiding interference
with subsequent modesets.

Fixes: 7875afafba ("drm/amd/display: Fix brightness level not retained over reboot")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4551
Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 16:59:34 -04:00
Guangshuo Li
cc9a8e238e drm/amdgpu/atom: Check kcalloc() for WS buffer in amdgpu_atom_execute_table_locked()
kcalloc() may fail. When WS is non-zero and allocation fails, ectx.ws
remains NULL while ectx.ws_size is set, leading to a potential NULL
pointer dereference in atom_get_src_int() when accessing WS entries.

Return -ENOMEM on allocation failure to avoid the NULL dereference.

Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 16:59:27 -04:00
Christian König
59e4405e9e drm/amdgpu: revert to old status lock handling v3
It turned out that protecting the status of each bo_va with a
spinlock was just hiding problems instead of solving them.

Revert the whole approach, add a separate stats_lock and lockdep
assertions that the correct reservation lock is held all over the place.

This not only allows for better checks if a state transition is properly
protected by a lock, but also switching back to using list macros to
iterate over the state of lists protected by the dma_resv lock of the
root PD.

v2: re-add missing check
v3: split into two patches

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 16:59:14 -04:00
Badal Nilawar
efa29317a5 drm/xe/xe_late_bind_fw: Extract and print version info
Extract and print version info of the late binding binary.

v2: Some refinements (Daniele)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-10-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
67de7982d5 drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding
Introduce a debug filesystem node to disable late binding fw reload
during the system or runtime resume. This is intended for situations
where the late binding fw needs to be loaded from user mode,
perticularly for validation purpose.
Note that xe kmd doesn't participate in late binding flow from user
space. Binary loaded from the userspace will be lost upon entering to
D3 cold hence user space app need to handle this situation.

v2:
  - s/(uval == 1) ? true : false/!!uval/ (Daniele)
v3:
  - Refine the commit message (Daniele)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-9-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
02f52f6d92 drm/xe/xe_late_bind_fw: Reload late binding fw during system resume
Reload late binding fw during resume from system suspend

v2:
  - Unconditionally reload late binding fw (Rodrigo)
  - Flush worker during system suspend

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-8-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
69ac1bb8fc drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume
Reload late binding fw during runtime resume.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-7-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
691a54ad94 drm/xe/xe_late_bind_fw: Load late binding firmware
Load late binding firmware

v2:
 - s/EAGAIN/EBUSY/
 - Flush worker in suspend and driver unload (Daniele)
v3:
 - Use retry interval of 6s, in steps of 200ms, to allow
   other OS components release MEI CL handle (Sasha)
v4:
 - return -ENODEV if component not added (Daniele)
 - parse and print status returned by csc
v5:
 - Use payload to check firmware valid (Daniele)
 - Obtain the RPM reference before scheduling the worker to
   ensure the device remains awake until the worker completes
   firmware loading (Rodrigo)
v6:
 - In case of error donot re-attempt fw download (Daniele)
v7 (Rodrigo):
 - Rename of mei structs and callback.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-6-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
45832bf9c1 drm/xe/xe_late_bind_fw: Initialize late binding firmware
Search for late binding firmware binaries and populate the meta data of
firmware structures.

v2 (Daniele):
 - drm_err if firmware size is more than max pay load size
 - s/request_firmware/firmware_request_nowarn/ as firmware will
   not be available for all possible cards
v3 (Daniele):
 - init firmware from within xe_late_bind_init, propagate error
 - switch late_bind_fw to array to handle multiple firmware types
v4 (Daniele):
 - Alloc payload dynamically, fix nits
v6 (Daniele)
 - %s/MAX_PAYLOAD_SIZE/XE_LB_MAX_PAYLOAD_SIZE/

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-5-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:01 -07:00
Badal Nilawar
918bd789d6 drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw
Introduce xe_late_bind_fw to enable firmware loading for the devices,
such as the fan controller, during the driver probe. Typically,
firmware for such devices are part of IFWI flash image but can be
replaced at probe after OEM tuning.
This patch binds mei late binding component to enable firmware loading.

v2:
 - Add devm_add_action_or_reset to remove the component (Daniele)
 - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
v3:
 - Fail driver probe if late bind initialization fails,
   add has_late_bind flag (Daniele)
v4:
 - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/
v6:
 - rebased
v7:
 - rebased
 - In xe_late_bind_init, use drm_err when returning an error to
   stop the probe (Lucas)
 - Use imperative mode in commit message (Lucas)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:00 -07:00
Alexander Usyskin
741eeabb7c mei: late_bind: add late binding component driver
Introduce a new MEI client driver to support Late Binding firmware
upload/update for Intel discrete graphics platforms.

Late Binding is a runtime firmware upload/update mechanism that allows
payloads, such as fan control and voltage regulator, to be securely
delivered and applied without requiring SPI flash updates or
system reboots. This driver enables the Xe graphics driver and other
user-space tools to push such firmware blobs to the authentication
firmware via the MEI interface.

The driver handles authentication, versioning, and communication
with the authentication firmware, which in turn coordinates with
the PUnit/PCODE to apply the payload.

This is a foundational component for enabling dynamic, secure,
and re-entrant configuration updates on platforms like Battlemage.

Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250905154953.3974335-3-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:00 -07:00
Alexander Usyskin
8d5b7009aa mei: bus: add mei_cldev_mtu interface
Add a new helper function that allows MEI client drivers
to query the maximum transmission unit (MTU) for a connected
MEI client.

This is useful for clients that need to transmit large payloads,
such as firmware blobs, allowing them to determine the maximum
message size that can be safely sent before starting transmission and
size of the buffer to allocate when receiving data.

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250905154953.3974335-2-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:31:50 -07:00
Sunil Khatri
0aa09d8a6c drm/amdgpu: add missing comment for the new argument
In function 'amdgpu_vm_lock_done_list' update the comment
for the new argument 'vm'.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509180211.UAqME0zj-lkp@intel.com/
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:38 -04:00
Alex Deucher
f8b367e6fa drm/amdgpu: suspend KFD and KGD user queues for S0ix
We need to make sure the user queues are preempted so
GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:30 -04:00
Alex Deucher
846de1384a drm/amdgpu/userq: Optimize S0ix handling
In S0i3, GFX state is retained, so it's preferrable to
preempt queues rather than unmapping them as the overhead
is lower.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:23 -04:00
Joe.Wang
f05c03ffc7 drm/amdgpu: Fix PRT flag for gfx12
AMDGPU_PTE_PRT_GFX12 flag is missed during pageTable rework, add it back.

Fixes: 6716a823d1 ("drm/amdgpu: rework how PTE flags are generated v3")
Signed-off-by: Joe Wang <joe.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:10 -04:00
Xiang Liu
1ed511fb76 drm/amdgpu: Check VF critical region before RAS poison injection
Check VF critical region before RAS poison injection to ensure that the
poison injection will not hit the VF critical region.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:10 -04:00
Alex Deucher
4bfa860993 drm/amdkfd: add proper handling for S0ix
When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:02 -04:00