Commit Graph

1412895 Commits

Author SHA1 Message Date
Osama Abdelkader
351fa2ff09 drm/xe: Add missing newlines to drm_warn messages
The drm_warn() calls in the default cases of various switch statements
in xe_vm.c were missing trailing newlines, which can cause log messages
to be concatenated with subsequent output. Add '\n' to all affected
messages.

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Link: https://patch.msgid.link/20251224212116.59021-1-osama.abdelkader@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:16:02 -05:00
Lukasz Laguna
96d45e34f8 drm/xe/pf: Allow upon-any-hang wedged mode only in debug config
The GuC reset policy is global, so disabling it on PF can affect all
running VFs. To avoid unintended side effects, restrict setting
upon-any-hang (2) wedged mode on the PF to debug builds only.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-5-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:59 -05:00
Lukasz Laguna
43d78aca8e drm/xe/vf: Disallow setting wedged mode to upon-any-hang
In upon-any-hang (2) wedged mode, engine resets need to be disabled,
which requires changing the GuC reset policy. VFs are not permitted to
do that.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-4-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:53 -05:00
Lukasz Laguna
0f13dead4e drm/xe: Update wedged.mode only after successful reset policy change
Previously, the driver's internal wedged.mode state was updated without
verifying whether the corresponding engine reset policy update in GuC
succeeded. This could leave the driver reporting a wedged.mode state
that doesn't match the actual reset behavior programmed in GuC.

With this change, the reset policy is updated first, and the driver's
wedged.mode state is modified only if the policy update succeeds on all
available GTs.

This patch also introduces two functional improvements:

 - The policy is sent to GuC only when a change is required. An update
   is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG,
   because only in that case the reset policy changes. For example,
   switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and
   XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no
   need to send the same value to GuC.

 - An inconsistent_reset flag is added to track cases where reset policy
   update succeeds only on a subset of GTs. If such inconsistency is
   detected, future wedged mode configuration will force a retry of the
   reset policy update to restore a consistent state across all GTs.

Fixes: 6b8ef44cc0 ("drm/xe: Introduce the wedged_mode debugfs")
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-3-lukasz.laguna@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:42 -05:00
Lukasz Laguna
17d3c3365b drm/xe: Validate wedged_mode parameter and define enum for modes
Check correctness of the wedged_mode parameter input to ensure only
supported values are accepted. Additionally, replace magic numbers with
a clearly defined enum.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-2-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:07 -05:00
Raag Jadav
644673a69f drm/xe/pm: Handle GT resume failure
We've been historically ignoring GT resume failure. Since the function
can return error, handle it properly.

v2: Bring up display before bailing (Matt Roper, Rodrigo)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20251220073657.166810-1-raag.jadav@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 15:35:36 -05:00
Matt Roper
4e88de313f drm/xe/nvls: Define GuC firmware for NVL-S
Although NVL-S has a similar Xe3 to PTL/WCL, it requires a unique GuC
firmware.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251016-xe3p-v3-12-3dd173a3097a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patch.msgid.link/20260108181956.1254908-9-julia.filipchuk@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 15:15:55 -05:00
Matthew Brost
10dd1eaa80 drm/pagemap: Disable device-to-device migration
Device-to-device migration is causing xe_exec_system_allocator --r
*race*no* to intermittently fail with engine resets and a kernel hang on
a page lock. This should work but is clearly buggy somewhere. Disable
device-to-device migration in the interim until the issue can be
root-caused.

The only downside of disabling device-to-device migration is that memory
will bounce through system memory during migration. However, this path
should be rare, as it only occurs when madvise attributes are changed or
atomics are used.

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: ec265e1f1c ("drm/pagemap: Support source migration over interconnect")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20260107182716.2236607-3-matthew.brost@intel.com
2026-01-07 21:29:40 -08:00
Matthew Brost
3902846af3 drm/pagemap Fix error paths in drm_pagemap_migrate_to_devmem
Avoid unlocking and putting device pages unless they were successfully
locked, and do not calculate migrated_pages on error paths.

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 75af93b3f5 ("drm/pagemap, drm/xe: Support destination migration over interconnect")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20260107182716.2236607-2-matthew.brost@intel.com
2026-01-07 21:29:39 -08:00
Matthew Brost
cc54eabdfb drm/xe: Adjust page count tracepoints in shrinker
Page accounting can change via the shrinker without calling
xe_ttm_tt_unpopulate(), which normally updates page count tracepoints
through update_global_total_pages. Add a call to
update_global_total_pages when the shrinker successfully shrinks a BO.

v2:
 - Don't adjust global accounting when pinning (Stuart)

Cc: stable@vger.kernel.org
Fixes: ce3d39fae3 ("drm/xe/bo: add GPU memory trace points")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20260107205732.2267541-1-matthew.brost@intel.com
2026-01-07 21:29:38 -08:00
Rodrigo Vivi
3f0e3af468 Merge drm/drm-next into drm-xe-next
Bring some drm-scheduler patches to Xe.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-07 16:49:20 -05:00
Matthew Brost
7c0c19c076 drm/xe: Validate preferred system memory placement in xe_svm_range_validate
Ensure preferred system memory placement is checked in
xe_svm_range_validate when dpagemap is NULL. Without this check, a
prefetch to system memory may become a no-op because device memory is
considered a valid placement.

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 238dbc9d9f ("drm/xe: Use the vma attibute drm_pagemap to select where to migrate")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20260106213443.1866797-1-matthew.brost@intel.com
2026-01-07 09:27:58 -08:00
Niranjana Vishwanathapura
051114652b drm/xe/doc: Remove KEEP_ACTIVE feature
The KEEP_ACTIVE feature is being reverted, update documentation.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260106191051.2866538-6-niranjana.vishwanathapura@intel.com
2026-01-06 11:13:56 -08:00
Niranjana Vishwanathapura
caaed1dda7 Revert "drm/xe/multi_queue: Support active group after primary is destroyed"
This reverts commit 3131a43ecb.

There is no must have requirement for this feature from Compute UMD.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260106191051.2866538-5-niranjana.vishwanathapura@intel.com
2026-01-06 11:13:54 -08:00
Raag Jadav
e70711be0d drm/xe/i2c: Force polling mode in survivability
SGUnit interrupts are not initialized in survivability. Force I2C
controller to polling mode while in survivability.

v2: Use helper function instead of manual check (Riana)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20260105080750.16605-1-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-01-05 07:43:22 -08:00
Dave Airlie
59260fe582 Merge tag 'drm-xe-next-2025-12-30' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Core Changes:
- Dynamic pagemaps and multi-device SVM (Thomas)

Driver Changes:
- Introduce SRIOV scheduler Groups (Daniele)
- Configure migration queue as low latency (Francois)
- Don't use absolute path in generated header comment (Calvin Owens)
- Add SoC remapper support for system controller (Umesh)
- Insert compiler barriers in GuC code (Jonathan)
- Rebar updates (Lucas)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aVOiULyYdnFbq-JB@fedora
2026-01-01 17:00:59 +10:00
Dave Airlie
9ec3c8ee16 Merge tag 'drm-xe-next-2025-12-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
[airlied: fix guc submit double definition]
UAPI Changes:
- Multi-Queue support (Niranjana)
- Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE (Brost)
- Add NO_COMPRESSION BO flag and query capability (Sanjay)
- Add gt_id to struct drm_xe_oa_unit (Ashutosh)
- Expose MERT OA unit (Ashutosh)
- Sysfs Survivability refactor (Riana)

Cross-subsystem Changes:
- VFIO: Add device specific vfio_pci driver variant for Intel graphics (Winiarski)

Driver Changes:
- MAINTAINERS update (Lucas -> Matt)
- Add helper to query compression enable status (Xin)
- Xe_VM fixes and updates (Shuicheng, Himal)
- Documentation fixes (Winiarski, Swaraj, Niranjana)
- Kunit fix (Roper)
- Fix potential leaks, uaf, null derref, and oversized
  allocations (Shuicheng, Sanjay, Mika, Tapani)
- Other minor fixes like kbuild duplication and sysfs_emit (Shuicheng, Madhur)
- Handle msix vector0 interrupt (Venkata)
- Scope-based forcewake and runtime PM (Roper, Raag)
- GuC/HuC related fixes and refactors (Lucas, Zhanjun, Brost, Julia, Wajdeczko)
- Fix conversion from clock ticks to milliseconds (Harish)
- SRIOV PF PF: Add support for MERT (Lukasz)
- Enable SR-IOV VF migration and other SRIOV updates (Winiarski,
  Satya, Brost, Wajdeczko, Piotr, Tomasz, Daniele)
- Optimize runtime suspend/resume and other PM improvements (Raag)
- Some W/a additions and updates (Bala, Harish, Roper)
- Use for_each_tlb_inval() to calculate invalidation fences (Roper)
- Fix VFIO link error (Arnd)
- Fix ix drm_gpusvm_init() arguments (Arnd)
- Other OA refactor (Ashutosh)
- Refactor PAT and expose debugfs (Xin)
- Enable Indirect Ring State for xe3p_xpc (Niranjana)
- MEI interrupt fix (Junxiao)
- Add stats for mode switching on hw_engine_group (Francois)
- DMA-Buf related changes (Thomas)
- Multi Queue feature support (Niranjana)
- Enable I2C controller for Crescent Island (Raag)
- Enable NVM for Crescent Island (Sasha)
- Increase TDF timeout (Jagmeet)
- Restore engine registers before restarting schedulers after GT reset (Jan)
- Page Reclamation Support for Xe3p Platforms (Brian, Brost, Oak)
- Fix performance when pagefaults and 3d/display share resources (Brost)
- More OA MERT work (Ashutosh)
- Fix return values (Dan)
- Some log level and messages improvements (Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aUXUhEgzs6hDLQuu@intel.com
2025-12-27 17:22:23 +10:00
Dave Airlie
c5fb82d113 Merge tag 'drm-intel-next-2025-12-19' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Beyond Display related:
 - Switch to use kernel standard fault injection in i915 (Juha-Pekka)

 Display uAPI related:
 - Display uapi vs. hw state fixes (Ville)
 - Expose sharpness only if num_scalers is >= 2 (Nemesa)

 Display related:
 - More display driver refactor and clean-ups, specially towards separation (Jani)
 - Add initial support Xe3p_LPD for NVL (Gustavo, Sai, )
 - BMG FBC W/a (Vinod)
 - RPM fix (Dibin)
 - Add MTL+ platforms to support dpll framework (Mika, Imre)
 - Other PLL related fixes (Imre)
 - Fix DIMM_S DRAM decoding on ICL (Ville)
 - Async flip refactor (Ville, Jouni)
 - Go back to using AUX interrupts (Ville)
 - Reduce severity of failed DII FEC enabling (Grzelak)
 - Enable system cache support for FBC (Vinod)
 - Move PSR/Panel Replay sink data into intel_connector and other PSR changes (Jouni)
 - Detect AuxCCS support via display parent interface (Tvrtko)
 - Clean up link BW/DSC slice config computation(Imre)
 - Toggle powerdown states for C10 on HDMI (Gustavo)
 - Add parent interface for PC8 forcewake tricks (Ville)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aUW3bVDdE63aSFOJ@intel.com
2025-12-27 16:27:04 +10:00
Dave Airlie
7bc0f871f9 Merge tag 'drm-misc-next-2025-12-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.20:

Core Changes:

  - dma-buf: Add tracepoints
  - sched: Introduce new helpers

Driver Changes:

  - amdxdna: Enable hardware context priority, Remove (obsolete and
    never public) NPU2 Support, Race condition fix
  - rockchip: Add RK3368 HDMI Support
  - rz-du: Add RZ/V2H(P) MIPI-DSI Support

  - panels:
    - st7571: Introduce SPI support
    - New panels: Sitronix ST7920, Samsung LTL106HL02, LG LH546WF1-ED01, HannStar HSD156JUW2

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20251219-arcane-quaint-skunk-e383b0@houat
2025-12-26 19:00:41 +10:00
Dave Airlie
6c8e404891 Merge tag 'drm-misc-next-2025-12-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.19:

UAPI Changes:

  - panfrost: Add PANFROST_BO_SYNC ioctl
  - panthor: Add PANTHOR_BO_SYNC ioctl

Core Changes:

  - atomic: Add drm_device pointer to drm_private_obj
  - bridge: Introduce drm_bridge_unplug, drm_bridge_enter, and
    drm_bridge_exit
  - dma-buf: Improve sg_table debugging
  - dma-fence: Add new helpers, and use them when needed
  - dp_mst: Avoid out-of-bounds access with VCPI==0
  - gem: Reduce page table overhead with transparent huge pages
  - panic: Report invalid panic modes
  - sched: Add TODO entries
  - ttm: Various cleanups
  - vblank: Various refactoring and cleanups

  - Kconfig cleanups
  - Removed support for kdb

Driver Changes:

  - amdxdna: Fix race conditions at suspend, Improve handling of zero
    tail pointers, Fix cu_idx being overwritten during command setup
  - ast: Support imported cursor buffers
  -
  - panthor: Enable timestamp propagation, Multiple improvements and
    fixes to improve the overall robustness, notably of the scheduler.

  - panels:
    - panel-edp: Support for CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H

Signed-off-by: Dave Airlie <airlied@redhat.com>

[airlied: fix mm conflict]
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20251212-spectacular-agama-of-abracadabra-aaef32@penduick
2025-12-26 18:15:33 +10:00
Lucas De Marchi
0b075f8293 drm/xe: Improve rebar log messages
Some minor improvements to the log messages in the rebar logic:
use xe-oriented printk, switch unit from M to MiB in a few places for
consistency and use ilog2(SZ_1M) for clarity.

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251219211650.1908961-6-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-24 07:59:35 -08:00
Lucas De Marchi
382876afa7 drm/xe: Move rebar to its own file
Now that xe_pci.c calls the rebar directly, it doesn't make sense to
keep it in xe_vram.c since it's closer to the PCI initialization than to
the VRAM. Move it to its own file.

While at it, add a better comment to document the possible values for
the vram_bar_size module parameter.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251219211650.1908961-5-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-24 07:59:35 -08:00
Jonathan Cavitt
ac1317df03 drm/xe/guc: READ/WRITE_ONCE ct->state
Use READ_ONCE and WRITE_ONCE when operating on ct->state
to prevent the compiler form ignoring important modifications
to its value.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251222201957.63245-6-jonathan.cavitt@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-23 16:43:49 -05:00
Jonathan Cavitt
b5179dbd1c drm/xe/guc: READ/WRITE_ONCE g2h_fence->done
Use READ_ONCE and WRITE_ONCE when operating on g2h_fence->done
to prevent the compiler from ignoring important modifications
to its value.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251222201957.63245-5-jonathan.cavitt@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-23 16:43:49 -05:00
Umesh Nerlige Ramappa
c3a613a039 drm/xe/soc_remapper: Add system controller config for SoC remapper
Define system controller config bits and helpers for SoC remapper.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251223183943.3175941-8-umesh.nerlige.ramappa@intel.com
2025-12-23 11:43:51 -08:00
Umesh Nerlige Ramappa
32eab46a61 drm/xe/soc_remapper: Use SoC remapper helper from VSEC code
Since different drivers can use SoC remapper, modify VSEC code to
access SoC remapper via a helper that would synchronize such accesses.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251223183943.3175941-7-umesh.nerlige.ramappa@intel.com
2025-12-23 11:43:49 -08:00
Umesh Nerlige Ramappa
a9f88c68f8 drm/xe/soc_remapper: Initialize SoC remapper during Xe probe
SoC remapper is used to map different HW functions in the SoC to their
respective drivers. Initialize SoC remapper during driver load.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251223183943.3175941-6-umesh.nerlige.ramappa@intel.com
2025-12-23 11:43:46 -08:00
Calvin Owens
e67870321a drm/xe: Don't use absolute path in generated header comment
Building the XE driver through Yocto throws this QA warning:

    WARNING: mc🏠linux-stable-6.17-r0 do_package_qa: QA Issue: File /usr/src/debug/linux-stable/6.17/drivers/gpu/drm/xe/generated/xe_device_wa_oob.h in package linux-stable-src contains reference to TMPDIR [buildpaths]
    WARNING: mc🏠linux-stable-6.17-r0 do_package_qa: QA Issue: File /usr/src/debug/linux-stable/6.17/drivers/gpu/drm/xe/generated/xe_wa_oob.h in package linux-stable-src contains reference to TMPDIR [buildpaths]

...because the comment at the top of the generated header contains the
absolute path to the rules file at build time:

    * This file was generated from rules: /home/calvinow/git/meta-house/build/tmp-house/work-shared/nuc14rvhu7/kernel-source/drivers/gpu/drm/xe/xe_device_wa_oob.rules

Fix this minor annoyance by putting the basename of the rules file in
the generated comment instead of the absolute path, so the generated
header contents no longer depend on the location of the kernel source.

Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Link: https://patch.msgid.link/20251222165441.516102-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-23 10:05:06 -05:00
Francois Dugast
15e096960a drm/xe/migrate: Configure migration queue as low latency
Commit 5488bec96b ("drm/xe/uapi: Use hint for guc to set GT frequency")
introduced low latency hint for use by user space when creating an exec
queue. This instructs SLPC to ramp the GT frequency aggressively.

SVM relies on an internal exec queue to migrate memory upon page faults.
This change creates this exec queue with the low latency hint to speed up
migration.

This should not impact systems where GT frequency is set over sysfs, or
with long running workloads which give enough time for the frequency to
ramp up. An example of memory access pattern that shows an improvement of
SVM performance is running hundreds of times IGT eu-fault-2m-once-device
in xe_exec_system_allocator. The copy duration provided by GT stats in
svm_2M_device_copy_us shows per GPU page fault:
    ~ 165 μs without low latency hint
    ~ 130 μs with low latency hint

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251223115327.49555-1-francois.dugast@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-23 10:03:48 -05:00
Thomas Hellström
0620837490 drm/xe/svm: Serialize migration to device if racing
Introduce an rw-semaphore to serialize migration to device if
it's likely that migration races with another device migration
of the same CPU address space range.
This is a temporary fix to attempt to mitigate a livelock that
might happen if many devices try to migrate a range at the same
time, and it affects only devices using the xe driver.
A longer term fix is probably improvements in the core mm
migration layer.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-25-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:49 +01:00
Thomas Hellström
ec265e1f1c drm/pagemap: Support source migration over interconnect
Support source interconnect migration by using the copy_to_ram() op
of the source device private pages.

Source interconnect migration is required to flush the L2 cache of
the source device, which among other things is a requirement for
correct global atomic operation. It also enables the source GPU to
potentially decompress any compressed content which is not
understood by peers, and finally for the PCIe case, it's expected
that writes over PCIe will be faster than reads.

The implementation can probably be improved by coalescing subregions
with the same source.

v5:
- Update waiting for the pre_migrate_fence and comments around that,
  previously in another patch. (Himal).
- Actually select device private pages to migrate when
  source_peer_migrates is true.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-24-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:49 +01:00
Thomas Hellström
75af93b3f5 drm/pagemap, drm/xe: Support destination migration over interconnect
Support destination migration over interconnect when migrating from
device-private pages with the same dev_pagemap owner.

Since we now also collect device-private pages to migrate,
also abort migration if the range to migrate is already
fully populated with pages from the desired pagemap.

Finally return -EBUSY from drm_pagemap_populate_mm()
if the migration can't be completed without first migrating all
pages in the range to system. It is expected that the caller
will perform that before retrying the call to
drm_pagemap_populate_mm().

v3:
- Fix a bug where the p2p dma-address was never used.
- Postpone enabling destination interconnect migration,
  since xe devices require source interconnect migration to
  ensure the source L2 cache is flushed at migration time.
- Update the drm_pagemap_migrate_to_devmem() interface to
  pass migration details.
v4:
- Define XE_INTERCONNECT_P2P unconditionally (CI)
- Include a missing header (CI)
v5:
- Use page order increments where possible (Matt Brost).
- Fix a negated value of can_migrate_same_pagemap.
- Move removal of some dead code to a separate patch (Matt Brost).
- Remove an unnecessary zdd get() and put() (Matt Brost).

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-23-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:49 +01:00
Thomas Hellström
0471ed20df drm/xe: Use drm_gpusvm_scan_mm()
Use drm_gpusvm_scan_mm() to avoid unnecessarily calling into
drm_pagemap_populate_mm();

v3:
- New patch.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-22-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
f1d08a5864 drm/gpusvm: Introduce a function to scan the current migration state
With multi-device we are much more likely to have multiple
drm-gpusvm ranges pointing to the same struct mm range.

To avoid calling into drm_pagemap_populate_mm(), which is always
very costly, introduce a much less costly drm_gpusvm function,
drm_gpusvm_scan_mm() to scan the current migration state.
The device fault-handler and prefetcher can use this function to
determine whether migration is really necessary.

There are a couple of performance improvements that can be done
for this function if it turns out to be too costly. Those are
documented in the code.

v3:
- New patch.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-21-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
5b64b23f6f drm/pagemap, drm/xe: Clean up the use of the device-private page owner
Use the dev_pagemap->owner field wherever possible, simplifying
the code slightly.

v3: New patch

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-20-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
1f430b8d68 drm/xe/svm: Document how xe keeps drm_pagemap references
As an aid to understanding the lifetime of the drm_pagemaps used
by the xe driver, document how the xe driver keeps the
drm_pagemap references.

v3:
- Fix formatting (Matt Brost)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-19-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
54dc5842a8 drm/xe/vm: Add a couple of VM debug printouts
Add debug printouts that are valueable for pagemap prefetch,
migration and page collection.

v2:
- Add additional debug prinouts around migration and page collection.
- Require CONFIG_DRM_XE_DEBUG_VM.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Link: https://patch.msgid.link/20251219113320.183860-18-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
2df55d9e66 drm/xe: Support pcie p2p dma as a fast interconnect
Mimic the dma-buf method using dma_[map|unmap]_resource to map
for pcie-p2p dma.

There's an ongoing area of work upstream to sort out how this best
should be done. One method proposed is to add an additional
pci_p2p_dma_pagemap aliasing the device_private pagemap and use
the corresponding pci_p2p_dma_pagemap page as input for
dma_map_page(). However, that would incur double the amount of
memory and latency to set up the drm_pagemap and given the huge
amount of memory present on modern GPUs, that would really not work.
Hence the simple approach used in this patch.

v2:
- Simplify xe_page_to_pcie(). (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-17-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
dff547e137 drm/xe/uapi: Extend the madvise functionality to support foreign pagemap placement for svm
Use device file descriptors and regions to represent pagemaps on
foreign or local devices.

The underlying files are type-checked at madvise time, and
references are kept on the drm_pagemap as long as there is are
madvises pointing to it.

Extend the madvise preferred_location UAPI to support the region
instance to identify the foreign placement.

v2:
- Improve UAPI documentation. (Matt Brost)
- Sanitize preferred_mem_loc.region_instance madvise. (Matt Brost)
- Clarify madvise drm_pagemap vs xe_pagemap refcounting. (Matt Brost)
- Don't allow a foreign drm_pagemap madvise without a fast
  interconnect.
v3:
- Add a comment about reference-counting in xe_devmem_open() and
  remove the reference-count get-and-put. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-16-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
4be5f2bc81 drm/xe: Simplify madvise_preferred_mem_loc()
Simplify madvise_preferred_mem_loc by removing repetitive patterns
in favour of local variables.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-15-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
238dbc9d9f drm/xe: Use the vma attibute drm_pagemap to select where to migrate
Honor the drm_pagemap vma attribute when migrating SVM pages.
Ensure that when the desired placement is validated as device
memory, that we also check that the requested drm_pagemap is
consistent with the current.

v2:
- Initialize a struct drm_pagemap pointer to NULL that could
  otherwise be dereferenced uninitialized. (CI)
- Remove a redundant assignment (Matt Brost)
- Slightly improved commit message (Matt Brost)
- Extended drm_pagemap validation.

v3:
- Fix a compilation error if CONFIG_DRM_GPUSVM is not enabled.
  (kernel test robot <lkp@intel.com>)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-14-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
eb9db59d96 drm/xe: Pass a drm_pagemap pointer around with the memory advise attributes
As a consequence, struct xe_vma_mem_attr() can't simply be assigned
or freed without taking the reference count of individual members
into account. Also add helpers to do that.

v2:
- Move some calls to xe_vma_mem_attr_fini() to xe_vma_free(). (Matt Brost)
v3:
- Rebase.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v2
Link: https://patch.msgid.link/20251219113320.183860-13-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:47 +01:00
Thomas Hellström
14b60874c9 drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner
Register a driver-wide owner list, provide a callback to identify
fast interconnects and use the drm_pagemap_util helper to allocate
or reuse a suitable owner struct. For now we consider pagemaps on
different tiles on the same device as having fast interconnect and
thus the same owner.

v2:
- Fix up the error onion unwind in xe_pagemap_create(). (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-12-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:47 +01:00
Thomas Hellström
e44f47a9bf drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus
The hmm_range_fault() and the migration helpers currently need a common
"owner" to identify pagemaps and clients with fast interconnect.
Add a drm_pagemap utility to setup such owners by registering
drm_pagemaps, in a registry, and for each new drm_pagemap,
query which existing drm_pagemaps have fast interconnects with the new
drm_pagemap.

The "owner" scheme is limited in that it is static at drm_pagemap creation.
Ideally one would want the owner to be adjusted at run-time, but that
requires changes to hmm. If the proposed scheme becomes too limited,
we need to revisit.

v2:
- Improve documentation of DRM_PAGEMAP_OWNER_LIST_DEFINE(). (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-11-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:47 +01:00
Thomas Hellström
33ac8d150a drm/pagemap: Remove the drm_pagemap_create() interface
With the drm_pagemap_init() interface, drm_pagemap_create() is not
used anymore.

v2:
- Slightly more verbose commit message. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-10-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:47 +01:00
Thomas Hellström
8a52f4d9b1 drm/xe: Use the drm_pagemap cache and shrinker
Define a struct xe_pagemap that embeds all pagemap-related
data used by xekmd, and use the drm_pagemap cache- and
shrinker to manage lifetime.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-9-thomas.hellstrom@linux.intel.com
2025-12-23 09:58:33 +01:00
Thomas Hellström
77f14f2f2d drm/pagemap: Add a drm_pagemap cache and shrinker
Pagemaps are costly to set up and tear down, and they consume a lot
of system memory for the struct pages. Ideally they should be
created only when needed.

Add a caching mechanism to allow doing just that: Create the drm_pagemaps
when needed for migration. Keep them around to avoid destruction and
re-creation latencies and destroy inactive/unused drm_pagemaps on memory
pressure using a shrinker.

Only add the helper functions. They will be hooked up to the xe driver
in the upcoming patch.

v2:
- Add lockdep checking for drm_pagemap_put(). (Matt Brost)
- Add a copyright notice. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-8-thomas.hellstrom@linux.intel.com
2025-12-23 09:37:33 +01:00
Thomas Hellström
a26084328a drm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes
If a device holds a reference on a foregin device's drm_pagemap,
and a device unbind is executed on the foreign device,
Typically that foreign device would evict its device-private
pages and then continue its device-managed cleanup eventually
releasing its drm device and possibly allow for module unload.
However, since we're still holding a reference on a drm_pagemap,
when that reference is released and the provider module is
unloaded we'd execute out of undefined memory.

Therefore keep a reference on the provider device and module until
the last drm_pagemap reference is gone.

Note that in theory, the drm_gpusvm_helper module may be unloaded
as soon as the final module_put() of the provider driver module is
executed, so we need to add a module_exit() function that waits
for the work item executing the module_put() has completed.

v2:
- Better commit message (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-7-thomas.hellstrom@linux.intel.com
2025-12-23 09:36:55 +01:00
Thomas Hellström
565477dbca drm/pagemap: Add a refcounted drm_pagemap backpointer to struct drm_pagemap_zdd
To be able to keep track of drm_pagemap usage, add a refcounted
backpointer to struct drm_pagemap_zdd. This will keep the drm_pagemap
reference count from dropping to zero as long as there are drm_pagemap
pages present in a CPU address space.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-6-thomas.hellstrom@linux.intel.com
2025-12-23 09:36:29 +01:00
Thomas Hellström
a599b98607 drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap
With the end goal of being able to free unused pagemaps
and allocate them on demand, add a refcount to struct drm_pagemap,
remove the xe embedded drm_pagemap, allocating and freeing it
explicitly.

v2:
- Make the drm_pagemap pointer in drm_gpusvm_pages reference-counted.
v3:
- Call drm_pagemap_get() before drm_pagemap_put() in drm_gpusvm_pages
  (Himal Prasad Ghimiray)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-5-thomas.hellstrom@linux.intel.com
2025-12-23 09:35:53 +01:00