Requirement for PSMI capture is to have a physically contiguous buffer.
All the needed configuration is done by the userspace tool directly to
the GPU via mmio access.
This interface only support allocating from VRAM regions. For integrated
devices, the PSMI buffer is in SYSTEM memory and should be allocated by
userspace using hugetlbfs.
Here we add the ability to allocate a region of physically contiguous
memory by writing to debugfs file (listed below). For multi-tile devices,
the capture tool requires ability to allocate a capture buffer per tile
(VRAM region) and so user can specify a region_mask. The tool then
can mmap the buffers via direct mmap of the PCIBAR via sysfs.
To support the capture tool, 3 new debugfs entries are added:
psmi_capture_addr - physical address per VRAM region's capture buffer
psmi_capture_region_mask - select which region(s) to allocate a buffer
psmi_capture_size - size of current capture buffer
Writing psmi_capture_size will allocate new buffer of requested size per
region after freeing any current buffers.
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Original-author: Brian Welty <brian.welty@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> # v2
Link: https://lore.kernel.org/r/20250821-psmi-v5-2-34ab7550d3d8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There are two registers filled in when reading data from
pcode besides the mailbox itself. Currently, we allow a NULL
value for the second of these two (data1) and assume the first
is defined. However, many of the routines that are calling
this function assume that pcode will ignore the value being
passed in and so leave that first value (data0) defined but
uninitialized. To be safe, make sure this value is always
initialized to something (0 generally) in the event pcode
behavior changes and starts using this value.
v2: Fix sob/author
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250819201054.393220-1-stuart.summers@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We are currently bumping gt_count as we define GTs for each tile. While
that works with our current code, there are reasons to improve that:
* That section of the code is dedicated to define each tile's set of GTs
and their respective info. The gt_count can be seen as a device level
property. While it is fair to bump it as we define each GT, we can
also focus on the GT themselves and count after we are done with the
definitions.
* More *importantly*, gt_count should reflect the number of GTs that we
would get when looping over them with for_each_gt(). Bumping gt_count
the way we are currently doing makes that value not really connected
to for_each_gt(). This could bite us in the future if in the loop gets
extra checks that are not accounted for in each existing "gt_count++".
As such, let's use for_each_gt() and extract the calculation of gt_count
into a separate block, just after we define the set of GTs for each
tile.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250818-gt_count-improvements-v4-2-ee12870c6f57@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
The function mmio_multi_tile_setup() does not look like the proper
location for probing for the number of existing tiles in the hardware.
It should not be that function's responsibility and such information
should instead be already available when it gets called.
The fact that we need to adjust gt_count is a symptom of that.
Move probing of available tile count to a dedicated function named
xe_info_probe_tile_count() and call it from xe_info_init(), which seems
like a more appropriate place for that.
With that move, we no longer need to (and shouldn't) adjust gt_count as
a part of xe_info_probe_tile_count(), as that field will be initialized
later in xe_info_init().
v4:
- Only probe for tile count if the default tile_count != 1 (just like
was done in mmio_multi_tile_setup()). (CI)
v3:
- Unchanged.
v2:
- Use KUnit static stub so that we do not try to query hardware when
running KUnit tests. (CI)
- Tweak last paragraph of commit message to make it clearer.
(Jonathan)
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250818-gt_count-improvements-v4-1-ee12870c6f57@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
VFs without native PCIe Power Management (PM) capabilities inherit their
PF's power state as per PCIe specifications(§5.10.1 PCIe Base Spec 7.0).
Enabling Runtime Power Management (RPM) for these VFs trigger unnecessary
driver suspend/resume operations that ultimately perform no PCI-level power
transition.
Since VFs without PM capabilities cannot independently enter low-power
states, the existing RPM workflow becomes redundant:
1. Driver executes full suspend/resume sequence
2. PCI PM transition step becomes no-op
3. VF power state remains tied to PF's status
Disabling RPM for VFs eliminates this redundant processing while
maintaining proper power management through PF dependency. This
optimization ensures VFs follow their PF's power state without superfluous
runtime handling.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250812163613.9954-1-satyanarayana.k.v.p@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
LMEM is partitioned between multiple VFs and we expect that the more
VFs we have, the less LMEM is assigned to each VF.
This means that we can achieve full LMEM BAR access without the need to
attempt full VF LMEM BAR resize via pci_resize_resource().
Always try to set the largest possible BAR size that allows to fit the
number of enabled VFs and inform the user in case the resize attempt is
not successful.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250527120637.665506-7-michal.winiarski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Bring v6.17-rc1 to propagate commits from other subsystems, particularly
PCI, which has some new functions needed for SR-IOV integration.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
In several code paths, such as xe_pt_create(), the vm->xef field is used
to determine whether a VM originates from userspace or the kernel.
Previously, this handler was only assigned in xe_vm_create_ioctl(),
after the VM was created by xe_vm_create(). However, xe_vm_create()
triggers page table creation, and that function assumes vm->xef should
be already set. This could lead to incorrect origin detection.
To fix this problem and ensure consistency in the initialization of
the VM object, let's move the assignment of this handler to
xe_vm_create.
v2:
- take reference to the xe file object only when xef is not NULL
- release the reference to the xe file object on the error path (Matthew)
Fixes: 7f387e6012 ("drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250811104358.2064150-2-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
For non-leaf paging structures we end up selecting a random index
between [0, 3], depending on the first user if the page-table is shared,
since non-leaf structures only have two bits in the HW for encoding the
PAT index, and here we are just passing along the full user provided
index, which can be an index as large as ~31 on xe2+. The user provided
index is meant for the leaf node, which maps the actual BO pages where
we have more PAT bits, and not the non-leaf nodes which are only mapping
other paging structures, and so only needs a minimal PAT index range.
Also the chosen index might need to consider how the driver mapped the
paging structures on the host side, like wc vs wb, which is separate
from the user provided index.
With that move the PDE PAT index selection under driver control. For now
just use a coherent index on platforms with page-tables that are cached
on host side, and incoherent otherwise. Using a coherent index could
potentially be expensive, and would be overkill if we know the page-table
is always uncached on host side.
v2 (Stuart):
- Add some documentation and split into separate helper.
BSpec: 59510
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250808103455.462424-2-matthew.auld@intel.com
Clamp writes to power limits powerX_crit/currX_crit, powerX_cap,
powerX_max, to the maximum supported by the pcode mailbox
when sysfs-provided values exceed this limit.
Although the pcode already performs clamping, values beyond the pcode
mailbox's supported range get truncated, leading to incorrect
critical power settings.
This patch ensures proper clamping to prevent such truncation.
v2:
- Address below review comments. (Riana)
- Split comments into multiple sentences.
- Use local variables for readability.
- Add a debug log.
- Use u64 instead of unsigned long.
v3:
- Change drm_dbg logs to drm_info. (Badal)
v4:
- Rephrase the drm_info log. (Rodrigo, Riana)
- Rename variable max_mbx_power_limit to max_supp_power_limit, as
limit is same for platforms with and without mailbox power limit
support.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 92d44a422d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fb1b70607f ("drm/xe/hwmon: Expose power attributes")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250808185310.3466529-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Pull gpio updates from Bartosz Golaszewski:
"As discussed: there's a small commit that removes the legacy GPIO line
value setter callbacks as they're no longer used and a big, treewide
commit that renames the new ones to the old names across all GPIO
drivers at once.
While at it: there are also two fixes that I picked up over the course
of the merge window:
- remove unused, legacy GPIO line value setters from struct gpio_chip
- rename the new set callbacks back to the original names treewide
- fix interrupt handling in gpio-mlxbf2
- revert a buggy immutable irqchip conversion"
* tag 'gpio-updates-for-v6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
treewide: rename GPIO set callbacks back to their original names
gpio: remove legacy GPIO line value setter callbacks
gpio: mlxbf2: use platform_get_irq_optional()
Revert "gpio: pxa: Make irq_chip immutable"
Pull drm fixes from Dave Airlie:
"This is the fixes that built up in the merge window, mostly amdgpu and
xe with one i915 display fix, seems like things are pretty good for
rc1.
i915:
- DP LPFS fixes
xe:
- SRIOV: PF fixes and removal of need of module param
- Fix driver unbind around Devcoredump
- Mark xe driver as BROKEN if kernel page size is not 4kB
amdgpu:
- GC 9.5.0 fixes
- SMU fix
- DCE 6 DC fixes
- mmhub client ID fixes
- VRR fix
- Backlight fix
- UserQ fix
- Legacy reset fix
- Misc fixes
amdkfd:
- CRIU fix
- Debugfs fix"
* tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
drm/amdgpu: add missing vram lost check for LEGACY RESET
drm/amdgpu/discovery: fix fw based ip discovery
drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
drm/amdgpu: Update SDMA firmware version check for user queue support
drm/amdgpu: Add NULL check for asic_funcs
drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
drm/amd/display: fix a Null pointer dereference vulnerability
drm/amd/display: Add primary plane to commits for correct VRR handling
drm/amdgpu: update mmhub 3.3 client id mappings
drm/amdgpu: update mmhub 3.0.1 client id mappings
drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
drm/amd/display: Don't overwrite dce60_clk_mgr
drm/amdkfd: Fix checkpoint-restore on multi-xcc
drm/amd: Restore cached manual clock settings during resume
drm/amd: Restore cached power limit during resume
drm/amdgpu: Update external revid for GC v9.5.0
drm/amdgpu: Update supported modes for GC v9.5.0
Mark xe driver as BROKEN if kernel page size is not 4kB
...
With non-page aligned copy, we need to use 4 byte aligned pitch, however
the size itself might still be close to our maximum of ~8M, and so the
dimensions of the copy can easily exceed the S16_MAX limit of the copy
command leading to the following assert:
xe 0000:03:00.0: [drm] Assertion `size / pitch <= ((s16)(((u16)~0U) >> 1))` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
tile: 0 VRAM 10.0 GiB
GT: 0 type 1
WARNING: CPU: 23 PID: 10605 at drivers/gpu/drm/xe/xe_migrate.c:673 emit_copy+0x4b5/0x4e0 [xe]
To fix this account for the pitch when calculating the number of current
bytes to copy.
Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-7-matthew.auld@intel.com
If the buf + offset is not aligned to XE_CAHELINE_BYTES we fallback to
using a bounce buffer. However the bounce buffer here is allocated on
the stack, and the only alignment requirement here is that it's
naturally aligned to u8, and not XE_CACHELINE_BYTES. If the bounce
buffer is also misaligned we then recurse back into the function again,
however the new bounce buffer might also not be aligned, and might never
be until we eventually blow through the stack, as we keep recursing.
Instead of using the stack use kmalloc, which should respect the
power-of-two alignment request here. Fixes a kernel panic when
triggering this path through eudebug.
v2 (Stuart):
- Add build bug check for power-of-two restriction
- s/EINVAL/ENOMEM/
Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-6-matthew.auld@intel.com
The conversion of all GPIO drivers to using the .set_rv() and
.set_multiple_rv() callbacks from struct gpio_chip (which - unlike their
predecessors - return an integer and allow the controller drivers to
indicate failures to users) is now complete and the legacy ones have
been removed. Rename the new callbacks back to their original names in
one sweeping change.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e82829 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62eedd150f)
Cc: stable@vger.kernel.org
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0333052d90)
Cc: stable@vger.kernel.org
The DMA mapping can now correspond to a folio (order > 0), so move
the iterator by the number of pages in the folio in order to migrate
all pages at once.
This requires forcing contiguous memory for SVM BOs, which greatly
simplifies the code and enables 2MB device page support, allowing a
major performance improvement. Negative effects like extra eviction
are unlikely as SVM BOs have a maximal size of 2MB.
v2:
- Improve commit message (Matthew Brost)
- Fix increment, chunk, assert match (Matthew Brost)
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-7-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
If the order is greater than zero, allocate a folio when populating the
RAM PFNs instead of allocating individual pages one after the other. For
example if 2MB folios are used instead of 4KB pages, this reduces the
number of calls to the allocation API by 512.
v2:
- Use page order instead of extra argument (Matthew Brost)
- Allocate with folio_alloc() (Matthew Brost)
- Loop for mpages and free_pages based on order (Matthew Brost)
v3:
- Fix loops in drm_pagemap_migrate_populate_ram_pfn() (Matthew Brost)
v4:
- Use folio_trylock(), set local variable to NULL (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-5-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
If the page is part of a folio, DMA map the whole folio at once instead of
mapping individual pages one after the other. For example if 2MB folios
are used instead of 4KB pages, this reduces the number of DMA mappings by
512.
The folio order (and consequently, the size) is persisted in the struct
drm_pagemap_device_addr to be available at the time of unmapping.
v2:
- Initialize order variable (Matthew Brost)
- Set proto and dir for completeness (Matthew Brost)
- Do not populate drm_pagemap_addr, document it (Matthew Brost)
- Add and use macro NR_PAGES(order) (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-4-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
This struct embeds more information than just the DMA address. This will
help later to support folio orders greater than zero. At this point, there
is no functional change as the only struct member used is addr.
In Xe, adapt to the new drm_gpusvm_devmem_ops type signatures using struct
drm_pagemap_addr, as well as the internal xe SVM functions implementing
those operations. The use of this struct is propagated to xe_migrate as it
makes indexed accesses to the next DMA address but they are no longer
contiguous.
v2:
- Rename drm_pagemap_device_addr to drm_pagemap_addr (Matthew Brost)
- Squash with patch for Xe (Matthew Brost)
- Set proto and dir for completeness (Matthew Brost)
- Assess DMA map protocol (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-3-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Since we are expecting that all configuration directory names
will match some of the existing devices, we can't provide any
configuration for the VFs until they are actually enabled.
But we can relax that restriction by just checking if there
is a PF device that could create given VF. This is easy since
all our PF devices are always present at function 0 and we can
query PF device for number of VFs it could support.
Then for some system with PF device at 0000:00:02.0 we can add
configs for all VFs:
/sys/kernel/config/xe/
├── 0000:00:02.0
│ └── ...
├── 0000:00:02.1
│ └── ...
├── 0000:00:02.2
│ └── ...
:
└── 0000:00:02.7
└── ...
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250731212145.179898-1-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>