Add a new VM_BIND flag, DRM_XE_VM_BIND_FLAG_DECOMPRESS, that lets userspace
express intent for the driver to perform on-device in-place decompression
for the GPU mapping created by a MAP bind operation.
This flag is used by subsequent driver changes to trigger scheduling of
GPU work that resolves compressed VRAM pages into an uncompressed PAT
VM mapping.
Behavior and semantics:
- Valid only for DRM_XE_VM_BIND_OP_MAP. IOCTLs using this flag on other ops
are rejected (-EINVAL).
- The bind's pat_index must select the device "no-compression" PAT entry;
otherwise the ioctl is rejected (-EINVAL).
- Only meaningful for VRAM-backed BOs on devices that support Flat CCS and
the required hardware generation (driver will return -EOPNOTSUPP if not).
- On success the driver schedules a migrate/resolve and installs the
returned dma_fence into the BO's kernel reservation
(DMA_RESV_USAGE_KERNEL).
Compute PR: https://github.com/intel/compute-runtime/pull/898
v3: Rebase on latest drm-tip and add compute pr info
v2: Add kernel doc (Matt)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mrozek, Michal <michal.mrozek@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260304123758.3050386-6-nitin.r.gote@intel.com
The AMD PMF driver provides realtime column utilization (npu_busy)
metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
query to expose these metrics to userspace.
Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
and update aie2_get_sensors() to return both the total power and up
to 8 column utilization sensors if the user buffer permits.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
[lizhi: support legacy tool which uses small buffer. checkpatch cleanup]
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260311171842.473453-1-lizhi.hou@amd.com
Similar to i915's commit cebc13de7e
("drm/i915: Whitelist COMMON_SLICE_CHICKEN3 for UMD access"), except
that instead of putting the register on the allowlist for UMD to
program, the KMD is doing the programming at context initialization
based on a queue creation flag.
This is a recommended tuning setting for both gen12 and Xe_HP
platforms.
If a render queue is created with
DRM_XE_EXEC_QUEUE_SET_STATE_CACHE_PERF_FIX, COMMON_SLICE_CHICKEN3 will
be programmed at initialization to enable the render color cache to
key with BTP+BTI (binding table pool + binding table entry) instead of
just BTI (binding table entry). This enables the UMD to avoid emitting
render-target-cache-flush + stall-at-pixel-scoreboard every time a
binding table entry pointing to a render target is changed.
v2: Use xe_lrc_write_ring()
v3: Update xe_query.c to report availability
v4: Rename defines to add DISABLE_
v5: update commit message
v6: rebase
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982
Bspec: 73993, 73994, 72161, 31870, 68331
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patch.msgid.link/20260306075504.1288676-1-lionel.g.landwerlin@intel.com
amd-drm-next-7.1-2026-03-04:
amdgpu:
- FAMS2 updates
- Refactor DC I2C
- Rework ttm handling to allow for multiple engines
- UserQ updates
- Ring reset improvements
- DC DCE 6.x cleanups
- DC support for NUTMEG and TRAVIS DP bridges
- Enable DC by default on CIK APUs
- Add DCN 4.2 support
- IPS fixes
- Overlay fixes for DCN4
- SDMA Limit updates
- Misc fixes
- RAS updates
- Register access callback rework
- GC 12.1 updates
amdkfd:
- Misc cleanups
UAPI:
- UserQ fence IOCTL parameter size fixes. The change is backwards compatible on LE, but not BE.
UserQs are still not considered stable and are disabled by default.
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260304213233.1938311-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Introduces the DRM RAS infrastructure over generic netlink.
The new interface allows drivers to expose RAS nodes and their
associated error counters to userspace in a structured and extensible
way. Each drm_ras node can register its own set of error counters, which
are then discoverable and queryable through netlink operations. This
lays the groundwork for reporting and managing hardware error states
in a unified manner across different DRM drivers.
Currently it only supports error-counter nodes. But it can be
extended later.
The registration is also not tied to any drm node, so it can be
used by accel devices as well.
It uses the new and mandatory YAML description format stored in
Documentation/netlink/specs/. This forces a single generic netlink
family namespace for the entire drm: "drm-ras".
But multiple-endpoints are supported within the single family.
Any modification to this API needs to be applied to
Documentation/netlink/specs/drm_ras.yaml before regenerating the
code:
$ tools/net/ynl/pyynl/ynl_gen_c.py --spec \
Documentation/netlink/specs/drm_ras.yaml --mode uapi --header \
-o include/uapi/drm/drm_ras.h
$ tools/net/ynl/pyynl/ynl_gen_c.py --spec \
Documentation/netlink/specs/drm_ras.yaml --mode kernel \
--header -o drivers/gpu/drm/drm_ras_nl.h
$ tools/net/ynl/pyynl/ynl_gen_c.py --spec \
Documentation/netlink/specs/drm_ras.yaml \
--mode kernel --source -o drivers/gpu/drm/drm_ras_nl.c
Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Co-developed-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/20260304074412.464435-8-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
update the type for num_syncobj_handles from __u32 to _u16 with
required padding.
This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
update the type for num_syncobj_handles from __u64 to _u16 with
required padding.
This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-misc-next for v7.1:
UAPI Changes:
connector:
- Add panel_type property
fourcc:
- Add ARM interleaved 64k modifier
nouveau:
- Query Z-Cull info with DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
Cross-subsystem Changes:
coreboot:
- Clean up coreboot framebuffer support
dma-buf:
- Provide revoke mechanism for shared buffers
- Rename move_notify callback to invalidate_mappings and update users.
- Always enable move_notify
- Support dma_fence_was_initialized() test
- Protect dma_fence_ops by RCU and improve locking
- Fix sparse warnings
Core Changes:
atomic:
- Allocate drm_private_state via callback and convert drivers
atomic-helper:
- Use system_percpu_wq
buddy:
- Make buddy allocator available to all DRM drivers
- Document flags and structures
colorop:
- Add destroy helper and convert drivers
fbdev-emulation:
- Clean up
gem:
- Fix drm_gem_objects_lookup() error cleanup
Driver Changes:
amdgpu:
- Set panel_type to OELD for eDP
atmel-hlcdc:
- Support sana5d65 LCD controller
bridge:
- anx7625: Support USB-C plus DT bindings
- connector: Fix EDID detection
- dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others
- fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor'
- imx8qxp-pixel-link: Improve bridge reference handling
- lt9611: Support Port-B-only input plus DT bindings
- tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up
- Support TH1520 HDMI plus DT bindings
- Clean up
imagination:
- Clean up
komeda:
- Fix integer overflow in AFBC checks
mcde:
- Improve bridge handling
nouveau:
- Provide Z-cull info to user space
- gsp: Support GA100
- Shutdown on PCI device shutdown
- Clean up
panel:
- panel-jdi-lt070me05000: Use mipi-dsi multi functions
- panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes
- Fix Kconfig dependencies
panthor:
- Add tracepoints for power and IRQs
rcar-du:
- dsi: fix VCLK calculation
rockchip:
- vop2: Use drm_ logging functions
- Support DisplayPort on RK3576
sysfb:
- corebootdrm: Support system framebuffer on coreboot firmware; detect orientation
- Clean up pixel-format lookup
sun4i:
- Clean up
tilcdc:
- Use DT bindings scheme
- Use managed DRM interfaces
- Support DRM_BRIDGE_ATTACH_NO_CONNECTOR
- Clean up a lot of obsolete code
v3d:
- Clean up
vc4:
- Use system_percpu_wq
- Clean up
verisilicon:
- Support DC8200 plus DT bindings
virtgpu:
- Support PRIME imports with enabled 3D
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260226143615.GA47200@linux.fritz.box
Some compute applications may try to allocate device memory to probe
how much device memory is actually available, assuming that the
application will be the only one running on the particular GPU.
That strategy fails in fault mode since it allows VM overcommit.
While this could be resolved in user-space it's further complicated
by cgroups potentially restricting the amount of memory available
to the application.
Introduce a vm create flag, DRM_XE_VM_CREATE_NO_VM_OVERCOMMIT, that
allows fault mode to mimic the behaviour of !fault mode WRT this. It
blocks evicting same vm bos during VM_BIND processing. However,
it does *not* block evicting same-vm bos during pagefault
processing, preferring eviction rather than VM banning in
OOM situations.
Cc: John Falkowski <john.falkowski@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260204153320.17989-1-thomas.hellstrom@linux.intel.com
Add kernel-side support for using the zcull hardware in nvidia gpus.
zcull aims to improve memory bandwidth by using an early approximate
depth test, similar to hierarchical Z on an AMD card.
Add a new ioctl that exposes zcull information that has been read
from the hardware. Userspace uses each of these parameters either
in a heuristic for determining zcull region parameters or in the
calculation of a buffer size.
It appears the hardware hasn't changed its structure for these
values since FERMI_C (circa 2011), so the assumption is that it
won't change on us too quickly, and is therefore reasonable to
include in UAPI.
This bypasses the nvif layer and instead accesses nvkm_gr directly,
which mirrors existing usage of nvkm_gr_units(). There is no nvif
object for nvkm_gr yet, and adding one is not trivial.
Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
Link: https://patch.msgid.link/20260219-zcull3-v3-2-dbe6a716f104@darkrefraction.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
This modifier is primarily intended to be used by panvk to implement
sparse partially-resident images with better map and unmap
performance, and no worse access performance, compared to
implementing them in terms of U-interleaved.
With this modifier, the plane is divided into 64k byte 1:1 or 2:1
-sided tiles. The 64k tiles are laid out linearly. Each 64k tile
is divided into blocks of 16x16 texel blocks each, which themselves
are laid out linearly within a 64k tile. Then within each such
16x16 block, texel blocks are laid out according to U order,
similar to 16X16_BLOCK_U_INTERLEAVED.
Unlike 16X16_BLOCK_U_INTERLEAVED, the layout does not depend on
whether a format is compressed or not.
The hardware features corresponding to this modifier are available
starting with v10 (second gen Valhall.)
The corresponding panvk MR can be found at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38986
Previous version:
https://lists.freedesktop.org/archives/dri-devel/2026-January/547072.html
No changes since v2
Changes since v1:
* Rewrite the description of the modifier to be hopefully unambiguous.
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patch.msgid.link/20260128184058.807213-1-caterina.shablia@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
UAPI Changes:
- Remove unused KEEP_ACTIVE flag in the new multi queue uAPI (Niranjana)
- Expose new temperature attributes in HWMON (Karthik)
Driver Changes:
- Force i2c into polling mode when in survivability (Raag)
- Validate preferred system memory placement in xe_svm_range_validate (Brost)
- Adjust page count tracepoints in shrinker (Brost)
- Fix a couple drm_pagemap issues with multi-GPU (Brost)
- Define GuC firmware for NVL-S (Roper)
- Handle GT resume failure (Raag)
- Improve wedged mode handling (Lukasz)
- Add missing newlines to drm_warn messages (Osama)
- Fix WQ_MEM_RECLAIM passed as max_active to alloc_workqueue (Marco)
- Page-reclaim fixes and PRL stats addition (Brian)
- Fix struct guc_lfd_file_header kernel-doc (Jani)
- Allow compressible surfaces to be 1-way coherent (Xin)
- Fix DRM scheduler layering violations in Xe (Brost)
- Minor improvements to MERT code (Michal)
- Privatize struct xe_ggtt_node (Maarten)
- Convert wait for lmem init into an assert (Bala)
- Enable GSC loading and PXP for PTL (Daniele)
- Replace use of system_wq with tlb_inval->timeout_wq (Marco)
- VRAM addr range bit expansion (Fei)
- Cleanup unused header includes (Roper)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aWkSxRQK7VhTlP32@intel.com
"AMDGPU_GEM_DOMAIN_MMIO_REMAP" - Never activated as UAPI and it turned
out that this was to inflexible.
Allocate the MMIO_REMAP buffer object as a regular GEM BO and explicitly
move it into the fixed AMDGPU_PL_MMIO_REMAP placement at the TTM level.
This avoids relying on GEM domain bits for MMIO_REMAP, keeps the
placement purely internal, and makes the lifetime and pinning of the
global MMIO_REMAP BO explicit. The BO is pinned in TTM so it cannot be
migrated or evicted.
The corresponding free path relies on normal DRM teardown ordering,
where no further user ioctls can access the global BO once TTM teardown
begins.
v2 (Srini):
- Updated patch title.
- Drop use of AMDGPU_GEM_DOMAIN_MMIO_REMAP in amdgpu_ttm.c. The
MMIO_REMAP domain bit is removed from UAPI, so keep the MMIO_REMAP BO
allocation domain-less (bp.domain = 0) and rely on the TTM placement
(AMDGPU_PL_MMIO_REMAP) for backing/pinning.
- Keep fdinfo/mem-stats visibility for MMIO_REMAP by classifying BOs
based on bo->tbo.resource->mem_type == AMDGPU_PL_MMIO_REMAP, since the
domain bit is removed.
v3: Squash patches #1 & #3
Fixes: 0561324837 ("drm/amdgpu/uapi: Introduce AMDGPU_GEM_DOMAIN_MMIO_REMAP")
Fixes: 2a7a794eb8 ("drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Cc: Ruijing Dong <ruijing.dong@amd.com>
Cc: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix all kernel-doc warnings in rocket_accel.h:
Warning: include/uapi/drm/rocket_accel.h:35 Incorrect use of kernel-doc
format: * Output: DMA address for the BO in the NPU address space.
This address
and 22 warnings like these:
Warning: include/uapi/drm/rocket_accel.h:43 struct member 'size'
not described in 'drm_rocket_create_bo'
Warning: include/uapi/drm/rocket_accel.h:60 struct member 'handle'
not described in 'drm_rocket_prep_bo'
Warning: include/uapi/drm/rocket_accel.h:73 struct member 'handle'
not described in 'drm_rocket_fini_bo'
Warning: include/uapi/drm/rocket_accel.h:86 struct member 'regcmd'
not described in 'drm_rocket_task'
Warning: include/uapi/drm/rocket_accel.h:116 struct member 'tasks'
not described in 'drm_rocket_job'
Warning: include/uapi/drm/rocket_accel.h:135 struct member 'jobs'
not described in 'drm_rocket_submit'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://patch.msgid.link/20251023062440.4093661-1-rdunlap@infradead.org
Use device file descriptors and regions to represent pagemaps on
foreign or local devices.
The underlying files are type-checked at madvise time, and
references are kept on the drm_pagemap as long as there is are
madvises pointing to it.
Extend the madvise preferred_location UAPI to support the region
instance to identify the foreign placement.
v2:
- Improve UAPI documentation. (Matt Brost)
- Sanitize preferred_mem_loc.region_instance madvise. (Matt Brost)
- Clarify madvise drm_pagemap vs xe_pagemap refcounting. (Matt Brost)
- Don't allow a foreign drm_pagemap madvise without a fast
interconnect.
v3:
- Add a comment about reference-counting in xe_devmem_open() and
remove the reference-count get-and-put. (Matt Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-16-thomas.hellstrom@linux.intel.com
The exec and vm_bind ioctl allow userspace to specify an arbitrary
num_syncs value. Without bounds checking, a very large num_syncs
can force an excessively large allocation, leading to kernel warnings
from the page allocator as below.
Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request
exceeding this limit.
"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124
...
Call Trace:
<TASK>
alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416
___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317
__kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348
__do_kmalloc_node mm/slub.c:4364 [inline]
__kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388
kmalloc_noprof include/linux/slab.h:909 [inline]
kmalloc_array_noprof include/linux/slab.h:948 [inline]
xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158
drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797
drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894
xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:598 [inline]
__se_sys_ioctl fs/ioctl.c:584 [inline]
__x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
"
v2: Add "Reported-by" and Cc stable kernels.
v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh)
v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt)
v5: Do the check at the top of the exec func. (Matt)
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450
Cc: <stable@vger.kernel.org> # v6.12+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Ivan Briano <ivan.briano@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com
(cherry picked from commit b07bac9bd7)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
The exec and vm_bind ioctl allow userspace to specify an arbitrary
num_syncs value. Without bounds checking, a very large num_syncs
can force an excessively large allocation, leading to kernel warnings
from the page allocator as below.
Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request
exceeding this limit.
"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124
...
Call Trace:
<TASK>
alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416
___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317
__kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348
__do_kmalloc_node mm/slub.c:4364 [inline]
__kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388
kmalloc_noprof include/linux/slab.h:909 [inline]
kmalloc_array_noprof include/linux/slab.h:948 [inline]
xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158
drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797
drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894
xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:598 [inline]
__se_sys_ioctl fs/ioctl.c:584 [inline]
__x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
"
v2: Add "Reported-by" and Cc stable kernels.
v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh)
v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt)
v5: Do the check at the top of the exec func. (Matt)
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450
Cc: <stable@vger.kernel.org> # v6.12+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Ivan Briano <ivan.briano@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com
Add support for queues of a multi queue group to set
their priority within the queue group by adding property
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY.
This is the only other property supported by secondary
queues of a multi queue group, other than
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE.
v2: Add kernel doc for enum xe_multi_queue_priority,
Add assert for priority values, fix includes and
declarations (Matt Brost)
v3: update uapi kernel-doc (Matt Brost)
v4: uapi change due to rebase
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-23-niranjana.vishwanathapura@intel.com
Multi Queue is a new mode of execution supported by the compute and
blitter copy command streamers (CCS and BCS, respectively). It is an
enhancement of the existing hardware architecture and leverages the
same submission model. It enables support for efficient, parallel
execution of multiple queues within a single context. All the queues
of a group must use the same address space (VM).
The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
property supports creating a multi queue group and adding queues to
a queue group. All queues of a multi queue group share the same
context.
A exec queue create ioctl call with above property specified with value
DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
queue being created as the primary queue (aka q0) of the group. To add
secondary queues to the group, they need to be created with the above
property with id of the primary queue as the value. The properties of
the primary queue (like priority, timeslice) applies to the whole group.
So, these properties can't be set for secondary queues of a group.
Once destroyed, the secondary queues of a multi queue group can't be
replaced. However, they can be dynamically added to the group up to a
total of 64 queues per group. Once the primary queue is destroyed,
secondary queues can't be added to the queue group.
v2: Remove group->lock, fix xe_exec_queue_group_add()/delete()
function semantics, add additional comments, remove unused
group->list_lock, add XE_BO_FLAG_GGTT_INVALIDATE for cgp bo,
Assert LRC is valid, update uapi kernel doc.
(Matt Brost)
v3: Use XE_BO_FLAG_PINNED_LATE_RESTORE/USER_VRAM/GGTT_INVALIDATE
flags for cgp bo (Matt)
v4: Ensure queue is not a vm_bind queue
uapi change due to rebase
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-21-niranjana.vishwanathapura@intel.com
Will be used by the UMD to optimize CPU accesses to buffers
that are frequently read by the CPU, or on which the access
pattern makes non-cacheable mappings inefficient.
Mapping buffers CPU-cached implies taking care of the CPU
cache maintenance in the UMD, unless the GPU is IO coherent.
v2:
- Add more to the commit message
v3:
- No changes
v4:
- Fix the map_wc test in panfrost_ioctl_query_bo_info()
v5:
- Drop Steve's R-b (enough has changed to justify a new review)
v6:
- Collect R-b
v7:
- No changes
v8:
- Fix double drm_gem_object_funcs::export assignment
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://patch.msgid.link/20251208100841.730527-13-boris.brezillon@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>