There different types of BOs are supported:
- shmem
A user application uses shmem BOs as input/output for its workload running
on NPU.
- device memory heap
The fixed size buffer dedicated to the device.
- device buffer
The buffer object allocated from device memory heap.
- command buffer
The buffer object created for delivering commands. The command buffer
object is small and pinned on creation.
New IOCTLs are added: CREATE_BO, GET_BO_INFO, SYNC_BO. SYNC_BO is used
to explicitly flush CPU cache for BO memory.
Co-developed-by: Min Ma <min.ma@amd.com>
Signed-off-by: Min Ma <min.ma@amd.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118172942.2014541-7-lizhi.hou@amd.com
The hardware can be shared among multiple user applications. The
hardware resources are allocated/freed based on the request from
user application via driver IOCTLs.
DRM_IOCTL_AMDXDNA_CREATE_HWCTX
Allocate tile columns and create a hardware context structure to track the
usage and status of the resources. A hardware context ID is returned for
XDNA command execution.
DRM_IOCTL_AMDXDNA_DESTROY_HWCTX
Release hardware context based on its ID. The tile columns belong to
this hardware context will be reclaimed.
DRM_IOCTL_AMDXDNA_CONFIG_HWCTX
Config hardware context. Bind the hardware context to the required
resources.
Co-developed-by: Min Ma <min.ma@amd.com>
Signed-off-by: Min Ma <min.ma@amd.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118172942.2014541-6-lizhi.hou@amd.com
The hardware mailboxes are used by the driver to submit requests to
firmware and receive the completion notices from hardware.
Initially, a management mailbox channel is up and running. The driver may
request firmware to create/destroy more channels dynamically through
management channel.
Add driver internal mailbox interfaces.
- create/destroy a mailbox channel instance
- send a message to the firmware through a specific channel
- wait for a notification from the specific channel
Co-developed-by: George Yang <George.Yang@amd.com>
Signed-off-by: George Yang <George.Yang@amd.com>
Co-developed-by: Min Ma <min.ma@amd.com>
Signed-off-by: Min Ma <min.ma@amd.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118172942.2014541-4-lizhi.hou@amd.com
Re-introduce a line-by-line composition algorithm for each pixel format.
This allows more performance by not requiring an indirection per pixel
read. This patch is focused on readability of the code.
Line-by-line composition was introduced by [1] but rewritten back to
pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
on performance, and it was merged.
This patch is almost a revert of [2], but in addition efforts have been
made to increase readability and maintainability of the rotation handling.
The blend function is now divided in two parts:
- Transformation of coordinates from the output referential to the source
referential
- Line conversion and blending
Most of the complexity of the rotation management is avoided by using
drm_rect_* helpers. The remaining complexity is around the clipping, to
avoid reading/writing outside source/destination buffers.
The pixel conversion is now done line-by-line, so the read_pixel_t was
replaced with read_pixel_line_t callback. This way the indirection is only
required once per line and per plane, instead of once per pixel and per
plane.
The read_line_t callbacks are very similar for most pixel format, but it
is required to avoid performance impact. Some helpers for color
conversion were introduced to avoid code repetition:
- *_to_argb_u16: perform colors conversion. They should be inlined by the
compiler, and they are used to avoid repetition between multiple variants
of the same format (argb/xrgb and maybe in the future for formats like
bgr formats).
This new algorithm was tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for rotations of planes)
- kms_cursor_crc (for translations of planes)
- kms_rotation (for all rotations and formats combinations) [3]
The performance gain was mesured with kms_fb_stress [4] with some
modification to fix the writeback format.
The performance improvement is around 5 to 10%.
[1]: commit 8ba1648567 ("drm: vkms: Refactor the plane composer to accept
new formats")
https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/
[2]: commit 322d716a3e ("drm/vkms: isolate pixel conversion
functionality")
https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/
[3]: https://lore.kernel.org/igt-dev/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com/
[4]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118-yuv-v14-8-2dbc2f1e222c@bootlin.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
The pixel_read_direction enum is useful to describe the reading direction
in a plane. It avoids using the rotation property of DRM, which not
practical to know the direction of reading.
This patch also introduce two helpers, one to compute the
pixel_read_direction from the DRM rotation property, and one to compute
the step, in byte, between two successive pixel in a specific direction.
Acked-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118-yuv-v14-7-2dbc2f1e222c@bootlin.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Introduce the usage of block_h/block_w to compute the offset and the
pointer of a pixel. The previous implementation was specialized for
planes with block_h == block_w == 1. To avoid confusion and allow easier
implementation of tiled formats. It also remove the usage of the
deprecated format field `cpp`.
Introduce the plane_index parameter to get an offset/pointer on a
different plane.
Acked-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118-yuv-v14-5-2dbc2f1e222c@bootlin.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
compiler to check if the passed functions take the correct arguments.
Such typedefs will help ensuring consistency across the code base in
case of update of these prototypes.
Rename input/output variable in a consistent way between read_line and
write_line.
A warn has been added in get_pixel_*_function to alert when an unsupported
pixel format is requested. As those formats are checked before
atomic_update callbacks, it should never happen.
Document for those typedefs.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118-yuv-v14-3-2dbc2f1e222c@bootlin.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
If the active performance monitor (`v3d->active_perfmon`) is being
destroyed, stop it first. Currently, the active perfmon is not
stopped during destruction, leaving the `v3d->active_perfmon` pointer
stale. This can lead to undefined behavior and instability.
This patch ensures that the active perfmon is stopped before being
destroyed, aligning with the behavior introduced in commit
7d1fd3638e ("drm/v3d: Stop the active perfmon before being destroyed").
Cc: stable@vger.kernel.org # v5.15+
Fixes: 26a4dc29b7 ("drm/v3d: Expose performance counters to userspace")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118221948.1758130-1-christian.gmeiner@gmail.com
VKMS currently supports only one CRTC, so it make no sense to have this
index configurable. To avoid issues, replace this hardcoded index by
drm_crtc_mask when applicable.
There is no need to manually set a crtc mask on primary and cursor plane
as it is automatically set by drmm_crtc_alloc_with_planes.
In addition, this will remove the use of an uninitialized structure in
vkms_add_overlay_plane. This currently works by chance because two things:
- vkms_plane_init always set a possible_crtcs value, so the problematic
branch is never used;
- drm_crtc_mask on an kzalloc'd drm_crtc returns BIT(0), and the VKMS CRTC
always have this id.
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241119-vkms-remove-index-v3-1-976321a3f801@bootlin.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Stop checking the FW halt_status as MCU_STATUS should be sufficient.
This should make the check for successful FW halt and subsequently
setting fast_reset to true more robust.
We should also clear GLB_REQ.GLB_HALT bit only on post-reset prior
to starting the FW and only if we're doing a fast reset, because
the slow reset will re-initialize all FW sections, including the
global interface.
Signed-off-by: Karunika Choo <karunika.choo@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://lore.kernel.org/r/20241119135030.3352939-1-karunika.choo@arm.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
This commit fixes the potential misalignment between the value of device
tree property "dma-coherent" and default value of COHERENCY_ENABLE
register.
Panthor driver didn't explicitly program the COHERENCY_ENABLE register
with the desired coherency mode. The default value of COHERENCY_ENABLE
register is implementation defined, so it may not be always aligned with
the "dma-coherent" property value.
The commit also checks the COHERENCY_FEATURES register to confirm that
the coherency protocol is actually supported or not.
v2:
- Added R-b tags
Signed-off-by: Akash Goel <akash.goel@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20241030225407.4077513-3-akash.goel@arm.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Mali GPU Arch spec forbids the GPU PTEs to indicate Inner or Outer
shareability when no_coherency protocol is selected. Doing so results in
unexpected or undesired snooping of the CPU caches on some platforms,
such as Juno FPGA, causing functional issues. For example the boot of
MCU firmware fails as GPU ends up reading stale data for the FW memory
pages from the CPU's cache. The FW memory pages are initialized with
uncached mapping when the device is not reported to be dma-coherent.
The shareability bits are set to inner-shareable when IOMMU_CACHE flag
is passed to map_pages() callback and IOMMU_CACHE flag is passed by
Panthor driver when memory needs to be mapped as cached on the GPU side.
IOMMU_CACHE seems to imply cache coherent and is probably not fit for
purpose for the memory that is mapped as cached on GPU side but doesn't
need to remain coherent with the CPU.
This commit updates the programming of MEMATTR register to use
MIDGARD_INNER instead of CPU_INNER when coherency is disabled. That way
the inner-shareability specified in the GPU PTEs would map to Mali's
internal-shareable mode, which is always supported by the GPU regardless
of the coherency protocal and is required by the Userspace driver to
ensure coherency between the shader cores.
v2:
- Added R-b tags
Signed-off-by: Akash Goel <akash.goel@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20241030225407.4077513-2-akash.goel@arm.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Having a fence linked to a virtio_gpu_framebuffer in the plane update
sequence would cause conflict when several planes referencing the same
framebuffer (e.g. Xorg screen covering multi-displays configured for an
extended mode) and those planes are updated concurrently. So it is needed
to allocate a fence for every plane state instead of the framebuffer.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
[dmitry.osipenko@collabora.com: rebase, fix up, edit commit message]
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241020230803.247419-2-dmitry.osipenko@collabora.com
Use drm_gem_plane_helper_prepare_fb() helper for explicit framebuffer
synchronization. We need to wait for explicit fences in a case of
Venus and native contexts when guest user space uses explicit fencing.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
[dmitry.osipenko@collabora.com: edit commit message]
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241020230803.247419-1-dmitry.osipenko@collabora.com
Currently, virtio uses its own dumb_map_offset implementation,
virtio_gpu_mode_dumb_mmap. It works similarly to generic implementation,
drm_gem_dumb_map_offset, and using the generic implementation is
preferable (and making drivers to do so is a task stated on the DRM
subsystem's TODO list).
Thus, make driver use the generic implementation. This includes
VIRTGPU_MAP ioctl so it cannot be used to circumvent rules imposed by
drm_gem_dumb_map_offset (imported objects cannot be mapped).
Signed-off-by: Peter Shkenev <mustela@erminea.space>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
[dmitry.osipenko@collabora.com: cosmetic code improvements]
Link: https://patchwork.freedesktop.org/patch/msgid/20241107141133.13624-1-mustela@erminea.space
When the new register addresses were introduced for V3D 7.x, we added
new masks for performance counter sources on V3D 7.x. Nevertheless,
we never apply these new masks when setting the sources.
Fix the performance counter source settings on V3D 7.x by introducing
a new macro, `V3D_SET_FIELD_VER`, which allows fields setting to vary
by version. Using this macro, we can provide different values for
source mask based on the V3D version, ensuring that sources are
correctly configure on V3D 7.x.
Fixes: 0ad5bc1ce4 ("drm/v3d: fix up register addresses for V3D 7.x")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241106121736.5707-1-mcanal@igalia.com
Add support for gamma LUT in VOP2 driver. The implementation was inspired
by one found in VOP1 driver. Blue and red channels in gamma LUT register
write were swapped with respect to how gamma LUT values are written in
VOP1. Gamma LUT port selection was added before the write of new gamma LUT
table.
If the current SoC is rk356x, check if no other CRTC has gamma LUT enabled
in atomic_check (only one video port can use gamma LUT at a time) and
disable gamma LUT before the LUT table write.
If the current SoC isn't rk356x, "seamless" gamma lut update is performed
similarly to how it was done in the case of RK3399 in VOP1[1]. In seamless
update gamma LUT disable before the write isn't necessary, check if no
other CRTC has gamma LUT enabled is also not necessary, different register
is being used to select gamma LUT port[2] and after setting DSP_LUT_EN bit,
GAMMA_UPDATE_EN bit is set[3].
Gamma size is set and drm color management is enabled for each video port's
CRTC except ones which have no associated device.
Patch was tested on RK3566 (Pinetab2). When using userspace tools
which set eg. constant color temperature no issues were noticed. When
using userspace tools which adjust eg. color temperature the slight screen
flicker is visible probably because of gamma LUT disable needed in the
case of RK356x before gamma LUT write.
Compare behaviour of eg.:
```
gammastep -O 3000
```
To eg.:
```
gammastep -l 53:23 -t 6000:3000
```
In latter case color temperature is slowly adjusted at the beginning which
causes screen to slighly flicker. Then it adjusts every few seconds which
also causes slight screen flicker.
[1] https://lists.infradead.org/pipermail/linux-rockchip/2021-October/028132.html
[2] https://lore.kernel.org/linux-rockchip/48249708-8c05-40d2-a5d8-23de960c5a77@rock-chips.com/
[3] https://github.com/radxa/kernel/blob/linux-6.1-stan-rkr1/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c#L3437
Helped-by: Daniel Stone <daniel@fooishbar.org>
Helped-by: Dragan Simic <dsimic@manjaro.org>
Helped-by: Diederik de Haas <didi.debian@cknow.org>
Helped-by: Andy Yan <andy.yan@rock-chips.com>
Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me>
Reviewed-by: Andy Yan <andyshrk@163.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241101185545.559090-3-pZ010001011111@proton.me
If jobs are still enqueued in struct drm_gpu_scheduler.pending_list
when drm_sched_fini() gets called, those jobs will be leaked since that
function stops both job-submission and (automatic) job-cleanup. It is,
thus, up to the driver to take care of preventing leaks.
The related function drm_sched_wqueue_stop() also prevents automatic job
cleanup.
Those pitfals are not reflected in the documentation, currently.
Explicitly inform about the leak problem in the docstring of
drm_sched_fini().
Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and
hint at the consequences for automatic cleanup.
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241105143137.71893-2-pstanner@redhat.com