Dumb buffers were not being freed because the GEM reference that was
acquired in gb_surface_define was not dropped like it is in the 2D case.
Dropping this ref uncovered a few additional issues with freeing the
resources associated with dirty tracking in vmw_bo_free/release.
Additionally the TTM object associated with the surface were also leaking
which meant that when the ttm_object_file was closed at process exit the
destructor unreferenced an already destroyed surface.
The solution is to remove the destructor from the vmw_user_surface
associated with the dumb_buffer and immediately unreferencing the TTM
object which his removes it from the ttm_object_file.
This also allows the early return in vmw_user_surface_base_release for the
dumb buffer case to be removed as it should no longer occur.
The chain of references now has the GEM handle(s) owning the dumb buffer.
The GEM handles have a singular GEM reference to the vmw_bo which is
dropped when all handles are closed. When the GEM reference count hits
zero the vmw_bo is freed which then unreferences the surface via
vmw_resource_release in vmw_bo_release.
Fixes: d6667f0ddf ("drm/vmwgfx: Fix handling of dumb buffers")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Zack Rusin <zack.rusin@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123204424.836896-1-ian.forbes@broadcom.com
crtc->mode is legacy junk and shouldn't really be used with
atomic drivers.
Most (all?) atomic drivers do end up still calling
drm_atomic_helper_update_legacy_modeset_state() at some
point, so crtc->mode does still get populated, and this
does work for now. But now that the modes[] lifetime issues
have been sorted out we can just switch over to the
proper crtc->state->mode.
v2: Rebase
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228211454.8138-6-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
drm_client_firmware_config() is currently picking up the current
mode of the crtc via the legacy crtc->mode, which is not supposed
to be used by atomic drivers at all. We can't simply switch over
to the proper crtc->state->mode because we drop the crtc->mutex
(which protects crtc->state) before the mode gets used.
The most straightforward solution to extend the lifetime of
modes[] seem to be to make full copies of the modes.
And with this we can undo also commit 3eadd887db
("drm/client:Fully protect modes[] with dev->mode_config.mutex")
as the lifetime of modes[] no longer has anything to do with
that lock.
v2: Don't try to copy NULL modes
v3: Keep storing pointers and use drm_mode_{duplicate,destroy}()
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228211454.8138-5-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
The new audio code fails to build when sounds support is in a loadable
module but the GPU driver is built-in:
x86_64-linux-ld: zynqmp_dp_audio.c:(.text+0x6a8): undefined reference to `devm_snd_soc_register_card'
x86_64-linux-ld: drivers/gpu/drm/xlnx/zynqmp_dp_audio.o:(.rodata+0x1bc): undefined reference to `snd_soc_info_volsw'
x86_64-linux-ld: drivers/gpu/drm/xlnx/zynqmp_dp_audio.o:(.rodata+0x1f0): undefined reference to `snd_soc_get_volsw'
x86_64-linux-ld: drivers/gpu/drm/xlnx/zynqmp_dp_audio.o:(.rodata+0x1f4): undefined reference to `snd_soc_put_volsw'
Change the Kconfig dependency to disallow the sound support in this
configuration.
Fixes: 3ec5c15793 ("drm: xlnx: zynqmp_dpsub: Add DP audio support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227132036.1136600-1-arnd@kernel.org
This reverts commit 44d2f310f0.
The function drm_sched_job_arm() is indeed the point of no return. The
background is that it is nearly impossible for the driver to correctly
retract the fence and signal it in the order enforced by the dma_fence
framework.
The code in drm_sched_job_cleanup() is for the purpose to cleanup after
the job was armed through drm_sched_job_arm() *and* processed by the
scheduler.
We can certainly improve the documentation, but removing the warning is
clearly not a good idea.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250312134400.2176393-1-christian.koenig@amd.com
Importing dma-bufs via PRIME requires a DMA-capable device. Devices on
peripheral busses, such as USB, often cannot perform DMA by themselves.
Without DMA-capable device PRIME import fails. DRM drivers for USB
devices already use a separate DMA device for dma-buf imports. Make the
mechanism generally available.
Besides the case of USB, there are embedded DRM devices without DMA
capability. DMA is performed by a separate controller. DRM drivers should
set this accordingly.
Add the field dma_dev to struct drm_device to refer to the device's DMA
device. For USB this should be the USB controller. Use dma_dev in the
PRIME import helpers, if set.
v2:
- acquire internal reference on dma_dev (Jani)
- add DMA-controller usecase to docs (Maxime)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307080836.42848-2-tzimmermann@suse.de
We have enabled PROVE_LOCKING (which enables LOCKDEP) in drm-ci.
This will output warnings when kernel locking errors are encountered
and will continue executing tests. To detect if lockdep has been
triggered, check the debug_locks value in /proc/lockdep_stats after
the tests have run. When debug_locks is 0, it indicates that lockdep
has detected issues and turned itself off. Check this value, and if
lockdep is detected, exit with an error and configure it as a warning
in GitLab CI.
GitLab CI ignores exit codes other than 1 by default. Pass the correct
exit code with variable FF_USE_NEW_BASH_EVAL_STRATEGY set to true or
exit on failure.
Also update the documentation.
Acked-by: Helen Koike <helen.fornazier@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250217053719.442644-4-vignesh.raman@collabora.com
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
The xa_store() may fail due to memory allocation failure because there
is no guarantee that the index is already used. This fix introduces new
paths to handle the error.
This patch also aligns the order of function calls by calling
vmw_bo_add_detached_resource() before ttm_prime_object_init() in order
to allow consistent error handling.
Fixes: d6667f0ddf ("drm/vmwgfx: Fix handling of dumb buffers")
Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250225145223.34773-1-keisuke.nishimura@inria.fr
Because sync_files are passive waiters they do not participate in
the processing of fences like the traditional vmw_fence_wait IOCTL.
If userspace exclusively uses sync_files for synchronization then
nothing in the kernel actually processes fence updates as interrupts
for fences are masked and ignored if the kernel does not indicate to the
SVGA device that there are active waiters.
This oversight results in a bug where the entire GUI can freeze waiting
on a sync_file that will never be signalled as we've masked the interrupts
to signal its completion. This bug is incredibly racy as any process which
interacts with the fencing code via the 3D stack can process the stuck
fences on behalf of the stuck process causing it to run again. Even a
simple app like eglinfo is enough to resume the stuck process. Usually
this bug is seen at a login screen like GDM because there are no other
3D apps running.
By adding a seqno waiter we re-enable interrupt based processing of the
dma_fences associated with the sync_file which is signalled as part of a
dma_fence_callback.
This has likely been broken since it was initially added to the kernel in
2017 but has gone unnoticed until mutter recently started using sync_files
heavily over the course of 2024 as part of their explicit sync support.
Fixes: c906965dee ("drm/vmwgfx: Add export fence to file descriptor support")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228200633.642417-1-ian.forbes@broadcom.com
Refactor cursor handling to make the code maintainable again. Over the
last 12 years the svga device improved support for virtualized cursors
and at the same time the drm interfaces evolved quite a bit from
pre-atomic to current atomic ones. vmwgfx only added new code over
the years, instead of adjusting/refactoring the paths.
Export the cursor plane handling to its own file. Remove special
handling of the legacy cursor support to make it fit within the global
cursor plane mechanism.
Finally redo dirty tracking because memcmp never worked correctly
resulting in the cursor not being properly updated in the guest.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307125836.3877138-2-zack.rusin@broadcom.com
Some GSP RPC commands need a new reply policy: "caller don't care about
the message content but want to make sure a reply is received". To
support this case, a new reply policy is introduced.
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual
required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the
dump of the GSP message queue. After the large GSP RPC command is issued,
GSP will write only an empty RPC header in the queue as the reply.
Without this change, the policy "receiving the entire message" is used
for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving
the returned GSP message in the suspend/resume path.
Introduce the new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for
the returned GSP message but discards it for the caller. Use the new policy
NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY.
Fixes: 50f290053d ("drm/nouveau: support handling the return of large GSP message")
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-3-zhiw@nvidia.com
There can be multiple cases of handling the GSP RPC messages, which are
the reply of GSP RPC commands according to the requirement of the
callers and the nature of the GSP RPC commands.
The current supported reply policies are "callers don't care" and "receive
the entire message" according to the requirement of the callers. To
introduce a new policy, factor out the current RPC command reply polices.
Also, centralize the handling of the reply in a single function.
Factor out NVKM_GSP_RPC_REPLY_NOWAIT as "callers don't care" and
NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". Introduce a
kernel doc to document the policies. Factor out
r535_gsp_rpc_handle_reply().
No functional change is intended for small GSP RPC commands. For large GSP
commands, the caller decides the policy of how to handle the returned GSP
RPC message.
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-2-zhiw@nvidia.com