Currently, wait_event_timeout() in drm_wait_one_vblank() uses a 100ms
timeout. Under heavy scheduling pressure or rare delayed vblank
handling, this can trigger WARNs unnecessarily.
Increase the timeout to 1000ms to reduce spurious WARNs, while still
catching genuine issues.
Reported-by: syzbot+147ba789658184f0ce04@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=147ba789658184f0ce04
Tested-by: syzbot+147ba789658184f0ce04@syzkaller.appspotmail.com
Signed-off-by: Chintan Patel <chintanlike@gmail.com>
v2:
- Dropped unnecessary in-code comment (suggested by Thomas Zimmermann)
- Removed else branch, only log timeout case
v3:
- Replaced drm_dbg_kms()/manual logging with drm_err() (suggested by Ville Syrjälä)
- Removed unnecessary curr = drm_vblank_count() (suggested by Thomas Zimmermann)
- Fixed commit message wording ("invalid userspace calls" → "delayed vblank handling")
v4:
- Keep the original drm_WARN() to catch genuine kernel issues
- Increased timeout from 100ms → 1000ms to reduce spurious WARNs (suggested by Thomas Zimmermann)
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251028034337.6341-1-chintanlike@gmail.com
In the general workqueue implementation, if a user enqueues a work item
using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq)
while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not
specified). The same applies to schedule_work() that is using system_wq
and queue_work(), that makes use again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
For more details see the Link tag below.
This continues the effort to refactor worqueue APIs, which has begun
with the change introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
Use the successor of system_wq, system_percpu_wq, for the scheduler's
default timeout_wq. system_wq will be removed in a few release cycles.
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251106150121.256367-1-marco.crivellari@suse.com
Currently, dma_fence_put(job->fence) is called in job notification
callback. However, if a job is canceled, the notification callback is never
invoked, leading to a memory leak. Move dma_fence_put(job->fence)
to the job cleanup function to ensure the fence is always released.
Fixes: aac243092b ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251105194140.1004314-1-lizhi.hou@amd.com
Fix all kernel-doc warnings in include/uapi/drm/panfrost_drm.h.
This mostly means modifying existing comments to conform to
kernel-doc format, but there also some additions of missing
kernel-doc comments and changing non-kernel-doc comments to
use "/*" to begin them.
Warning: panfrost_drm.h:83 struct member 'jc' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'in_syncs' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'in_sync_count' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'out_sync' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'bo_handles' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'bo_handle_count' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'requirements' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'jm_ctx_handle' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:83 struct member 'pad' not described
in 'drm_panfrost_submit'
Warning: panfrost_drm.h:116 Incorrect use of kernel-doc format:
* Returned offset for the BO in the GPU address space. This offset
Warning: panfrost_drm.h:124 struct member 'size' not described
in 'drm_panfrost_create_bo'
Warning: panfrost_drm.h:124 struct member 'flags' not described
in 'drm_panfrost_create_bo'
Warning: panfrost_drm.h:124 struct member 'handle' not described
in 'drm_panfrost_create_bo'
Warning: panfrost_drm.h:124 struct member 'pad' not described
in 'drm_panfrost_create_bo'
Warning: panfrost_drm.h:124 struct member 'nonzero' not described
in 'drm_panfrost_create_bo'
Warning: panfrost_drm.h:143 struct member 'handle' not described
in 'drm_panfrost_mmap_bo'
Warning: panfrost_drm.h:143 struct member 'flags' not described
in 'drm_panfrost_mmap_bo'
Warning: panfrost_drm.h:143 struct member 'offset' not described
in 'drm_panfrost_mmap_bo'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patch.msgid.link/20251031054152.1406764-1-rdunlap@infradead.org
The driver checks the firmware version during initialization.If preemption
is supported, the driver configures preemption accordingly and handles
userspace preemption requests. Otherwise, the driver returns an error for
userspace preemption requests.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104185340.897560-1-lizhi.hou@amd.com
Add IOCTL debug bit for logging user provided parameter validation
errors.
Refactor several warning and error messages to better reflect fault
reason. User generated faults should not flood kernel messages with
warnings or errors, so change those to ivpu_dbg(). Add additional debug
logs for parameter validation in IOCTLs.
Check size provided by in metric streamer start and return -EINVAL
together with a debug message print.
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251104132418.970784-1-karol.wachowski@linux.intel.com
Add three hardware specific attributes to describe device capabilities:
hwctx_limit: The maximum number of hardware context supported.
max_tops: The maximum TOPS supported.
curr_tops: The TOPS achievable with the current power and frequency
configuration.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104062546.833771-1-lizhi.hou@amd.com
These error paths free a pointer and then dereference it on the next line
to get the error code. Save the error code first and then free the
memory.
Fixes: 3e4d5b30d2 ("drm/vkms: Allow to configure multiple CRTCs via configfs")
Fixes: 2f1734ba27 ("drm/vkms: Allow to configure multiple planes via configfs")
Fixes: 67d8cf92e1 ("drm/vkms: Allow to configure multiple encoders via configfs")
Fixes: 272acbca96 ("drm/vkms: Allow to configure multiple connectors via configfs")
Fixes: 13fc9b9745 ("drm/vkms: Add and remove VKMS instances via configfs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: José Expósito <jose.exposito89@gmail.com>
Link: https://lore.kernel.org/r/aPtfy2jCI_kb3Df7@stanley.mountain
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
MSG_OP_CHAIN_EXEC_NPU is a unified mailbox message that replaces
MSG_OP_CHAIN_EXEC_BUFFER_CF and MSG_OP_CHAIN_EXEC_DPU.
Add driver logic to check firmware version, and if MSG_OP_CHAIN_EXEC_NPU
is supported, uses it to submit firmware commands.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251031014700.2919349-1-lizhi.hou@amd.com
There is a race between panthor_device_unplug() and
panthor_device_suspend() which can lead to IRQ handlers running on a
powered down GPU. This is how it can happen:
- unplug routine calls drm_dev_unplug()
- panthor_device_suspend() can now execute, and will skip a lot of
important work because the device is currently marked as unplugged.
- IRQs will remain active in this case and IRQ handlers can therefore
try to access a powered down GPU.
The fix is simply to take the PM ref in panthor_device_unplug() a
little bit earlier, before drm_dev_unplug().
Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com>
Fixes: 5fe909cae1 ("drm/panthor: Add the device logical block")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://patch.msgid.link/20251022103242.1083311-1-ketil.johnsen@arm.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
A previous change, "drm/panthor: Fix UAF race between device unplug and
FW event processing", fixes a real issue where new work was unexpectedly
queued after cancellation. This was fixed by a disable instead.
Apply the same disable logic to other device level async work on device
unplug as a precaution.
Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patch.msgid.link/20251029111412.924104-1-ketil.johnsen@arm.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
The function panthor_fw_unplug() will free the FW memory sections.
The problem is that there could still be pending FW events which are yet
not handled at this point. process_fw_events_work() can in this case try
to access said freed memory.
Simply call disable_work_sync() to both drain and prevent future
invocation of process_fw_events_work().
Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com>
Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patch.msgid.link/20251027140217.121274-1-ketil.johnsen@arm.com
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
drm_bridge_connector_init() takes eight pointers to various bridges, some
of which can be identical, and stores them in pointers inside struct
drm_bridge_connector. Get a reference to each of the taken bridges and put
it on cleanup.
Achieve this by adding a drmm cleanup callback whic puts all the non-NULL
bridges. Using drmm ensures the cleanup happens on drm_device teardown,
whichever is the return value of this function.
Four of these pointers (edid, hpd, detect and modes) can be written
multiple times (up to once per loop iterations), in order to eventually
store the last matching bridge. So when one of those pointers is
overwritten, we need to put the reference that we got during the previous
assignment. Add a drm_bridge_put() before writing them to handle this.
Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> # db410c
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20251017-drm-bridge-alloc-getput-bridge-connector-fix-hdmi_cec-v2-2-667abf6d47c0@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
It was reported that Weston stops at an assert, which checks that the
page flip event timestamp is the same or newer than the previous
timestamp:
weston_output_finish_frame: Assertion `timespec_sub_to_nsec(stamp, &output->frame_time) >= 0' failed.
With manual tests, I can see that when I enable the CRTC, I get a page
flip event with a timestamp of 0. Tracking this down led to
drm_reset_vblank_timestamp() which does "t_vblank = 0" if
"high-precision query" is not available.
TI DSS does not have any hardware timestamping, and thus the default
ktime_get() is used in the DRM framework to get the vblank timestamp,
and ktime_get() is not "high precision" here.
It is not quite clear why the framework behaves this way, but I assume
the idea is that drm_crtc_vblank_on(), which calls
drm_reset_vblank_timestamp(), can be called at any time, and thus
ktime_get() wouldn't give a good timestamp. And, the idea is that the
driver would wait until next vblank after the CRTC enable, and then we
could get a good timestamp. This is hinted in the comment: "reinitialize
delayed at next vblank interrupt and assign 0 for now".
I think that makes sense. However, when we enable the CRTC in TI DSS,
i.e. we write the enable bit to the hardware, that's the exact moment
when the "vblank cycle" starts. It is the zero point in the cycle, and
thus ktime_get() would give a good timestamp.
I am not sure if this is applicable to other hardware, and if so, how
should it be solved in the framework. So, let's fix this in the tidss
driver at least for now.
This patch updates the vblank->time manually to ktime_get() just before
sending the vblank event, and we enable the crtc just before calling
ktime_get(). To get even more exact timing, the dispc_vp_enable() is
moved inside the event_lock spinlock.
With this, we get a proper timestamp for the page flip event from
enabling the CRTC, and Weston is happy.
Reviewed-by: Devarsh Thakkar <devarsht@ti.com>
Link: https://patch.msgid.link/20250905-tidss-fix-timestamp-v1-2-c2aedf31e2c9@ideasonboard.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Closes: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1553964/processor-sdk-am62x-weston-fails-to-wake-from-idle-time-sleep-restarts-after-sigterm
Closes: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1416342/am625-am625-doesn-t-wake-up-from-standy-when-idle-time-is-configured-in-weston-ini