Mali's CSF firmware triggers the job IRQ whenever there's new firmware
events for processing. While this can be a global event (BIT(31) of the
status register), it's usually an event relating to a command stream
group (the other bit indices).
Panthor throws these events onto a workqueue for processing outside the
IRQ handler. It's therefore useful to have an instrumented tracepoint
that goes beyond the generic IRQ tracepoint for this specific case, as
it can be augmented with additional data, namely the events bit mask.
This can then be used to debug problems relating to GPU jobs events not
being processed quickly enough. The duration_ns field can be used to
work backwards from when the tracepoint fires (at the end of the IRQ
handler) to figure out when the interrupt itself landed, providing not
just information on how long the work queueing took, but also when the
actual interrupt itself arrived.
With this information in hand, the IRQ handler itself being slow can be
excluded as a possible source of problems, and attention can be directed
to the workqueue processing instead.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260116-panthor-tracepoints-v10-4-d925986e3d1b@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Mali GPUs have three registers that indicate which parts of the hardware
are powered at any moment. These take the form of bitmaps. In the case
of SHADER_READY for example, a high bit indicates that the shader core
corresponding to that bit index is powered on. These bitmaps aren't
solely contiguous bits, as it's common to have holes in the sequence of
shader core indices, and the actual set of which cores are present is
defined by the "shader present" register.
When the GPU finishes a power state transition, it fires a
GPU_IRQ_POWER_CHANGED_ALL interrupt. After such an interrupt is
received, the _READY registers will contain new interesting data. During
power transitions, the GPU_IRQ_POWER_CHANGED interrupt will fire, and
the registers will likewise contain potentially changed data.
This is not to be confused with the PWR_IRQ_POWER_CHANGED_ALL interrupt,
which is something related to Mali v14+'s power control logic. The
_READY registers and corresponding interrupts are already available in
v9 and onwards.
Expose the data as a tracepoint to userspace. This allows users to debug
various scenarios and gather interesting information, such as: knowing
how much hardware is lit up at any given time, correlating graphics
corruption with a specific powered shader core, measuring when hardware
is allowed to go to a powered off state again, and so on.
The registration/unregistration functions for the tracepoint go through
a wrapper in panthor_hw.c, so that v14+ can implement the same
tracepoint by adding its hardware specific IRQ on/off callbacks to the
panthor_hw.ops member.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260116-panthor-tracepoints-v10-3-d925986e3d1b@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
The current IRQ helpers do not guarantee mutual exclusion that covers
the entire transaction from accessing the mask member and modifying the
mask register.
This makes it hard, if not impossible, to implement mask modification
helpers that may change one of these outside the normal
suspend/resume/isr code paths.
Add a spinlock to struct panthor_irq that protects both the mask member
and register. Acquire it in all code paths that access these, but drop
it before processing the threaded handler function. Then, add the
aforementioned new helpers: enable_events, and disable_events. They work
by ORing and NANDing the mask bits.
resume is changed to no longer have a mask passed, as pirq->mask is
supposed to be the user-requested mask now, rather than a mirror of the
INT_MASK register contents. Users of the resume helper are adjusted
accordingly, including a rather painful refactor in panthor_mmu.c.
In panthor_mmu.c, the bespoke mask modification is excised, and replaced
with enable_events/disable_events in as_enable/as_disable.
Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260116-panthor-tracepoints-v10-2-d925986e3d1b@collabora.com
Since we can only inspect dmabuf by iterating over process FDs or the
dmabuf_list, we need to add our own tracepoints to track its status in
real time in production.
For example:
binder:3016_1-3102 [006] ...1. 255.126521: dma_buf_export: exp_name=qcom,system size=12685312 ino=2738
binder:3016_1-3102 [006] ...1. 255.126528: dma_buf_fd: exp_name=qcom,system size=12685312 ino=2738 fd=8
binder:3016_1-3102 [006] ...1. 255.126642: dma_buf_mmap_internal: exp_name=qcom,system size=28672 ino=2739
kworker/6:1-86 [006] ...1. 255.127194: dma_buf_put: exp_name=qcom,system size=12685312 ino=2738
RenderThread-9293 [006] ...1. 316.618179: dma_buf_get: exp_name=qcom,system size=12771328 ino=2762 fd=176
RenderThread-9293 [006] ...1. 316.618195: dma_buf_dynamic_attach: exp_name=qcom,system size=12771328 ino=2762 attachment:ffffff880a18dd00 is_dynamic=0 dev_name=kgsl-3d0
RenderThread-9293 [006] ...1. 318.878220: dma_buf_detach: exp_name=qcom,system size=12771328 ino=2762 attachment:ffffff880a18dd00 is_dynamic=0 dev_name=kgsl-3d0
Signed-off-by: Xiang Gao <gaoxiang17@xiaomi.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260109115411.115270-1-gxxa03070307@gmail.com
Add kunit tests that exercise edge cases where allocation requests
exceed mm->max_order after rounding. This can happen with
non-power-of-two VRAM sizes when the allocator rounds up requests.
For example, with 10G VRAM (8G + 2G roots), mm->max_order represents
the 8G block. A 9G allocation can round up to 16G in multiple ways:
CONTIGUOUS allocation rounds to next power-of-two, or non-CONTIGUOUS
with 8G min_block_size rounds to next alignment boundary.
The test validates CONTIGUOUS and RANGE flag combinations, ensuring that
only CONTIGUOUS-alone allocations use try_harder fallback, while other
combinations return -EINVAL when rounded size exceeds memory, preventing
BUG_ON assertions.
Cc: Christian König <christian.koenig@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20260108113227.2101872-6-sanjay.kumar.yadav@intel.com
When DRM_BUDDY_CONTIGUOUS_ALLOCATION is set, the requested size is
rounded up to the next power-of-two via roundup_pow_of_two().
Similarly, for non-contiguous allocations with large min_block_size,
the size is aligned up via round_up(). Both operations can produce a
rounded size that exceeds mm->size, which later triggers
BUG_ON(order > mm->max_order).
Example scenarios:
- 9G CONTIGUOUS allocation on 10G VRAM memory:
roundup_pow_of_two(9G) = 16G > 10G
- 9G allocation with 8G min_block_size on 10G VRAM memory:
round_up(9G, 8G) = 16G > 10G
Fix this by checking the rounded size against mm->size. For
non-contiguous or range allocations where size > mm->size is invalid,
return -EINVAL immediately. For contiguous allocations without range
restrictions, allow the request to fall through to the existing
__alloc_contig_try_harder() fallback.
This ensures invalid user input returns an error or uses the fallback
path instead of hitting BUG_ON.
v2: (Matt A)
- Add Fixes, Cc stable, and Closes tags for context
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6712
Fixes: 0a1844bf0b ("drm/buddy: Improve contiguous memory allocation")
Cc: <stable@vger.kernel.org> # v6.7+
Cc: Christian König <christian.koenig@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20260108113227.2101872-5-sanjay.kumar.yadav@intel.com
The atmel_hlcdc_plane_atomic_duplicate_state() callback was copying
the atmel_hlcdc_plane state structure without properly duplicating the
drm_plane_state. In particular, state->commit remained set to the old
state commit, which can lead to a use-after-free in the next
drm_atomic_commit() call.
Fix this by calling
__drm_atomic_helper_duplicate_plane_state(), which correctly clones
the base drm_plane_state (including the ->commit pointer).
It has been seen when closing and re-opening the device node while
another DRM client (e.g. fbdev) is still attached:
=============================================================================
BUG kmalloc-64 (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
0xc611b344-0xc611b344 @offset=836. First byte 0x6a instead of 0x6b
FIX kmalloc-64: Restoring Poison 0xc611b344-0xc611b344=0x6b
Allocated in drm_atomic_helper_setup_commit+0x1e8/0x7bc age=178 cpu=0
pid=29
drm_atomic_helper_setup_commit+0x1e8/0x7bc
drm_atomic_helper_commit+0x3c/0x15c
drm_atomic_commit+0xc0/0xf4
drm_framebuffer_remove+0x4cc/0x5a8
drm_mode_rmfb_work_fn+0x6c/0x80
process_one_work+0x12c/0x2cc
worker_thread+0x2a8/0x400
kthread+0xc0/0xdc
ret_from_fork+0x14/0x28
Freed in drm_atomic_helper_commit_hw_done+0x100/0x150 age=8 cpu=0
pid=169
drm_atomic_helper_commit_hw_done+0x100/0x150
drm_atomic_helper_commit_tail+0x64/0x8c
commit_tail+0x168/0x18c
drm_atomic_helper_commit+0x138/0x15c
drm_atomic_commit+0xc0/0xf4
drm_atomic_helper_set_config+0x84/0xb8
drm_mode_setcrtc+0x32c/0x810
drm_ioctl+0x20c/0x488
sys_ioctl+0x14c/0xc20
ret_fast_syscall+0x0/0x54
Slab 0xef8bc360 objects=21 used=16 fp=0xc611b7c0
flags=0x200(workingset|zone=0)
Object 0xc611b340 @offset=832 fp=0xc611b7c0
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Link: https://patch.msgid.link/20251024-lcd_fixes_mainlining-v1-2-79b615130dc3@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
After several commits, the slab memory increases. Some drm_crtc_commit
objects are not freed. The atomic_destroy_state callback only put the
framebuffer. Use the __drm_atomic_helper_plane_destroy_state() function
to put all the objects that are no longer needed.
It has been seen after hours of usage of a graphics application or using
kmemleak:
unreferenced object 0xc63a6580 (size 64):
comm "egt_basic", pid 171, jiffies 4294940784
hex dump (first 32 bytes):
40 50 34 c5 01 00 00 00 ff ff ff ff 8c 65 3a c6 @P4..........e:.
8c 65 3a c6 ff ff ff ff 98 65 3a c6 98 65 3a c6 .e:......e:..e:.
backtrace (crc c25aa925):
kmemleak_alloc+0x34/0x3c
__kmalloc_cache_noprof+0x150/0x1a4
drm_atomic_helper_setup_commit+0x1e8/0x7bc
drm_atomic_helper_commit+0x3c/0x15c
drm_atomic_commit+0xc0/0xf4
drm_atomic_helper_set_config+0x84/0xb8
drm_mode_setcrtc+0x32c/0x810
drm_ioctl+0x20c/0x488
sys_ioctl+0x14c/0xc20
ret_fast_syscall+0x0/0x54
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Link: https://patch.msgid.link/20251024-lcd_fixes_mainlining-v1-1-79b615130dc3@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
of_drm_find_bridge() is deprecated. Move to its replacement
of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it
is put when done. Also switch to the drm_bridge::next_bridge pointer.
This needs to handle both cases: when of_drm_find_panel() succeeds and when
it fails.
In the 'else' case (i.e. when of_drm_find_panel() fails), just switch to
of_drm_find_and_get_bridge() to ensure the bridge is not freed while in use
in the function tail, when it is stored in dsi->bridge.next_bridge.
In the 'then' case (i.e. when of_drm_find_panel() succeeds),
devm_drm_panel_bridge_add() already increments the refcount using devres
which ties the bridge allocation lifetime to the device lifetime, so we
would not need to do anything. However to have the same behaviour in both
branches take an additional reference here, so that the bridge needs to be
put whichever branch is taken without more complicated logic. Ensure to
clear the bridge pointer however, to avoid calling drm_bridge_put() on an
ERR_PTR.
Acked-by: Maxime Ripard <mripard@kernel.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-2-v2-12-8bad3ef90b9f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
In preparation to handle refcounting of the out_bridge, we need to ensure
the out_bridge pointer contains either a valid bridge pointer or NULL, not
an ERR_PTR. Otherwise calls such as drm_bridge_get/put() would try to
redeference an ERR_PTR.
As a preliminary cleanup, add a temporary local 'next_bridge' pointer and
only copy it in dsi->out_bridge as late as possible, i.e. just before
calling pdata->host_ops->attach() which uses it (only in the exynos
driver).
Not strictly needed, but for symmetry move the clearing of dsi->out_bridge
in samsung_dsim_host_detach() to after pdata->host_ops->detach().
Acked-by: Maxime Ripard <mripard@kernel.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-2-v2-10-8bad3ef90b9f@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>