CPU jobs, like Cache Clean jobs, execute synchronously once the DRM
scheduler starts running them. Consequently, a global `v3d->cpu_job`
variable is unnecessary, as everything is managed within the
`v3d_cpu_job_run()` function.
This commit removes the `v3d->cpu_job` pointer, as it is not needed.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250113154741.67520-2-mcanal@igalia.com
If buddy manager have more than one roots and each root have sub-block
need to be free. When drm_buddy_fini called, the first loop of
force_merge will merge and free all of the sub block of first root,
which offset is 0x0 and size is biggest(more than have of the mm size).
In subsequent force_merge rounds, if we use 0 as start and use remaining
mm size as end, the block of other roots will be skipped in
__force_merge function. It will cause the other roots can not be freed.
Solution: use roots' offset as the start could fix this issue.
Signed-off-by: Lin.Cao <lincao12@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241226070116.309290-1-Arunpravin.PaneerSelvam@amd.com
We have a few reports of sc7180-trogdor-pompom devices that have a
panel in them that IDs as STA 0x0004 and has the following raw EDID:
00 ff ff ff ff ff ff 00 4e 81 04 00 00 00 00 00
10 20 01 04 a5 1a 0e 78 0a dc dd 96 5b 5b 91 28
1f 52 54 00 00 00 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 8e 1c 56 a0 50 00 1e 30 28 20
55 00 00 90 10 00 00 18 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fe
00 31 31 36 4b 48 44 30 32 34 30 30 36 0a 00 e6
We've been unable to locate a datasheet for this panel and our partner
has not been responsive, but all Starry eDP datasheets that we can
find agree on the same timing (delay_100_500_e200) so it should be
safe to use that here instead of the super conservative timings. We'll
still go a little extra conservative and allow `hpd_absent` of 200
instead of 100 because that won't add any real-world delay in most
cases.
We'll associate the string from the EDID ("116KHD024006") with this
panel. Given that the ID is the suspicious value of 0x0004 it seems
likely that Starry doesn't always update their IDs but the string will
still work to differentiate if we ever need to in the future.
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250109142853.1.Ibcc3009933fd19507cc9c713ad0c99c7a9e4fe17@changeid
Simplify the pool allocation code somewhat by merging loop arguments
used by multiple functions together in a struct and simplifying the
loop. Also add documentation.
This hopefully makes the behaviour of the allocation loop
simplier to understand, but above all paves the way for upcoming
restore-while-allocating functionality.
There are no functional changes, but the "allow_pools" bool
introduced to keep current functionality could be removed as a
follow up, which would enable using write-back cached pools when
allocating memory for other caching modes, rather than to resort
to allocating from the system directly.
v15:
- Introduce this patch to simplify the upcoming patch that introduces
restore while allocating.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241217145852.37342-4-thomas.hellstrom@linux.intel.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
If a reset is scheduled when the suspend happens, we drop the
reset-pending info on the floor assuming the resume will fix things,
but the resume logic might try a fast reset. If we're lucky, the
fast reset fails and we fallback to a slow reset, but if the FW was
corrupted in a way that makes it partially functional (it boots but
doesn't quite do what it's expected to do), we won't notice immediately
that things are not working correctly, leading to a new reset further
down the road.
Fixes: 5fe909cae1 ("drm/panthor: Add the device logical block")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241217092457.1582053-1-boris.brezillon@collabora.com
The code that changes hdmi->ref_clk was accidentally copied from
downstream code that sets a different clock. We don't actually
want to set any clock here at all.
Setting this clock incorrectly leads to incorrect timings for
DDC, CEC, and HDCP signal generation.
No Fixes listed, as the theoretical timing error in DDC appears to
still be within tolerances and harmless - and HDCP and CEC are not
yet supported.
Signed-off-by: Derek Foreman <derek.foreman@collabora.com>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241217201708.3320673-1-derek.foreman@collabora.com
Testing some workloads in two different scenarios, such as games running
under Gamescope on a Steam Deck, or vkcube under a Plasma desktop, shows
that in a significant portion of calls the dma_fence_unwrap_merge helper
is called with just a single unsignalled fence.
Therefore it is worthile to add a fast path for that case and so bypass
the memory allocation and insertion sort attempts.
Tested scenarios:
1) Hogwarts Legacy under Gamescope
~1500 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.85%
1 69.80% -> The new fast path.
2-9 29.36% 9% (Ie. 91% of this bucket flattened to 1 fence)
10-19
20-40
50+
2) Cyberpunk 2077 under Gamescope
~2400 calls per second.
N Before After
0 0.71%
1 52.53% -> The new fast path.
2-9 44.38% 50.60% (Ie. half resolved to a single fence)
10-19 2.34%
20-40 0.06%
50+
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-9 100% 0% (Ie. all resolved to a single fence)
10-19
20-40
50+
In the case of vkcube all invocations in the 2-9 bucket were actually
just two input fences.
v2:
* Correct local variable name and hold on to unsignaled reference. (Chistian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-4-tursulin@igalia.com
If another driver for a VGA compatible GPU (that is passthrough'd)
locks the VGA resources (by calling vga_get()), then virtio_gpu
driver would encounter the following errors and fail to load during
probe and initialization:
Invalid read at addr 0x7200005014, size 1, region '(null)', reason: rejected
Invalid write at addr 0x7200005014, size 1, region '(null)', reason: rejected
virtio_gpu virtio0: virtio: device uses modern interface but does not have VIRTIO_F_VERSION_1
virtio_gpu virtio0: probe with driver virtio_gpu failed with error -22
This issue is only seen if virtio-gpu and the other GPU are on
different PCI buses, which can happen if the user includes an
additional PCIe port and associates the VGA compatible GPU with
it while launching Qemu:
qemu-system-x86_64...
-device virtio-vga,max_outputs=1,xres=1920,yres=1080,blob=true
-device pcie-root-port,id=pcie.1,bus=pcie.0,addr=1c.0,slot=1,chassis=1,multifunction=on
-device vfio-pci,host=03:00.0,bus=pcie.1,addr=00.0 ...
In the above example, the device 03:00.0 is an Intel DG2 card and
this issue is seen when both i915 driver and virtio_gpu driver are
loading (or initializing) concurrently or when i915 is loaded first.
Note that during initalization, i915 driver does the following in
intel_vga_reset_io_mem():
vga_get_uninterruptible(pdev, VGA_RSRC_LEGACY_IO);
outb(inb(VGA_MIS_R), VGA_MIS_W);
vga_put(pdev, VGA_RSRC_LEGACY_IO);
Although, virtio-gpu might own the VGA resources initially, the
above call (in i915) to vga_get_uninterruptible() would result in
these resources being taken away, which means that virtio-gpu would
not be able to decode VGA anymore. This happens in __vga_tryget()
when it calls
pci_set_vga_state(conflict->pdev, false, pci_bits, flags);
where
pci_bits = PCI_COMMAND_MEMORY | PCI_COMMAND_IO
flags = PCI_VGA_STATE_CHANGE_DECODES | PCI_VGA_STATE_CHANGE_BRIDGE
Therefore, to solve this issue, virtio-gpu driver needs to call
vga_get() whenever it needs to reclaim and access VGA resources,
which is during initial probe and setup. After that, a call to
vga_put() would release the lock to allow other VGA compatible
devices to access these shared VGA resources.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Cc: Chia-I Wu <olvaffe@gmail.com>
Reported-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241211064343.550153-1-vivek.kasireddy@intel.com
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Fix deadlock in job submission and abort handling.
When a thread aborts currently executing jobs due to a fault,
it first locks the global lock protecting submitted_jobs (#1).
After the last job is destroyed, it proceeds to release the related context
and locks file_priv (#2). Meanwhile, in the job submission thread,
the file_priv lock (#2) is taken first, and then the submitted_jobs
lock (#1) is obtained when a job is added to the submitted jobs list.
CPU0 CPU1
---- ----
(for example due to a fault) (jobs submissions keep coming)
lock(&vdev->submitted_jobs_lock) #1
ivpu_jobs_abort_all()
job_destroy()
lock(&file_priv->lock) #2
lock(&vdev->submitted_jobs_lock) #1
file_priv_release()
lock(&vdev->context_list_lock)
lock(&file_priv->lock) #2
This order of locking causes a deadlock. To resolve this issue,
change the order of locking in ivpu_job_submit().
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-12-maciej.falkowski@linux.intel.com
To prevent looping infinitely in MMU event handler we stop
generating new events by removing 'R' (record) bit from context
descriptor, but to ensure this change has effect KMD has to perform
configuration invalidation followed by sync command.
Because of that move parts of the interrupt handler that can take longer
to a thread not to block in interrupt handler for too long.
This includes:
* disabling event queue for the time KMD updates MMU event queue consumer
to ensure proper synchronization between MMU and KMD
* removal of 'R' (record) bit from context descriptor to ensure no more
faults are recorded until that context is destroyed
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-8-maciej.falkowski@linux.intel.com
Stop dumping consecutive faults from an already faulty context immediately,
instead of waiting for the context abort thread handler (IRQ handler bottom
half) to abort currently executing jobs.
Remove 'R' (record events) bit from context descriptor of a faulty
context to prevent future faults generation.
This change speeds up the IRQ handler by eliminating the need to print the
fault content repeatedly. Additionally, it prevents flooding dmesg with
errors, which was occurring due to the delay in the bottom half of the
handler stopping fault-generating jobs.
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-7-maciej.falkowski@linux.intel.com
With hardware scheduler it is not expected to receive JOB_DONE
notifications from NPU FW for the jobs aborted due to command queue destroy
JSM command.
Remove jobs submitted to unregistered command queue from submitted_jobs_xa
to avoid triggering a TDR in such case.
Add explicit submitted_jobs_lock that protects access to list of submitted
jobs which is now used to find jobs to abort.
Move context abort procedure to separate work queue not to slow down
handling of IPCs or DCT requests in case where job abort takes longer,
especially when destruction of the last job of a specific context results
in context release.
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-4-maciej.falkowski@linux.intel.com