The original intention was to avoid CPU page table unmaps
when BOs move between the GTT and SYSTEM domain.
The problem is that this never correctly handled changes
in the caching attributes or backing pages.
Just drop this for now and simply unmap the CPU page
tables in all cases.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/378240/
Two in one go:
- it is allowed to call dma_fence_wait() while holding a
dma_resv_lock(). This is fundamental to how eviction works with ttm,
so required.
- it is allowed to call dma_fence_wait() from memory reclaim contexts,
specifically from shrinker callbacks (which i915 does), and from mmu
notifier callbacks (which amdgpu does, and which i915 sometimes also
does, and probably always should, but that's kinda a debate). Also
for stuff like HMM we really need to be able to do this, or things
get real dicey.
Consequence is that any critical path necessary to get to a
dma_fence_signal for a fence must never a) call dma_resv_lock nor b)
allocate memory with GFP_KERNEL. Also by implication of
dma_resv_lock(), no userspace faulting allowed. That's some supremely
obnoxious limitations, which is why we need to sprinkle the right
annotations to all relevant paths.
The one big locking context we're leaving out here is mmu notifiers,
added in
commit 23b68395c7
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Aug 26 22:14:21 2019 +0200
mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
that one covers a lot of other callsites, and it's also allowed to
wait on dma-fences from mmu notifiers. But there's no ready-made
functions exposed to prime this, so I've left it out for now.
v2: Also track against mmu notifier context.
v3: kerneldoc to spec the cross-driver contract. Note that currently
i915 throws in a hard-coded 10s timeout on foreign fences (not sure
why that was done, but it's there), which is why that rule is worded
with SHOULD instead of MUST.
Also some of the mmu_notifier/shrinker rules might surprise SoC
drivers, I haven't fully audited them all. Which is infeasible anyway,
we'll need to run them with lockdep and dma-fence annotations and see
what goes boom.
v4: A spelling fix from Mika
v5: #ifdef for CONFIG_MMU_NOTIFIER. Reported by 0day. Unfortunately
this means lockdep enforcement is slightly inconsistent, it won't spot
GFP_NOIO and GFP_NOFS allocations in the wrong spot if
CONFIG_MMU_NOTIFIER is disabled in the kernel config. Oh well.
v5: Note that only drivers/gpu has a reasonable (or at least
historical) excuse to use dma_fence_wait() from shrinker and mmu
notifier callbacks. Everyone else should either have a better memory
manager model, or better hardware. This reflects discussions with
Jason Gunthorpe.
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: kernel test robot <lkp@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> (v4)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@intel.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-rdma@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200707201229.472834-3-daniel.vetter@ffwll.ch
Design is similar to the lockdep annotations for workers, but with
some twists:
- We use a read-lock for the execution/worker/completion side, so that
this explicit annotation can be more liberally sprinkled around.
With read locks lockdep isn't going to complain if the read-side
isn't nested the same way under all circumstances, so ABBA deadlocks
are ok. Which they are, since this is an annotation only.
- We're using non-recursive lockdep read lock mode, since in recursive
read lock mode lockdep does not catch read side hazards. And we
_very_ much want read side hazards to be caught. For full details of
this limitation see
commit e914985897
Author: Peter Zijlstra <peterz@infradead.org>
Date: Wed Aug 23 13:13:11 2017 +0200
locking/lockdep/selftests: Add mixed read-write ABBA tests
- To allow nesting of the read-side explicit annotations we explicitly
keep track of the nesting. lock_is_held() allows us to do that.
- The wait-side annotation is a write lock, and entirely done within
dma_fence_wait() for everyone by default.
- To be able to freely annotate helper functions I want to make it ok
to call dma_fence_begin/end_signalling from soft/hardirq context.
First attempt was using the hardirq locking context for the write
side in lockdep, but this forces all normal spinlocks nested within
dma_fence_begin/end_signalling to be spinlocks. That bollocks.
The approach now is to simple check in_atomic(), and for these cases
entirely rely on the might_sleep() check in dma_fence_wait(). That
will catch any wrong nesting against spinlocks from soft/hardirq
contexts.
The idea here is that every code path that's critical for eventually
signalling a dma_fence should be annotated with
dma_fence_begin/end_signalling. The annotation ideally starts right
after a dma_fence is published (added to a dma_resv, exposed as a
sync_file fd, attached to a drm_syncobj fd, or anything else that
makes the dma_fence visible to other kernel threads), up to and
including the dma_fence_wait(). Examples are irq handlers, the
scheduler rt threads, the tail of execbuf (after the corresponding
fences are visible), any workers that end up signalling dma_fences and
really anything else. Not annotated should be code paths that only
complete fences opportunistically as the gpu progresses, like e.g.
shrinker/eviction code.
The main class of deadlocks this is supposed to catch are:
Thread A:
mutex_lock(A);
mutex_unlock(A);
dma_fence_signal();
Thread B:
mutex_lock(A);
dma_fence_wait();
mutex_unlock(A);
Thread B is blocked on A signalling the fence, but A never gets around
to that because it cannot acquire the lock A.
Note that dma_fence_wait() is allowed to be nested within
dma_fence_begin/end_signalling sections. To allow this to happen the
read lock needs to be upgraded to a write lock, which means that any
other lock is acquired between the dma_fence_begin_signalling() call and
the call to dma_fence_wait(), and still held, this will result in an
immediate lockdep complaint. The only other option would be to not
annotate such calls, defeating the point. Therefore these annotations
cannot be sprinkled over the code entirely mindless to avoid false
positives.
Originally I hope that the cross-release lockdep extensions would
alleviate the need for explicit annotations:
https://lwn.net/Articles/709849/
But there's a few reasons why that's not an option:
- It's not happening in upstream, since it got reverted due to too
many false positives:
commit e966eaeeb6
Author: Ingo Molnar <mingo@kernel.org>
Date: Tue Dec 12 12:31:16 2017 +0100
locking/lockdep: Remove the cross-release locking checks
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
while it found a number of old bugs initially, was also causing too many
false positives that caused people to disable lockdep - which is arguably
a worse overall outcome.
- cross-release uses the complete() call to annotate the end of
critical sections, for dma_fence that would be dma_fence_signal().
But we do not want all dma_fence_signal() calls to be treated as
critical, since many are opportunistic cleanup of gpu requests. If
these get stuck there's still the main completion interrupt and
workers who can unblock everyone. Automatically annotating all
dma_fence_signal() calls would hence cause false positives.
- cross-release had some educated guesses for when a critical section
starts, like fresh syscall or fresh work callback. This would again
cause false positives without explicit annotations, since for
dma_fence the critical sections only starts when we publish a fence.
- Furthermore there can be cases where a thread never does a
dma_fence_signal, but is still critical for reaching completion of
fences. One example would be a scheduler kthread which picks up jobs
and pushes them into hardware, where the interrupt handler or
another completion thread calls dma_fence_signal(). But if the
scheduler thread hangs, then all the fences hang, hence we need to
manually annotate it. cross-release aimed to solve this by chaining
cross-release dependencies, but the dependency from scheduler thread
to the completion interrupt handler goes through hw where
cross-release code can't observe it.
In short, without manual annotations and careful review of the start
and end of critical sections, cross-relese dependency tracking doesn't
work. We need explicit annotations.
v2: handle soft/hardirq ctx better against write side and dont forget
EXPORT_SYMBOL, drivers can't use this otherwise.
v3: Kerneldoc.
v4: Some spelling fixes from Mika
v5: Amend commit message to explain in detail why cross-release isn't
the solution.
v6: Pull out misplaced .rst hunk.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@intel.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-rdma@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200707201229.472834-2-daniel.vetter@ffwll.ch
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200719171428.60470-1-grandmaster@al2klimov.de
Calling drmm_vram_helper_init() sets up a managed instance of
VRAM MM. Releasing the DRM device also frees the memory manager.
The patch also updates the DRM documentation for VRAM helpers. The
tutorial now describes the new managed interface. The old interfaces
are deprecated and should not be used in new code.
v2:
* rename init function to drmm_vram_helper_init()
* return errno code from init function; caller does not
need vram_mm anyway
* update documentation and remove docs for deprecated
un-managed functions
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716125353.31512-2-tzimmermann@suse.de
Add some kind of vblank workers. The interface is similar to regular
delayed works, and is mostly based off kthread_work. It allows for
scheduling delayed works that execute once a particular vblank sequence
has passed. It also allows for accurate flushing of scheduled vblank
works - in that flushing waits for both the vblank sequence and job
execution to complete, or for the work to get cancelled - whichever
comes first.
Whatever hardware programming we do in the work must be fast (must at
least complete during the vblank or scanout period, sometimes during the
first few scanlines of the vblank). As such we use a high-priority
per-CRTC thread to accomplish this.
Changes since v7:
* Stuff drm_vblank_internal.h and drm_vblank_work_internal.h contents
into drm_internal.h
* Get rid of unnecessary spinlock in drm_crtc_vblank_on()
* Remove !vblank->worker check
* Grab vbl_lock in drm_vblank_work_schedule()
* Mention self-rearming work items in drm_vblank_work_schedule() kdocs
* Return 1 from drm_vblank_work_schedule() if the work was scheduled
successfully, 0 or error code otherwise
* Use drm_dbg_core() instead of DRM_DEV_ERROR() in
drm_vblank_work_schedule()
* Remove vblank->worker checks in drm_vblank_destroy_worker() and
drm_vblank_flush_worker()
Changes since v6:
* Get rid of ->pending and seqcounts, and implement flushing through
simpler means - danvet
* Get rid of work_lock, just use drm_device->event_lock
* Move drm_vblank_work item cleanup into drm_crtc_vblank_off() so that
we ensure that all vblank work has finished before disabling vblanks
* Add checks into drm_crtc_vblank_reset() so we yell if it gets called
while there's vblank workers active
* Grab event_lock in both drm_crtc_vblank_on()/drm_crtc_vblank_off(),
the main reason for this is so that other threads calling
drm_vblank_work_schedule() are blocked from attempting to schedule
while we're in the middle of enabling/disabling vblanks.
* Move drm_handle_vblank_works() call below drm_handle_vblank_events()
* Simplify drm_vblank_work_cancel_sync()
* Fix drm_vblank_work_cancel_sync() documentation
* Move wake_up_all() calls out of spinlock where we can. The only one I
left was the call to wake_up_all() in drm_vblank_handle_works() as
this seemed like it made more sense just living in that function
(which is all technically under lock)
* Move drm_vblank_work related functions into their own source files
* Add drm_vblank_internal.h so we can export some functions we don't
want drivers using, but that we do need to use in drm_vblank_work.c
* Add a bunch of documentation
Changes since v4:
* Get rid of kthread interfaces we tried adding and move all of the
locking into drm_vblank.c. For implementing drm_vblank_work_flush(),
we now use a wait_queue and sequence counters in order to
differentiate between multiple work item executions.
* Get rid of drm_vblank_work_cancel() - this would have been pretty
difficult to actually reimplement and it occurred to me that neither
nouveau or i915 are even planning to use this function. Since there's
also no async cancel function for most of the work interfaces in the
kernel, it seems a bit unnecessary anyway.
* Get rid of to_drm_vblank_work() since we now are also able to just
pass the struct drm_vblank_work to work item callbacks anyway
Changes since v3:
* Use our own spinlocks, don't integrate so tightly with kthread_works
Changes since v2:
* Use kthread_workers instead of reinventing the wheel.
Cc: Tejun Heo <tj@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Co-developed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Dave Airlie <airlied@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200627194657.156514-4-lyude@redhat.com
In cases such as DRM_FORMAT_MOD_SAMSUNG_16_16_TILE, the modifier
describes a generic pixel re-ordering which can be applicable to
multiple vendors.
Define an alias: DRM_FORMAT_MOD_GENERIC_16_16_TILE, which can be
used to describe this layout in a vendor-neutral way, and add a
comment about the expected usage of such "generic" modifiers.
Changes in v2:
- Move note about future cases to comment (Daniel)
Signed-off-by: Brian Starkey <brian.starkey@arm.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200626164800.11595-1-brian.starkey@arm.com
Amlogic uses a proprietary lossless image compression protocol and format
for their hardware video codec accelerators, either video decoders or
video input encoders.
It considerably reduces memory bandwidth while writing and reading
frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit
per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2
Amlogic VDEC decoder.
This introduces the basic layout composed of:
- a body content organized in 64x32 superblocks with 4096 bytes per
superblock in default mode.
- a 32 bytes per 128x64 header block
This layout is tranferrable between Amlogic SoCs supporting this modifier.
The Memory Saving option exist changing the layout superblock size to save memory when
using 8bit components pixels size.
Finally is also adds the Scatter Memory layout, meaning the header contains IOMMU
references to the compressed frames content to optimize memory access
and layout.
In this mode, only the header memory address is needed, thus the content
memory organization is tied to the current producer execution and cannot
be saved/dumped neither transferrable between Amlogic SoCs supporting this
modifier.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200703080728.25207-2-narmstrong@baylibre.com
This counter will be used by drm_helper_probe_detect caller to determine
if anything had changed(including edid, connection status and etc).
Hardware specific driver detect hooks are responsible for updating this
counter when some change is detected to notify the drm part,
which can trigger for example hotplug event.
Also now call drm_connector_update_edid_property
right after we get edid always to make sure there is a
unified way to handle edid change, without having to
change tons of source code as currently
drm_connector_update_edid_property is called only in
certain cases like reprobing and not right after edid is
actually updated.
v2: Added documentation for the new counter. Rename change_counter to
epoch_counter.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105540
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200630002700.5451-3-kunal1.joshi@intel.com