The current way of handling refcounting in the DP MST helpers is really
confusing and probably just plain wrong because it's been hacked up many
times over the years without anyone actually going over the code and
seeing if things could be simplified.
To the best of my understanding, the current scheme works like this:
drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
this refcount hits 0 for either of the two, they're removed from the
topology state, but not immediately freed. Both ports and branch devices
will reinitialize their kref once it's hit 0 before actually destroying
themselves. The intended purpose behind this is so that we can avoid
problems like not being able to free a remote payload that might still
be active, due to us having removed all of the port/branch device
structures in memory, as per:
commit 91a25e4631 ("drm/dp/mst: deallocate payload on port destruction")
Which may have worked, but then it caused use-after-free errors. Being
new to MST at the time, I tried fixing it;
commit 263efde31f ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")
But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
are validated in almost every DP MST helper function. Simply put, this
means we go through the topology and try to see if the given
drm_dp_mst_branch or drm_dp_mst_port is still attached to something
before trying to use it in order to avoid dereferencing freed memory
(something that has happened a LOT in the past with this library).
Because of this it doesn't actually matter whether or not we keep keep
the ports and branches around in memory as that's not enough, because
any function that validates the branches and ports passed to it will
still reject them anyway since they're no longer in the topology
structure. So, use-after-free errors were fixed but payload deallocation
was completely broken.
Two years later, AMD informed me about this issue and I attempted to
come up with a temporary fix, pending a long-overdue cleanup of this
library:
commit c54c7374ff ("drm/dp_mst: Skip validating ports during destruction, just ref")
But then that introduced use-after-free errors, so I quickly reverted
it:
commit 9765635b30 ("Revert "drm/dp_mst: Skip validating ports during destruction, just ref"")
And in the process, learned that there is just no simple fix for this:
the design is just broken. Unfortunately, the usage of these helpers are
quite broken as well. Some drivers like i915 have been smart enough to
avoid accessing any kind of information from MST port structures, but
others like nouveau have assumed, understandably so, that
drm_dp_mst_port structures are normal and can just be accessed at any
time without worrying about use-after-free errors.
After a lot of discussion, me and Daniel Vetter came up with a better
idea to replace all of this.
To summarize, since this is documented far more indepth in the
documentation this patch introduces, we make it so that drm_dp_mst_port
and drm_dp_mst_branch structures have two different classes of
refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
given topology. Once it hits zero, any associated connectors are removed
and the branch or port can no longer be validated. malloc_kref
corresponds to the lifetime of the memory allocation for the actual
structure, and will always be non-zero so long as the topology_kref is
non-zero. This gives us a way to allow callers to hold onto port and
branch device structures past their topology lifetime, and dramatically
simplifies the lifetimes of both structures. This also finally fixes the
port deallocation problem, properly.
Additionally: since this now means that we can keep ports and branch
devices allocated in memory for however long we need, we no longer need
a significant amount of the port validation that we currently do.
Additionally, there is one last scenario that this fixes, which couldn't
have been fixed properly beforehand:
- CPU1 unrefs port from topology (refcount 1->0)
- CPU2 refs port in topology(refcount 0->1)
Since we now can guarantee memory safety for ports and branches
as-needed, we also can make our main reference counting functions fix
this problem by using kref_get_unless_zero() internally so that topology
refcounts can only ever reach 0 once.
Changes since v4:
* Change the kernel-figure summary for dp-mst/topology-figure-1.dot a
bit - danvet
* Remove figure numbers - danvet
Changes since v3:
* Remove rebase detritus - danvet
* Split out purely style changes into separate patches - hwentlan
Changes since v2:
* Fix commit message - checkpatch
* s/)-1/) - 1/g - checkpatch
Changes since v1:
* Remove forward declarations - danvet
* Move "Branch device and port refcounting" section from documentation
into kernel-doc comments - danvet
* Export internal topology lifetime functions into their own section in
the kernel-docs - danvet
* s/@/&/g for struct references in kernel-docs - danvet
* Drop the "when they are no longer being used" bits from the kernel
docs - danvet
* Modify diagrams to show how the DRM driver interacts with the topology
and payloads - danvet
* Make suggested documentation changes for
drm_dp_mst_topology_get_mstb() and drm_dp_mst_topology_get_port() -
danvet
* Better explain the relationship between malloc refs and topology krefs
in the documentation for drm_dp_mst_topology_get_port() and
drm_dp_mst_topology_get_mstb() - danvet
* Fix "See also" in drm_dp_mst_topology_get_mstb() - danvet
* Rename drm_dp_mst_topology_get_(port|mstb)() ->
drm_dp_mst_topology_try_get_(port|mstb)() and
drm_dp_mst_topology_ref_(port|mstb)() ->
drm_dp_mst_topology_get_(port|mstb)() - danvet
* s/should/must in docs - danvet
* WARN_ON(refcount == 0) in topology_get_(mstb|port) - danvet
* Move kdocs for mstb/port structs inline - danvet
* Split drm_dp_get_last_connected_port_and_mstb() changes into their own
commit - danvet
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@redhat.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Juston Li <juston.li@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190111005343.17443-7-lyude@redhat.com
This reverts commit 9a09a42369.
The whole interface isn't thought through. Since this function can't
fail we actually can't allocate an object to store the sync point.
Sorry, I should have taken the lead on this from the very beginning and
reviewed it more thoughtfully. Going to propose a new interface as a
follow up change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Link: https://patchwork.freedesktop.org/patch/265580/
This patch adds a couple of helpers to remove the boilerplate involved
in grabbing all of the modeset locks.
I've also converted the obvious cases in drm core to use the helpers.
The only remaining instance of drm_modeset_lock_all_ctx() is in
drm_framebuffer. It's complicated by the state clear that occurs on
deadlock. ATM, there's no way to inject code in the deadlock path with
the helpers, so it's unfit for conversion.
Changes in v2:
- Relocate ret argument to the end of the list (Daniel)
- Incorporate Daniel's doc suggestions (Daniel)
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20181129150423.239081-4-sean@poorly.run
drm-misc-next for v4.21:
Core Changes:
- Merge drm_info.c into drm_debugfs.c
- Complete the fake drm_crtc_commit's hw_done/flip_done sooner.
- Remove deprecated drm_obj_ref/unref functions. All drivers use get/put now.
- Decrease stack use of drm_gem_prime_mmap.
- Improve documentation for dumb callbacks.
Driver Changes:
- Add edid support to virtio.
- Wait on implicit fence in meson and sun4i.
- Add support for BGRX8888 to sun4i.
- Preparation patches for sun4i driver to start supporting linear and tiled YUV formats.
- Add support for HDMI 1.4 4k modes to meson, and support for VIC alternate timings.
- Drop custom dumb_map in vkms.
- Small fixes and cleanups to v3d.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/151a3270-b1be-ed75-bd58-6b29d741f592@linux.intel.com
drm-misc-next for v4.21, part 2:
UAPI Changes:
- Remove syncobj timeline support from drm.
Cross-subsystem Changes:
- Document canvas provider node in the DT bindings.
- Improve documentation for TPO TPG110 DT bindings.
Core Changes:
- Use explicit state in drm atomic functions.
- Add panel quirk for new GPD Win2 firmware.
- Add DRM_FORMAT_XYUV8888.
- Set the default import/export function in prime to drm_gem_prime_import/export.
- Add a separate drm_gem_object_funcs, to stop relying on dev->driver->*gem* functions.
- Make sure that tinydrm sets the virtual address also on imported buffers.
Driver Changes:
- Support active-low data enable signal in sun4i.
- Fix scaling in vc4.
- Use canvas provider node in meson.
- Remove unused variables in sti and qxl and cirrus.
- Add overlay plane support and primary plane scaling to meson.
- i2c fixes in drm/bridge/sii902x
- Fix mailbox read size in rockchip.
- Spelling fix in panel/s6d16d0.
- Remove unnecessary null check from qxl_bo_unref.
- Remove unused arguments from qxl_bo_pin.
- Fix qxl cursor pinning.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9c0409e3-a85f-d2af-b4eb-baf1eb8bbae4@linux.intel.com
This adds an optional function table on GEM objects.
The main benefit is for drivers that support more than one type of
memory (shmem,vram,cma) for their buffers depending on the hardware it
runs on. With the callbacks attached to the GEM object itself, it is
easier to have core helpers for the the various buffer types. The driver
only has to make the decision about buffer type on GEM object creation
and all other callbacks can be handled by the chosen helper.
drm_driver->gem_prime_res_obj has not been added since there's a todo to
put a reservation_object into drm_gem_object.
v3: Add todo entry
v2: Drop drm_gem_object_funcs->prime_mmap in favour of
drm_gem_prime_mmap() (Daniel Vetter)
v1:
- drm_gem_object_funcs.map -> .prime_map let it only do PRIME mmap like
the function it superseeds (Daniel Vetter)
- Flip around the if ladders and make obj->funcs the first choice
highlighting the fact that this the new default way of doing it
(Daniel Vetter)
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181110145647.17580-4-noralf@tronnes.org
New features for 4.21:
amdgpu:
- Support for SDMA paging queue on vega
- Put compute EOP buffers into vram for better performance
- Share more code with amdkfd
- Support for scanout with DCC on gfx9
- Initial kerneldoc for DC
- Updated SMU firmware support for gfx8 chips
- Rework CSA handling for eventual support for preemption
- XGMI PSP support
- Clean up RLC handling
- Enable GPU reset by default on VI, SOC15 dGPUs
- Ring and IB test cleanups
amdkfd:
- Share more code with amdgpu
ttm:
- Move global init out of the drivers
scheduler:
- Track if schedulers are ready for work
- Timeout/fault handling changes to facilitate GPU recovery
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181114165113.3751-1-alexander.deucher@amd.com
drm-misc-next for v4.21, part 1:
UAPI Changes:
- Add syncobj timeline support to drm.
Cross-subsystem Changes:
- Remove shared fence staging in dma-buf's fence object, and allow
reserving more than 1 fence and add more paranoia when debugging.
- Constify infoframe functions in video/hdmi.
Core Changes:
- Add vkms todo, and a lot of assorted doc fixes.
- Drop transitional helpers and convert drivers to use drm_atomic_helper_shutdown().
- Move atomic state helper functions to drm_atomic_state_helper.[ch]
- Refactor drm selftests, and add new tests.
- DP MST atomic state cleanups.
- Drop EXPORT_SYMBOL from drm leases.
- Lease cleanups and fixes.
- Create render node for vgem.
Driver Changes:
- Fix build failure in imx without fbdev emulation.
- Add rotation quirk for GPD win2 panel.
- Add support for various CDTech panels, Banana Pi Panel, DLC1010GIG,
Olimex LCD-O-LinuXino, Samsung S6D16D0, Truly NT35597 WQXGA,
Himax HX8357D, simulated RTSM AEMv8.
- Add dw_hdmi support to rockchip driver.
- Fix YUV support in vc4.
- Fix resource id handling in virtio.
- Make rockchip use dw-mipi-dsi bridge driver, and add dual dsi support.
- Advertise that tinydrm only supports DRM_FORMAT_MOD_LINEAR.
- Convert many drivers to use atomic helpers, and drm_fbdev_generic_setup().
- Add Mali linear tiled formats, and enable them in the Mali-DP driver.
- Add support for H6 DE3 mixer 0, DW HDMI, HDMI PHY and TCON TOP.
- Assorted driver cleanups and fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/be7ebd91-edd9-8fa4-4286-1c57e3165113@linux.intel.com
drm-next is forwarded to v4.20-rc1, and we need this to make
a patch series apply.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Add a function to check whether there is at least one plane that
supports a specific format and modifier combination. Drivers can
use this to reject unsupported formats/modifiers in .fb_create().
v2: Accept anyformat if the driver doesn't do planes (Eric)
s/planes_have_format/any_plane_has_format/ (Eric)
Check the modifier as well since we already have a function
that does both
v3: Don't do the check in the core since we may not know the
modifier yet, instead export the function and let drivers
call it themselves
Cc: Eric Anholt <eric@anholt.net>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181029183453.28541-1-ville.syrjala@linux.intel.com
This patch adds a new API to clean up the scheduler job resources. This
is primarliy needed in cases the job was created but was not queued to
the scheduler queue. Additionally with this change, the layer which
creates the scheduler job also gets to free up the job's resources and
this entails moving the dma_fence_put(finished_fence) to the drivers
ops free handler routines.
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
A particular scheduler may become unsuable (underlying HW) after
some event (e.g. GPU reset). If it's later chosen by
the get free sched. policy a command will fail to be
submitted.
Fix:
Add a driver specific callback to report the sched status so
rq with bad sched can be avoided in favor of working one or
none in which case job init will fail.
v2: Switch from driver callback to flag in scheduler.
v3: rebase
v4: Remove ready paramter from drm_sched_init, set
uncoditionally to true once init done.
v5: fix missed change in v3d in v4 (Alex)
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make sure that the global BO state is always correctly initialized.
This allows removing all the device code to initialize it.
v2: fix up vbox (Alex)
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>