From inside an active timeline in the execbuf ioctl, we may try to
reclaim some space in the GGTT. We need GGTT space for all objects on
!full-ppgtt platforms, and for context images everywhere. However, to
free up space in the GGTT we may need to remove some pinned objects
(e.g. context images) that require flushing the idle barriers to remove.
For this we use the big hammer of intel_gt_wait_for_idle()
However, commit 7936a22dd4 ("drm/i915/gt: Wait for new requests in
intel_gt_retire_requests()") will continue spinning on the wait if a
timeline is active but lacks requests, as is the case during execbuf
reservation. Spinning forever is quite time consuming, so revert that
commit and start again.
In practice, the effect commit 7936a22dd4 was trying to achieve is
accomplished by commit 1683d24c14 ("drm/i915/gt: Move new timelines
to the end of active_list"), so there is no immediate rush to replace
the looping.
Testcase: igt/gem_exec_reloc/basic-range
Fixes: a46bfdc83f ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()")
References: 1683d24c14 ("drm/i915/gt: Move new timelines to the end of active_list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-1-chris@chris-wilson.co.uk
(cherry picked from commit 689122dcc3)
[Joonas: Corrected Fixes: tag ref to match drm-intel-next-fixes]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Bonded request submission is designed to allow requests to execute in
parallel as laid out by the user. If the master request is already
finished before its bonded pair is submitted, the pair were not destined
to run in parallel and we lose the information about the master engine
to dictate selection of the secondary. If the second request was
required to be run on a particular engine in a virtual set, that should
have been specified, rather than left to the whims of a random
unconnected requests!
In the selftest, I made the mistake of not ensuring the master would
overlap with its bonded pairs, meaning that it could indeed complete
before we submitted the bonds. Those bonds were then free to select any
available engine in their virtual set, and not the one expected by the
test.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122112152.660743-1-chris@chris-wilson.co.uk
As we start peeking into requests for longer and longer, e.g.
incorporating use of spinlocks when only protected by an
rcu_read_lock(), we need to be careful in how we reset the request when
recycling and need to preserve any barriers that may still be in use as
the request is reset for reuse.
Quoting Linus Torvalds:
> If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?
.. because the object can be accessed (by RCU) after the refcount has
gone down to zero, and the thing has been released.
That's the whole and only point of SLAB_TYPESAFE_BY_RCU.
That flag basically says:
"I may end up accessing this object *after* it has been free'd,
because there may be RCU lookups in flight"
This has nothing to do with constructors. It's ok if the object gets
reused as an object of the same type and does *not* get
re-initialized, because we're perfectly fine seeing old stale data.
What it guarantees is that the slab isn't shared with any other kind
of object, _and_ that the underlying pages are free'd after an RCU
quiescent period (so the pages aren't shared with another kind of
object either during an RCU walk).
And it doesn't necessarily have to have a constructor, because the
thing that a RCU walk will care about is
(a) guaranteed to be an object that *has* been on some RCU list (so
it's not a "new" object)
(b) the RCU walk needs to have logic to verify that it's still the
*same* object and hasn't been re-used as something else.
In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
immediately, but because it gets reused as the same kind of object,
the RCU walker can "know" what parts have meaning for re-use, in a way
it couidn't if the re-use was random.
That said, it *is* subtle, and people should be careful.
> So the re-use might initialize the fields lazily, not necessarily using a ctor.
If you have a well-defined refcount, and use "atomic_inc_not_zero()"
to guard the speculative RCU access section, and use
"atomic_dec_and_test()" in the freeing section, then you should be
safe wrt new allocations.
If you have a completely new allocation that has "random stale
content", you know that it cannot be on the RCU list, so there is no
speculative access that can ever see that random content.
So the only case you need to worry about is a re-use allocation, and
you know that the refcount will start out as zero even if you don't
have a constructor.
So you can think of the refcount itself as always having a zero
constructor, *BUT* you need to be careful with ordering.
In particular, whoever does the allocation needs to then set the
refcount to a non-zero value *after* it has initialized all the other
fields. And in particular, it needs to make sure that it uses the
proper memory ordering to do so.
NOTE! One thing to be very worried about is that re-initializing
whatever RCU lists means that now the RCU walker may be walking on the
wrong list so the walker may do the right thing for this particular
entry, but it may miss walking *other* entries. So then you can get
spurious lookup failures, because the RCU walker never walked all the
way to the end of the right list. That ends up being a much more
subtle bug.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
- Fix kernel oops on dumb_create ioctl on no crtc situation
- Fix bad ugly colored flash on VLV/CHV related to gamma LUT update
- Fix unity of the frequencies reported on PMU
- Fix kernel oops on set_page_dirty using better locks around it
- Protect the request pointer with RCU to prevent it being freed while we might need still
- Make pool objects read-only
- Restore physical addresses for fb_map to avoid corrupted page table
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121165339.GA23920@intel.com
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Link: https://lore.kernel.org/r/20191118003513.10852-1-fred@fredlawl.com
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Previously we masked PCIe Link Control 2 register values with "7 << 9",
which was apparently intended to be the Transmit Margin field, but instead
was the high order bit of Transmit Margin, the Enter Modified Compliance
bit, and the Compliance SOS bit.
Correct the mask to "7 << 7", which is the Transmit Margin field.
Link: https://lore.kernel.org/r/20191112173503.176611-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
[bhelgaas: fix a couple remaining instances in cik.c]
Link: https://lore.kernel.org/r/20191118003513.10852-1-fred@fredlawl.com
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Previously we masked PCIe Link Control 2 register values with "7 << 9",
which was apparently intended to be the Transmit Margin field, but instead
was the high order bit of Transmit Margin, the Enter Modified Compliance
bit, and the Compliance SOS bit.
Correct the mask to "7 << 7", which is the Transmit Margin field.
Link: https://lore.kernel.org/r/20191112173503.176611-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.
Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk
For locking semantics it really doesn't matter when we grab the
ticket. But for lockdep validation it does: the acquire ctx is a fake
lockdep. Since other drivers might want to do a full multi-lock dance
in their fault-handler, not just lock a single dma_resv. Therefore we
must init the acquire_ctx only after we've done all the copy_*_user or
anything else that might trigger a pagefault. For msm this means we
need to move it past submit_lookup_objects.
Aside: Why is msm still using struct_mutex, it seems to be using
dma_resv_lock for general buffer state protection?
v2:
- Add comment to explain why the ww ticket setup is separate (Rob)
- Fix up error handling, we need to make sure we don't call
ww_acquire_fini without _init.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-and-tested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20191120105607.3023-1-daniel.vetter@ffwll.ch
From inside an active timeline in the execbuf ioctl, we may try to
reclaim some space in the GGTT. We need GGTT space for all objects on
!full-ppgtt platforms, and for context images everywhere. However, to
free up space in the GGTT we may need to remove some pinned objects
(e.g. context images) that require flushing the idle barriers to remove.
For this we use the big hammer of intel_gt_wait_for_idle()
However, commit 7936a22dd4 ("drm/i915/gt: Wait for new requests in
intel_gt_retire_requests()") will continue spinning on the wait if a
timeline is active but lacks requests, as is the case during execbuf
reservation. Spinning forever is quite time consuming, so revert that
commit and start again.
In practice, the effect commit 7936a22dd4 was trying to achieve is
accomplished by commit 1683d24c14 ("drm/i915/gt: Move new timelines
to the end of active_list"), so there is no immediate rush to replace
the looping.
Testcase: igt/gem_exec_reloc/basic-range
Fixes: 7936a22dd4 ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()")
References: 1683d24c14 ("drm/i915/gt: Move new timelines to the end of active_list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-1-chris@chris-wilson.co.uk
For fine grained dpm, there is only two levels supported. However
to reflect correctly the current clock frequency, there is an
intermediate level faked. Thus on forcing level setting, we
need to treat level 2 correctly as level 1.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The "panel" pointer is not reset to NULL if of_drm_find_panel()
returns an error. Therefore we later assume that a panel was found,
and try to dereference the error pointer, resulting in:
mcde-dsi a0351000.dsi: failed to find panel try bridge (4294966779)
Unable to handle kernel paging request at virtual address fffffe03
PC is at drm_panel_bridge_add.part.0+0x10/0x5c
LR is at mcde_dsi_bind+0x120/0x464
...
Reset "panel" to NULL to avoid this problem.
Also change the format string of the error to %ld to print
the negative errors correctly. The crash above then becomes:
mcde-dsi a0351000.dsi: failed to find panel try bridge (-517)
mcde-dsi a0351000.dsi: no panel or bridge
...
Fixes: 5fc537bfd0 ("drm/mcde: Add new driver for ST-Ericsson MCDE")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191118130252.170324-1-stephan@gerhold.net
In commit a79ca656b6 ("drm/i915: Push the wakeref->count deferral to
the backend"), I erroneously concluded that we last modify the engine
inside __i915_request_commit() meaning that we could enable concurrent
submission for userspace as we enqueued this request. However, this
falls into a trap with other users of the engine->kernel_context waking
up and submitting their request before the idle-switch is queued, with
the result that the kernel_context is executed out-of-sequence most
likely upsetting the GPU and certainly ourselves when we try to retire
the out-of-sequence requests.
As such we need to hold onto the effective engine->kernel_context mutex
lock (via the engine pm mutex proxy) until we have finish queuing the
request to the engine.
v2: Serialise against concurrent intel_gt_retire_requests()
v3: Describe the hairy locking scheme with intel_gt_retire_requests()
for future reference.
v4: Combine timeline->lock and engine pm release; it's hairy.
Fixes: a79ca656b6 ("drm/i915: Push the wakeref->count deferral to the backend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-2-chris@chris-wilson.co.uk
The general concept was that intel_timeline.active_count was locked by
the intel_timeline.mutex. The exception was for power management, where
the engine->kernel_context->timeline could be manipulated under the
global wakeref.mutex.
This was quite solid, as we always manipulated the timeline only while
we held an engine wakeref.
And then we started retiring requests outside of struct_mutex, only
using the timelines.active_list and the timeline->mutex. There we
started manipulating intel_timeline.active_count outside of an engine
wakeref, and so introduced a race between __engine_park() and
intel_gt_retire_requests(), a race that could result in the
engine->kernel_context not being added to the active timelines and so
losing requests, which caused us to keep the system permanently powered
up [and unloadable].
The race would be easy to close if we could take the engine wakeref for
the timeline before we retire -- except timelines are not bound to any
engine and so we would need to keep all active engines awake. The
alternative is to guard intel_timeline_enter/intel_timeline_exit for use
outside of the timeline->mutex.
Fixes: e5dadff4b0 ("drm/i915: Protect request retirement with timeline->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk