If our interrupt handler gets called and we don't really handle the
interrupt then we should return IRQ_NONE. The current interrupt
handler didn't do this, so let's fix it.
NOTE: for some of the cases it's clear that we should return IRQ_NONE
and some cases it's clear that we should return IRQ_HANDLED. However,
there are a few that fall somewhere in between. Specifically, the
documentation for when to return IRQ_NONE vs. IRQ_HANDLED is probably
best spelled out in the commit message of commit d9e4ad5bad ("Document
that IRQ_NONE should be returned when IRQ not actually handled"). That
commit makes it clear that we should return IRQ_HANDLED if we've done
something to make the interrupt stop happening.
The case where it's unclear is, for instance, in dp_aux_isr() after
we've read the interrupt using dp_catalog_aux_get_irq() and confirmed
that "isr" is non-zero. The function dp_catalog_aux_get_irq() not only
reads the interrupts but it also "ack"s all the interrupts that are
returned. For an "unknown" interrupt this has a very good chance of
actually stopping the interrupt from happening. That would mean we've
identified that it's our device and done something to stop them from
happening and should return IRQ_HANDLED. Specifically, it should be
noted that most interrupts that need "ack"ing are ones that are
one-time events and doing an "ack" is enough to clear them. However,
since these interrupts are unknown then, by definition, it's unknown
if "ack"ing them is truly enough to clear them. It's possible that we
also need to remove the original source of the interrupt. In this
case, IRQ_NONE would be a better choice.
Given that returning an occasional IRQ_NONE isn't the absolute end of
the world, however, let's choose that course of action. The IRQ
framework will forgive a few IRQ_NONE returns now and again (and it
won't even log them, which is why we have to log them ourselves). This
means that if we _do_ end hitting an interrupt where "ack"ing isn't
enough the kernel will eventually detect the problem and shut our
device down.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/520660/
Link: https://lore.kernel.org/r/20230126170745.v2.2.I2d7aec2fadb9c237cd0090a47d6a8ba2054bf0f8@changeid
[DB: reformatted commit message to make checkpatch happy]
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
The DP AUX interrupt handling was a bit of a mess.
* There were two functions (one for "native" transfers and one for
"i2c" transfers) that were quite similar. It was hard to say how
many of the differences between the two functions were on purpose
and how many of them were just an accident of how they were coded.
* Each function sometimes used "else if" to test for error bits and
sometimes didn't and again it was hard to say if this was on purpose
or just an accident.
* The two functions wouldn't notice whether "unknown" bits were
set. For instance, there seems to be a bit "DP_INTR_PLL_UNLOCKED"
and if it was set there would be no indication.
* The two functions wouldn't notice if more than one error was set.
Let's fix this by being more consistent / explicit about what we're
doing.
By design this could cause different handling for AUX transfers,
though I'm not actually aware of any bug fixed as a result of
this patch (this patch was created because we simply noticed how odd
the old code was by code inspection). Specific notes here:
1. In the old native transfer case if we got "done + wrong address"
we'd ignore the "wrong address" (because of the "else if"). Now we
won't.
2. In the old native transfer case if we got "done + timeout" we'd
ignore the "timeout" (because of the "else if"). Now we won't.
3. In the old native transfer case we'd see "nack_defer" and translate
it to the error number for "nack". This differed from the i2c
transfer case where "nack_defer" was given the error number for
"nack_defer". This 100% can't matter because the only user of this
error number treats "nack defer" the same as "nack", so it's clear
that the difference between the "native" and "i2c" was pointless
here.
4. In the old i2c transfer case if we got "done" plus any error
besides "nack" or "defer" then we'd ignore the error. Now we don't.
5. If there is more than one error signaled by the hardware it's
possible that we'll report a different one than we used to. I don't
know if this matters. If someone is aware of a case this matters we
should document it and change the code to make it explicit.
6. One quirk we keep (I don't know if this is important) is that in
the i2c transfer case if we see "done + defer" we report that as a
"nack". That seemed too intentional in the old code to just drop.
After this change we will add extra logging, including:
* A warning if we see more than one error bit set.
* A warning if we see an unexpected interrupt.
* A warning if we get an AUX transfer interrupt when shouldn't.
It actually turns out that as a result of this change then at boot we
sometimes see an error:
[drm:dp_aux_isr] *ERROR* Unexpected DP AUX IRQ 0x01000000 when not busy
That means that, during init, we are seeing DP_INTR_PLL_UNLOCKED. For
now I'm going to say that leaving this error reported in the logs is
OK-ish and hopefully it will encourage someone to track down what's
going on at init time.
One last note here is that this change renames one of the interrupt
bits. The bit named "i2c done" clearly was used for native transfers
being done too, so I renamed it to indicate this.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/520658/
Link: https://lore.kernel.org/r/20230126170745.v2.1.I90ffed3ddd21e818ae534f820cb4d6d8638859ab@changeid
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
The adreno_load_gpu() path is guarded by an error check on
adreno_load_fw(). This function is responsible for loading
Qualcomm-only-signed binaries (e.g. SQE and GMU FW for A6XX), but it
does not take the vendor-signed ZAP blob into account.
By embedding the SQE (and GMU, if necessary) firmware into the
initrd/kernel, we can trigger and unfortunate path that would not bail
out early and proceed with gpu->hw_init(). That will fail, as the ZAP
loader path will not find the firmware and return back to
adreno_load_gpu().
This error path involves pm_runtime_put_sync() which then calls idle()
instead of suspend(). This is suboptimal, as it means that we're not
going through the clean shutdown sequence. With at least A619_holi, this
makes the GPU not wake up until it goes through at least one more
start-fail-stop cycle. The pm_runtime_put_sync that appears in the error
path actually does not guarantee that because of the earlier enabling of
runtime autosuspend.
Fix that by using pm_runtime_put_sync_suspend to force a clean shutdown.
Test cases:
1. All firmware baked into kernel
2. error loading ZAP fw in initrd -> load from rootfs at DE start
Both succeed on A619_holi (SM6375) and A630 (SDM845).
Fixes: 0d997f95b7 ("drm/msm/adreno: fix runtime PM imbalance at gpu load")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/530001/
Link: https://lore.kernel.org/r/20230330231517.2747024-1-konrad.dybcio@linaro.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.
This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:
1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers
See https://patchwork.freedesktop.org/series/93035/
This does not yet add any UAPI, although this will be needed in
a number of cases:
1) Workloads "ping-ponging" between CPU and GPU, where we don't
want the GPU freq governor to interpret time stalled waiting
for GPU as "idle" time
2) Cases where the compositor is waiting for fences to be signaled
before issuing the atomic ioctl, for example to maintain 60fps
cursor updates even when the GPU is not able to maintain that
framerate.
Signed-off-by: Rob Clark <robdclark@chromium.org>
For an atomic commit updating a single CRTC (ie. a pageflip) calculate
the next vblank time, and inform the fence(s) of that deadline.
v2: Comment typo fix (danvet)
v3: If there are multiple CRTCs, consider the time of the soonest vblank
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Will be used in the next commit to set a deadline on fences that an
atomic update is waiting on.
v2: Calculate time at *start* of vblank period, not end
v3: Fix kbuild complaints
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
As the finished fence is the one that is exposed to userspace, and
therefore the one that other operations, like atomic update, would
block on, we need to propagate the deadline from from the finished
fence to the actual hw fence.
v2: Split into drm_sched_fence_set_parent() (ckoenig)
v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees
fence->parent set before drm_sched_fence_set_parent() does this
test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
We had all of the internal driver APIs, but not the all important
userspace uABI, in the dma-buf doc. Fix that. And re-arrange the
comments slightly as otherwise the comments for the ioctl nr defines
would not show up.
v2: Fix docs build warning coming from newly including the uabi header
in the docs build
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Add a way to set a deadline on remaining resv fences according to the
requested usage.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Propagate the deadline to all the fences in the chain.
v2: Use dma_fence_chain_contained [Tvrtko]
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Now that everything that controls which LRU an obj lives in *except* the
backing pages is protected by the LRU lock, add a special path to unpin
in the job_run() path, where we are assured that we already have backing
pages and will not be racing against eviction (because the GEM object's
dma_resv contains the fence that will be signaled when the submit/job
completes).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527845/
Link: https://lore.kernel.org/r/20230320144356.803762-10-robdclark@gmail.com
Since the LRU lock is already acquired when moving an obj between LRUs,
we can use it to protect pin_count and madv, without any significant
change in locking (ie. it just expands the scope of the lock by a hand-
ful of instructions). This prepares the way to decrement the pin_count
in the job_run() path without needing to hold the obj lock, to avoid a
potential deadlock (or rather stall) caused by the fence-signaling path
(job_run()) blocking on shrinker/reclaim. (Only a stall because the
wait for fence signaling wait_for_idle() is not infinite.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527843/
Link: https://lore.kernel.org/r/20230320144356.803762-9-robdclark@gmail.com
The flags are only accessed (1) when submit is constructed, before
enqueuing to gpu sched (ie. when still visible to only the task calling
the submit ioctl), (2) here, where we own a reference to the submit and
are serialized on the gpu sched thread, and (3) after the submit is
retired and last reference is dropped, which is serialized on the
submit's reference count. Hence locking is unneeded here.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527830/
Link: https://lore.kernel.org/r/20230320144356.803762-3-robdclark@gmail.com
Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss. This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.
v2: Drop dma_fence::deadline and related logic to filter duplicate
deadlines, to avoid increasing dma_fence size. The fence-context
implementation will need similar logic to track deadlines of all
the fences on the same timeline. [ckoenig]
v3: Clarify locking wrt. set_deadline callback
v4: Clarify in docs comment that this is a hint
v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v6: More docs
v7: Fix typo, clarify past deadlines
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
A recent commit moved enabling of runtime PM to GPU load time (first
open()) but failed to update the error paths so that runtime PM is
disabled if initialisation of the GPU fails. This would trigger a
warning about the unbalanced disable count on the next open() attempt.
Note that pm_runtime_put_noidle() is sufficient to balance the usage
count when pm_runtime_put_sync() fails (and is chosen over
pm_runtime_resume_and_get() for consistency reasons).
Fixes: 4b18299b33 ("drm/msm/adreno: Defer enabling runpm until hw_init()")
Cc: stable@vger.kernel.org # 6.0
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/524971/
Link: https://lore.kernel.org/r/20230303164807.13124-3-johan+linaro@kernel.org
Signed-off-by: Rob Clark <robdclark@chromium.org>