Our simulator supports idle check so no need anymore to check if pdev
exists.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
When the driver needs to abort waiters for interrupts, for cases
such as critical events that occur and driver need to do hard reset,
in such scenario the driver will complete the fence to wake up the
waiting thread, and will set the fence error indication.
The return value of the completion API will be greater than 0
since it will return the timeout, but as this indicates successful
completion, the driver should mark it as aborted.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Protect against concurrency of user requesting to register a timestamp
offset (where the driver fills the timestamp when the command submission
has finished executing) to a specific user interrupt ID. The
protection is basically to allow only one timestamp registration
request to be handled at a time.
This is needed because the user can decide to re-use a timestamp
offset (register an already registered offset, to a different
interrupt ID). This means the request will cause the timestamp node to
move from one interrupt list to another interrupt list. In such
scenario, without proper protection, we could end up adding the same
node twice to the interrupts wait lists.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Every time an FD is returned to the user, the driver adds
a corresponding private structure to the list.
Yet, it's still a list of private structures rather than of FDs.
Remove, as well, an unnecessary comment.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Because we might still be using related resources, decrementing PID's
reference count should be done at later stages of the device release.
A good place is right after the representing private structure is
removed from LKD's list.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As part of driver teardown, we attempt to kill all user processes.
It shouldn't fail, but if it does we want to print the error code that
the kapi returned to us.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The preboot used to statically allocate memory for the comms descriptor
on the device memory when driver requested the descriptor information.
Now preboot moved to dynamic memory allocation where it wants to check
the size the driver expects vs. what the f/w expects.
Note there are no backward compatibility issues as older f/w versions
simply ignore this value.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Because in this case we have only a single possible cause, we can
safely stop fetching the cause from firmware.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
hl_device_status() returns the status of an acquired device.
If a device is going down (following an rmmod cmd),
it should be marked as an unusable/malfunctioning device, and
hence should not be acquired.
However, since this was not the case so far (i.e., a device going
down would inaccurately return 'in reset' status allowing the user
to acquire the device) it introduced a bug where as part of a reset
flow, the driver could not kill processes that have not run yet, and
since those processes aren't blocked from reacquiring a device,
we get eventually a new flow of a driver attempting to kill all
processes in a list that can't be ever really empty.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
It is useful for debug to know which user process have acquired the
device.
Add this info to the relevant debug print, in addition to the already
printed user context's ASID.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When an ioctl fails, it is useful to know what is the task command name
and the full ioctl request code, in addition to the task pid and the
ioctl number.
Add the additional information to the relevant debug error prints.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order for user to be aware of undefined opcode events, we must
store all relevant information and notify user about the failure.
The user will fetch the stored info via info ioctl.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If hl_device_cond_reset() is called while a reset is already pending but
hasn't started, the reset request will be dropped.
If the flags of the new request are more severe, e.g. a hard reset while
the pending reset is a compute reset, the eventual reset won't be
suitable for the device status.
To prevent such cases, update the pending reset flags with the new
requests flags before the requests are dropped.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When a H/W event is received while a user is registered to events, no
immediate hard reset will happen, and instead the user will be notified
and will have some time to handle it and eventually release the
device, after which the reset will be done.
If a user, as part of the handling and as part of the cleanup steps
towards releasing the device, unregisters from receiving those events,
and at that time an adjacent H/W event is received, it will be assumed
that the user is not registered to events and thus an immediate hard
reset is required.
To prevent such an unwanted immediate reset, modify the driver to
perform it if the user is not registered to events AND we don't already
have a pending reset for a previous H/W event.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drm-misc-next for v6.7-rc1:
UAPI Changes:
- drm_file owner is now updated during use, in the case of a drm fd
opened by the display server for a client, the correct owner is
displayed.
- Qaic gains support for the QAIC_DETACH_SLICE_BO ioctl to allow bo
recycling.
Cross-subsystem Changes:
- Disable boot logo for au1200fb, mmpfb and unexport logo helpers.
Only fbcon should manage display of logo.
- Update freescale in MAINTAINERS.
- Add some bridge files to bridge in MAINTAINERS.
- Update gma500 driver repo in MAINTAINERS to point to drm-misc.
Core Changes:
- Move size computations to drm buddy allocator.
- Make drm_atomic_helper_shutdown(NULL) a nop.
- Assorted small fixes in drm_debugfs, DP-MST payload addition error handling.
- Fix DRM_BRIDGE_ATTACH_NO_CONNECTOR handling.
- Handle bad (h/v)sync_end in EDID by clipping to htotal.
- Build GPUVM as a module.
Driver Changes:
- Simple drivers don't need to cache prepared result.
- Call drm_atomic_helper_shutdown() in shutdown/unbind for a whole lot
more drm drivers.
- Assorted small fixes in amdgpu, ssd130x, bridge/it6621, accel/qaic,
nouveau, tc358768.
- Add NV12 for komeda writeback.
- Add arbitration lost event to synopsis/dw-hdmi-cec.
- Speed up s/r in nouveau by not restoring some big bo's.
- Assorted nouveau display rework in preparation for GSP-RM,
especially related to how the modeset sequence works and
the DP sequence in relation to link training.
- Update anx7816 panel.
- Support NVSYNC and NHSYNC in tegra.
- Allow multiple power domains in simple driver.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f1fae5eb-25b8-192a-9a53-215e1184ce81@linux.intel.com
Recompute the state of all CRTCs on an FDI link during a modeset that
may be affected by the modeset of other CRTCs on the same link. This
ensures that each CRTC on the link maximizes its BW use (after another
CRTC is disabled).
In practice this means recomputing pipe B's config on IVB if pipe C gets
disabled.
v2:
- Add the change recomputing affected CRTC states in a separate patch.
(Ville)
v3: (Ville)
- Constify old and new crtc states.
- Check for fused off pipe C.
- Fix new vs. old crtc state mixup.
- Drop check for pipe C's enabled state.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230921195159.2646027-12-imre.deak@intel.com
At the moment modesetting pipe C on IVB will fail if pipe B uses 4 FDI
lanes. Make the BW sharing more dynamic by trying to reduce pipe B's
link bpp in this case, until pipe B uses only up to 2 FDI lanes.
For this instead of the encoder compute config retry loop - which
reduced link bpp only for the encoder's pipe - reduce the maximum link
bpp for pipe B/C as required after all CRTC states are computed and
recompute the CRTC states with the new bpp limit.
Atm, all FDI encoder's compute config function returns an error if a BW
constrain prevents increasing the pipe bpp value. The corresponding
crtc_state->bw_constrained check can be replaced with checking
crtc_state->max_link_bpp_x16, add TODO comments for this. SDVO is an
exception where this case is only handled in the outer config retry
loop, failing the modeset with a WARN, add a FIXME comment to handle
this in the encoder code similarly to other encoders.
v2:
- Don't assume that a CRTC is already in the atomic state, while
reducing its link bpp.
- Add DocBook description to intel_fdi_atomic_check_link().
v3:
- Enable BW management for FDI links in a separate patch. (Ville)
v4: (Ville)
- Fail the SDVO encoder config computation if it doesn't support the
link bpp limit.
- Add TODO: comments about checking link_bpp_x16 instead of
bw_constrained.
v5:
- Replace link bpp limit check with a FIXME: comment in
intel_sdvo_compute_config(). (Ville)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[Amended commit message wrt. changes in v5]
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230921195159.2646027-11-imre.deak@intel.com
At the moment a modeset fails if the config computation of a pipe can't
fit its required BW to the available link BW even though the limitation
may be resolved by reducing the BW requirement of other pipes.
To improve the above this patch adds helper functions checking the
overall BW limits after all CRTC states have been computed. If the check
fails the maximum link bpp for a selected pipe will be reduced and all
the CRTC states will be recomputed until either the overall BW limit
check passes, or further bpp reduction is not possible (because all
pipes/encoders sharing the link BW reached their minimum link bpp).
Atm, the MST encoder allocates twice the required BW for YUV420 format
streams. A follow-up patchset will fix that, add a code comment about
this.
This change prepares for upcoming patches enabling the above BW
management on FDI and MST links.
v2:
- Rename intel_crtc_state::max_link_bpp to max_link_bpp_x16 and
intel_link_bw_limits::max_bpp to max_bpp_x16. (Jani)
v3:
- Add the helper functions in a separate patch. (Ville)
- Add the functions to intel_link_bw.c instead of intel_atomic.c (Ville)
- Return -ENOSPC instead of -EINVAL to userspace in case of a link BW
limit failure.
v4:
- Make intel_atomic_check_config() static.
v5: (Ville)
- Rename intel_link_bw_limits::min_bpp_pipes to min_bpp_reached_pipes
and intel_link_bw_reset_pipe_limit_to_min() to
intel_link_bw_set_min_bpp_for_pipe().
- Rename pipe_bpp to link_bpp in intel_link_bw_reduce_bpp().
- Add FIXME: comment about MST encoder's YUV420 BW allocation and
tracking the link bpp limit accordingly.
v6:
- Move intel_link_bw_compute_pipe_bpp() to intel_fdi.c (Ville)
- WARN_ON(BIT(pipe) & min_bpp_reached_pipes) in
intel_link_bw_set_bpp_limit_for_pipe(). (Ville)
- Rename intel_link_bw_set_min_bpp_for_pipe() to
intel_link_bw_set_bpp_limit_for_pipe() and
intel_link_bw_limits::min_bpp_reached_pipes to
bpp_limit_reached_pipes. (Ville)
- Remove unused header includes.
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230921195159.2646027-10-imre.deak@intel.com
Add intel_modeset_pipes_in_mask_early() to modeset a provided set of
pipes, used in a follow-up patch.
As opposed to intel_modeset_all_pipes() which modesets only the active
pipes - others don't requiring programming the HW - modeset all enabled
pipes in intel_modeset_pipes_in_mask_early() which may need to recompute
their state even if they are not active (that is in the DPMS off state).
While at it add DocBook descriptions for the two exported functions.
v2:
- Add a flag controlling if active planes are force updated as well.
- Add DockBook descriptions.
v3:
- For clarity use _early/_late suffixes for the exported functions
instead of the update_active_planes parameter. (Ville)
v4:
- In intel_modeset_pipes_in_mask_early() update only the crtc
flags relevant to the early phase. (Ville)
- Rename intel_modeset_all_pipes() in a separate patch.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230921195159.2646027-7-imre.deak@intel.com
In non-DSC mode the link bpp can be set in 2*3 bpp steps in the pipe bpp
range, while in DSC mode it can be set in 1/16 bpp steps to any value
up to the maximum pipe bpp. Update the limits accordingly in both modes
to prepare for a follow-up patch which may need to reduce the max link
bpp value and starts to check the link bpp limits in DSC mode as well.
While at it add more detail to the link limit debug print and print it
also for DSC mode.
v2:
- Add to_bpp_frac_dec() instead of open coding it. (Jani)
v3: (Ville)
- Add BPP_X16_FMT / BPP_X16_ARG.
- Add TODO: comment about initializing the DSC link bpp limits earlier.
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230921195159.2646027-5-imre.deak@intel.com
A follow-up patch will need to limit the output link bpp both in the
non-DSC and DSC configuration, so track the pipe and link bpp limits
separately in the link_config_limits struct.
Use .4 fixed point format for link bpp matching the 1/16 bpp granularity
in DSC mode and for now keep this limit matching the pipe bpp limit.
v2: (Jani)
- Add to_bpp_int(), to_bpp_x16() helpers instead of opencoding them.
- Rename link_config_limits::link.min/max_bpp to min/max_bpp_x16.
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230921195159.2646027-3-imre.deak@intel.com
Normally we could be in a deep PkgC state all the way up to the
point when DSB starts its execution at the transcoders undelayed
vblank. The DSB will then have to wait for the hardware to
wake up before it can execute anything. This will waste a huge
chunk of the vblank time just waiting, and risks the DSB execution
spilling into the vertical active period. That will be very bad,
especially when programming the LUTs as the anti-collision logic
will cause DSB to corrupt LUT writes during vertical active.
To avoid these problems we can instruct the DSB to pre-wake the
display engine on a specific scanline so that everything will
be 100% ready to go when we hit the transcoder's undelayed vblank.
One annoyance is that the scanline is specified as just that,
a single scanline. So if we happen to start the DSB execution
after passing said scanline no DEwake will happen and we may drop
back into some PkgC state before reaching the transcoder's undelayed
vblank. To prevent that we'll use the "force DEwake" bit to manually
force the display engine to stay awake. We'll then have to clear
the force bit again after the DSB is done (the force bit remains
effective even when the DSB is otherwise disabled).
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230606191504.18099-18-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
We want to start the DSB execution from the transcoder's undelayed
vblank, so in order to guarantee atomicity with the all the other
mmio register writes we need to evade both vblanks.
Note that currently we don't add any vblank delay, so this is
effectively a nop. But in the future when we start to program
double buffered registers from the DSB we'll need to delay the
pipe's vblank to provide the register programming "window2"
for the DSB.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230606191504.18099-15-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Loading LUTs with the DSB outside of vblank doesn't really
work due to the palette anti-collision logic. Apparently the
DSB register writes don't get stalled like CPU mmio writes
do and instead we end up corrupting the LUT entries. Disabling
the anti-collision logic would allow us to successfully load
the LUT outside of vblank, but presumably that risks the LUT
reads from the scanout (temporarily) getting corrupted data
from the LUT instead.
The anti-collision logic isn't active during vblank so that
is when we can successfully load the LUT with the DSB. That is
what we want to do anyway to avoid tearing.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230606191504.18099-13-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Commit ade8a0f598 ("drm/i915: Make all GPU resets atomic") added a
preempt disable section over the hardware reset callback to prepare the
driver for being able to reset from atomic contexts.
In retrospect I can see that the work item at a time was about removing
the struct mutex from the reset path. Code base also briefly entertained
the idea of doing the reset under stop_machine in order to serialize
userspace mmap and temporary glitch in the fence registers (see
eb8d0f5af4 ("drm/i915: Remove GPU reset dependence on struct_mutex"),
but that never materialized and was soon removed in 2caffbf117
("drm/i915: Revoke mmaps and prevent access to fence registers across
reset") and replaced with a SRCU based solution.
As such, as far as I can see, today we still have a requirement that
resets must not sleep (invoked from submission tasklets), but no need to
support invoking them from a truly atomic context.
Given that the preemption section is problematic on RT kernels, since the
uncore lock becomes a sleeping lock and so is invalid in such section,
lets try and remove it. Potential downside is that our short waits on GPU
to complete the reset may get extended if CPU scheduling interferes, but
in practice that probably isn't a deal breaker.
In terms of mechanics, since the preemption disabled block is being
removed we just need to replace a few of the wait_for_atomic macros into
busy looping versions which will work (and not complain) when called from
non-atomic sections.
v2:
* Fix timeouts which are now in us. (Andi)
* Update one comment as a drive by. (Andi)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230926100855.61722-1-tvrtko.ursulin@linux.intel.com
Ideally the busyness worker should take a gt pm wakeref because the
worker only needs to be active while gt is awake. However, the gt_park
path cancels the worker synchronously and this complicates the flow if
the worker is also running at the same time. The cancel waits for the
worker and when the worker releases the wakeref, that would call gt_park
and would lead to a deadlock.
The resolution is to take the global pm wakeref if runtime pm is already
active. If not, we don't need to update the busyness stats as the stats
would already be updated when the gt was parked.
Note:
- We do not requeue the worker if we cannot take a reference to runtime
pm since intel_guc_busyness_unpark would requeue the worker in the
resume path.
- If the gt was parked longer than time taken for GT timestamp to roll
over, we ignore those rollovers since we don't care about tracking the
exact GT time. We only care about roll overs when the gt is active and
running workloads.
- There is a window of time between gt_park and runtime suspend, where
the worker may run. This is acceptable since the worker will not find
any new data to update busyness.
v2: (Daniele)
- Edit commit message and code comment
- Use runtime pm in the worker
- Put runtime pm after enabling the worker
- Use Link tag and add Fixes tag
v3: (Daniele)
- Reword commit and comments and add details
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230925192117.2497058-1-umesh.nerlige.ramappa@intel.com
There is no reason to add gtt_offset to the cached head/tail pointers
stream->oa_buffer.head and stream->oa_buffer.tail. This causes the code to
constantly add gtt_offset and subtract gtt_offset and is error
prone.
It is much simpler to maintain stream->oa_buffer.head and
stream->oa_buffer.tail without adding gtt_offset to them and just allow for
the gtt_offset when reading/writing from/to HW registers.
v2: Minor tweak to commit message due to dropping patch in previous series
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230920040211.2351279-1-ashutosh.dixit@intel.com