The WA states that we need to alert the GSC FW before doing a GSC engine
reset and then wait for 200ms. The GuC owns engine reset, so on the i915
side we only need to apply this for full GT reset.
Given that we do full GT resets in the resume paths to cleanup the HW
state and that a long wait in those scenarios would not be acceptable,
a faster path has been introduced where, if the GSC is idle, we try first
to individually reset the GuC and all engines except the GSC and only fall
back to full reset if that fails.
Note: according to the WA specs, if the GSC is idle it should be possible
to only wait for the uC wakeup time (~15ms) instead of the whole 200ms.
However, the GSC FW team have mentioned that the wakeup time can change
based on other things going on in the HW and pcode, so a good security
margin would be required. Given that when the GSC is idle we already
skip the wait & reset entirely and that this reduced wait would still
likely be too long to use in resume paths, it's not worth adding support
for this reduced wait.
v2: add comment to explain why it is safe to skip the GSC reset (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323231857.2194435-2-daniele.ceraolospurio@intel.com
Commit 3db9d59055 ("drm/i915/gt: Reset twice") modified the code to
always hit the GDRST register twice when doing a reset, with the
reported aim to fix invalid post-reset engine state on some platforms
(Jasperlake being the only one actually mentioned).
This is a problem on MTL, due to the fact that we have to apply a time
consuming WA (coming in the next patch) every time we hit the GDRST
register in a way that can include the GSC engine. Even post MTL, the
expectation is that we'll have some work to do before and after hitting
the GDRST if the GSC is involved.
Since the issue requiring the double reset seems to be limited to older
platforms, instead of trying to handle the double-reset on MTL and
future platforms it is just easier to turn it off. The default on MTL is
also for GuC to own engine reset, with i915 only covering full-GT reset.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323231857.2194435-1-daniele.ceraolospurio@intel.com
Now that we may have multiple OA units in a single GT as well as on
separate GTs, create an engine group that maps to a single OA unit.
v2: (Jani)
- Drop warning on ENOMEM
- Reorder patch in the series
v3: (Ashutosh)
- Remove unused members from perf structs
- Update comments
- Update engine_supports_oa check
- Just return 1 in num_perf_groups_per_gt for now
- Set engine->oa_group to NULL to begin with
v4: Use engine_supports_oa() check in oa_init_reg_state (Ashutosh)
v5: Rebase after dropping engine_supports_oa helper
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-5-umesh.nerlige.ramappa@intel.com
There are multiple ways in which the GuC load can fail. The driver was
reporting the status register as is, but not everyone can read the
matrix unfiltered. So add decoding of the common error cases.
Also, remove the comment about interrupt based load completion
checking being not recommended. The interrupt was removed from the GuC
firmware some time ago so it is no longer an option anyway. While at
it, also abort the timeout if a known error code is reported. No need
to keep waiting if the GuC has already given up the load.
v2: Fix mis-matched case and confusing 'success' variable (Daniele).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316220632.3312218-2-John.C.Harrison@Intel.com
The commit 82a149a62b ("drm/i915/gt: move remaining debugfs
interfaces into gt") moved gt-related debugfs files in the gtX/
directories to operate on individual gt's.
However, the original files were only functioning on the root
GT (GT 0) and have been left in the same location to maintain
compatibility with userspace users.
Add multiplexing functionality to the higher directories' files.
This enables the operations to be performed on all the GTs with
a single write. In the case of reads, the files provide an or'ed
value across all the tiles.
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230318203616.183765-4-andi.shyti@linux.intel.com
To support multi-GT configurations, we need to generate
independent debug files for each GT.
To achieve this create a separate directory for each GT under the
debugfs directory. For instance, in a system with two GTs, the
debugfs structure would look like this:
/sys/kernel/debug/dri
└── 0
├── gt0
│ ├── drpc
│ ├── engines
│ ├── forcewake
│ ├── frequency
│ └── rps_boost
└── gt1
: ├── drpc
: ├── engines
: ├── forcewake
├── frequency
└── rps_boost
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230318203616.183765-2-andi.shyti@linux.intel.com
Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead.
Address the following warning found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
drivers/gpu/drm/i915/gem/i915_gem_context.c: In function ‘set_proto_ctx_engines.isra’:
drivers/gpu/drm/i915/gem/i915_gem_context.c:769:41: warning: array subscript n is outside array bounds of ‘struct i915_engine_class_instance[0]’ [-Warray-bounds=]
769 | if (copy_from_user(&ci, &user->engines[n], sizeof(ci))) {
| ^~~~~~~~~~~~~~~~~
./include/uapi/drm/i915_drm.h:2494:43: note: while referencing ‘engines’
2494 | struct i915_engine_class_instance engines[0];
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/271
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZBSu2QsUJy31kjSE@work
Expose intel_rps_read_actual_frequency_fw to read the actual freq without
taking forcewake for use by PMU. The code is refactored to use a common set
of functions across sysfs and PMU. Using common functions with sysfs in PMU
solves the issues of missing support for MTL and missing support for older
generations (prior to Gen6). It also future proofs the PMU where sometimes
code has been updated for sysfs and PMU has been missed.
v2: Remove runtime_pm_if_in_use from read_actual_frequency_fw (Tvrtko)
v3: (Tvrtko)
- Remove goto in __read_cagf
- Unexport intel_rps_get_cagf and intel_rps_read_punit_req
Fixes: 22009b6dad ("drm/i915/mtl: Modify CAGF functions for MTL")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8280
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316004800.2539753-1-ashutosh.dixit@intel.com
On MTL, objects allocated through i915_gem_object_create_internal() are
mapped as uncached in GPU by default because HAS_LLC is false. However
in the live_hwsp_read selftest these watcher objects are mapped as WB
on CPU side. The conseqence is that the updates done by the GPU are not
immediately visible to CPU, thus the selftest is randomly failing due to
the stale data in CPU cache. Solution can be either setting WC for CPU +
UC for GPU, or WB for CPU + 1-way coherent WB for GPU.
To keep the consistency, let's simply inherit the same cache settings
from the timeline, which is WB for CPU + 1-way coherent WB for GPU,
because this test is supposed to emulate the behavior of the timeline
anyway.
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230315180800.2632766-1-fei.yang@intel.com
Write-combining memory allows speculative reads by CPU.
ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
to prefetch memory beyond the error_capture, ie it tries
to read memory pointed by next PTE in GGTT.
If this PTE points to invalid address DMAR errors will occur.
This behaviour was observed on ADL and RPL platforms.
To avoid it, guard scratch page should be added after error_capture.
The patch fixes the most annoying issue with error capture but
since WC reads are used also in other places there is a risk similar
problem can affect them as well.
v2:
- modified commit message (I hope the diagnosis is correct),
- added bug checks to ensure scratch is initialized on gen3 platforms.
CI produces strange stacktrace for it suggesting scratch[0] is NULL,
to be removed after resolving the issue with gen3 platforms.
v3:
- removed bug checks, replaced with gen check.
v4:
- change code for scratch page insertion to support all platforms,
- add info in commit message there could be more similar issues
v5:
- check for nop_clear_range instead of gen8 (Tvrtko),
- re-insert scratch pages on resume (Tvrtko)
v6:
- use scratch_range callback to set scratch pages (Chris)
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308-guard_error_capture-v6-2-1b5f31422563@intel.com
The Wa_14017073508 require to send Media Busy/Idle mailbox while
accessing Media tile. As of now it is getting handled while __gt_unpark,
__gt_park. But there are various corner cases where forcewakes are taken
without __gt_unpark i.e. without sending Busy Mailbox especially during
register reads. Forcewakes are taken without busy mailbox leads to
GPU HANG. So bringing mailbox calls under forcewake calls are no feasible
option as forcewake calls are atomic and mailbox calls are blocking.
The issue already fixed in B step so disabling MC6 on A step and
reverting previous commit which handles Wa_14017073508
Fixes: 8f70f1ec58 ("drm/i915/mtl: Add Wa_14017073508 for SAMedia")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310061339.2495416-2-badal.nilawar@intel.com
The CI results for the 'fast request' patch set (enables error return
codes for fire-and-forget H2G messages) hit an issue with the KMD
sending context submission requests on an invalid context. That was
caused by a fault injection probe failing the context creation of a
kernel context. However, there was no return code checking on any of
the kernel context registration paths. So the driver kept going and
tried to use the kernel context for the record defaults process.
This would not cause any actual problems. The invalid requests would
be rejected by GuC and ultimately the start up sequence would
correctly wedge due to the context creation failure. But fixing the
issue correctly rather ignoring it means we won't get CI complaining
when the fast request patch lands and enables the extra error checking.
So fix it by checking for errors and aborting as appropriate when
creating kernel contexts. While at it, clean up some other submission
init related failure cleanup paths. Also, rename guc_init_lrc_mapping
to guc_init_submission as the former name hasn't been valid in a long
time.
v2: Add another wrapper to keep the flow balanced (Daniele)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230217223308.3449737-3-John.C.Harrison@Intel.com
The GSC FW load is a slow process (up to 250 ms), so we defer it to a
dedicated worker to avoid stalling the init flow for that long. However,
we currently start this worker before the HW init is complete, so there
is a chance that the GSC loading code submits to the HW before the
engine initialization has completed. We can easily fix this by starting
the thread later in the gt_resume flow.
From this later spot, the GSC code can still race with the default
submission code; we functionally don't care who wins the race (the GSC
load doesn't need any state), but since the whole point of the separate
worker is to make the main thread faster, we prefer the default
submission code to run first. Therefore, make an exception for driver
probe and only and start the gsc load from uc_init_late.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230223172120.3304293-3-daniele.ceraolospurio@intel.com
If we unload the driver and wedge before the GSC worker is complete,
the worker will hit an error on its submission to the GSC engine and
then exit. This is hard to hit for a user, but it is reproducible
with skipping selftests. The error is handled gracefully by the
worker, so there are no functional issues, but we still end up with
an error message in dmesg, which is something we want to avoid as
this is a supported scenario. We could modify the worker to better
handle a wedging occurring during its execution, but that gets
complicated for a couple of reasons:
- We do want the error on runtime wedging, because there are
implications for subsystems outside of GT (i.e., PXP, HDCP), it's
only the error on driver unload that we want to silence.
- The worker is responsible for multiple submissions (GSC FW load,
HuC auth, SW proxy), so all of those will have to be adapted to
handle the wedged_on_fini scenario.
Therefore, it's much simpler to just wait for the worker to be done
before wedging on driver removal, also considering that the worker
will likely already be idle in the great majority of non-selftest
scenarios.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230223172120.3304293-2-daniele.ceraolospurio@intel.com