Most of the context workarounds tweak masked registers, but not all. For
masked registers, when writing the value it's sufficient to just write
the wa->set_bits since that will take care of both the clr and set bits
as well as not overwriting other bits.
However there are some workarounds, the registers are non-masked. Up
until now the driver was simply emitting a MI_LOAD_REGISTER_IMM with the
set_bits to program the register via the GPU in the WA bb. This has the
side effect of overwriting the content of the register outside of bits
that should be set and also doesn't handle the bits that should be
cleared.
Kenneth reported that on DG2, mesa was seeing a weird behavior due to
the kernel programming of L3SQCREG5 in dg2_ctx_gt_tuning_init(). With
the GPU idle, that register could be read via intel_reg as 0x00e001ff,
but during a 3D workload it would change to 0x0000007f. So the
programming of that tuning was affecting more than the bits in
L3_PWM_TIMER_INIT_VAL_MASK. Matt Roper noticed the lack of rmw for the
context workarounds due to the use of MI_LOAD_REGISTER_IMM.
So, for registers that are not masked, read its value via mmio, modify
and then set it in the buffer to be written by the GPU. This should take
care in a simple way of programming just the bits required by the
tuning/workaround. If in future there are registers that involved that
can't be read by the CPU, a more complex approach may be required like
a) issuing additional instructions to read and modify; or b) scan the
golden context and patch it in place before saving it; or something
else. But for now this should suffice.
Scanning the context workarounds for all platforms, these are the
impacted ones with the respective registers
mtl: DRAW_WATERMARK
mtl/dg2: XEHP_L3SQCREG5, XEHP_FF_MODE2
ICL has some non-masked registers in the context workarounds:
GEN8_L3CNTLREG, IVB_FBC_RT_BASE and VB_FBC_RT_BASE_UPPER, but there
shouldn't be an impact. The first is already being manually read and the
other 2 are intentionally overwriting the entire register. Same
reasoning applies to GEN12_FF_MODE2: the WA is intentionally
overwriting all the bits to avoid a read-modify-write.
v2: Reword commit message wrt GEN12_FF_MODE2 and the changed behavior
on preparatory patches.
v3: Also skip reading if clear|set bits covers everything
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Matt Roper <matthew.d.roper@intel.com>
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23783#note_1968971
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-4-lucas.demarchi@intel.com
Right now context workarounds don't do a rmw and instead only write to
the register. Since 2 separate programmings to the same register are
coalesced into a single write, this is not problematic for
GEN12_FF_MODE2 since both TDS and GS timer are going to be written
together and the other remaining bits be zeroed.
However in order to fix other workarounds that may want to preserve the
unrelated bits in the same register, context workarounds need to
be changed to a rmw. To prepare for that, move the programming of
GEN12_FF_MODE2 to a single place so the value passed for "clear" can
be all the bits. Otherwise the second workaround would be dropped as
it'd be detected as overwriting a previously programmed workaround.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-3-lucas.demarchi@intel.com
kmap() has been deprecated in favor of the kmap_local_page()
due to high cost, restricted mapping space, the overhead of a
global lock for synchronization, and making the process sleep
in the absence of free slots.
kmap_local_page() is faster than kmap() and offers thread-local
and CPU-local mappings, can take pagefaults in a local kmap region
and preserves preemption by saving the mappings of outgoing tasks
and restoring those of the incoming one during a context switch.
The mapping is kept thread local in the function
“i915_vma_coredump_create” in i915_gpu_error.c
Therefore, replace kmap() with kmap_local_page().
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sumitra Sharma <sumitraartsy@gmail.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230617180420.GA410966@sumitra.com
[tursulin: Removed blank line within tags. Fixup commit text.]
This workaround was already implemented for DG2, PVC, and some steppings
of MTL, but the workaround database has now been updated to extend this
workaround to TGL, RKL, DG1, and ADL.
v2:
- Skip readback verification for these extra gen12lp platforms. On
some of the platforms, the firmware locks this register, preventing
the driver from making any modifications. We should still try to
apply the workaround, but if the register is locked and the value
doesn't stick, that's semi-expected and not something we want to flag
as a driver error on debug builds.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230616225041.3922719-1-matthew.d.roper@intel.com
The scenario being fixed here is depicted in the following sequence-
modprobe i915
echo 1 > /sys/class/drm/card0/gt/gt0/slpc_ignore_eff_freq
echo 300 > /sys/class/drm/card0/gt_min_freq_mhz (RPn)
cat /sys/class/drm/card0/gt_cur_freq_mhz --> cur == RPn as expected
echo 1 > /sys/kernel/debug/dri/0/gt0/reset --> reset
cat /sys/class/drm/card0/gt_min_freq_mhz --> cached freq is RPn
cat /sys/class/drm/card0/gt_cur_freq_mhz --> it's not RPn, but RPe!!
When SLPC reinitializes, it sets SLPC min freq to efficient frequency.
Even if we disable efficient freq post that, we should restore the cached
min freq (via H2G) for it to take effect.
v2: Clarify commit message (Ashutosh)
Fixes: 95ccf312a1 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency")
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230621014257.1769564-1-vinay.belgaumkar@intel.com
intel_gsc_uc_fw_proxy_init_done is used by a few code paths
and usages. However, certain paths need a wakeref while others
can't take a wakeref such as from the runtime_pm_resume callstack.
Add a param into this helper to allow callers to direct whether
to take the wakeref or not. This resolves the following bug:
INFO: task sh:2607 blocked for more than 61 seconds.
Not tainted 6.3.0-pxp-gsc-final-jun14+ #297
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sh state:D stack:13016 pid:2607 ppid:2602 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x47b/0xe10
schedule+0x58/0xd0
rpm_resume+0x1cc/0x800
? __pfx_autoremove_wake_function+0x10/0x10
__pm_runtime_resume+0x42/0x80
__intel_runtime_pm_get+0x19/0x80 [i915]
gsc_uc_get_fw_status+0x10/0x50 [i915]
intel_gsc_uc_fw_init_done+0x9/0x20 [i915]
intel_gsc_uc_load_start+0x5b/0x130 [i915]
__uc_resume+0xa5/0x280 [i915]
intel_runtime_resume+0xd4/0x250 [i915]
? __pfx_pci_pm_runtime_resume+0x10/0x10
__rpm_callback+0x3c/0x160
Fixes: 8c33c3755b ("drm/i915/gsc: take a wakeref for the proxy-init-completion check")
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230615211940.4061378-1-alan.previn.teres.alexis@intel.com
The release and security versions of the GSC binary are not used at
runtime to decide interface compatibility (there is a separate version
for that), but they're still useful for debug, so it is still worth
extracting them and printing them out in dmesg.
To get to these version, we need to navigate through various headers in
the binary. See in-code comment for details.
v2: fix and improve size checks when crawling the binary header, add
comment about the different version, wrap the partition base/offset
pairs in the GSC header in a struct (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230612181529.2222451-3-daniele.ceraolospurio@intel.com
A few fixes/updates are required around the GSC memory allocation and it
is easier to do them all at the same time. The changes are as follows:
1 - Switch the memory allocation to stolen memory. We need to avoid
accesses from GSC FW to normal memory after the suspend function has
completed and to do so we can either switch to using stolen or make sure
the GSC is gone to sleep before the end of the suspend function. Given
that the GSC waits for a bit before going idle even if there are no
pending operations, it is easier and quicker to just use stolen memory.
2 - Reduce the GSC allocation size to 4MBs, which is the POR requirement.
The 8MBs were needed only for early FW and I had misunderstood that as
being the expected POR size when I sent the original patch.
3 - Perma-map the GSC allocation. This isn't required immediately, but it
will be needed later to be able to quickly extract the GSC logs, which are
inside the allocation. Since the mapping code needs to be rewritten due to
switching to stolen, it makes sense to do the switch immediately to avoid
having to change it again later.
Note that the explicit setting of CACHE_NONE for Wa_22016122933 has been
dropped because that's the default setting for stolen memory on !LLC
platforms.
v2: only memset the memory we're not overwriting (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230612181529.2222451-2-daniele.ceraolospurio@intel.com
On DG2, capturing OA reports while running heavy render workloads
sometimes results in invalid OA reports where 64-byte chunks inside
reports have stale values. Under memory pressure, high OA sampling rates
(13.3 us) and heavy render workload, occasionally, the OA HW TAIL
pointer does not progress as fast as the sampling rate. When these
glitches occur, the TAIL pointer takes approx. 200us to progress. While
this is expected behavior from the HW perspective, invalid reports are
not expected.
In oa_buffer_check_unlocked(), when we execute the if condition, we are
updating the oa_buffer.tail to the aging tail and then setting pollin
based on this tail value, however, we do not have a chance to rewind and
validate the reports prior to setting pollin. The validation happens
in a subsequent call to oa_buffer_check_unlocked(). If a read occurs
before this validation, then we end up reading reports up until this
oa_buffer.tail value which includes invalid reports. Though found on
DG2, this affects all platforms.
The aging tail logic is no longer necessary since we are explicitly
checking for landed reports.
Start by dropping the aging tail logic.
v2:
- Drop extra blank line
- Add reason to drop aging logic (Ashutosh)
- Add bug links (Ashutosh)
- rename aged_tail to read_tail
- Squash patches 3 and 1
v3: (Ashutosh)
- Remove extra spaces
- Remove gtt_offset from the pollin calculation
- s/Bug:/Link/ in commit message (checkpatch)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7484
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7757
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230605193923.1836048-2-umesh.nerlige.ramappa@intel.com
Before we add the second step of the MTL HuC auth (via GSC), we need to
have the ability to differentiate between them. To do so, the huc
authentication check is duplicated for GuC and GSC auth, with
GSC-enabled binaries being considered fully authenticated only after
the GSC auth step.
To report the difference between the 2 auth steps, a new case is added
to the HuC getparam. This way, the clear media driver can start
submitting before full auth, as partial auth is enough for those
workloads.
v2: fix authentication status check for DG2
v3: add a better comment at the top of the HuC file to explain the
different approaches to load and auth (John)
v4: update call to intel_huc_is_authenticated in the pxp code to check
for GSC authentication
v5: drop references to meu and esclamation mark in huc_auth print (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> #v2
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-5-daniele.ceraolospurio@intel.com
In the previous patch we extracted the offset of the legacy-style HuC
binary located within the GSC-enabled blob, so now we can use that to
load the HuC via DMA if the fuse is set that way.
Note that we now need to differentiate between "GSC-enabled binary" and
"loaded by GSC", so the former case has been renamed to "has GSC headers"
for clarity, while the latter is now based on the fuse instead of the
binary format. This way, all the legacy load paths are automatically
taken (including the auth by GuC) without having to implement further
code changes.
v2: s/is_meu_binary/has_gsc_headers/, clearer logs (John)
v3: split check for GSC access, better comments (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-4-daniele.ceraolospurio@intel.com
The new binaries that support the 2-step authentication contain the
legacy-style binary, which we can use for loading the HuC via DMA. To
find out where this is located in the image, we need to parse the
manifest of the GSC-enabled HuC binary. The manifest consist of a
partition header followed by entries, one of which contains the offset
we're looking for.
Note that the DG2 GSC binary contains entries with the same names, but
it doesn't contain a full legacy binary, so we need to skip assigning
the dma offset in that case (which we can do by checking the ccs).
Also, since we're now parsing the entries, we can extract the HuC
version that way instead of using hardcoded offsets.
Note that the GSC binary uses the same structures in its binary header,
so they've been added in their own header file.
v2: fix structure names to match meu defines (s/CPT/CPD/), update commit
message, check ccs validity, drop old version location defines.
v3: drop references to the MEU tool to reduce confusion, fix log (John)
v4: fix log for real (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> #v2
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-3-daniele.ceraolospurio@intel.com
Now that each FW has its own reserved area, we can keep them always
pinned and skip the pin/unpin dance on reset. This will make things
easier for the 2-step HuC authentication, which requires the FW to be
pinned in GGTT after the xfer is completed.
Since the vma is now valid for a long time and not just for the quick
pin-load-unpin dance, the name "dummy" is no longer appropriare and has
been replaced with vma_res. All the functions have also been updated to
operate on vma_res for consistency.
Given that we pin the vma behind the allocator's back (which is ok
because we do the pinning in an area that was previously reserved for
thus purpose), we do need to explicitly re-pin on resume because the
automated helper won't cover us.
v2: better comments and commit message, s/dummy/vma_res/
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-2-daniele.ceraolospurio@intel.com
For conflict avoidance we need the following commit:
c9a9f18d3a drm/i915/huc: use const struct bus_type pointers
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>