Add the registers to get C6 residency of MTL SAMedia and
C6 status of MTL gts
v2:
- move register definitions to regs header (Anshuman)
- correct reg definition for mtl rc status
- make idle_status function common (Badal)
v3:
- remove extra line in commit message
- use only media type check in initialization
- use graphics ver check (Anshuman)
v4:
- remove extra lines (Anshuman)
Bspec: 66300
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1) Add a new sysfs directory under devices/gt#/ called gtidle
to contain idle properties of GT such as name, idle_status,
idle_residency_ms
2) Remove forcewake calls for residency counter
v2:
- abstract using function pointers (Anshuman)
- remove forcewake calls for residency counter
- use device_attr (Badal)
- move rc functions to guc_pc
- change name to gt_idle (Rodrigo)
v3:
- return error for drmm_add_action_or_reset
- replace file and functions with gt_idle prefix
to gt_idle_sysfs (Himal)
- use enum for gt idle state
- move multiplier to gt idle and initialize (Anshuman)
- correct doc annotation (Rodrigo)
- remove return variable
- use kobj_gt instead of new gtidle kobj
- move residency_ms to gtidle file
- retain xe_guc_pc prefix for functions in guc_rc file (Michal)
v4:
- fix doc errors in xe_guc_pc file
- change u64 to u32 for reading residency counter
- keep gtidle states generic GT_IDLE_C[0/6] (Anshuman)
v5:
- update commit message to include removal of
forcewake calls (Anshuman)
- return void from sysfs initialization function and add warnings
(Andi)
v6:
- remove extra lines (Anshuman)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
For VRAM allocations the bo->flags can control some characteristics of
the underlying memory, like whether it needs to be contiguous, and in
the future whether it needs to be in the CPU visible portion. Rather use
add_vram() in xe_bo_migrate() which should take care of such things for
us.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When moving between PL_VRAM <-> PL_SYSTEM we have to have use PL_TT in
the middle as a temporary resource for the actual copy. In some GL
workloads it can be seen that once the resource has been moved to the
PL_TT we might have to bail out of the ttm_bo_validate(), before
finishing the final hop. If this happens the resource is left as
TTM_PL_FLAG_TEMPORARY, and when the ttm_bo_validate() is restarted the
current placement is always seen as incompatible, requiring us to
complete the move. However if the BO allows PL_TT as a possible
placement we can end up attempting a PL_TT -> PL_TT move (like when
running out of VRAM) which leads to explosions in xe_bo_move(), like
triggering the XE_BUG_ON(!tile).
Going from TTM_PL_FLAG_TEMPORARY with PL_TT -> PL_VRAM should already
work as-is, so it looks like we only need to worry about PL_TT -> PL_TT
and it looks like we can just treat it as a dummy move, since no real
move is needed.
Reported-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Use the TTM LRU bulk move for BOs tied to a VM. Update the bulk moves
LRU position on every exec.
v2: Bulk move for compute VMs, use WARN rather than BUG
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We only need to try to lock a BO if it's external as non-external BOs
share the dma-resv with the already locked VM. Trying to lock
non-external BOs caused an issue (list corruption) in an uncoming patch
which adds bulk LRU move. Since this code isn't needed, remove it.
v2: New commit message, s/mattthew/matthew/
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
With our ref counting scheme long running (LR) engines only close
properly if not persistent, ensure that LR engines are non-persistent.
v2: spell out LR
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
For long running (LR) jobs with the DRM scheduler we must return NULL in
run_job which results in signaling the job's finished fence immediately.
This prevents LR jobs from creating infinite dma-fences.
Signaling job's finished fence immediately breaks flow controlling ring
with the DRM scheduler. To work around this, the ring is flow controlled
and written in the exec IOCTL. Signaling job's finished fence
immediately also breaks the TDR which is used in reset / cleanup entity
paths so write a new path for LR entities.
v2: Better commit, white space, remove rmb(), better comment next to
emit_job()
v3 (Thomas): Change LR reference counting, fix working in commit
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add uAPI and implementation for NULL bindings. A NULL binding is defined
as writes dropped and read zero. A single bit in the uAPI has been added
which results in a single bit in the PTEs being set.
NULL bindings are intendedd to be used to implement VK sparse bindings,
in particular residencyNonResidentStrict property.
v2: Fix BUG_ON shown in VK testing, fix check patch warning, fix
xe_pt_scan_64K, update __gen8_pte_encode to understand NULL bindings,
remove else if vma_addr
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Suggested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The GuC can't access addresses above GUC_GGTT_TOP, so any GuC-accessible
objects can't be mapped above that offset. Instead of checking each
object to see if GuC may access it or not before mapping it, we just
limit the GGTT size to GUC_GGTT_TOP. This wastes a bit of address space
(about ~18 MBs, which is in addition to what already removed at the bottom
of the GGTT), but it is a good tradeoff to keep the code simple.
The in-code comment has also been updated to explain the limitation.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20230615002521.2587250-1-daniele.ceraolospurio@intel.com/
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This adds a handful of workarounds that apply to production steppings of
MTL:
- Wa_14018575942
- Wa_22016670082
- Wa_14017856879
- Wa_18019271663
Wa_22016670082 is currently only applied to the primary GT at the
moment, but may need to be extended to the media GT in the future if a
pending update to the workaround database gets finalized.
OOB workarounds will need to be implemented separately in future patches
for Wa_14016712196, Wa_16018063123, and Wa_18013179988.
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://lore.kernel.org/r/20230608181217.2385932-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It's not possible for the condition checking if we're running on
platform without geometry pipeline to ever be true, since
gt->fuse_topo.g_dss_mask is an array.
It also breaks the build:
../drivers/gpu/drm/xe/xe_rtp.c:183:50: error: address of array 'gt->fuse_topo.g_dss_mask' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230523135020.345596-2-michal@hardline.pl
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Using uninitialized variables leads to undefined behavior.
Moreover, it causes the compiler to complain with:
../drivers/gpu/drm/xe/xe_vm.c:3265:40: error: variable 'vma' is uninitialized when used here [-Werror,-Wuninitialized]
../drivers/gpu/drm/xe/xe_rtp.c:118:36: error: variable 'i' is uninitialized when used here [-Werror,-Wuninitialized]
../drivers/gpu/drm/xe/xe_mocs.c:449:3: error: variable 'flags' is uninitialized when used here [-Werror,-Wuninitialized]
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230523135020.345596-1-michal@hardline.pl
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
gt_count is only being incremented when initializing the primary GT;
since the media GT sets the ID directly, gt_count is not incremented
again, resulting in an incorrect count on MTL. Use autoincrement while
assigning the media GTs ID to ensure gt_count is correct on MTL and
other future platforms with standalone media.
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20230613094232.3703549-1-riana.tauro@intel.com
[mattrope: Tweaked commit message to focus on gt_count importance]
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
RPSTAT1 is an sgunit register and thus doesn't need forcewake.
MTL_MIRROR_TARGET_WP1 is within an "always on" power domain and thus
doesn't require any forcewake to ensure the register is powered
up and usable. When GT is RC6 the actual frequency reported will be 0.
v2:
- Add bspec index (Anshuman)
- %s/GEN12_RPSTAT1/GT_PERF_STATUS as per bspec
v3: Update Fixes tag
Bspec: 51837, 67651
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230609024954.987039-1-badal.nilawar@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This define is for internal PTE flags rather than fields in the hardware
PTEs, rename as such. This will help in an upcoming patch to avoid
further confusion.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
A mix of the system unbound wq and Xe ordered wq was used for the
rebind, only use the Xe ordered wq. This will ensure only 1 rebind is
occuring at a time providing a somewhat clunky work around for short
comings in TTM wrt to memory contention. Once the TTM memory contention
is resolved we should be able to use a dedicated non-ordered workqueue.
Also add helper to queue rebind worker to avoid using wrong workqueue
going forward.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
For scratch table mode we need to cover the case where a scratch PTE might
have been pre-fetched and cached and used instead of that of the newly
bound vma.
For compute vms, invalidate TLB globally using GuC before signalling
bind complete. For !long-running vms, invalidate TLB at batch start.
Also document how TLB invalidation works.
v2:
- Fix a pointer to the comment about TLB invalidation (Jose Souza).
- Add a bool to the vm whether we want to invalidate TLB at batch start.
- Invalidate TLB also on BCS- and video engines at batch start where
needed.
- Use BIT() macro instead of explicit shift.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com> #v1
Reported-by: José Roberto de Souza <jose.souza@intel.com> #v1
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The resizing of the PCI BAR is a best effort feature. If it is
not available, it should not fail the driver probe.
Rework the resize to not exit on failure.
Fixes: 7f075300a3 ("drm/xe: Simplify rebar sizing")
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reformat the GuC register header according to the same rules used by
other register headers:
- Register definitions are ordered by offset
- Value of #define's start on column 49
- Lowercase used for hex values
No functional change.
This header has some things that aren't directly related to register
definitions (e.g., number of doorbells, doorbell info structure, GuC
interrupt vector layout, etc. These items have been moved to the bottom
of the header.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20230602235210.1314028-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Now that a higher GT count can result from either multiple tiles (with
one GT each) or an extra media GT within the root tile, we need to
update the query code slightly to stop looking at tile_count.
FIXME: As noted previously, we need to decide on a formal direction for
exposing tiles and/or GTs to userspace.
v2:
- Drop num_gt() function in favor of stored xe->info.gt_count. (Brian)
v3:
- Keep XE_QUERY_GT_TYPE_REMOTE around for now. Userspace probably
doesn't actually need this, and we may remove it in the future, but
for now let's avoid changing uapi. (Brian)
Cc: Brian Welty <brian.welty@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-30-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Allow xe_device_get_gt() and for_each_gt() to operate as expected on
platforms with standalone media.
FIXME: We need to figure out a consistent ID scheme for GTs. This patch
keeps the pre-existing behavior of 0/1 being the GT IDs for both PVC
(multi-tile) and MTL (multi-GT), but depending on the direction we
decide to go with uapi, we may change this in the future (e.g., to
return 0/1 on PVC and 0/2 on MTL). Or if we decide we only need to
expose tiles to userspace and not GTs, we may not even need ID numbers
for the GTs anymore.
v2:
- Restructure a bit to make the assertions more clear.
- Clarify in commit message that the goal here is to preserve existing
behavior; UAPI-visible changes may be introduced in the future once
we settle on what we really want.
v3:
- Store total GT count in xe_device for ease of lookup. (Brian)
- s/(id__++)/(id__)++/ (Gustavo)
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Acked-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-29-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The majority of xe_gt_irq_postinstall() is really focused on the
hardware engine interrupts; other GT-related interrupts such as the GuC
are enabled/disabled independently. Renaming the function and making it
truly GT-specific will make it more clear what the intended focus is.
Disabling/masking of other interrupts (such as GuC interrupts) is
unnecessary since that has already happened during the irq_reset stage,
and doing so will become harmful once the media GT is re-enabled since
calls to xe_gt_irq_postinstall during media GT initialization would
incorrectly disable the primary GT's GuC interrupts.
Also, since this function is called from gt_fw_domain_init(), it's not
necessary to also call it earlier during xe_irq_postinstall; just
xe_irq_resume to handle runtime resume should be sufficient.
v2:
- Drop unnecessary !gt check. (Lucas)
- Reword some comments about enable/unmask for clarity. (Lucas)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-26-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The xe_irq_postinstall() never actually gets called after installing the
interrupt handler. This oversight seems to get papered over due to the
fact that the (misnamed) xe_gt_irq_postinstall does more than it really
should and gets called in the middle of the GT initialization. The
callstack for postinstall is also a bit muddled with top-level device
interrupt enablement happening within platform-specific functions called
from the per-tile xe_gt_irq_postinstall() function.
Clean this all up by adding the missing call to xe_irq_postinstall()
after installing the interrupt handler and pull top-level irq enablement
up to xe_irq_postinstall where we'd expect it to be.
The xe_gt_irq_postinstall() function is still a bit misnamed here; an
upcoming patch will refocus its purpose and rename it.
v2:
- Squash in patch to actually call xe_irq_postinstall() after
installing the interrupt handler.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-25-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Although primary and media GuC share a single interrupt enable bit, they
each have distinct bits in the mask register. Although we always enable
interrupts for the primary GuC before the media GuC today (and never
disable either of them), this might not always be the case in the
future, so use a RMW when updating the mask register to ensure the other
GuC's mask doesn't get clobbered.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-24-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
IRQ delivery and handling needs to be handled on a per-tile basis. Note
that this is true even for the "GT interrupts" relating to engines and
GuCs --- the interrupts relating to both GTs get raised through a single
set of registers in the tile's sgunit range.
On true multi-tile platforms, interrupts on remote tiles are internally
forwarded to the root tile; the first thing the top-level interrupt
handler should do is consult the root tile's instance of
DG1_MSTR_TILE_INTR to determine which tile(s) had interrupts. This
register is also responsible for enabling/disabling top-level reporting
of any interrupts to the OS. Although this register technically exists
on all tiles, it should only be used on the root tile.
The (mis)use of struct xe_gt as a target for MMIO operations in the
driver makes the code somewhat confusing since we wind up needing a GT
pointer to handle programming that's unrelated to the GT. To mitigate
this confusion, all of the xe_gt structures used solely as an MMIO
target in interrupt code are renamed to 'mmio' so that it's clear that
the structure being passed does not necessarily relate to any specific
GT (primary or media) that we might be dealing with interrupts for.
Reworking the driver's MMIO handling to not be dependent on xe_gt is
planned as a future patch series.
Note that GT initialization code currently calls xe_gt_irq_postinstall()
in an attempt to enable the HWE interrupts for the GT being initialized.
Unfortunately xe_gt_irq_postinstall() doesn't really match its name and
does a bunch of other stuff unrelated to the GT interrupts (such as
enabling the top-level device interrupts). That will be addressed in
future patches.
v2:
- Clarify commit message with explanation of why DG1_MSTR_TILE_INTR is
only used on the root tile, even though it's an sgunit register that
is technically present in each tile's MMIO space. (Aravind)
- Also clarify that the xe_gt used as a target for MMIO operations may
or may not relate to the GT we're dealing with for interrupts.
(Lucas)
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-22-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>