Instead of creating ad-hoc new register definitions with altered
register addresses to mimic the VF's access to these registers,
prepare new MMIO instance per required VF, with shifted internal
location of the register map. This will allow to use unmodified
register definitions in all calls to xe_mmio() functions.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251024205826.4652-1-michal.wajdeczko@intel.com
If subsequent VF FLR request is triggered when previous VF FLR
sequence is still being processed, we ignore it as not needed.
But in case of the multi-GT platforms, one GT may already finish
its VF FLR processing and will start a new sequence, which includes
new cross-GT synchronization point. However, since other GT may
be still busy with post-sync cleanup steps, this will put on hold
this new FLR sequence, which might never finish due to lack of any
future synchronization checkouts.
Add additional cross-GT FLR synchronization point when each GT
ends processing its own FLR sequence. This should also help to
cover the case when one GT fails FLR processing before reaching
the first synchronization point.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6287
Fixes: 2a8fcf7cc9 ("drm/xe/pf: Synchronize VF FLR between all GTs")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251025124906.5264-1-michal.wajdeczko@intel.com
Early revisions of commit 7abd69278b ("drm/xe/configfs: Add attribute
to disable GT types") used MAX_GT_TYPE_CHARS not only to size the
constant name field, but also for some of the string matching logic. By
the time the patch finally landed, the constant was no longer needed for
parsing. Stop using it for the string field definition as well; this
eliminates the risk that we forget to update the constant if we ever add
a GT type name longer than seven characters.
Suggested-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251024200834.1512329-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
The bspec was originally missing the information related to steering of
L3-related ranges. Now that a late-breaking spec update has added the
necessary information, implement the steering rules in the code. Note
that the sole L3BANK range is the same as the one used on Xe_LPG, so we
can re-use the existing table for that MCR type.
Bspec: 74418
Fixes: be614ea19d ("drm/xe/xe3p_xpc: Add MCR steering")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251021224556.437970-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Early versions of the B-spec originally indicated that Xe3p_XPC had two
ranges of PSMI registers requiring MCR steering (one starting at 0xB500,
one starting at 0xB600), and that reads of registers in these ranges
required different grpid values to ensure that a non-terminated value is
obtained. A late-breaking spec update has simplified this; both ranges
can be safely steered to grpid=0 for reads.
Drop the "PSMI19" replication type and related code, and consolidate
both register ranges into a single entry in the "INSTANCE0" steering
table.
Bspec: 74418
Fixes: be614ea19d ("drm/xe/xe3p_xpc: Add MCR steering")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251021224556.437970-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Make this the default on xe2+ when doing a copy. This has a few
advantages over the exiting copy instruction:
1) It has a special PAGE_COPY mode that claims to be optimised for
page-in/page-out, which is the vast majority of current users.
2) It also has a simple BYTE_COPY mode that supports byte granularity
copying without any restrictions.
With 2) we can now easily skip the bounce buffer flow when copying
buffers with strange sizing/alignment, like for memory_access. But that
is left for the next patch.
v2 (Matt Brost):
- Use device info to check whether device should use the MEM_COPY
path. This should fit better with making this a configfs tunable.
- And with that also keep old path still functional on xe2 for possible
experimentation.
- Add a define for PAGE_COPY page-size.
v3 (Matt Brost):
- Fallback to an actual linear copy for pitch=1.
- Also update NVL.
BSpec: 57561
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-7-matthew.auld@intel.com
Our current support for the VF migration depends on the availability
of the MEMIRQ rather than specific graphics version 20.
Relax our early migration support checks to allow also use some older
platforms like ATS-M for experiments and testing.
Do not allow ADL, as supporting VF migration through MMIO interrupts
would require additional changes in order to achieve reliability.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251021224817.1593817-5-tomasz.lis@intel.com
Most BOs do not care at which offset they will be accessed within
GGTT or PPGTT. The few which do care, should be only created
on PF, and mapped within GGTT. On VFs, mapping at fixed offset
is prohibited, as each VF is granted access to a range of
GGTT address space.
Since fixed addresses of GGTT mapping can only be used on PF,
add an assert which makes sure no attempt of fixed placement
will happen for a driver probed on a VF.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251021224817.1593817-4-tomasz.lis@intel.com
We already have check_graphics_ip() and check_media_ip() as general
functions to check the IP descriptors. The check in
check_platform_gt_count() is simple enough such that we can convert the
function to a more general device check. In an upcoming change, we will
also add some checks for other members of struct xe_device_desc. As
such, rename check_platform_gt_count() to check_platform_desc().
While at it, use inline (unsigned int) casting of max_gt_per_tile to
keep checks for each member localized; and use KUNIT_EXPECT_*() variants
of the macros to allow multiple issues to be reported.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251020-xe-kunit-dma_mask_size-va_bits-vm_max_level-v2-1-27b03971bc7e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
For Xe3p arch some subunits of an IP may be different. The GMD_ID
register returns the Xe3p arch and dedicates the reserved field to mark
possible subunit differences. Generally this is an under-the-hood
implementation detail that drivers don't need to worry about, but the
new Main_GAMCTRL may be enabled or not depending on those.
Those reserved bits are described for Xe3p as: "If Zero, No special case
to be handled. If Non-Zero, special case to be handled by Software
agent.". That special case is defined per Arch. So if media version is
35, also check the additional reserved bits. To avoid confusion with the
usual meaning of "reserved", define them as GMD_ID_SUBIP_FLAG_MASK.
Bspec: 74201
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-2-ad66d3c1908f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Starting from Xe3p, there are two different copies of some of the GAM
registers: the traditional MCR variant at their old locations, and a
new unicast copy known as "main_gamctrl." The Xe driver doesn't use
these registers directly, but we need to instruct the GuC on which set
it should use. Since the new, unicast registers are preferred (since
they avoid the need for unnecessary MCR synchronization), set a new GuC
feature flag, GUC_CTL_MAIN_GAMCTRL_QUEUES to convey this decision. A
new helper function, xe_guc_using_main_gamctrl_queues(), is added for
use in the 3 independent places that need to handle configuration of the
new reporting queues.
The mmio write to enable the main gamctl is only done during the general
GuC upload. The gamctrl registers are not accessed by the GuC during
hwconfig load.
Last, the ADS blob for communicating the queue addresses contains both a
DPA and GGTT offset. The GuC documentation states that DPA is now MBZ
when using the MAIN_GAMCTRL queues.
Bspec: 76445, 73540
Signed-off-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-1-ad66d3c1908f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The compression overfetch tuning settings only apply to platforms that
support FlatCCS. In Xe3p_XPC (and any future IPs that also lack
compression) some of the registers being adjusted by this tuning will
not exist or may have been repurposed for something else, so we should
take care not to try to program them.
Note that our xe_rtp_match_has_flatccs() function will also return false
on platforms that do have FlatCCS in the hardware design, but have
compression manually disabled in the BIOS. On such platforms the
registers still exist (and it would be fine to continue programming
them), but they would have no effect, so skipping that tuning is also
safe.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Link: https://lore.kernel.org/r/20251016-xe3p-v3-22-3dd173a3097a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Current implementation of compute walker has dependency on GPU/SW Stack
which requires SW/UMD to wait for event from KMD to indicate
PIPE_CONTROL interrupt was done. This created latency on SW stack.
This feature adds support to generate completion interrupt from GPGPU
walker which does not support MSIx and avoid software using Pipe control
drain/idle latency.
The only thing needed for the kernel driver to do here is to wakeup the
thread waiting on the ufence, which is already handled by the irq
handler. Before waiting on this event, the userspace side can opt-in to
this interrupt being generated by the HW by selecting the flag in the
POST_SYNC_DATA_2 substructure's dw0[3] of COMPUTE_WALKER_2 instruction.
Bspec: 62346, 74334
Suggested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: S A Muqthyar Ahmed <syed.abdul.muqthyar.ahmed@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20251016-xe3p-v3-21-3dd173a3097a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
It's confusing to refer to some masks as the interrupt masks and others
as the fuse masks. Rename the fuse one to make it clearer.
Note that the most important role they play here is that the call
to xe_hw_engine_mask_per_class() will not only limit the engines
according to the fuses, but also by what is available in the specific
architecture - the latter is more important information to know what
interrupts should be enabled. Add a comment about that.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20251016-xe3p-v3-17-3dd173a3097a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Xe3p_XPC's steering has a few changes from Xe3. Aside from
minor changes to the XeCore (the new name for what used to be "DSS") and
INSTANCE0 tables, different rules apply to different subranges of type
"GAM." Certain GAM subranges require steering to grp/instance (0,0)
(and thus use the INSTANCE0 table), while others require special
steering to (1,0) instead. Similarly, there are multiple classes of
"PSMI" steering, with some requiring steering to (0,0) while others
require (19,0).
There are some ranges in Bspec missing the termination. Add a TODO
comment so they can eventually be updated.
Bspec: 74418
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20251016-xe3p-v3-16-3dd173a3097a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Xe3p_LPM's MCR steering has the same ranges and behavior as Xe3_LPM.
However one register range that was reserved on Xe3_LPM has now become a
unicast range (0x384200-0x38427F), so we need to stop consolidating the
adjacent MCR ranges into a single table entry in the table. With this
change to the Xe3_LPM table, we can continue to use the same table for
both IP families.
While we're touching this table, take the opportunity to fix a
whitespace mistake and clarify that one of the other consolidated range
entries includes a reserved range.
Bspec: 76445
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20251016-xe3p-v3-6-3dd173a3097a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>