Although all current Xe2 platforms support FlatCCS, we probably
shouldn't assume that will be universally true forever. In the past
we've had platforms like PVC that didn't support compression, and the
same could show up again at some point in the future. Future-proof the
migration code by adding an explicit check for FlatCCS support to the
condition that decides whether to use a compressed PAT index for
migration.
While we're at it, we can drop the IS_DGFX check since it's redundant
with the src_is_vram check (only dGPUs have VRAM).
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726171757.2728819-2-matthew.d.roper@intel.com
Gustavo noticed an odd "+ 2" in rtp_mark_active() while processing
rtp rules and pointed that it should be "+ 1". In fact, while processing
entries without actions (OOB workarounds), if the WA is activated and
has OR rules, it will also inadvertently activate the very next
workaround.
Test in a LNL B0 platform by moving 18024947630 on top of 16020292621,
makes the latter become active:
$ cat /sys/kernel/debug/dri/0/gt0/workarounds
...
OOB Workarounds
18024947630
16020292621
14018094691
16022287689
13011645652
22019338487_display
In future a kunit test will be added to cover the rtp checks for entries
without actions.
Fixes: fe19328b90 ("drm/xe/rtp: Add support for entries with no action")
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726064337.797576-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
An xe file can outlive the associated process as the GPU cleanup is just
triggered upon file close (process kill) and completes sometime later.
If the file close triggers error conditions (GPU hangs) the process
cannot be safely referenced to retrieve the name and pid for debug
information. Store the process name and pid directly in the xe file to
be safe.
v2:
- Access file->pid via rcu_access_pointer (Matthew Auld)
Fixes: b10d0c5e9d ("drm/xe: Add process name to devcoredump")
Fixes: f6ca930d97 ("drm/xe: Add process name and PID to job timedout message")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240723151045.1725417-1-matthew.brost@intel.com
eu_type_to_str() relies on -Wswitch to warn (and -Werror) to make sure
it handles all enum values. However it's perfectly legal to pass an int
to that function so in the end that function may happen to return
nothing. There's too much implicit knowledge about the initialization
of eu_type for a compiler to notice eu_type is never assigned to
anything other than those values.
Trying to reproduce this issue, none of gcc-9, gcc-10 and gcc-13
triggered for me, but this was reported in a different system with
gcc-10:
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump() falls through to next function xe_gt_topology_init()
Also it was reported these warnings when building with clang:
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump+0x77: sibling call from callable instruction with modified stack frame
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump() falls through to next function xe_dss_mask_group_ffs()
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump+0x77: can't find jump dest instruction at .text.xe_gt_topology_dump+0xc0
Since that value is not really possible in real world, just take the
simple approach and return NULL.
Fixes: 7108b4a589 ("drm/xe/uapi: Expose SIMD16 EU mask in topology query")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719191534.3845469-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There is no point to run those tests on VFs devices as they can't
access any of the MOCS registers. Skip testing on the VF device.
[ ] =================== xe_mocs (1 subtest) ====================
[ ] ================ xe_live_mocs_kernel_kunit ================
[ ] [PASSED] 0000:4d:00.0
[ ] [SKIPPED] 0000:4d:00.1
[ ] ============ [PASSED] xe_live_mocs_kernel_kunit ============
[ ] ===================== [PASSED] xe_mocs =====================
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240720142528.530-8-michal.wajdeczko@intel.com
Instead of iterating over available Xe devices within a testcase,
without being able to distinguish potential failures from different
devices on system with many Xe devices, introduce helpers that will
allow to treat each Xe device as a parameter for the testcase like:
static void bar(struct kunit *test)
{
struct xe_device *xe = test->priv;
...
}
struct kunit_case foo_live_tests[] = {
KUNIT_CASE_PARAM(bar, xe_pci_live_device_gen_param),
{}
};
struct kunit_suite foo_suite = {
.name = "foo_live",
.test_cases = foo_live_tests,
.init = xe_kunit_helper_xe_device_live_test_init,
};
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240720142528.530-3-michal.wajdeczko@intel.com
Having two methods to wait on GT TLB invalidations is not ideal. Remove
xe_gt_tlb_invalidation_wait and only use GT TLB invalidation fences.
In addition to two methods being less than ideal, once GT TLB
invalidations are coalesced the seqno cannot be assigned during
xe_gt_tlb_invalidation_ggtt/range. Thus xe_gt_tlb_invalidation_wait
would not have a seqno to wait one. A fence however can be armed and
later signaled.
v3:
- Add explaination about coalescing to commit message
v4:
- Don't put dma fence if defined on stack (CI)
v5:
- Initialize ret to zero (CI)
v6:
- Use invalidation_fence_signal helper in tlb timeout (Matthew Auld)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-3-matthew.brost@intel.com
xe_file_close triggers an asynchronous queue cleanup and then frees up
the xef object. Since queue cleanup flushes all pending jobs and the KMD
stores client usage stats into the xef object after jobs are flushed, we
see a use-after-free for the xef object. Resolve this by taking a
reference to xef from xe_exec_queue.
While at it, revert an earlier change that contained a partial work
around for this issue.
v2:
- Take a ref to xef even for the VM bind queue (Matt)
- Squash patches relevant to that fix and work around (Lucas)
v3: Fix typo (Lucas)
Fixes: ce62827bc2 ("drm/xe: Do not access xe file when updating exec queue run_ticks")
Fixes: 6109f24f87 ("drm/xe: Add helper to accumulate exec queue runtime")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-5-umesh.nerlige.ramappa@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
PVC, Xe2 and later platforms have 16-wide EUs. We were implicitly
reporting for PVC the number of 16-wide EUs without giving userspace any
hint that they were different than for other platforms. Xe2 and later
also have 16-wide, but in those cases the reported number would
correspond to the 8-wide count.
To avoid confusion and make sure the right number is used by userspace
depending on the platform, add a new item to the topology query and drop
the one that is not available. The new mask reported for both PVC and
Xe2 should now match the numbers reported via hwconfig.
v2: Use a different topo item with EU type in its name to report the
new mask instead of adding the type itself as the item (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Acked-by: Wenbin Lu <wenbin.lu@intel.com>
Acked-by: Effie Yu <effie.yu@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240710220446.2169797-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
As per recommendation in the workarounds:
WA_22019338487
There is an issue with accessing Stolen memory pages due a
hardware limitation. Limit the usage of stolen memory for
fbdev for LNL+. Don't use BIOS FB from stolen on LNL+ and
assign the same from system memory.
v2: Corrected the WA Number, limited WA to LNL and
Adopted XE_WA framework as suggested by Lucas and Matt.
v3: Introduced the waxxx_display to implement display side
of WA changes on Lunarlake. Used xe_root_mmio_gt and
avoid the for loop (Suggested by Lucas)
v4: Fixed some nits (Luca)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240717082252.3875909-1-uma.shankar@intel.com
Xe2+ has unified compression (exactly one compression mode/format),
where compression is now controlled via PAT at PTE level.
This simplifies KMD operations, as it can now decompress freely
without concern for the buffer's original compression format—unlike DG2,
which had multiple compression formats and thus required copying the
raw CCS state during VRAM eviction. In addition mixed VRAM and system
memory buffers were not supported with compression enabled.
On Xe2 dGPU compression is still only supported with VRAM, however we
can now support compression with VRAM and system memory buffers,
with GPU access being seamless underneath. So long as when doing
VRAM -> system memory the KMD uses compressed -> uncompressed,
to decompress it. This also allows CPU access to such buffers,
assuming that userspace first decompress the corresponding
pages being accessed.
If the pages are already in system memory then KMD would have already
decompressed them. When restoring such buffers with sysmem -> VRAM
the KMD can't easily know which pages were originally compressed,
so we always use uncompressed -> uncompressed here.
With this it also means we can drop all the raw CCS handling on such
platforms (including needing to allocate extra CCS storage).
In order to support this we now need to have two different identity
mappings for compressed and uncompressed VRAM.
In this patch, we set up the additional identity map for the VRAM with
compressed pat_index. We then select the appropriate mapping during
migration/clear. During eviction (vram->sysmem), we use the mapping
from compressed -> uncompressed. During restore (sysmem->vram), we need
the mapping from uncompressed -> uncompressed.
Therefore, we need to have two different mappings for compressed and
uncompressed vram. We set up an additional identity map for the vram
with compressed pat_index.
We then select the appropriate mapping during migration/clear.
v2: Formatting nits, Updated code to match recent changes in
xe_migrate_prepare_vm(). (Matt)
v3: Move identity map loop to a helper function. (Matt Brost)
v4: Split helper function in different patch, and
add asserts and nits. (Matt Brost)
v5: Convert the 2 bool arguments of pte_update_size to flags
argument (Matt Brost)
v6: Formatting nits (Matt Brost)
Signed-off-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b00db5c7267e54260cb6183ba24b15c1e6ae52a3.1721250309.git.akshata.jahagirdar@intel.com