Matthew Auld
11bfc4a2cf
drm/xe/ct: drop irq usage of xa_erase()
...
Unclear why disabling interrupts is needed here. Nothing seems to be
touching fence_lookup and its corresponding lock from an irq so there
should be no risk of deadlock.
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Cc: Badal Nilawar <badal.nilawar@intel.com >
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-8-matthew.auld@intel.com
2024-10-03 08:34:21 +01:00
Matthew Auld
f040327238
drm/xe/guc_submit: fix xa_store() error checking
...
Looks like we are meant to use xa_err() to extract the error encoded in
the ptr.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Cc: Badal Nilawar <badal.nilawar@intel.com >
Cc: <stable@vger.kernel.org > # v6.8+
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-7-matthew.auld@intel.com
2024-10-03 08:34:20 +01:00
Matthew Auld
1aa4b78647
drm/xe/ct: fix xa_store() error checking
...
Looks like we are meant to use xa_err() to extract the error encoded in
the ptr.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Cc: Badal Nilawar <badal.nilawar@intel.com >
Cc: <stable@vger.kernel.org > # v6.8+
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-6-matthew.auld@intel.com
2024-10-03 08:34:19 +01:00
Matthew Auld
52789ce35c
drm/xe/ct: prevent UAF in send_recv()
...
Ensure we serialize with completion side to prevent UAF with fence going
out of scope on the stack, since we have no clue if it will fire after
the timeout before we can erase from the xa. Also we have some dependent
loads and stores for which we need the correct ordering, and we lack the
needed barriers. Fix this by grabbing the ct->lock after the wait, which
is also held by the completion side.
v2 (Badal):
- Also print done after acquiring the lock and seeing timeout.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Cc: Badal Nilawar <badal.nilawar@intel.com >
Cc: <stable@vger.kernel.org > # v6.8+
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-5-matthew.auld@intel.com
2024-10-03 08:34:18 +01:00
Matthew Brost
63e0695597
drm/xe: Fix memory leak when aborting binds
...
Make sure to call xe_pt_update_ops_fini in xe_pt_update_ops_abort to
free any memory the bind allocated.
Caught by kmemleak when running Vulkan CTS tests on LNL. The leak
seems to happen only when there's some kind of failure happening, like
the lack of memory. Example output:
unreferenced object 0xffff9120bdf62000 (size 8192):
comm "deqp-vk", pid 115008, jiffies 4310295728
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 1b 05 f9 28 01 00 00 40 ...........(...@
00 00 00 00 00 00 00 00 1b 15 f9 28 01 00 00 40 ...........(...@
backtrace (crc 7a56be79):
[<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0
[<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe]
[<ffffffffc08e8309>] xe_pt_insert_entry+0xb9/0x140 [xe]
[<ffffffffc08eab6d>] xe_pt_stage_bind_entry+0x12d/0x5b0 [xe]
[<ffffffffc08ecbca>] xe_pt_walk_range+0xea/0x280 [xe]
[<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe]
[<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe]
[<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe]
[<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe]
[<ffffffffc08e9eff>] xe_pt_stage_bind.constprop.0+0x25f/0x580 [xe]
[<ffffffffc08eb21a>] bind_op_prepare+0xea/0x6e0 [xe]
[<ffffffffc08ebab8>] xe_pt_update_ops_prepare+0x1c8/0x440 [xe]
[<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe]
[<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe]
[<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe]
[<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm]
unreferenced object 0xffff9120bdf72000 (size 8192):
comm "deqp-vk", pid 115008, jiffies 4310295728
hex dump (first 32 bytes):
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
backtrace (crc 23b2f0b5):
[<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0
[<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe]
[<ffffffffc08e8453>] xe_pt_stage_unbind_post_descend+0xb3/0x150 [xe]
[<ffffffffc08ecd26>] xe_pt_walk_range+0x246/0x280 [xe]
[<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe]
[<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe]
[<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe]
[<ffffffffc08ece31>] xe_pt_walk_shared+0xc1/0x110 [xe]
[<ffffffffc08e7b2a>] xe_pt_stage_unbind+0x9a/0xd0 [xe]
[<ffffffffc08e913d>] unbind_op_prepare+0xdd/0x270 [xe]
[<ffffffffc08eb9f6>] xe_pt_update_ops_prepare+0x106/0x440 [xe]
[<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe]
[<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe]
[<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe]
[<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm]
[<ffffffffc05e95a0>] drm_ioctl+0x280/0x4e0 [drm]
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com >
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2877
Fixes: a708f6501c ("drm/xe: Update PT layer with better error handling")
Signed-off-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240927232228.3255246-1-matthew.brost@intel.com
2024-10-02 06:32:51 -07:00
Zhanjun Dong
59a1c9c7e1
drm/xe: Prevent null pointer access in xe_migrate_copy
...
xe_migrate_copy designed to copy content of TTM resources. When source
resource is null, it will trigger a NULL pointer dereference in
xe_migrate_copy. To avoid this situation, update lacks source flag to
true for this case, the flag will trigger xe_migrate_clear rather than
xe_migrate_copy.
Issue trace:
<7> [317.089847] xe 0000:00:02.0: [drm:xe_migrate_copy [xe]] Pass 14,
sizes: 4194304 & 4194304
<7> [317.089945] xe 0000:00:02.0: [drm:xe_migrate_copy [xe]] Pass 15,
sizes: 4194304 & 4194304
<1> [317.128055] BUG: kernel NULL pointer dereference, address:
0000000000000010
<1> [317.128064] #PF: supervisor read access in kernel mode
<1> [317.128066] #PF: error_code(0x0000) - not-present page
<6> [317.128069] PGD 0 P4D 0
<4> [317.128071] Oops: Oops: 0000 [#1 ] PREEMPT SMP NOPTI
<4> [317.128074] CPU: 1 UID: 0 PID: 1440 Comm: kunit_try_catch Tainted:
G U N 6.11.0-rc7-xe #1
<4> [317.128078] Tainted: [U]=USER, [N]=TEST
<4> [317.128080] Hardware name: Intel Corporation Lunar Lake Client
Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3221.D80.2407291239 07/29/2024
<4> [317.128082] RIP: 0010:xe_migrate_copy+0x66/0x13e0 [xe]
<4> [317.128158] Code: 00 00 48 89 8d e0 fe ff ff 48 8b 40 10 4c 89 85 c8
fe ff ff 44 88 8d bd fe ff ff 65 48 8b 3c 25 28 00 00 00 48 89 7d d0 31
ff <8b> 79 10 48 89 85 a0 fe ff ff 48 8b 00 48 89 b5 d8 fe ff ff 83 ff
<4> [317.128162] RSP: 0018:ffffc9000167f9f0 EFLAGS: 00010246
<4> [317.128164] RAX: ffff8881120d8028 RBX: ffff88814d070428 RCX:
0000000000000000
<4> [317.128166] RDX: ffff88813cb99c00 RSI: 0000000004000000 RDI:
0000000000000000
<4> [317.128168] RBP: ffffc9000167fbb8 R08: ffff88814e7b1f08 R09:
0000000000000001
<4> [317.128170] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff88814e7b1f08
<4> [317.128172] R13: ffff88814e7b1f08 R14: ffff88813cb99c00 R15:
0000000000000001
<4> [317.128174] FS: 0000000000000000(0000) GS:ffff88846f280000(0000)
knlGS:0000000000000000
<4> [317.128176] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [317.128178] CR2: 0000000000000010 CR3: 000000011f676004 CR4:
0000000000770ef0
<4> [317.128180] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
<4> [317.128182] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
0000000000000400
<4> [317.128184] PKRU: 55555554
<4> [317.128185] Call Trace:
<4> [317.128187] <TASK>
<4> [317.128189] ? show_regs+0x67/0x70
<4> [317.128194] ? __die_body+0x20/0x70
<4> [317.128196] ? __die+0x2b/0x40
<4> [317.128198] ? page_fault_oops+0x15f/0x4e0
<4> [317.128203] ? do_user_addr_fault+0x3fb/0x970
<4> [317.128205] ? lock_acquire+0xc7/0x2e0
<4> [317.128209] ? exc_page_fault+0x87/0x2b0
<4> [317.128212] ? asm_exc_page_fault+0x27/0x30
<4> [317.128216] ? xe_migrate_copy+0x66/0x13e0 [xe]
<4> [317.128263] ? __lock_acquire+0xb9d/0x26f0
<4> [317.128265] ? __lock_acquire+0xb9d/0x26f0
<4> [317.128267] ? sg_free_append_table+0x20/0x80
<4> [317.128271] ? lock_acquire+0xc7/0x2e0
<4> [317.128273] ? mark_held_locks+0x4d/0x80
<4> [317.128275] ? trace_hardirqs_on+0x1e/0xd0
<4> [317.128278] ? _raw_spin_unlock_irqrestore+0x31/0x60
<4> [317.128281] ? __pm_runtime_resume+0x60/0xa0
<4> [317.128284] xe_bo_move+0x682/0xc50 [xe]
<4> [317.128315] ? lock_is_held_type+0xaa/0x120
<4> [317.128318] ttm_bo_handle_move_mem+0xe5/0x1a0 [ttm]
<4> [317.128324] ttm_bo_validate+0xd1/0x1a0 [ttm]
<4> [317.128328] shrink_test_run_device+0x721/0xc10 [xe]
<4> [317.128360] ? find_held_lock+0x31/0x90
<4> [317.128363] ? lock_release+0xd1/0x2a0
<4> [317.128365] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[kunit]
<4> [317.128370] xe_bo_shrink_kunit+0x11/0x20 [xe]
<4> [317.128397] kunit_try_run_case+0x6e/0x150 [kunit]
<4> [317.128400] ? trace_hardirqs_on+0x1e/0xd0
<4> [317.128402] ? _raw_spin_unlock_irqrestore+0x31/0x60
<4> [317.128404] kunit_generic_run_threadfn_adapter+0x1e/0x40 [kunit]
<4> [317.128407] kthread+0xf5/0x130
<4> [317.128410] ? __pfx_kthread+0x10/0x10
<4> [317.128412] ret_from_fork+0x39/0x60
<4> [317.128415] ? __pfx_kthread+0x10/0x10
<4> [317.128416] ret_from_fork_asm+0x1a/0x30
<4> [317.128420] </TASK>
Fixes: 266c858852 ("drm/xe/xe2: Handle flat ccs move for igfx.")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com >
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com >
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240927161308.862323-2-zhanjun.dong@intel.com
2024-10-01 14:27:07 -07:00
Jani Nikula
ff35237de5
drm/xe/compat: remove unused i915_gpu_error.h
...
The last user of the compat header was removed in commit d6b933912d
("drm/i915/dmc: convert intel_dmc_print_error_state() to drm_printer").
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240930164052.3862911-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com >
2024-10-01 12:30:03 +03:00
José Roberto de Souza
0c8650b09a
drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close
...
Mesa testing on Xe2+ revealed that when OA metrics are collected for an
exec_queue, after the OA stream is closed, future batch buffers submitted
on that exec_queue do not complete. Not resetting OAC_CONTEXT_ENABLE on OA
stream close resolves these hangs and should not have any adverse effects.
v2: Make the change that we don't reset the bit clearer (Ashutosh)
Also make the same fix for OAC as OAR (Ashutosh)
Bspec: 60314
Fixes: 2f4a730fcd ("drm/xe/oa: Add OAR support")
Fixes: 14e077f800 ("drm/xe/oa: Add OAC support")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2821
Signed-off-by: José Roberto de Souza <jose.souza@intel.com >
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com >
Cc: stable@vger.kernel.org
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240924213713.3497992-1-ashutosh.dixit@intel.com
2024-09-27 10:10:12 -07:00
Matthew Auld
16536582dd
drm/xe/queue: move xa_alloc to prevent UAF
...
Evil user can guess the next id of the queue before the ioctl completes
and then call queue destroy ioctl to trigger UAF since create ioctl is
still referencing the same queue. Move the xa_alloc all the way to the end
to prevent this.
v2:
- Rebase
Fixes: 2149ded630 ("drm/xe: Fix use after free when client stats are captured")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com >
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-4-matthew.auld@intel.com
2024-09-27 09:28:59 +01:00
Matthew Auld
dcfd397132
drm/xe/vm: move xa_alloc to prevent UAF
...
Evil user can guess the next id of the vm before the ioctl completes and
then call vm destroy ioctl to trigger UAF since create ioctl is still
referencing the same vm. Move the xa_alloc all the way to the end to
prevent this.
v2:
- Rebase
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Cc: <stable@vger.kernel.org > # v6.8+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com >
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-3-matthew.auld@intel.com
2024-09-27 09:28:58 +01:00
Matthew Brost
8ec5a4e5ce
drm/xe: Resume TDR after GT reset
...
Not starting the TDR after GT reset on exec queue which have been
restarted can lead to jobs being able to be run forever. Fix this by
restarting the TDR.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240724235919.1917216-1-matthew.brost@intel.com
2024-09-27 00:07:51 -07:00
Matt Roper
ee615c2bac
drm/xe: Move IRQ-related registers to dedicated header
...
IRQ registers have a well-defined scope and make sense to collect in a
dedicated header file. This also reduces confusion about the GT IRQ
registers --- even though those registers relate to the GTs, they
actually live outside the GT (in the sgunit) and thus do not need to
worry about GT-specific register concepts like forcewake, steering, etc.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240923214514.2031410-2-matthew.d.roper@intel.com
2024-09-26 10:27:07 -07:00
Matthew Auld
861108666c
drm/xe: fix UAF around queue destruction
...
We currently do stuff like queuing the final destruction step on a
random system wq, which will outlive the driver instance. With bad
timing we can teardown the driver with one or more work workqueue still
being alive leading to various UAF splats. Add a fini step to ensure
user queues are properly torn down. At this point GuC should already be
nuked so queue itself should no longer be referenced from hw pov.
v2 (Matt B)
- Looks much safer to use a waitqueue and then just wait for the
xa_array to become empty before triggering the drain.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2317
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Cc: <stable@vger.kernel.org > # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240923145647.77707-2-matthew.auld@intel.com
2024-09-26 14:30:30 +01:00
Matthew Auld
d28af0b6b9
drm/xe/guc_submit: add missing locking in wedged_fini
...
Any non-wedged queue can have a zero refcount here and can be running
concurrently with an async queue destroy, therefore dereferencing the
queue ptr to check wedge status after the lookup can trigger UAF if
queue is not wedged. Fix this by keeping the submission_state lock held
around the check to postpone the free and make the check safe, before
dropping again around the put() to avoid the deadlock.
Fixes: 8ed9aaae39 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240924150947.118433-2-matthew.auld@intel.com
2024-09-26 14:28:06 +01:00
Matthew Brost
fe4f5d4b66
drm/xe: Clean up VM / exec queue file lock usage.
...
Both the VM / exec queue file lock protect the lookup and reference to
the object, nothing more. These locks are not intended anything else
underneath them. XA have their own locking too, so no need to take the
VM / exec queue file lock aside from when doing a lookup and reference
get.
Add some kernel doc to make this clear and cleanup a few typos too.
Signed-off-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Matthew Auld <matthew.auld@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240921011712.2681510-1-matthew.brost@intel.com
2024-09-24 09:03:42 -07:00
Gustavo Sousa
876253165f
drm/xe/xe2: Add performance tuning for L3 cache flushing
...
A recommended performance tuning for LNL related to L3 cache flushing
was recently introduced in Bspec. Implement it.
Unlike the other existing tuning settings, we limit this one for LNL
only, since there is no info about whether this would be applicable to
other platforms yet. In the future we can come back and use IP version
ranges if applicable.
v2:
- Fix reference to Bspec. (Sai Teja, Tejas)
- Use correct register name for "Tuning: L3 RW flush all Cache". (Sai
Teja)
- Use SCRATCH3_LBCF (with the underscore) for better readability.
v3:
- Limit setting to LNL only. (Matt)
Bspec: 72161
Cc: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com >
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com >
Cc: Matt Roper <matthew.d.roper@intel.com >
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com >
Reviewed-by: Matt Roper <matthew.d.roper@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-5-gustavo.sousa@intel.com
2024-09-23 10:46:31 -07:00
Gustavo Sousa
f5b463fd7c
drm/xe/xe2: Assume tuning settings also apply for future media GT
...
We already make the assumption that recommended tuning settings for
primary GT on Xe2 will also apply for future releases. Let's make the
same assumption for the media GT. We can come back and define closed
ranges when that becomes necessary.
Bspec: 72161
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com >
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-4-gustavo.sousa@intel.com
2024-09-23 10:46:31 -07:00
Gustavo Sousa
e1f813947c
drm/xe/xe2: Extend performance tuning to media GT
...
With exception of "Tuning: L3 cache - media", we are currently applying
recommended performance tuning settings only for the primary GT. Let's
also implement them for the media GT when applicable.
According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
only in Xe2_LPM and their offsets do not match their primary GT
counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
listed as a multicast range on the media GT. As such, we need to have
Xe2_LPM-specific definitions for those registers and apply the setting
only for that specific IP.
Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the
offset on the media GT matches the one on the primary one. So we can
simply have a copy of "Tuning: Stateless compression control" for the
media GT.
v2:
- Fix implementation with respect to multicast vs non-multicast
registers. (Matt)
- Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
Compression Overfetch - media".
v3:
- STATELESS_COMPRESSION_CTRL on Xe2_HPM is also a multicast register,
do not define a XE2HPM_STATELESS_COMPRESSION_CTRL register. (Tejas)
Bspec: 72161
Cc: Matt Roper <matthew.d.roper@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com >
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-3-gustavo.sousa@intel.com
2024-09-23 10:46:30 -07:00
Gustavo Sousa
21ae035ae5
drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM
...
According to Bspec, Xe2 steering tables must be used for Xe2_HPM, just
as it is with Xe2_LPM. Update our driver to reflect that.
Bspec: 71186
Reviewed-by: Matt Roper <matthew.d.roper@intel.com >
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-2-gustavo.sousa@intel.com
2024-09-23 10:46:29 -07:00
Dnyaneshwar Bhadane
35667a0330
drm/xe/pciid: Add new PCI id for ARL
...
Add new PCI id for ARL platform.
v2: Fix typo in PCI id (SaiTeja)
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com >
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912115906.2730577-1-dnyaneshwar.bhadane@intel.com
2024-09-20 13:39:04 -07:00
Matthew Brost
dc0dce6d63
drm/xe: Use helper for ASID -> VM in GPU faults and access counters
...
Normalize both code paths with a helper. Fixes a possible leak access
counter path too.
Suggested-by: Matthew Auld <matthew.auld@intel.com >
Signed-off-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Matthew Auld <matthew.auld@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240918160503.2021315-1-matthew.brost@intel.com
2024-09-19 12:19:43 -07:00
Rodrigo Vivi
5b40191152
drm/xe/pciids: Add PVC's PCI device ID macros
...
Add PVC PCI IDs to the xe_pciids.h header. They're not yet used in the
driver.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com >
Cc: Lucas De Marchi <lucas.demarchi@intel.com >
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com >
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com >
Acked-by: Simona Vetter <simona.vetter@ffwll.ch >
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com >
Signed-off-by: Jani Nikula <jani.nikula@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/6ac1829493a53a3fec889c746648d627a0296892.1725624296.git.jani.nikula@intel.com
2024-09-19 14:39:58 +03:00
Ilia Levi
aa4e216827
drm/xe: memirq handler changes
...
Expose an interrupt processing handler for a single hw engine.
Refactor code to use this handler from the VF.
This handler also caters for the MSI-X mode, where the hardware engines
report interrupt source and status to the offset of engine instance zero
(this usage will be introduced in upcoming MSI-X enabling series).
Signed-off-by: Ilia Levi <ilia.levi@intel.com >
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-6-illevi@habana.ai
2024-09-19 10:15:40 +02:00
Ilia Levi
ef6103d20f
drm/xe: memirq infra changes for MSI-X
...
When using MSI-X, hw engines report interrupt status and source to engine
instance 0. For this scenario, in order to differentiate between the
engines, we need to pass different status/source pointers in the LRC.
The requirements on those pointers are:
- Interrupt status should be 4KiB aligned
- Interrupt source should be 64 bytes aligned
To accommodate this, we duplicate the current memirq page layout -
allocating a page for each engine instance and pass this page in the LRC.
Note that the same page can be reused for different engine types.
For example, an LRC executing on CCS #x will have pointers to page #x,
and an LRC executing on BCS #x will have the same pointers. Thus, to
locate the proper page, the pointer accessors were modified to receive
the hw engine.
Signed-off-by: Ilia Levi <ilia.levi@intel.com >
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-5-illevi@habana.ai
2024-09-19 10:14:20 +02:00
Ilia Levi
4157849ca3
drm/xe: move memirq out of VF
...
Up until now only VF used Memory Based Interrupts (memirq).
Moving it out of VF to cater for other usages, specifically MSI-X.
Signed-off-by: Ilia Levi <ilia.levi@intel.com >
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-4-illevi@habana.ai
2024-09-19 10:12:47 +02:00
Ilia Levi
6fa86e7ad4
drm/xe: Introduce xe_device_uses_memirq()
...
Simplify some memirq usage scenarios and asserts in memirq infrastructure.
Signed-off-by: Ilia Levi <ilia.levi@intel.com >
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-3-illevi@habana.ai
2024-09-19 10:10:26 +02:00
Ilia Levi
b46afdac45
drm/xe: Introduce dedicated config for memirq debug
...
Separate config for debugging memory based interrupts (memirq)
infrastructure.
Signed-off-by: Ilia Levi <ilia.levi@intel.com >
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-2-illevi@habana.ai
2024-09-19 10:03:38 +02:00
Matt Roper
58548b9110
drm/xe: Defer gt->mmio initialization until after multi-tile setup
...
With the recent xe_mmio redesign, tiles and GTs each have their own MMIO
accessor, with the GT inheriting some of the information (such as the
iomap pointer) from their containing tile. Given that non-root tiles
get initialized later than the root tile (and currently after the point
at which GT MMIO is initialized for _all_ GTs), we wind up incorrectly
inheriting uninitialized pointers for the initialization of GT MMIO for
GTs that reside on non-root tiles. This causes a driver crash on
multi-tile PVC platforms.
With the general xe_mmio redesign, it's now only necessary to do the
GT-level MMIO setup before the point we start reading/writing GT
registers. Move initialization of gt->mmio out of xe_info_init (which
runs before non-root tiles are initialized) and to the beginning of
where we start actually accessing the GTs themselves.
The high-level initialization flow now boils down to:
- General device init, software-only setup
- (no register access possible yet)
- Root tile initialization
- (access to device/tile0 registers possible via xe_root_tile_mmio())
- Initialization of non-root tiles
- (access to any tile's registers possible via tile->mmio)
- GT MMIO initialization, inheriting iomap from each GT's tile
- (access to any GT's registers possible via gt->mmio)
Fixes: fa599b8c95 ("drm/xe: Populate GT's mmio iomap from tile during init")
Reported-by: John Harrison <John.C.Harrison@Intel.com >
Signed-off-by: Matt Roper <matthew.d.roper@intel.com >
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240917221615.875962-2-matthew.d.roper@intel.com
2024-09-18 12:52:53 -07:00
Matthew Brost
1378c633a3
drm/xe: Convert to USM lock to rwsem
...
Remove contention from GPU fault path for ASID->VM lookup.
Signed-off-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Matthew Auld <matthew.auld@intel.com >
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240918054436.1971839-1-matthew.brost@intel.com
2024-09-18 08:46:24 -07:00
Animesh Manna
17d3243036
drm/xe: Revert "drm/i915: Disable DSB in Xe KMD"
...
This reverts commit c27f010aa1 .
After fix from [1] dsb timeout issue is not reproducible on local testing
with xe driver. Checking CI result to confirm and not for review.
[1] https://patchwork.freedesktop.org/series/130783/
Signed-off-by: Animesh Manna <animesh.manna@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240913114754.7956-3-maarten.lankhorst@linux.intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com >
Acked-by: Jani Nikula <jani.nikula@intel.com >
2024-09-18 16:10:10 +02:00
Maarten Lankhorst
71a3161e9d
drm/xe: Fix DSB buffer coherency
...
Add the scanout flag to force WC caching, and add the memory barrier
where needed.
Reviewed-by: Matthew Auld <matthew.auld@intel.com >
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240913114754.7956-2-maarten.lankhorst@linux.intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com >
2024-09-18 16:10:06 +02:00
Rodrigo Vivi
ec2d1539e1
drm/xe: Restore pci state upon resume
...
The pci state was saved, but not restored. Restore
right after the power state transition request like
every other driver.
v2: Use right fixes tag, since this was there initialy, but
accidentally removed.
Fixes: f6761c68c0 ("drm/xe/display: Improve s2idle handling.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com >
Cc: Lucas De Marchi <lucas.demarchi@intel.com >
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com >
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912214507.456897-1-rodrigo.vivi@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com >
2024-09-18 16:01:58 +02:00
Rodrigo Vivi
8a677d5b0a
drm/xe/display: Remove i915_drv.h include
...
Change HAS_DISPLAY towards intel_display and remove one of the
last includes of i915_drv.h in Xe.
Reviewed-by: Jani Nikula <jani.nikula@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240917203243.659393-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com >
2024-09-17 19:49:53 -04:00
Lucas De Marchi
bc67631872
drm/xe/rtp: Remove unneeded semicolon
...
Fix coccicheck report with regard to unneeded semicolon. This is
currently the only case according to
make coccicheck \
MODE=report \
COCCI=scripts/coccinelle/misc/semicolon.cocci \
M=drivers/gpu/drm/xe
Reported-by: kernel test robot <lkp@intel.com >
Closes: https://lore.kernel.org/oe-kbuild-all/202409151152.pJ4ukp5k-lkp@intel.com/
Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240916192149.855996-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com >
2024-09-16 12:58:26 -07:00
Matthew Auld
3717339274
drm/xe/vram: fix ccs offset calculation
...
Spec says SW is expected to round up to the nearest 128K, if not already
aligned for the CC unit view of CCS. We are seeing the assert sometimes
pop on BMG to tell us that there is a hole between GSM and CCS, as well
as popping other asserts with having a vram size with strange alignment,
which is likely caused by misaligned offset here.
v2 (Shuicheng):
- Do the round_up() on final SW address.
BSpec: 68023
Fixes: b5c2ca0372 ("drm/xe/xe2hpg: Determine flat ccs offset for vram")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com >
Cc: Lucas De Marchi <lucas.demarchi@intel.com >
Cc: Shuicheng Lin <shuicheng.lin@intel.com >
Cc: Matt Roper <matthew.d.roper@intel.com >
Cc: stable@vger.kernel.org # v6.10+
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Tested-by: Shuicheng Lin <shuicheng.lin@intel.com >
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240916084911.13119-2-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com >
2024-09-16 12:22:36 -07:00
He Lugang
fdc81c43f0
drm/xe: use devm_add_action_or_reset() helper
...
Use devm_add_action_or_reset() to release resources in case of failure,
because the cleanup function will be automatically called.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com >
Signed-off-by: He Lugang <helugang@uniontech.com >
Link: https://patchwork.freedesktop.org/patch/msgid/9631BC17D1E028A2+20240911102215.84865-1-helugang@uniontech.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com >
2024-09-16 12:35:15 -04:00
Michal Wajdeczko
20e3aa503f
drm/xe/pf: Allow to trigger VF GuC state restore from debugfs
...
For feature enabling and testing purposes, allow to restore saved
or replaced VF GuC state from debugfs, bypassing normal migration
flow. This is available only under strict debug config.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Cc: Michał Winiarski <michal.winiarski@intel.com >
Cc: Tomasz Lis <tomasz.lis@intel.com >
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-7-michal.wajdeczko@intel.com
2024-09-16 13:00:47 +02:00
Michal Wajdeczko
d620448fb5
drm/xe/pf: Allow to view and replace VF GuC state over debugfs
...
For feature enabling and testing purposes, allow to view saved VF
GuC state and to replace it, but only under strict debug config.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Cc: Michał Winiarski <michal.winiarski@intel.com >
Cc: Tomasz Lis <tomasz.lis@intel.com >
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-6-michal.wajdeczko@intel.com
2024-09-16 13:00:39 +02:00
Michal Wajdeczko
14423f08c3
drm/xe/pf: Save VF GuC state when pausing VF
...
Since usually pausing the VF is done as a first step to migrate
that VF, immediately save VF GuC state as a final step of the VF
pausing to have that data ready to export when needed.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Cc: Michał Winiarski <michal.winiarski@intel.com >
Cc: Tomasz Lis <tomasz.lis@intel.com >
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-5-michal.wajdeczko@intel.com
2024-09-16 13:00:31 +02:00
Michal Wajdeczko
d86e3737c7
drm/xe/pf: Add functions to save and restore VF GuC state
...
To successfully migrate a VM with attached GPU VF we also need to
migrate VF's GuC state. Add necessary functions that interacts with
GuC to save and restore a VF GuC state. We will start using them in
upcoming patches.
Since VF migration requires many more changes in the driver, enable
those functions only under debug config.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Cc: Michał Winiarski <michal.winiarski@intel.com >
Cc: Tomasz Lis <tomasz.lis@intel.com >
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240913120013.1924-1-michal.wajdeczko@intel.com
2024-09-16 13:00:22 +02:00
Michal Wajdeczko
804ce41f66
drm/xe/guc: Add PF2GUC_SAVE_RESTORE_VF to ABI
...
In upcoming patches we will add support to the PF driver to save
and restore a VF state maintained by the GuC to allow VF migration.
Add necessary H2G definitions to our GuC firmware ABI header.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Cc: Michał Winiarski <michal.winiarski@intel.com >
Cc: Tomasz Lis <tomasz.lis@intel.com >
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-3-michal.wajdeczko@intel.com
2024-09-16 13:00:08 +02:00
Michal Wajdeczko
02fdf821ed
drm/xe/guc: Fix GUC_{SUBMIT,FIRMWARE}_VER helper macros
...
Those macros rely on non-existing MAKE_VER_STRUCT macro, while the
correct one that should be used is named MAKE_GUC_VER_STRUCT.
Fixes: 4eb0aab6e4 ("drm/xe/guc: Bump minimum required GuC version to v70.29.2")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com >
Cc: Julia Filipchuk <julia.filipchuk@intel.com >
Cc: John Harrison <John.C.Harrison@Intel.com >
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-2-michal.wajdeczko@intel.com
2024-09-16 12:54:59 +02:00
Jiapeng Chong
cdb389a4c9
drm/xe/irq: Remove unneeded semicolon
...
Remove unnecessary semicolon in pick_engine_gt().
Reported-by: Abaci Robot <abaci@linux.alibaba.com >
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8757
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com >
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240913060254.26678-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com >
2024-09-13 15:53:05 -04:00
José Roberto de Souza
9ba0e0f30c
drm/xe/oa: Fix overflow in oa batch buffer
...
By default xe_bb_create_job() appends a MI_BATCH_BUFFER_END to batch
buffer, this is not a problem if batch buffer is only used once but
oa reuses the batch buffer for the same metric and at each call
it appends a MI_BATCH_BUFFER_END, printing the warning below and then
overflowing.
[ 381.072016] ------------[ cut here ]------------
[ 381.072019] xe 0000:00:02.0: [drm] Assertion `bb->len * 4 + bb_prefetch(q->gt) <= size` failed!
platform: LUNARLAKE subplatform: 1
graphics: Xe2_LPG / Xe2_HPG 20.04 step B0
media: Xe2_LPM / Xe2_HPM 20.00 step B0
tile: 0 VRAM 0 B
GT: 0 type 1
So here checking if batch buffer already have MI_BATCH_BUFFER_END if
not append it.
v2:
- simply fix, suggestion from Ashutosh
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com >
Signed-off-by: José Roberto de Souza <jose.souza@intel.com >
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240912153842.35813-1-jose.souza@intel.com
2024-09-13 09:04:13 -07:00
Yu Jiaoliang
bbb1ed0b44
drm/xe: Use ERR_CAST to return an error-valued pointer
...
Instead of directly casting and returning an error-valued pointer,
use ERR_CAST to make the error handling more explicit and improve
code clarity.
Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com >
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Signed-off-by: Matthew Brost <matthew.brost@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240906070109.1852860-1-yujiaoliang@vivo.com
2024-09-12 12:18:21 -07:00
Matthew Brost
f96dbf7c32
drm/xe: Do not run GPU page fault handler on a closed VM
...
Closing a VM removes page table memory thus we shouldn't touch page
tables when a VM is closed. Do not run the GPU page fault handler once
the VM is closed to avoid touching page tables.
Signed-off-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240911011820.825127-1-matthew.brost@intel.com
2024-09-12 12:17:55 -07:00
Matthew Auld
3b04c2cfd7
drm/xe/bo: add some annotations in bo_put()
...
If the put() triggers bo destroy then there is at least one potential
sleeping lock. Also annotate bos_lock and ggtt lock.
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com >
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com >
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-8-matthew.auld@intel.com
2024-09-12 09:27:31 +01:00
Matthew Auld
fbd73b7d2a
drm/xe/client: use mem_type from the current resource
...
Rather extract the mem_type from the current resource. Checking the
first potential placement doesn't really tell us where the bo is
currently allocated, especially if there are multiple potential
placements.
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com >
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com >
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-7-matthew.auld@intel.com
2024-09-12 09:27:30 +01:00
Matthew Auld
4f63d712fa
drm/xe/client: add missing bo locking in show_meminfo()
...
bo_meminfo() wants to inspect bo state like tt and the ttm resource,
however this state can change at any point leading to stuff like NPD and
UAF, if the bo lock is not held. Grab the bo lock when calling
bo_meminfo(), ensuring we drop any spinlocks first. In the case of
object_idr we now also need to hold a ref.
v2 (MattB)
- Also add xe_bo_assert_held()
Fixes: 0845233388 ("drm/xe: Implement fdinfo memory stats printing")
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com >
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com >
Cc: <stable@vger.kernel.org > # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-6-matthew.auld@intel.com
2024-09-12 09:27:29 +01:00
Matthew Auld
0083b8e6f1
drm/xe/client: fix deadlock in show_meminfo()
...
There is a real deadlock as well as sleeping in atomic() bug in here, if
the bo put happens to be the last ref, since bo destruction wants to
grab the same spinlock and sleeping locks. Fix that by dropping the ref
using xe_bo_put_deferred(), and moving the final commit outside of the
lock. Dropping the lock around the put is tricky since the bo can go
out of scope and delete itself from the list, making it difficult to
navigate to the next list entry.
Fixes: 0845233388 ("drm/xe: Implement fdinfo memory stats printing")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
Signed-off-by: Matthew Auld <matthew.auld@intel.com >
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com >
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com >
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com >
Cc: <stable@vger.kernel.org > # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com >
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-5-matthew.auld@intel.com
2024-09-12 09:27:28 +01:00