linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-02 15:43:35 -04:00

Author	SHA1	Message	Date
Matthew Auld	11bfc4a2cf	drm/xe/ct: drop irq usage of xa_erase() Unclear why disabling interrupts is needed here. Nothing seems to be touching fence_lookup and its corresponding lock from an irq so there should be no risk of deadlock. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-8-matthew.auld@intel.com	2024-10-03 08:34:21 +01:00
Matthew Auld	f040327238	drm/xe/guc_submit: fix xa_store() error checking Looks like we are meant to use xa_err() to extract the error encoded in the ptr. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-7-matthew.auld@intel.com	2024-10-03 08:34:20 +01:00
Matthew Auld	1aa4b78647	drm/xe/ct: fix xa_store() error checking Looks like we are meant to use xa_err() to extract the error encoded in the ptr. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-6-matthew.auld@intel.com	2024-10-03 08:34:19 +01:00
Matthew Auld	52789ce35c	drm/xe/ct: prevent UAF in send_recv() Ensure we serialize with completion side to prevent UAF with fence going out of scope on the stack, since we have no clue if it will fire after the timeout before we can erase from the xa. Also we have some dependent loads and stores for which we need the correct ordering, and we lack the needed barriers. Fix this by grabbing the ct->lock after the wait, which is also held by the completion side. v2 (Badal): - Also print done after acquiring the lock and seeing timeout. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-5-matthew.auld@intel.com	2024-10-03 08:34:18 +01:00
Matthew Brost	63e0695597	drm/xe: Fix memory leak when aborting binds Make sure to call xe_pt_update_ops_fini in xe_pt_update_ops_abort to free any memory the bind allocated. Caught by kmemleak when running Vulkan CTS tests on LNL. The leak seems to happen only when there's some kind of failure happening, like the lack of memory. Example output: unreferenced object 0xffff9120bdf62000 (size 8192): comm "deqp-vk", pid 115008, jiffies 4310295728 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 1b 05 f9 28 01 00 00 40 ...........(...@ 00 00 00 00 00 00 00 00 1b 15 f9 28 01 00 00 40 ...........(...@ backtrace (crc 7a56be79): [<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0 [<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe] [<ffffffffc08e8309>] xe_pt_insert_entry+0xb9/0x140 [xe] [<ffffffffc08eab6d>] xe_pt_stage_bind_entry+0x12d/0x5b0 [xe] [<ffffffffc08ecbca>] xe_pt_walk_range+0xea/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08e9eff>] xe_pt_stage_bind.constprop.0+0x25f/0x580 [xe] [<ffffffffc08eb21a>] bind_op_prepare+0xea/0x6e0 [xe] [<ffffffffc08ebab8>] xe_pt_update_ops_prepare+0x1c8/0x440 [xe] [<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe] [<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe] [<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe] [<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm] unreferenced object 0xffff9120bdf72000 (size 8192): comm "deqp-vk", pid 115008, jiffies 4310295728 hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace (crc 23b2f0b5): [<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0 [<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe] [<ffffffffc08e8453>] xe_pt_stage_unbind_post_descend+0xb3/0x150 [xe] [<ffffffffc08ecd26>] xe_pt_walk_range+0x246/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08ece31>] xe_pt_walk_shared+0xc1/0x110 [xe] [<ffffffffc08e7b2a>] xe_pt_stage_unbind+0x9a/0xd0 [xe] [<ffffffffc08e913d>] unbind_op_prepare+0xdd/0x270 [xe] [<ffffffffc08eb9f6>] xe_pt_update_ops_prepare+0x106/0x440 [xe] [<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe] [<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe] [<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe] [<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm] [<ffffffffc05e95a0>] drm_ioctl+0x280/0x4e0 [drm] Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2877 Fixes: `a708f6501c` ("drm/xe: Update PT layer with better error handling") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927232228.3255246-1-matthew.brost@intel.com	2024-10-02 06:32:51 -07:00
Zhanjun Dong	59a1c9c7e1	drm/xe: Prevent null pointer access in xe_migrate_copy xe_migrate_copy designed to copy content of TTM resources. When source resource is null, it will trigger a NULL pointer dereference in xe_migrate_copy. To avoid this situation, update lacks source flag to true for this case, the flag will trigger xe_migrate_clear rather than xe_migrate_copy. Issue trace: <7> [317.089847] xe 0000:00:02.0: [drm:xe_migrate_copy [xe]] Pass 14, sizes: 4194304 & 4194304 <7> [317.089945] xe 0000:00:02.0: [drm:xe_migrate_copy [xe]] Pass 15, sizes: 4194304 & 4194304 <1> [317.128055] BUG: kernel NULL pointer dereference, address: 0000000000000010 <1> [317.128064] #PF: supervisor read access in kernel mode <1> [317.128066] #PF: error_code(0x0000) - not-present page <6> [317.128069] PGD 0 P4D 0 <4> [317.128071] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI <4> [317.128074] CPU: 1 UID: 0 PID: 1440 Comm: kunit_try_catch Tainted: G U N 6.11.0-rc7-xe #1 <4> [317.128078] Tainted: [U]=USER, [N]=TEST <4> [317.128080] Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3221.D80.2407291239 07/29/2024 <4> [317.128082] RIP: 0010:xe_migrate_copy+0x66/0x13e0 [xe] <4> [317.128158] Code: 00 00 48 89 8d e0 fe ff ff 48 8b 40 10 4c 89 85 c8 fe ff ff 44 88 8d bd fe ff ff 65 48 8b 3c 25 28 00 00 00 48 89 7d d0 31 ff <8b> 79 10 48 89 85 a0 fe ff ff 48 8b 00 48 89 b5 d8 fe ff ff 83 ff <4> [317.128162] RSP: 0018:ffffc9000167f9f0 EFLAGS: 00010246 <4> [317.128164] RAX: ffff8881120d8028 RBX: ffff88814d070428 RCX: 0000000000000000 <4> [317.128166] RDX: ffff88813cb99c00 RSI: 0000000004000000 RDI: 0000000000000000 <4> [317.128168] RBP: ffffc9000167fbb8 R08: ffff88814e7b1f08 R09: 0000000000000001 <4> [317.128170] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88814e7b1f08 <4> [317.128172] R13: ffff88814e7b1f08 R14: ffff88813cb99c00 R15: 0000000000000001 <4> [317.128174] FS: 0000000000000000(0000) GS:ffff88846f280000(0000) knlGS:0000000000000000 <4> [317.128176] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [317.128178] CR2: 0000000000000010 CR3: 000000011f676004 CR4: 0000000000770ef0 <4> [317.128180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4> [317.128182] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 <4> [317.128184] PKRU: 55555554 <4> [317.128185] Call Trace: <4> [317.128187] <TASK> <4> [317.128189] ? show_regs+0x67/0x70 <4> [317.128194] ? __die_body+0x20/0x70 <4> [317.128196] ? __die+0x2b/0x40 <4> [317.128198] ? page_fault_oops+0x15f/0x4e0 <4> [317.128203] ? do_user_addr_fault+0x3fb/0x970 <4> [317.128205] ? lock_acquire+0xc7/0x2e0 <4> [317.128209] ? exc_page_fault+0x87/0x2b0 <4> [317.128212] ? asm_exc_page_fault+0x27/0x30 <4> [317.128216] ? xe_migrate_copy+0x66/0x13e0 [xe] <4> [317.128263] ? __lock_acquire+0xb9d/0x26f0 <4> [317.128265] ? __lock_acquire+0xb9d/0x26f0 <4> [317.128267] ? sg_free_append_table+0x20/0x80 <4> [317.128271] ? lock_acquire+0xc7/0x2e0 <4> [317.128273] ? mark_held_locks+0x4d/0x80 <4> [317.128275] ? trace_hardirqs_on+0x1e/0xd0 <4> [317.128278] ? _raw_spin_unlock_irqrestore+0x31/0x60 <4> [317.128281] ? __pm_runtime_resume+0x60/0xa0 <4> [317.128284] xe_bo_move+0x682/0xc50 [xe] <4> [317.128315] ? lock_is_held_type+0xaa/0x120 <4> [317.128318] ttm_bo_handle_move_mem+0xe5/0x1a0 [ttm] <4> [317.128324] ttm_bo_validate+0xd1/0x1a0 [ttm] <4> [317.128328] shrink_test_run_device+0x721/0xc10 [xe] <4> [317.128360] ? find_held_lock+0x31/0x90 <4> [317.128363] ? lock_release+0xd1/0x2a0 <4> [317.128365] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit] <4> [317.128370] xe_bo_shrink_kunit+0x11/0x20 [xe] <4> [317.128397] kunit_try_run_case+0x6e/0x150 [kunit] <4> [317.128400] ? trace_hardirqs_on+0x1e/0xd0 <4> [317.128402] ? _raw_spin_unlock_irqrestore+0x31/0x60 <4> [317.128404] kunit_generic_run_threadfn_adapter+0x1e/0x40 [kunit] <4> [317.128407] kthread+0xf5/0x130 <4> [317.128410] ? __pfx_kthread+0x10/0x10 <4> [317.128412] ret_from_fork+0x39/0x60 <4> [317.128415] ? __pfx_kthread+0x10/0x10 <4> [317.128416] ret_from_fork_asm+0x1a/0x30 <4> [317.128420] </TASK> Fixes: `266c858852` ("drm/xe/xe2: Handle flat ccs move for igfx.") Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927161308.862323-2-zhanjun.dong@intel.com	2024-10-01 14:27:07 -07:00
Jani Nikula	ff35237de5	drm/xe/compat: remove unused i915_gpu_error.h The last user of the compat header was removed in commit `d6b933912d` ("drm/i915/dmc: convert intel_dmc_print_error_state() to drm_printer"). Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930164052.3862911-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-10-01 12:30:03 +03:00
José Roberto de Souza	0c8650b09a	drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close Mesa testing on Xe2+ revealed that when OA metrics are collected for an exec_queue, after the OA stream is closed, future batch buffers submitted on that exec_queue do not complete. Not resetting OAC_CONTEXT_ENABLE on OA stream close resolves these hangs and should not have any adverse effects. v2: Make the change that we don't reset the bit clearer (Ashutosh) Also make the same fix for OAC as OAR (Ashutosh) Bspec: 60314 Fixes: `2f4a730fcd` ("drm/xe/oa: Add OAR support") Fixes: `14e077f800` ("drm/xe/oa: Add OAC support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2821 Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924213713.3497992-1-ashutosh.dixit@intel.com	2024-09-27 10:10:12 -07:00
Matthew Auld	16536582dd	drm/xe/queue: move xa_alloc to prevent UAF Evil user can guess the next id of the queue before the ioctl completes and then call queue destroy ioctl to trigger UAF since create ioctl is still referencing the same queue. Move the xa_alloc all the way to the end to prevent this. v2: - Rebase Fixes: `2149ded630` ("drm/xe: Fix use after free when client stats are captured") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-4-matthew.auld@intel.com	2024-09-27 09:28:59 +01:00
Matthew Auld	dcfd397132	drm/xe/vm: move xa_alloc to prevent UAF Evil user can guess the next id of the vm before the ioctl completes and then call vm destroy ioctl to trigger UAF since create ioctl is still referencing the same vm. Move the xa_alloc all the way to the end to prevent this. v2: - Rebase Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-3-matthew.auld@intel.com	2024-09-27 09:28:58 +01:00
Matthew Brost	8ec5a4e5ce	drm/xe: Resume TDR after GT reset Not starting the TDR after GT reset on exec queue which have been restarted can lead to jobs being able to be run forever. Fix this by restarting the TDR. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240724235919.1917216-1-matthew.brost@intel.com	2024-09-27 00:07:51 -07:00
Matt Roper	ee615c2bac	drm/xe: Move IRQ-related registers to dedicated header IRQ registers have a well-defined scope and make sense to collect in a dedicated header file. This also reduces confusion about the GT IRQ registers --- even though those registers relate to the GTs, they actually live outside the GT (in the sgunit) and thus do not need to worry about GT-specific register concepts like forcewake, steering, etc. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240923214514.2031410-2-matthew.d.roper@intel.com	2024-09-26 10:27:07 -07:00
Matthew Auld	861108666c	drm/xe: fix UAF around queue destruction We currently do stuff like queuing the final destruction step on a random system wq, which will outlive the driver instance. With bad timing we can teardown the driver with one or more work workqueue still being alive leading to various UAF splats. Add a fini step to ensure user queues are properly torn down. At this point GuC should already be nuked so queue itself should no longer be referenced from hw pov. v2 (Matt B) - Looks much safer to use a waitqueue and then just wait for the xa_array to become empty before triggering the drain. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2317 Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240923145647.77707-2-matthew.auld@intel.com	2024-09-26 14:30:30 +01:00
Matthew Auld	d28af0b6b9	drm/xe/guc_submit: add missing locking in wedged_fini Any non-wedged queue can have a zero refcount here and can be running concurrently with an async queue destroy, therefore dereferencing the queue ptr to check wedge status after the lookup can trigger UAF if queue is not wedged. Fix this by keeping the submission_state lock held around the check to postpone the free and make the check safe, before dropping again around the put() to avoid the deadlock. Fixes: `8ed9aaae39` ("drm/xe: Force wedged state and block GT reset upon any GPU hang") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924150947.118433-2-matthew.auld@intel.com	2024-09-26 14:28:06 +01:00
Matthew Brost	fe4f5d4b66	drm/xe: Clean up VM / exec queue file lock usage. Both the VM / exec queue file lock protect the lookup and reference to the object, nothing more. These locks are not intended anything else underneath them. XA have their own locking too, so no need to take the VM / exec queue file lock aside from when doing a lookup and reference get. Add some kernel doc to make this clear and cleanup a few typos too. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240921011712.2681510-1-matthew.brost@intel.com	2024-09-24 09:03:42 -07:00
Gustavo Sousa	876253165f	drm/xe/xe2: Add performance tuning for L3 cache flushing A recommended performance tuning for LNL related to L3 cache flushing was recently introduced in Bspec. Implement it. Unlike the other existing tuning settings, we limit this one for LNL only, since there is no info about whether this would be applicable to other platforms yet. In the future we can come back and use IP version ranges if applicable. v2: - Fix reference to Bspec. (Sai Teja, Tejas) - Use correct register name for "Tuning: L3 RW flush all Cache". (Sai Teja) - Use SCRATCH3_LBCF (with the underscore) for better readability. v3: - Limit setting to LNL only. (Matt) Bspec: 72161 Cc: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-5-gustavo.sousa@intel.com	2024-09-23 10:46:31 -07:00
Gustavo Sousa	f5b463fd7c	drm/xe/xe2: Assume tuning settings also apply for future media GT We already make the assumption that recommended tuning settings for primary GT on Xe2 will also apply for future releases. Let's make the same assumption for the media GT. We can come back and define closed ranges when that becomes necessary. Bspec: 72161 Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-4-gustavo.sousa@intel.com	2024-09-23 10:46:31 -07:00
Gustavo Sousa	e1f813947c	drm/xe/xe2: Extend performance tuning to media GT With exception of "Tuning: L3 cache - media", we are currently applying recommended performance tuning settings only for the primary GT. Let's also implement them for the media GT when applicable. According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist only in Xe2_LPM and their offsets do not match their primary GT counterparts. Furthermore, the range where CCCHKNREG1 belongs is not listed as a multicast range on the media GT. As such, we need to have Xe2_LPM-specific definitions for those registers and apply the setting only for that specific IP. Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the offset on the media GT matches the one on the primary one. So we can simply have a copy of "Tuning: Stateless compression control" for the media GT. v2: - Fix implementation with respect to multicast vs non-multicast registers. (Matt) - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning: Compression Overfetch - media". v3: - STATELESS_COMPRESSION_CTRL on Xe2_HPM is also a multicast register, do not define a XE2HPM_STATELESS_COMPRESSION_CTRL register. (Tejas) Bspec: 72161 Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-3-gustavo.sousa@intel.com	2024-09-23 10:46:30 -07:00
Gustavo Sousa	21ae035ae5	drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM According to Bspec, Xe2 steering tables must be used for Xe2_HPM, just as it is with Xe2_LPM. Update our driver to reflect that. Bspec: 71186 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-2-gustavo.sousa@intel.com	2024-09-23 10:46:29 -07:00
Dnyaneshwar Bhadane	35667a0330	drm/xe/pciid: Add new PCI id for ARL Add new PCI id for ARL platform. v2: Fix typo in PCI id (SaiTeja) Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912115906.2730577-1-dnyaneshwar.bhadane@intel.com	2024-09-20 13:39:04 -07:00
Matthew Brost	dc0dce6d63	drm/xe: Use helper for ASID -> VM in GPU faults and access counters Normalize both code paths with a helper. Fixes a possible leak access counter path too. Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918160503.2021315-1-matthew.brost@intel.com	2024-09-19 12:19:43 -07:00
Rodrigo Vivi	5b40191152	drm/xe/pciids: Add PVC's PCI device ID macros Add PVC PCI IDs to the xe_pciids.h header. They're not yet used in the driver. Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6ac1829493a53a3fec889c746648d627a0296892.1725624296.git.jani.nikula@intel.com	2024-09-19 14:39:58 +03:00
Ilia Levi	aa4e216827	drm/xe: memirq handler changes Expose an interrupt processing handler for a single hw engine. Refactor code to use this handler from the VF. This handler also caters for the MSI-X mode, where the hardware engines report interrupt source and status to the offset of engine instance zero (this usage will be introduced in upcoming MSI-X enabling series). Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-6-illevi@habana.ai	2024-09-19 10:15:40 +02:00
Ilia Levi	ef6103d20f	drm/xe: memirq infra changes for MSI-X When using MSI-X, hw engines report interrupt status and source to engine instance 0. For this scenario, in order to differentiate between the engines, we need to pass different status/source pointers in the LRC. The requirements on those pointers are: - Interrupt status should be 4KiB aligned - Interrupt source should be 64 bytes aligned To accommodate this, we duplicate the current memirq page layout - allocating a page for each engine instance and pass this page in the LRC. Note that the same page can be reused for different engine types. For example, an LRC executing on CCS #x will have pointers to page #x, and an LRC executing on BCS #x will have the same pointers. Thus, to locate the proper page, the pointer accessors were modified to receive the hw engine. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-5-illevi@habana.ai	2024-09-19 10:14:20 +02:00
Ilia Levi	4157849ca3	drm/xe: move memirq out of VF Up until now only VF used Memory Based Interrupts (memirq). Moving it out of VF to cater for other usages, specifically MSI-X. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-4-illevi@habana.ai	2024-09-19 10:12:47 +02:00
Ilia Levi	6fa86e7ad4	drm/xe: Introduce xe_device_uses_memirq() Simplify some memirq usage scenarios and asserts in memirq infrastructure. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-3-illevi@habana.ai	2024-09-19 10:10:26 +02:00
Ilia Levi	b46afdac45	drm/xe: Introduce dedicated config for memirq debug Separate config for debugging memory based interrupts (memirq) infrastructure. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-2-illevi@habana.ai	2024-09-19 10:03:38 +02:00
Matt Roper	58548b9110	drm/xe: Defer gt->mmio initialization until after multi-tile setup With the recent xe_mmio redesign, tiles and GTs each have their own MMIO accessor, with the GT inheriting some of the information (such as the iomap pointer) from their containing tile. Given that non-root tiles get initialized later than the root tile (and currently after the point at which GT MMIO is initialized for _all_ GTs), we wind up incorrectly inheriting uninitialized pointers for the initialization of GT MMIO for GTs that reside on non-root tiles. This causes a driver crash on multi-tile PVC platforms. With the general xe_mmio redesign, it's now only necessary to do the GT-level MMIO setup before the point we start reading/writing GT registers. Move initialization of gt->mmio out of xe_info_init (which runs before non-root tiles are initialized) and to the beginning of where we start actually accessing the GTs themselves. The high-level initialization flow now boils down to: - General device init, software-only setup - (no register access possible yet) - Root tile initialization - (access to device/tile0 registers possible via xe_root_tile_mmio()) - Initialization of non-root tiles - (access to any tile's registers possible via tile->mmio) - GT MMIO initialization, inheriting iomap from each GT's tile - (access to any GT's registers possible via gt->mmio) Fixes: `fa599b8c95` ("drm/xe: Populate GT's mmio iomap from tile during init") Reported-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240917221615.875962-2-matthew.d.roper@intel.com	2024-09-18 12:52:53 -07:00
Matthew Brost	1378c633a3	drm/xe: Convert to USM lock to rwsem Remove contention from GPU fault path for ASID->VM lookup. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918054436.1971839-1-matthew.brost@intel.com	2024-09-18 08:46:24 -07:00
Animesh Manna	17d3243036	drm/xe: Revert "drm/i915: Disable DSB in Xe KMD" This reverts commit `c27f010aa1`. After fix from [1] dsb timeout issue is not reproducible on local testing with xe driver. Checking CI result to confirm and not for review. [1] https://patchwork.freedesktop.org/series/130783/ Signed-off-by: Animesh Manna <animesh.manna@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240913114754.7956-3-maarten.lankhorst@linux.intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com>	2024-09-18 16:10:10 +02:00
Maarten Lankhorst	71a3161e9d	drm/xe: Fix DSB buffer coherency Add the scanout flag to force WC caching, and add the memory barrier where needed. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240913114754.7956-2-maarten.lankhorst@linux.intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2024-09-18 16:10:06 +02:00
Rodrigo Vivi	ec2d1539e1	drm/xe: Restore pci state upon resume The pci state was saved, but not restored. Restore right after the power state transition request like every other driver. v2: Use right fixes tag, since this was there initialy, but accidentally removed. Fixes: `f6761c68c0` ("drm/xe/display: Improve s2idle handling.") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912214507.456897-1-rodrigo.vivi@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2024-09-18 16:01:58 +02:00
Rodrigo Vivi	8a677d5b0a	drm/xe/display: Remove i915_drv.h include Change HAS_DISPLAY towards intel_display and remove one of the last includes of i915_drv.h in Xe. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240917203243.659393-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-17 19:49:53 -04:00
Lucas De Marchi	bc67631872	drm/xe/rtp: Remove unneeded semicolon Fix coccicheck report with regard to unneeded semicolon. This is currently the only case according to make coccicheck \ MODE=report \ COCCI=scripts/coccinelle/misc/semicolon.cocci \ M=drivers/gpu/drm/xe Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409151152.pJ4ukp5k-lkp@intel.com/ Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240916192149.855996-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-09-16 12:58:26 -07:00
Matthew Auld	3717339274	drm/xe/vram: fix ccs offset calculation Spec says SW is expected to round up to the nearest 128K, if not already aligned for the CC unit view of CCS. We are seeing the assert sometimes pop on BMG to tell us that there is a hole between GSM and CCS, as well as popping other asserts with having a vram size with strange alignment, which is likely caused by misaligned offset here. v2 (Shuicheng): - Do the round_up() on final SW address. BSpec: 68023 Fixes: `b5c2ca0372` ("drm/xe/xe2hpg: Determine flat ccs offset for vram") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: stable@vger.kernel.org # v6.10+ Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Tested-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240916084911.13119-2-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-09-16 12:22:36 -07:00
He Lugang	fdc81c43f0	drm/xe: use devm_add_action_or_reset() helper Use devm_add_action_or_reset() to release resources in case of failure, because the cleanup function will be automatically called. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: He Lugang <helugang@uniontech.com> Link: https://patchwork.freedesktop.org/patch/msgid/9631BC17D1E028A2+20240911102215.84865-1-helugang@uniontech.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-16 12:35:15 -04:00
Michal Wajdeczko	20e3aa503f	drm/xe/pf: Allow to trigger VF GuC state restore from debugfs For feature enabling and testing purposes, allow to restore saved or replaced VF GuC state from debugfs, bypassing normal migration flow. This is available only under strict debug config. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-7-michal.wajdeczko@intel.com	2024-09-16 13:00:47 +02:00
Michal Wajdeczko	d620448fb5	drm/xe/pf: Allow to view and replace VF GuC state over debugfs For feature enabling and testing purposes, allow to view saved VF GuC state and to replace it, but only under strict debug config. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-6-michal.wajdeczko@intel.com	2024-09-16 13:00:39 +02:00
Michal Wajdeczko	14423f08c3	drm/xe/pf: Save VF GuC state when pausing VF Since usually pausing the VF is done as a first step to migrate that VF, immediately save VF GuC state as a final step of the VF pausing to have that data ready to export when needed. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-5-michal.wajdeczko@intel.com	2024-09-16 13:00:31 +02:00
Michal Wajdeczko	d86e3737c7	drm/xe/pf: Add functions to save and restore VF GuC state To successfully migrate a VM with attached GPU VF we also need to migrate VF's GuC state. Add necessary functions that interacts with GuC to save and restore a VF GuC state. We will start using them in upcoming patches. Since VF migration requires many more changes in the driver, enable those functions only under debug config. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240913120013.1924-1-michal.wajdeczko@intel.com	2024-09-16 13:00:22 +02:00
Michal Wajdeczko	804ce41f66	drm/xe/guc: Add PF2GUC_SAVE_RESTORE_VF to ABI In upcoming patches we will add support to the PF driver to save and restore a VF state maintained by the GuC to allow VF migration. Add necessary H2G definitions to our GuC firmware ABI header. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-3-michal.wajdeczko@intel.com	2024-09-16 13:00:08 +02:00
Michal Wajdeczko	02fdf821ed	drm/xe/guc: Fix GUC_{SUBMIT,FIRMWARE}_VER helper macros Those macros rely on non-existing MAKE_VER_STRUCT macro, while the correct one that should be used is named MAKE_GUC_VER_STRUCT. Fixes: `4eb0aab6e4` ("drm/xe/guc: Bump minimum required GuC version to v70.29.2") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-2-michal.wajdeczko@intel.com	2024-09-16 12:54:59 +02:00
Jiapeng Chong	cdb389a4c9	drm/xe/irq: Remove unneeded semicolon Remove unnecessary semicolon in pick_engine_gt(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8757 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240913060254.26678-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-13 15:53:05 -04:00
José Roberto de Souza	9ba0e0f30c	drm/xe/oa: Fix overflow in oa batch buffer By default xe_bb_create_job() appends a MI_BATCH_BUFFER_END to batch buffer, this is not a problem if batch buffer is only used once but oa reuses the batch buffer for the same metric and at each call it appends a MI_BATCH_BUFFER_END, printing the warning below and then overflowing. [ 381.072016] ------------[ cut here ]------------ [ 381.072019] xe 0000:00:02.0: [drm] Assertion `bb->len * 4 + bb_prefetch(q->gt) <= size` failed! platform: LUNARLAKE subplatform: 1 graphics: Xe2_LPG / Xe2_HPG 20.04 step B0 media: Xe2_LPM / Xe2_HPM 20.00 step B0 tile: 0 VRAM 0 B GT: 0 type 1 So here checking if batch buffer already have MI_BATCH_BUFFER_END if not append it. v2: - simply fix, suggestion from Ashutosh Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912153842.35813-1-jose.souza@intel.com	2024-09-13 09:04:13 -07:00
Yu Jiaoliang	bbb1ed0b44	drm/xe: Use ERR_CAST to return an error-valued pointer Instead of directly casting and returning an error-valued pointer, use ERR_CAST to make the error handling more explicit and improve code clarity. Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240906070109.1852860-1-yujiaoliang@vivo.com	2024-09-12 12:18:21 -07:00
Matthew Brost	f96dbf7c32	drm/xe: Do not run GPU page fault handler on a closed VM Closing a VM removes page table memory thus we shouldn't touch page tables when a VM is closed. Do not run the GPU page fault handler once the VM is closed to avoid touching page tables. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240911011820.825127-1-matthew.brost@intel.com	2024-09-12 12:17:55 -07:00
Matthew Auld	3b04c2cfd7	drm/xe/bo: add some annotations in bo_put() If the put() triggers bo destroy then there is at least one potential sleeping lock. Also annotate bos_lock and ggtt lock. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-8-matthew.auld@intel.com	2024-09-12 09:27:31 +01:00
Matthew Auld	fbd73b7d2a	drm/xe/client: use mem_type from the current resource Rather extract the mem_type from the current resource. Checking the first potential placement doesn't really tell us where the bo is currently allocated, especially if there are multiple potential placements. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-7-matthew.auld@intel.com	2024-09-12 09:27:30 +01:00
Matthew Auld	4f63d712fa	drm/xe/client: add missing bo locking in show_meminfo() bo_meminfo() wants to inspect bo state like tt and the ttm resource, however this state can change at any point leading to stuff like NPD and UAF, if the bo lock is not held. Grab the bo lock when calling bo_meminfo(), ensuring we drop any spinlocks first. In the case of object_idr we now also need to hold a ref. v2 (MattB) - Also add xe_bo_assert_held() Fixes: `0845233388` ("drm/xe: Implement fdinfo memory stats printing") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-6-matthew.auld@intel.com	2024-09-12 09:27:29 +01:00
Matthew Auld	0083b8e6f1	drm/xe/client: fix deadlock in show_meminfo() There is a real deadlock as well as sleeping in atomic() bug in here, if the bo put happens to be the last ref, since bo destruction wants to grab the same spinlock and sleeping locks. Fix that by dropping the ref using xe_bo_put_deferred(), and moving the final commit outside of the lock. Dropping the lock around the put is tricky since the bo can go out of scope and delete itself from the list, making it difficult to navigate to the next list entry. Fixes: `0845233388` ("drm/xe: Implement fdinfo memory stats printing") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240911155527.178910-5-matthew.auld@intel.com	2024-09-12 09:27:28 +01:00

1 2 3 4 5 ...

1297921 Commits