linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-18 14:58:25 -04:00

Author	SHA1	Message	Date
Maarten Lankhorst	501d799a47	drm/xe: Wire up device shutdown handler The system is turning off, and we should probably put the device in a safe power state. We don't need to evict VRAM or suspend running jobs to a safe state, as the device is rebooted anyway. This does not imply the system is necessarily reset, as we can kexec into a new kernel. Without shutting down, things like USB Type-C may mysteriously start failing. References: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/3500 Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [mlankhorst: Add !xe_driver_flr_disabled assert] Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240905150052.174895-4-maarten.lankhorst@linux.intel.com	2024-09-11 19:07:57 +02:00
Maarten Lankhorst	f90491d4b6	drm/xe: Remove runtime argument from display s/r functions The previous change ensures that pm_suspend is only called when suspending or resuming. This ensures no further bugs like those in the previous commit. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240905150052.174895-3-maarten.lankhorst@linux.intel.com	2024-09-11 18:31:13 +02:00
Maarten Lankhorst	474f64cb98	drm/xe: Fix missing conversion to xe_display_pm_runtime_resume This error path was missed when converting away from xe_display_pm_resume with second argument. Fixes: `66a0f6b9f5` ("drm/xe/display: handle HPD polling in display runtime suspend/resume") Cc: Arun R Murthy <arun.r.murthy@intel.com> Cc: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240905150052.174895-2-maarten.lankhorst@linux.intel.com	2024-09-11 18:31:09 +02:00
Tejas Upadhyay	9db969b36b	drm/xe/xe2hpg: Add Wa_15016589081 Wa_15016589081 applies to xe2_hpg renderCS V2(Gustavo) - rename bit macro Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240904101333.2049655-1-tejas.upadhyay@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-09-11 13:09:00 +02:00
Thomas Hellström	5a90b60db5	drm/xe: Add a xe_bo subtest for shrinking / swapping Add a subtest that tries to allocate twice the amount of buffer object memory available, write data to it and then read all the data back verifying data integrity. In order to be able to do this on systems that have no or not enough swap-space available, allocate some memory as purgeable, and introduce a function to purge such memory from the TTM swap_notify path. this test is intended to add test coverage to the current bo swap path and upcoming shrinking path. The test has previously been part of the xe bo shrinker series. v2: - Skip test if the execution time is expected to be too long. - Minor code cleanups. v3: - Print random seed. (Matthew Auld) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240909085654.5064-1-thomas.hellstrom@linux.intel.com	2024-09-11 09:11:33 +02:00
Arnd Bergmann	1c129ed07d	drm/xe: fix build warning with CONFIG_PM=n The 'runtime_status' field is an implementation detail of the power management code, so a device driver should not normally touch this: drivers/gpu/drm/xe/xe_pm.c: In function 'xe_pm_suspending_or_resuming': drivers/gpu/drm/xe/xe_pm.c:606:26: error: 'struct dev_pm_info' has no member named 'runtime_status' 606 \| return dev->power.runtime_status == RPM_SUSPENDING \|\| \| ^ drivers/gpu/drm/xe/xe_pm.c:607:27: error: 'struct dev_pm_info' has no member named 'runtime_status' 607 \| dev->power.runtime_status == RPM_RESUMING; \| ^ drivers/gpu/drm/xe/xe_pm.c:608:1: error: control reaches end of non-void function [-Werror=return-type] Add an #ifdef check to avoid the build regression. Fixes: `cb85e39dc5` ("drm/xe: Suppress missing outer rpm protection warning") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240909202521.1018439-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-09 16:49:18 -04:00
Michal Wajdeczko	f2710d9572	drm/xe: Don't keep stale pointer to bo->ggtt_node When we fail to map a BO in the GGTT, we release our GGTT node placeholder, but leave stale bo->ggtt_node pointer to it, which triggers an assert immediately followed by a crash, due to UAF: [ ] xe 0000:00:02.0: [drm] Assertion `bo->ggtt_node->base.size == bo->size` failed! [ ] WARNING: CPU: 4 PID: 126 at drivers/gpu/drm/xe/xe_ggtt.c:689 xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] RIP: 0010:xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] Call Trace: [ ] <TASK> [ ] ? __warn+0x88/0x190 [ ] ? xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] ? report_bug+0x1c3/0x1d0 [ ] ? handle_bug+0x42/0x70 [ ] ? exc_invalid_op+0x14/0x70 [ ] ? asm_exc_invalid_op+0x16/0x20 [ ] ? xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] ? xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] xe_ttm_bo_destroy+0x11f/0x260 [xe] [ ] ? ttm_bo_release+0x31c/0x350 [ttm] [ ] ? __mutex_unlock_slowpath+0x35/0x270 [ ] __xe_bo_create_locked+0x4a0/0x550 [xe] [ ] ? mark_held_locks+0x49/0x80 [ ] xe_bo_create_pin_map_at+0x37/0x200 [xe] [ ] xe_bo_create_pin_map+0x11/0x20 [xe] While around, for similar reason, also don't keep an error pointer if we fail to allocate ggtt_node placeholder. Fixes: `34e804220f` ("drm/xe: Make xe_ggtt_node struct independent") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240906220348.1836-1-michal.wajdeczko@intel.com	2024-09-09 20:08:55 +02:00
Lucas De Marchi	3fe62f7bfd	drm/xe: Mark reserved engines in snapshot When printing <debufs>/gt*/hw_engines, it's useful to mark what engines are reserved so it doesn't mislead developers while debugging. Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240906205609.3131330-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-09-09 10:33:14 -07:00
Lucas De Marchi	ceb29504dd	drm/xe: Fix arg to pci_iomap() Commit `2d8865b277` ("drm/xe: Move BAR definitions to dedicated file") moved the BAR definition to the header, but replaced the wrong arg in the pci_iomap() function - the last arg is actuall the length, not the BAR. Luckily GTTMMADR_BAR == 0, so it still works. Fix the argument to avoid confusion. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240906032507.2952859-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-09-09 10:32:32 -07:00
Lucas De Marchi	0c841e47d8	drm/xe: Update runtime detection of has_flat_ccs It's confusing to have a set function that actually probes the hardware rather than receiving a parameter. Rename it to probe along with prefix removal and comment in the relevant places that the has_flat_ccs flag may be overridden in runtime. While at it, fix the mixed declaration of struct xe_gt. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240904162238.2831202-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-09-06 10:19:42 -07:00
Lucas De Marchi	b43723f864	drm/xe: Cleanup has_flat_ccs handling The flag is set in XE_HP_FEATURES, but then overridden in all but one xe_graphics_desc. Make it set only where needed. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240904162238.2831202-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-09-06 10:19:36 -07:00
Dafna Hirschfeld	249df8cbec	drm/xe: fix missing 'xe_vm_put' Fix memleak caused by missing xe_vm_put Fixes: `852856e3b6` ("drm/xe: Use reserved copy engine for user binds on faulting devices") Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240901044227.1177211-1-dhirschfeld@habana.ai Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-06 12:33:29 -04:00
Rodrigo Vivi	cb85e39dc5	drm/xe: Suppress missing outer rpm protection warning Do not raise a WARN if we are likely within suspending or resuming path. This is likely this false positive: rpm_status: 0000:03:00.0 status=RPM_SUSPENDING console: xe_bo_evict_all (called from suspend) xe_sched_job_create: dev=0000:03:00.0, ... xe_sched_job_exec: dev=0000:03:00.0, ... xe_pm_runtime_put: dev=0000:03:00.0, ... xe_sched_job_run: dev=0000:03:00.0, ... rpm_usage: 0000:03:00.0 flags-0 cnt-2 ... rpm_usage: 0000:03:00.0 flags-0 cnt-2 ... rpm_usage: 0000:03:00.0 flags-0 cnt-2 ... console: xe 0000:03:00.0: [drm] Missing outer runtime PM protection console: xe_guc_ct_send+0x15/0x50 [xe] console: guc_exec_queue_run_job+0x1509/0x3950 [xe] [snip] console: drm_sched_run_job_work+0x649/0xc20 At this point, BOs are getting evicted from VRAM with rpm usage-counter = 2, but rpm status = SUSPENDING. The xe->pm_callback_task won't be equal 'current' because this call is coming from a work queue. So, pm_runtime_get_if_active() will be called and return 0 because rpm status != ACTIVE (but equal SUSPENDING or RESUMING). v2: Still get the reference even on non suspending/resuming path (Jonathan, Brost). Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240905140215.56404-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-06 12:15:01 -04:00
Riana Tauro	0914c1e45d	drm/xe/xe_gt_idle: add debugfs entry for powergating info Coarse Powergating is a power saving technique where Render and Media can be power-gated independently irrespective of the rest of the GT. For debug purposes, it is useful to expose the powergating information. v2: move to debugfs add details to commit message add per-slice status for media define reg bits in descending order (Matt Roper) v3: fix return statement fix kernel-doc use loop for media slices use helper function for status (Michal) v4: add pg prefix do not wake GT if in C6 (Badal) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240906071126.28078-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-06 12:13:30 -04:00
Riana Tauro	c2bf07dd0b	drm/xe/xe_gt_idle: modify powergate enable condition Modify powergate enable condition based on the type of GT or presence of media engines. Also have a copy of the value written to powergate enable register. v2: add condition to enable render or media powergating (Badal) v3: fix commit message (Shekhar) fix kernel-doc Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240906071126.28078-2-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-06 12:13:30 -04:00
Jani Nikula	5ea28f921a	drm/xe: use IS_ENABLED() instead of defined() on config options Prefer IS_ENABLED() instead of defined() for checking whether a kconfig option is enabled. Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240904145231.3902289-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-09-06 16:32:49 +03:00
Jani Nikula	cdb56a63f7	drm/xe/pciids: separate ARL and MTL PCI IDs Avoid including PCI IDs for one platform to the PCI IDs of another. It's more clear to deal with them completely separately at the PCI ID macro level. Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a30cb0da7694a8eccceba66d676ac59aa0e96176.1725443121.git.jani.nikula@intel.com	2024-09-06 15:40:05 +03:00
Jani Nikula	d454902a69	drm/xe/pciids: separate RPL-U and RPL-P PCI IDs Avoid including PCI IDs for one platform to the PCI IDs of another. It's more clear to deal with them completely separately at the PCI ID macro level. Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4868d36fbfa8c38ea2d490bca82cf6370b8d65dd.1725443121.git.jani.nikula@intel.com	2024-09-06 15:39:55 +03:00
Jani Nikula	ae304b0545	drm/xe/pciids: add some missing ADL-N PCI IDs Similar to commit `425b463859` ("drm/i915: Update ADL-N PCI IDs"). Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/47d543393e4026588401a03c4e3ce12ce29780e3.1725443121.git.jani.nikula@intel.com	2024-09-06 15:39:20 +03:00
Matthew Auld	1ff14648dc	drm/xe/pat: sanity check compression and coh_mode There is an implicit assumption in the driver that compression and coh_1way+ are mutually exclusive. If this is ever not true then userptr and imported dma-buf from external device will have uncleared ccs state. Add a build bug for this so we don't forget. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828092257.169063-2-matthew.auld@intel.com	2024-09-06 09:35:28 +01:00
Matthew Auld	89076b5a8b	drm/xe: prevent potential UAF in pf_provision_vf_ggtt() The node ptr can point to an already freed ptr, if we hit the path with an already allocated node. We later dereference that pointer with: xe_gt_assert(gt, !xe_ggtt_node_allocated(node)); which is a potential UAF. Fix this by not stashing the ptr for node. Also since it is likely a bad idea to leave config->ggtt_region pointing to a stale ptr, also set that to NULL by calling pf_release_vf_config_ggtt() instead of pf_release_ggtt(). Fixes: `34e804220f` ("drm/xe: Make xe_ggtt_node struct independent") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828104341.180111-2-matthew.auld@intel.com	2024-09-06 09:35:05 +01:00
Nitin Gote	cd89de14bb	drm/xe: Replace double space with single space after comma Avoid using double space, ", " in function or macro parameters where it's not required by any alignment purpose. Replace it with a single space, ", ". Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823080643.2461992-1-nitin.r.gote@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-09-05 18:20:00 +02:00
Michal Wajdeczko	13a48a0fa5	drm/xe/pf: Sanitize VF scratch registers on FLR Some VF accessible registers (like GuC scratch registers) must be explicitly reset during the FLR. While this is today done by the GuC firmware, according to the design, this should be responsibility of the PF driver, as future platforms may require more registers to be reset. Likewise GuC, the PF can access VFs registers by adding some platform specific offset to the original register address. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240902192953.1792-1-michal.wajdeczko@intel.com	2024-09-05 18:09:24 +02:00
Thomas Hellström	34bb7b813a	drm/xe: Use xe_pm_runtime_get in xe_bo_move() if reclaim-safe. xe_bo_move() might be called in the TTM swapout path from validation by another TTM device. If so, we are not likely to have a RPM reference. So iff xe_pm_runtime_get() is safe to call from reclaim, use it instead of xe_pm_runtime_get_noresume(). Strictly this is currently needed only if handle_system_ccs is true, but use xe_pm_runtime_get() if possible anyway to increase test coverage. At the same time warn if handle_system_ccs is true and we can't call xe_pm_runtime_get() from reclaim context. This will likely trip if someone tries to enable SRIOV on LNL, without fixing Xe SRIOV runtime resume / suspend. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240903094232.166342-1-thomas.hellstrom@linux.intel.com	2024-09-04 09:28:09 +02:00
Rodrigo Vivi	8da19441d0	drm/xe/display: Avoid encoder_suspend at runtime suspend Fix circular locking dependency on runtime suspend. <4> [74.952215] ====================================================== <4> [74.952217] WARNING: possible circular locking dependency detected <4> [74.952219] 6.10.0-rc7-xe #1 Not tainted <4> [74.952221] ------------------------------------------------------ <4> [74.952223] kworker/7:1/82 is trying to acquire lock: <4> [74.952226] ffff888120548488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_modeset_lock_all+0x40/0x1e0 [drm] <4> [74.952260] but task is already holding lock: <4> [74.952262] ffffffffa0ae59c0 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}, at: xe_pm_runtime_suspend+0x2f/0x340 [xe] <4> [74.952322] which lock already depends on the new lock. The commit 'b1d90a86 ("drm/xe: Use the encoder suspend helper also used by the i915 driver")' didn't do anything wrong. It actually fixed a critical bug, because the encoder_suspend was never getting actually called because it was returning if (has_display(xe)) instead of if (!has_display(xe)). However, this ended up introducing the encoder suspend calls in the runtime routines as well, causing the circular locking dependency. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2304 Fixes: `b1d90a862c` ("drm/xe: Use the encoder suspend helper also used by the i915 driver") Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240830183507.298351-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-03 12:47:00 -04:00
Rodrigo Vivi	bc947d9a8c	drm/xe: Add missing runtime reference to wedged upon gt_reset Fixes this missed case: xe 0000:00:02.0: [drm] Missing outer runtime PM protection WARNING: CPU: 99 PID: 1455 at drivers/gpu/drm/xe/xe_pm.c:564 xe_pm_runtime_get_noresume+0x48/0x60 [xe] Call Trace: <TASK> ? show_regs+0x67/0x70 ? __warn+0x94/0x1b0 ? xe_pm_runtime_get_noresume+0x48/0x60 [xe] ? report_bug+0x1b7/0x1d0 ? handle_bug+0x46/0x80 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? xe_pm_runtime_get_noresume+0x48/0x60 [xe] xe_device_declare_wedged+0x91/0x280 [xe] gt_reset_worker+0xa2/0x250 [xe] v2: Also move get and get the right Fixes tag (Himal, Brost) Fixes: `fb74b205cd` ("drm/xe: Introduce a simple wedged state") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240830183507.298351-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-09-03 12:46:39 -04:00
Michal Wajdeczko	9f6b47907e	drm/xe: Remove redundant [drm] tag from xe_assert() message Since commit `178c0a33c4` ("drm/print: Add generic drm dev printk function") the output from drm_WARN() includes previously missing the [drm] tag, so now xe_assert() is printing it twice: [ ] xe 0000:00:02.0: [drm] [drm] Assertion `false` failed! Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240902190726.1748-1-michal.wajdeczko@intel.com	2024-09-03 11:42:42 +02:00
Michal Wajdeczko	da6ec74339	drm/xe/pf: Reset thresholds when releasing a VF config As part of the VF config release, we should reset all parameters, including thresholds, to always start with the clean VF config. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240830132100.1704-3-michal.wajdeczko@intel.com	2024-09-02 20:51:03 +02:00
Michal Wajdeczko	a1498ab229	drm/xe/pf: Add thresholds to the VF KLV config We are pushing threshold KLV to the GuC immediately during the threshold provisioning, but those configs will be lost during a GT reset. Include threshold KLVs while encoding full VF config buffer to make sure the GuC receives all of the config KLVs. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240830132100.1704-2-michal.wajdeczko@intel.com	2024-09-02 20:51:02 +02:00
Matthew Brost	bf758226c7	drm/xe: Invalidate media_gt TLBs in PT code Testing on LNL has shown media GT's TLBs need to be invalidated via the GuC, update PT code appropriately. v2: - Do dma_fence_get before first call of invalidation_fence_init (Himal) - No need to check for valid chain fence (Himal) v3: - Use dma-fence-array Fixes: `3330361543` ("drm/xe/lnl: Add LNL platform definition") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826170144.2492062-3-matthew.brost@intel.com	2024-08-30 11:41:26 -07:00
Matthew Brost	ddc94d0b17	dma-buf: Split out dma fence array create into alloc and arm functions Useful to preallocate dma fence array and then arm in path of reclaim or a dma fence. v2: - s/arm/init (Christian) - Drop !array warn (Christian) v3: - Fix kernel doc typos (dim) Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826170144.2492062-2-matthew.brost@intel.com	2024-08-30 11:41:05 -07:00
Matt Roper	20f61c1ead	drm/xe/hwmon: Treat hwmon as a per-device concept There's only one instance of hwmon per device, and MMIO access to it is always done through the root tile. The code has been passing around a pointer to the root tile's primary GT, which is confusing since this isn't really a GT-level concept. Replace that pointer with an xe_device pointer and use xe_root_mmio_gt(xe) to get a pointer when we need to do register MMIO. This makes things easier to follow, and also cleans up the code in preparation for a much larger MMIO register access overhaul that's coming soon. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240829220619.789159-6-matthew.d.roper@intel.com	2024-08-30 08:56:23 -07:00
Matt Roper	3034cc8107	drm/xe/pcode: Treat pcode as per-tile rather than per-GT There's only one instance of the pcode per tile, and for GT-related accesses both the primary and media GT share the same register interface. Since Xe was using per-GT locking, the pcode mutex wasn't actually protecting everything that it should since concurrent accesses related to a tile's primary GT and media GT were possible. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240829220619.789159-5-matthew.d.roper@intel.com	2024-08-30 08:56:20 -07:00
Matt Roper	cad08fa776	drm/xe/display: Drop unnecessary xe_gt.h includes None of the Xe display files work directly with the GT or need anything from xe_gt.h. Drop the unnecessary include. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240829230307.886233-2-matthew.d.roper@intel.com	2024-08-30 08:55:55 -07:00
Nirmoy Das	c5f728de69	drm/xe: Fix memory leak on xe_alloc_pf_queue failure Simplify memory unwinding on error also fixing current memory leak that can happen on error. v2: use devm_kcalloc(Matt A) Fixes: `3338e4f90c` ("drm/xe: Use topology to determine page fault queue size") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826162035.20462-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-08-30 11:38:53 +02:00
Michal Wajdeczko	2bd87f0fc2	drm/xe/pf: Improve VF control Our initial VF control implementation was focused on providing a very minimal support for the VF_STATE_NOTIFY events just to meet GuC requirements, without tracking a VF state or doing any expected actions (like cleanup in case of the FLR notification). Try to improve this by defining set of VF state machines, each responsible for processing one activity (PAUSE, RESUME, STOP or FLR). All required steps defined by the VF state machine are then executed by the PF worker from the dedicated workqueue. Any external requests or notifications simply try to transition between the states to trigger a work and then wait for that work to finish. Some predefined default timeouts are used to avoid changing existing API calls, but it should be easy to extend the control API to also accept specific timeout values. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828210809.1528-5-michal.wajdeczko@intel.com	2024-08-30 10:51:09 +02:00
Michal Wajdeczko	d69300abc2	drm/xe/pf: Drop GuC notifications for non-existing VF It is unlikely that GuC will ever send a G2H notification with an invalid VFID and it is currently harmless if that actually happen. But in upcoming patches we will start using that VFID as an index and we must be sure it is a valid to avoid a crash due to a buggy firmware or a currupted G2H message. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828210809.1528-4-michal.wajdeczko@intel.com	2024-08-30 10:51:08 +02:00
Michal Wajdeczko	65fe9617a1	drm/xe/pf: Fix documentation formatting Current formatting of "The VF FLR Flow with GuC" only looks fine, but it will not render properly when included in htmldocs due to: WARNING: Block quote ends without a blank line; unexpected unindent. CRITICAL: Missing matching underline for section title overline. Fix that by adding proper indent and using list markup. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828210809.1528-3-michal.wajdeczko@intel.com	2024-08-30 10:51:07 +02:00
Michal Wajdeczko	16ba2b28df	drm/xe/pf: Add function to sanitize VF resources On current platforms it is a PF driver responsibility to clear some of the VF's resources during a VF FLR. Add simple function that will clear configured VF resources (GGTT, LMEM). We will start using this function soon. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828210809.1528-2-michal.wajdeczko@intel.com	2024-08-30 10:51:06 +02:00
Daniele Ceraolo Spurio	02a416afbe	drm/xe/gsc: Wedge the device if the GSCCS reset fails Due to the special handling of the GSCCS in HW, we can't escalate to GT reset when we receive the reset failure interrupt; the specs indicate that we should trigger an FLR instead, but we do not have support for that at the moment, so the HW will stay permanently in a broken state. We should therefore mark the device as wedged, the same as if the GT reset had failed. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828221457.2752868-1-daniele.ceraolospurio@intel.com	2024-08-29 14:18:52 -07:00
Daniele Ceraolo Spurio	5ee2d63ca1	drm/xe/gsc: Add debugfs to print GSC info This is useful for debug, in case something goes wrong with the GSC. The info includes the version information and the current value of the HECI1 status registers. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-5-daniele.ceraolospurio@intel.com	2024-08-29 10:32:20 -07:00
Daniele Ceraolo Spurio	f7c2ea682d	drm/xe/gsc: Track the platform in the compatibility version The GSC compatibility version number is reset for each new platform. To indicate this, the version includes a number that identifies the platform (102 = MTL, 104 = LNL); this matches what happens for the release version, where the major number also identifies a platform. To make it clearer in our logs that the compatibility version is specific to the platform, it is useful to include this platform number. However, given that our binary names already include the platform, it is not necessary to add this extra number there. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-4-daniele.ceraolospurio@intel.com	2024-08-29 10:32:19 -07:00
Daniele Ceraolo Spurio	7293859c51	drm/xe/gsc: Fix FW status if the firmware is already loaded We set the FW status to "TRANSFERRED" after the load completes and to "RUNNING"once we're done with proxy init, so do the same if we're trying to re-load the FW and it is already loaded. Note that there is no difference in driver behavior between the 2 states, but it's useful to be accurate when we dump the status for debug. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-3-daniele.ceraolospurio@intel.com	2024-08-29 10:32:18 -07:00
Daniele Ceraolo Spurio	2160f6f6e3	drm/xe/gsc: Do not attempt to load the GSC multiple times The GSC HW is only reset by driver FLR or D3cold entry. We don't support the former at runtime, while the latter is only supported on DGFX, for which we don't support GSC. Therefore, if GSC failed to load previously there is no need to try again because the HW is stuck in the error state. An assert has been added so that if we ever add DGFX support we'll know we need to handle the D3 case. v2: use "< 0" instead of "!= 0" in the FW state error check (Julia). Fixes: `dd0e89e5ed` ("drm/xe/gsc: GSC FW load") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828215158.2743994-2-daniele.ceraolospurio@intel.com	2024-08-29 10:32:17 -07:00
Jani Nikula	87d8ecf015	drm/xe: replace #include <drm/xe_drm.h> with <uapi/drm/xe_drm.h> include/drm/xe_drm.h does not exist. Prefer the explicit uapi include. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827091539.4136838-1-jani.nikula@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 15:17:54 -04:00
Karthik Poosa	a7f657097e	drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16 WRITE_I1 sub-command of the POWER_SETUP pcode command accepts a u16 parameter instead of u32. This change prevents potential illegal sub-command errors. v2: Mask uval instead of changing the prototype. (Badal) v3: Rephrase commit message. (Badal) Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: `92d44a422d` ("drm/xe/hwmon: Expose card reactive critical power") Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827155301.183383-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 14:59:26 -04:00
Ilia Levi	aeb4ae66cb	drm/xe: move the kernel lrc from hwe to execlist port The kernel lrc is used solely by the execlist infra. Move it to the execlist port struct and initialize it only when execlists are used. v2: Rebase, improve error handling readability (Jonathan) Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826100655.1719060-1-ilia.levi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 14:50:13 -04:00
Balasubramani Vivekanandan	3adcf970dc	drm/xe/bmg: Drop force_probe requirement Battlemage platform is sufficiently tested and found stable. CI is also pretty stable. Remove the force_probe requirement to enable the platform support by default. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828082152.3194814-1-balasubramani.vivekanandan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-28 10:47:03 -07:00
Himal Prasad Ghimiray	c72084163c	drm/xe: Fix NPD in ggtt_node_remove() Make sure that ggtt_node_remove() is invoked only if both node and ggtt are not null. Move the null checks to the caller function xe_ggtt_node_remove(). v2: Move null check below declarations (Tejas) Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828092229.3606503-1-himal.prasad.ghimiray@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 13:15:40 -04:00
Thomas Hellström	379cad69bd	drm/xe: Use separate rpm lockdep map for non-d3cold-capable devices For non-d3cold-capable devices we'd like to be able to wake up the device from reclaim. In particular, for Lunar Lake we'd like to be able to blit CCS metadata to system at shrink time; at least from kswapd where it's reasonable OK to wait for rpm resume and a preceding rpm suspend. Therefore use a separate lockdep map for such devices and prime it reclaim-tainted. v2: - Rename lockmap acquire- and release functions. (Rodrigo Vivi). - Reinstate the old xe_pm_runtime_lockdep_prime() function and rename it to xe_rpm_might_enter_cb(). (Matthew Auld). - Introduce a separate xe_pm_runtime_lockdep_prime function called from module init for known required locking orders. v3: - Actually hook up the prime function at module init. v4: - Rebase. v5: - Don't use reclaim-safe RPM with sriov. Cc: "Vivi, Rodrigo" <rodrigo.vivi@intel.com> Cc: "Auld, Matthew" <matthew.auld@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826143450.92511-1-thomas.hellstrom@linux.intel.com	2024-08-28 16:29:15 +02:00

1 2 3 4 5 ...

1295319 Commits