linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-02 10:31:26 -04:00

Author	SHA1	Message	Date
Matthew Auld	225bc03d85	drm/xe/evict: drop bogus assert This assert can trigger here with non pin_map users that select LATE_RESTORE, since the vmap is allowed to be NULL given that save/restore can now use the blitter instead. The check here doesn't seem to have much value anymore given that we no longer move pinned memory, so any existing vmap is left well alone, and doesn't need to be recreated upon restore, so just drop the assert here. Fixes: `86f69c2611` ("drm/xe: use backup object for pinned save/restore") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6213 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251010152457.177884-2-matthew.auld@intel.com (cherry picked from commit `a10b4a69c7`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:54 -07:00
Matthew Auld	6a91af25cd	drm/xe/migrate: don't misalign current bytes If current bytes exceeds the max copy size, ensure the clamped size still accounts for the XE_CACHELINE_BYTES alignment, otherwise we trigger the assert in xe_migrate_vram with the size now being out of alignment. Fixes: `8c2d61e0e9` ("drm/xe/migrate: don't overflow max copy size") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com (cherry picked from commit `641bcf8731`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:48 -07:00
Matt Roper	6d36f65ba5	drm/xe/kunit: Fix kerneldoc for parameterized tests Kunit's generate_params() was recently updated to take an additional test context parameter. Xe's IP and platform parameter generators were updated accordingly at the same time, but the new parameter was not added to the functions' kerneldoc, resulting in the following warnings: Warning: drivers/gpu/drm/xe/tests/xe_pci.c:78 function parameter 'test' not described in 'xe_pci_fake_data_gen_params' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:254 function parameter 'test' not described in 'xe_pci_graphics_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:278 function parameter 'test' not described in 'xe_pci_media_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:302 function parameter 'test' not described in 'xe_pci_id_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:390 function parameter 'test' not described in 'xe_pci_live_device_gen_param' 5 warnings as errors Document the new parameter to eliminate the warnings and make CI happy. Fixes: `b9a214b5f6` ("kunit: Pass parameterized test context to generate_params()") Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20251013153014.2362879-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `89e347f8a7`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:43 -07:00
Thomas Hellström	7987b93e3a	drm/xe/svm: Ensure data will be migrated to system if indicated by madvise. If the location madvise() is set to DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, the drm_pagemap in the SVM gpu fault handler will be set to NULL. However there is nothing that explicitly migrates the data to system if it is already present in device memory. In that case, set the device memory owner to NULL to ensure data gets properly migrated to system on page-fault. v2: - Remove redundant dpagemap assignment (Himal Prasad Ghimiray) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251010104149.72783-2-thomas.hellstrom@linux.intel.com Fixes: `10aa5c8060` ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages") (cherry picked from commit `2cfcea7a74`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-15 22:48:37 -07:00
Shuicheng Lin	9f64b3cd05	drm/xe/guc: Check GuC running state before deregistering exec queue In normal operation, a registered exec queue is disabled and deregistered through the GuC, and freed only after the GuC confirms completion. However, if the driver is forced to unbind while the exec queue is still running, the user may call exec_destroy() after the GuC has already been stopped and CT communication disabled. In this case, the driver cannot receive a response from the GuC, preventing proper cleanup of exec queue resources. Fix this by directly releasing the resources when GuC is not running. Here is the failure dmesg log: " [ 468.089581] ---[ end trace 0000000000000000 ]--- [ 468.089608] pci 0000:03:00.0: [drm] ERROR GT0: GUC ID manager unclean (1/65535) [ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535 [ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1 [ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1) [ 468.092716] ------------[ cut here ]------------ [ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe] " v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled(). As CT may go down and come back during VF migration. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com (cherry picked from commit `9b42321a02`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:26 -07:00
Vinay Belgaumkar	1852d27aa9	drm/xe: Enable media sampler power gating Where applicable, enable media sampler power gating. Also, add it to the powergate_info debugfs. v2: Remove the sampler powergate status since it is cleared quickly anyway. v3: Use vcs mask (Rodrigo) and fix the version check for media v4: Remove extra spaces v5: Media samplers are independent of vcs mask, use Media version 1255 (Matt Roper) Fixes: `38e8c4184e` ("drm/xe: Enable Coarse Power Gating") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `4cbc08649a`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:20 -07:00
Matthew Brost	7413e9f2be	drm/xe: Handle mixed mappings and existing VRAM on atomic faults Moving to VRAM will fail if mixed mappings are present or if the page is already located in VRAM. Atomic faults that require a move to VRAM currently retry without attempting to evict mixed mappings or locate existing VRAM mappings. This patch fixes the issue by attempting to evict mixed mappings or find existing VRAM pages when a move to VRAM fails during atomic fault handling. Fixes: `a9ac0fa455` ("drm/xe: Strict migration policy for atomic SVM faults") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com (cherry picked from commit `75188605c5`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:15 -07:00
Thomas Hellström	1117e7d1e8	drm/xe/migrate: Fix an error path The exhaustive eviction accidently changed an error path goto to a return. Fix this. Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `381f1ed151`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:08 -07:00
Lucas De Marchi	d30203739b	drm/xe: Move rebar to be done earlier There may be cases in which the BAR0 also needs to move to accommodate the bigger BAR2. However if it's not released, the BAR2 resize fails. During the vram probe it can't be released as it's already in use by xe_mmio for early register access. Add a new function in xe_vram and let xe_pci call it directly before even early device probe. This allows the BAR2 to resize in cases BAR0 also needs to move, assuming there aren't other reasons to hold that move: [] xe 0000:03:00.0: vgaarb: deactivate vga console [] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned [] pcieport 0000:00:01.0: PCI bridge to [bus 01-04] [] pcieport 0000:00:01.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:00:01.0: bridge window [mem 0x4000000000-0x44007fffff 64bit pref] [] pcieport 0000:01:00.0: PCI bridge to [bus 02-04] [] pcieport 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] pcieport 0000:02:01.0: PCI bridge to [bus 03] [] pcieport 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff] [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] xe 0000:03:00.0: [drm] BAR2 resized to 16384M [] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ... For BMG there are additional fix needed in the PCI side, but this helps getting it to a working resize. All the rebar logic is more pci-specific than xe-specific and can be done very early in the probe sequence. In future it would be good to move it out of xe_vram.c, but this refactor is left for later. Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org # 6.12+ Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `45e33f220f`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:03 -07:00
Matthew Brost	7ac74613e5	drm/xe: Don't allow evicting of BOs in same VM in array of VM binds An array of VM binds can potentially evict other buffer objects (BOs) within the same VM under certain conditions, which may lead to NULL pointer dereferences later in the bind pipeline. To prevent this, clear the allow_res_evict flag in the xe_bo_validate call. v2: - Invert polarity of no_res_evict (Thomas) - Add comment in code explaining issue (Thomas) Cc: stable@vger.kernel.org Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268 Fixes: `774b5fa509` ("drm/xe: Avoid evicting object of the same vm in none fault mode") Fixes: `77f2ef3f16` ("drm/xe: Lock all gpuva ops during VM bind IOCTL") Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com (cherry picked from commit `8b9ba8d6d9`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:02:58 -07:00
Kenneth Graunke	e5ae8d1eb0	drm/xe: Increase global invalidation timeout to 1000us The previous timeout of 500us seems to be too small; panning the map in the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered timeouts within a few seconds of usage, causing the monitor to freeze and the following to be printed to dmesg: [Jul30 13:44] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation timeout [Jul30 13:48] xe 0000:03:00.0: [drm] ERROR [CRTC:82:pipe A] flip_done timed out I haven't hit a single timeout since increasing it to 1000us even after several multi-hour testing sessions. Fixes: `0dd2dd0182` ("drm/xe: Move DSB l2 flush to a more sensible place") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `146046907b`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:02:50 -07:00
Linus Torvalds	284fc30e66	Merge tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel Pull more drm fixes from Dave Airlie: "Just the follow up fixes for rc1 from the next branch, amdgpu and xe mostly with a single v3d fix in there. amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL xe: - Fix build with clang 16 - Fix handling of invalid configfs syntax usage and spell out the expected syntax in the documentation - Do not try late bind firmware when running as VF since it shouldn't handle firmware loading - Fix idle assertion for local BOs - Fix uninitialized variable for late binding - Do not require perfmon_capable to expose free memory at page granularity. Handle it like other drm drivers do - Fix lock handling on suspend error path - Fix I2C controller resume after S3 v3d: - fix fence locking" * tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/amd/display: Incorrect Mirror Cositing drm/amd/display: Enable Dynamic DTBCLK Switch drm/amdgpu: Report individual reset error drm/amdgpu: partially revert "revert to old status lock handling v3" drm/amd/display: Fix unsafe uses of kernel mode FPU drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine drm/amdgpu: Check swus/ds for switch state save drm/amdkfd: Fix two comments in kfd_ioctl.h drm/amd/pm: Avoid interface mismatch messaging drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init drm/amd/amdgpu: Fix the mes version that support inv_tlbs drm/amd: Check whether secure display TA loaded successfully drm/amdkfd: Fix mmap write lock not release drm/amdkfd: Fix kfd process ref leaking when userptr unmapping drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. drm/amd/display: Disable scaling on DCE6 for now drm/amd/display: Properly disable scaling on DCE6 drm/amd/display: Properly clear SCL__FILTER_CONTROL on DCE6 drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT SRIs ...	2025-10-10 14:02:14 -07:00
Raag Jadav	1af59cd5cc	drm/xe/i2c: Don't rely on d3cold.allowed flag in system PM path In S3 and above sleep states, the device can loose power regardless of d3cold.allowed flag. Bring up I2C controller explicitly in system PM path to ensure its normal operation after losing power. v2: Cover S3 and above states (Rodrigo) Fixes: `0ea07b6951` ("drm/xe/pm: Wire up suspend/resume for I2C controller") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250918103200.2952576-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `e4863f1159`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-03 14:17:08 -05:00
Shuicheng Lin	08fdfd260e	drm/xe/hw_engine_group: Fix double write lock release in error path In xe_hw_engine_group_get_mode(), a write lock is acquired before calling switch_mode(), which in turn invokes xe_hw_engine_group_suspend_faulting_lr_jobs(). On failure inside xe_hw_engine_group_suspend_faulting_lr_jobs(), the write lock is released there, and then again in xe_hw_engine_group_get_mode(), leading to a double release. Fix this by keeping both acquire and release operation in xe_hw_engine_group_get_mode(). Fixes: `770bd1d341` ("drm/xe/hw_engine_group: Ensure safe transition between execution modes") Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250925023145.1203004-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `662d98b8b3`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-03 14:17:00 -05:00
Matthew Auld	2d1684a077	drm/xe/uapi: loosen used tracking restriction Currently this is hidden behind perfmon_capable() since this is technically an info leak, given that this is a system wide metric. However the granularity reported here is always PAGE_SIZE aligned, which matches what the core kernel is already willing to expose to userspace if querying how many free RAM pages there are on the system, and that doesn't need any special privileges. In addition other drm drivers seem happy to expose this. The motivation here if with oneAPI where they want to use the system wide 'used' reporting here, so not the per-client fdinfo stats. This has also come up with some perf overlay applications wanting this information. Fixes: `1105ac15d2` ("drm/xe/uapi: restrict system wide accounting") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Joshua Santosh <joshua.santosh.ranjan@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250919122052.420979-2-matthew.auld@intel.com (cherry picked from commit `4d0b035fd6`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-03 14:16:55 -05:00
Mallesh Koujalagi	6982a462cb	drm/xe/xe_late_bind_fw: Initialize uval variable in xe_late_bind_fw_num_fans() Initialize the uval variable to 0 in xe_late_bind_fw_num_fans() to fix a potential use of uninitialized variable warning and ensure predictable behavior. The variable is passed by reference to xe_pcode_read() which should populate it on success, but initializing it to 0 provides a safe default value and follows kernel coding best practices. v2: - uval = 0 which serves as both a safe default and the fallback value when the pcode read operation fails. v3: - Handle MMIO failure (Rodrigo) - The function should probably return the error and make the uval as pointer-argument, like the pcode_read. - Change the caller of this function to propagate the error upwards if mmio failed. Fixes: `45832bf9c1` ("drm/xe/xe_late_bind_fw: Initialize late binding firmware") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Link: https://lore.kernel.org/r/20251002005648.3185636-1-mallesh.koujalagi@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `07abc16c14`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Thomas Hellström	10aa5c8060	drm/gpusvm, drm/xe: Fix userptr to not allow device private pages When userptr is used on SVM-enabled VMs, a non-NULL hmm_range::dev_private_owner value might mean that hmm_range_fault() attempts to return device private pages. Either that will fail, or the userptr code will not know how to handle those. Use NULL for hmm_range::dev_private_owner to migrate such pages to system. In order to do that, move the struct drm_gpusvm::device_private_page_owner field to struct drm_gpusvm_ctx::device_private_page_owner so that it doesn't remain immutable over the drm_gpusvm lifetime. v2: - Don't conditionally compile xe_svm_devm_owner(). - Kerneldoc xe_svm_devm_owner(). Fixes: `9e97874148` ("drm/xe/userptr: replace xe_hmm with gpusvm") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250930122752.96034-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `ad298d9ec9`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Colin Ian King	5b440a8bba	drm/xe/xe_late_bind_fw: Fix missing initialization of variable offset The variable offset is not being initialized, and it is only set inside a for-loop if entry->name is the same as manifest_entry. In the case where it is not initialized a non-zero check on offset is potentialy checking a bogus uninitalized value. Fix this by initializing offset to zero. Fixes: `efa29317a5` ("drm/xe/xe_late_bind_fw: Extract and print version info") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250924102208.9216-1-colin.i.king@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `20f3b28e2e`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Thomas Hellström	17f6f6f25a	drm/xe/bo: Fix an idle assertion for local bos Before calling ttm_bo_populate() in the CPU fault path of a bo, we assert that the bo is not being migrated. However, for local bos we share the reservation object with other local bos that might be in the process of being migrated. Also some VM operations may attach USAGE_KERNEL fences to the common reservation object and trigger false positives from the assert. So remove the assert and instead wait for bo idle. This may unnecessarily wait for idle in some cases but since we're doing this wait later in the fault path anyway we might as well do it here as well. This fixes warnings like: Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------ Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) \|\| (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed! platform: BATTLEMAGE subplatform: 1 graphics: Xe2_HPG 20.01 step A0 media: Xe2_HPM 13.01 step A1 Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe] Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr> Sep 25 14:56:23 desky kernel: snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor> Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G U W 6.17.0-rc7+ #32 PREEMPT(voluntary) Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022 Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe] Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4> Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286 Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027 Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0 Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000 Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8 Sep 25 14:56:23 desky kernel: FS: 00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000 Sep 25 14:56:23 desky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0 Sep 25 14:56:23 desky kernel: PKRU: 55555558 Sep 25 14:56:23 desky kernel: Call Trace: Sep 25 14:56:23 desky kernel: <TASK> Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe] Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault+0x84/0x410 [xe] Sep 25 14:56:23 desky kernel: ? __x64_sys_mmap+0x33/0x50 Sep 25 14:56:23 desky kernel: ? x64_sys_call+0x1b2e/0x20d0 Sep 25 14:56:23 desky kernel: ? do_syscall_64+0x9d/0x1f0 Sep 25 14:56:23 desky kernel: ? __check_object_size+0x4a/0x2e0 Sep 25 14:56:23 desky kernel: __do_fault+0x36/0x190 Sep 25 14:56:23 desky kernel: do_fault+0xcf/0x570 Sep 25 14:56:23 desky kernel: __handle_mm_fault+0x92b/0xfe0 Sep 25 14:56:23 desky kernel: ? ktime_get_mono_fast_ns+0x39/0xd0 Sep 25 14:56:23 desky kernel: handle_mm_fault+0x164/0x2c0 Sep 25 14:56:23 desky kernel: do_user_addr_fault+0x2cb/0x840 Sep 25 14:56:23 desky kernel: exc_page_fault+0x75/0x180 Sep 25 14:56:23 desky kernel: asm_exc_page_fault+0x27/0x30 Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7 Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7> Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207 Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000 Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020 Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000 Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240 Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000 Sep 25 14:56:23 desky kernel: </TASK> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250 Fixes: `c2ae94cf8c` ("drm/xe: Convert the CPU fault handler for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250929112649.6131-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `8f1756a7ea`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Michal Wajdeczko	3734c8184d	drm/xe/vf: Don't claim support for firmware late-bind if VF In general, the VFs can't load firmwares so attempt to initialize the firmware late-bind component leads to errors like: [] xe 0000:03:00.1: [drm] ERROR Late bind component not bound Fixes: `918bd789d6` ("drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6190 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250928174811.198933-3-michal.wajdeczko@intel.com (cherry picked from commit `e35e288090`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Michal Wajdeczko	de61d5944c	drm/xe/vf: Rename sriov_update_device_info This is a VF only function and its name should reflect that to avoid any confusion. Move the VF check to the caller side. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250928174811.198933-2-michal.wajdeczko@intel.com (cherry picked from commit `b88bb1eefa`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:51 -07:00
Lucas De Marchi	59b7ed0ba2	drm/xe/configfs: Improve doc for ctx_restore* attributes Spell out the syntax instead of only using examples. Particularly important the <engine-class> part since that's different than engines_allowed and may confuse users. The same batch buffer is used for all engines of a certain class. Cc: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Fixes: `e2a9854d80` ("drm/xe/configfs: Allow to select by class only") Link: https://lore.kernel.org/r/20250924152709.659483-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `47ca7acff4`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:51 -07:00
Lucas De Marchi	7646423c7f	drm/xe/configfs: Fix engine class parsing If mask is NULL, only the engine class should be accepted, so the pattern string should be completely parsed. This should fix passing e.g. rcs0 to ctx_restore_post_bb when it's only expecting the engine class. Reported-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Closes: https://lore.kernel.org/r/20250922155544.67712-1-jonathan.cavitt@intel.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/aNJKnrCQmL9xS9Gv@stanley.mountain Fixes: `e2a9854d80` ("drm/xe/configfs: Allow to select by class only") Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250924152709.659483-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `dd79796716`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:51 -07:00
Michal Wajdeczko	7bd03e3914	drm/xe/tests: Fix build break on clang 16.0.6 The following error was reported when building with clang 16.0.6: In file included from drivers/gpu/drm/xe/xe_pci.c:1104: >> drivers/gpu/drm/xe/tests/xe_pci.c:214:2: error: initializer \ element is not a compile-time constant graphics_ip_xelp, ^~~~~~~~~~~~~~~~ drivers/gpu/drm/xe/tests/xe_pci.c:221:2: error: initializer \ element is not a compile-time constant media_ip_xem, ^~~~~~~~~~~~ 2 errors generated. Fix that by explicit re-definition of pre-GMDID IPs, as there are not so many of them. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509192041.tQwdE4DS-lkp@intel.com/ Fixes: `5bb5258e35` ("drm/xe/tests: Add pre-GMDID IP descriptors to param generators") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250922101207.192028-1-michal.wajdeczko@intel.com (cherry picked from commit `2de80e2da7`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:51 -07:00
Linus Torvalds	58809f614e	Merge tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "cross-subsystem: - i2c-hid: Make elan touch controllers power on after panel is enabled - dt bindings for STM32MP25 SoC - pci vgaarb: use screen_info helpers - rust pin-init updates - add MEI driver for late binding firmware update/load uapi: - add ioctl for reassigning GEM handles - provide boot_display attribute on boot-up devices core: - document DRM_MODE_PAGE_FLIP_EVENT - add vendor specific recovery method to drm device wedged uevent gem: - Simplify gpuvm locking ttm: - add interface to populate buffers sched: - Fix race condition in trace code atomic: - Reallow no-op async page flips display: - dp: Fix command length video: - Improve pixel-format handling for struct screen_info rust: - drop Opaque<> from ioctl args - Alloc: - BorrowedPage type and AsPageIter traits - Implement Vmalloc::to_page() and VmallocPageIter - DMA/Scatterlist: - Add dma::DataDirection and type alias for dma_addr_t - Abstraction for struct scatterlist and sg_table - DRM: - simplify use of generics - add DriverFile type alias - drop Object::SIZE - Rust: - pin-init tree merge - Various methods for AsBytes and FromBytes traits gpuvm: - Support madvice in Xe driver gpusvm: - fix hmm_pfn_to_map_order usage in gpusvm bridge: - Improve and fix ref counting on bridge management - cdns-dsi: Various improvements to mode setting - Support Solomon SSD2825 plus DT bindings - Support Waveshare DSI2DPI plus DT bindings - Support Content Protection property - display-connector: Improve DP display detection - Add support for Radxa Ra620 plus DT bindings - adv7511: Provide SPD and HDMI infoframes - it6505: Replace crypto_shash with sha() - synopsys: Add support for DW DPTX Controller plus DT bindings - adv7511: Write full Audio infoframe - ite6263: Support vendor-specific infoframes - simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings panel: - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64; Support SHP LQ134Z1; Fixes - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings - Support Samsung AMS561RA01 - Support Hydis HV101HD1 plus DT bindings - ilitek-ili9881c: Refactor mode setting; Add support for Bestar BSD1218-A101KL68 LCD plus DT bindings - lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings - edp: Add support for additonal mt8189 Chromebook panels - lvds: Add DT bindings for EDT ETML0700Z8DHA amdgpu: - add CRIU support for gem objects - RAS updates - VCN SRAM load fixes - EDID read fixes - eDP ALPM support - Documentation updates - Rework PTE flag generation - DCE6 fixes - VCN devcoredump cleanup - MMHUB client id fixes - VCN 5.0.1 RAS support - SMU 13.0.x updates - Expanded PCIe DPC support - Expanded VCN reset support - VPE per queue reset support - give kernel jobs unique id for tracing - pre-populate exported buffers - cyan skillfish updates - make vbios build number available in sysfs - userq updates - HDCP updates - support MMIO remap page as ttm pool - JPEG parser updates - DCE6 DC updates - use devm for i2c buses - GPUVM locking updates - Drop non-DC DCE11 code - improve fallback handling for pixel encoding amdkfd: - SVM/page migration fixes - debugfs fixes - add CRIO support for gem objects - SVM updates radeon: - use dev_warn_once in CS parsers xe: - add madvise interface - add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count and memory attributes - drop L# bank mask reporting from media GT3 on Xe3+. - add SLPC power_profile sysfs interface - add configs attribs to add post/mid context-switch commands - handle firmware reported hardware errors notifying userspace with device wedged uevent - use same dir structure across sysfs/debugfs - cleanup and future proof vram region init - add G-states and PCI link states to debugfs - Add SRIOV support for CCS surfaces on Xe2+ - Enable SRIOV PF mode by default on supported platforms - move flush to common code - extended core workarounds for Xe2/3 - use DRM scheduler for delayed GT TLB invalidations - configs improvements and allow VF device enablement - prep work to expose mmio regions to userspace - VF migration support added - prepare GPU SVM for THP migration - start fixing XE_PAGE_SIZE vs PAGE_SIZE - add PSMI support for hw validation - resize VF bars to max possible size according to number of VFs - Ensure GT is in C0 during resume - pre-populate exported buffers - replace xe_hmm with gpusvm - add more SVM GT stats to debugfs - improve fake pci and WA kunnit handle for new platform testing - Test GuC to GuC comms to add debugging - use attribute groups to simplify sysfs registration - add Late Binding firmware code to interact with MEI i915: - apply multiple JSL/EHL/Gen7/Gen6 workarounds properly - protect against overflow in active_engine() - Use try_cmpxchg64() in __active_lookup() - include GuC registers in error state - get rid of dev->struct_mutex - iopoll: generalize read_poll_timout - lots more display refactoring - Reject HBR3 in any eDP Panel - Prune modes for YUV420 - Display Wa fix, additions, and updates - DP: Fix 2.7 Gbps link training on g4x - DP: Adjust the idle pattern handling - DP: Shuffle the link training code a bit - Don't set/read the DSI C clock divider on GLK - Enable_psr kernel parameter changes - Type-C enabled/disconnected dp-alt sink - Wildcat Lake enabling - DP HDR updates - DRAM detection - wait PSR idle on dsb commit - Remove FBC modulo 4 restriction for ADL-P+ - panic: refactor framebuffer allocation habanalabs: - debug/visibility improvements - vmalloc-backed coherent mmap support - HLDIO infrastructure nova-core: - various register!() macro improvements - minor vbios/firmware fixes/refactoring - advance firmware boot stages; process Booter and patch signatures - process GSP and GSP bootloader - Add r570.144 firmware bindings and update to it - Move GSP boot code to own module - Use new pin-init features to store driver's private data in a single allocation - Update ARef import from sync::aref nova-drm: - Update ARef import from sync::aref tyr: - initial driver skeleton for a rust driver for ARM Mali GPUs - capable of powering up, query metadata and provide it to userspace. msm: - GPU and Core: - in DT bindings describe clocks per GPU type - GMU bandwidth voting for x1-85 - a623/a663 speedbins - cleanup some remaining no-iommu leftovers after VM_BIND conversion - fix GEM obj 32b size truncation - add missing VM_BIND param validation - IFPC for x1-85 and a750 - register xml and gen_header.py sync from mesa - Display: - add missing bindings for display on SC8180X - added DisplayPort MST bindings - conversion from round_rate() to determine_rate() amdxdna: - add IOCTL_AMDXDNA_GET_ARRAY - support user space allocated buffers - streamline PM interfaces - Refactoring wrt. hardware contexts - improve error reporting nouveau: - use GSP firmware by default - improve error reporting - Pre-populate exported buffers ast: - Clean up detection of DRAM config exynos: - add DSIM bridge driver support for Exynos7870 - Document Exynos7870 DSIM compatible in dt-binding panthor: - Print task/pid on errors - Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25 - Improve cache flushing - Fail VM bind if BO has offset renesas: - convert to RUNTIME_PM_OPS rcar-du: - Make number of lanes configurable - Use RUNTIME_PM_OPS - Add support for DSI commands rocket: - Add driver for Rockchip NPU plus DT bindings - Use kfree() and sizeof() correctly - Test DMA status rockchip: - dsi2: Add support for RK3576 plus DT bindings - Add support for RK3588 DPTX output tidss: - Use crtc_ fields for programming display mode - Remove other drivers from aperture pixpaper: - Add support for Mayqueen Pixpaper plus DT bindings v3d: - Support querying nubmer of GPU resets for KHR_robustness stm: - Clean up logging - ltdc: Add support support for STM32MP257F-EV1 plus DT bindings sitronix: - st7571-i2c: Add support for inverted displays and 2-bit grayscale tidss: - Convert to kernel's FIELD_ macros vesadrm: - Support 8-bit palette mode imagination: - Improve power management - Add support for TH1520 GPU - Support Risc-V architectures v3d: - Improve job management and locking vkms: - Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x - Spport YUV with 16-bit components" * tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits) drm/amd: Add name to modes from amdgpu_connector_add_common_modes() drm/amd: Drop some common modes from amdgpu_connector_add_common_modes() drm/amdgpu: update MODULE_PARM_DESC for freesync_video drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes() drm/amd/display: Share dce100_validate_global with DCE6-8 drm/amd/display: Share dce100_validate_bandwidth with DCE6-8 drm/amdgpu: Fix fence signaling race condition in userqueue amd/amdkfd: enhance kfd process check in switch partition amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw drm/amd/display: Reject modes with too high pixel clock on DCE6-10 drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes() drm/amd/display: Only enable common modes for eDP and LVDS drm/amdgpu: remove the redeclaration of variable i drm/amdgpu/userq: assign an error code for invalid userq va drm/amdgpu: revert "rework reserved VMID handling" v2 drm/amdgpu: remove leftover from enforcing isolation by VMID drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails accel/habanalabs: add Infineon version check accel/habanalabs/gaudi2: read preboot status after recovering from dirty state accel/habanalabs: add HL_GET_P_STATE passthrough type ...	2025-10-02 12:47:25 -07:00
Linus Torvalds	30bbcb4470	Merge tag 'linux_kselftest-kunit-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit updates from Shuah Khan: - New parameterized test features KUnit parameterized tests supported two primary methods for getting parameters: - Defining custom logic within a generate_params() function. - Using the KUNIT_ARRAY_PARAM() and KUNIT_ARRAY_PARAM_DESC() macros with a pre-defined static array and passing the created _gen_params() to KUNIT_CASE_PARAM(). These methods present limitations when dealing with dynamically generated parameter arrays, or in scenarios where populating parameters sequentially via generate_params() is inefficient or overly complex. These limitations are fixed with a parameterized test method - Fix issues in kunit build artifacts cleanup - Fix parsing skipped test problem in kselftest framework - Enable PCI on UML without triggering WARN() - a few other fixes and adds support for new configs such as MIPS tag 'linux_kselftest-kunit-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Extend kconfig help text for KUNIT_UML_PCI rust: kunit: allow `cfg` on `test`s kunit: qemu_configs: Add MIPS configurations kunit: Enable PCI on UML without triggering WARN() Documentation: kunit: Document new parameterized test features kunit: Add example parameterized test with direct dynamic parameter array setup kunit: Add example parameterized test with shared resource management using the Resource API kunit: Enable direct registration of parameter arrays to a KUnit test kunit: Pass parameterized test context to generate_params() kunit: Introduce param_init/exit for parameterized test context management kunit: Add parent kunit for parameterized test context kunit: tool: Accept --raw_output=full as an alias of 'all' kunit: tool: Parse skipped tests from kselftest.h kunit: Always descend into kunit directory during build	2025-10-01 19:15:11 -07:00
Thomas Hellström	77c8ede611	drm/xe: Don't copy pinned kernel bos twice on suspend We were copying the bo content the bos on the list "xe->pinned.late.kernel_bo_present" twice on suspend. Presumingly the intent is to copy the pinned external bos on the first pass. This is harmless since we (currently) should have no pinned external bos needing copy since a) exernal system bos don't have compressed content, b) We do not (yet) allow pinning of VRAM bos. Still, fix this up so that we copy pinned external bos on the first pass. We're about to allow bos pinned in VRAM. Fixes: `c6a4d46ec1` ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250918092207.54472-2-thomas.hellstrom@linux.intel.com (cherry picked from commit `9e69bafece`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-22 12:13:18 -04:00
Lucas De Marchi	b67e7422d2	drm/xe: Fix build with CONFIG_MODULES=n When building with CONFIG_MODULES=n, the __exit functions are dropped. However our init functions may call them for error handling, so they are not good candidates for the exit sections. Fix this error reported by 0day: ld.lld: error: relocation refers to a symbol in a discarded section: xe_configfs_exit >>> defined in vmlinux.a(drivers/gpu/drm/xe/xe_configfs.o) >>> referenced by xe_module.c >>> drivers/gpu/drm/xe/xe_module.o:(init_funcs) in archive vmlinux.a This is the only exit function using __exit. Drop it to fix the build. Cc: Riana Tauro <riana.tauro@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506092221.1FmUQmI8-lkp@intel.com/ Fixes: `16280ded45` ("drm/xe: Add configfs to enable survivability mode") Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250912-fix-nomodule-build-v1-1-d11b70a92516@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `d9b2623319`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-22 12:13:14 -04:00
Michal Wajdeczko	500dad428e	drm/xe/vf: Don't expose sysfs attributes not applicable for VFs VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE (already guarded by the info.skip_pcode flag) so we shouldn't expose attributes that require any of them to avoid errors like: [] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \ inaccessible register 0x138340+0x0 [] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe] [] Call Trace: [] xe_mmio_read32+0x110/0x280 [xe] [] auto_link_downgrade_capable_show+0x2e/0x70 [xe] [] dev_attr_show+0x1a/0x70 [] sysfs_kf_seq_show+0xaa/0x120 [] kernfs_seq_show+0x41/0x60 Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Fixes: `cdc36b66cd` ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com (cherry picked from commit `a2d6223d22`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-22 12:13:08 -04:00
Dave Airlie	0faeb8cf99	Merge tag 'drm-xe-next-2025-09-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Drop L3 bank mask reporting from the media GT on Xe3 and later. Only do that for the primary GT. No userspace needs or uses it for media and some platforms may report bogus values. - Add SLPC power_profile sysfs interface with support for base and power_saving modes (Vinay Belgaumkar, Rodrigo Vivi) - Add configfs attributes to add post/mid context-switch commands (Lucas De Marchi) Cross-subsystem Changes: - Fix hmm_pfn_to_map_order() usage in gpusvm and refactor APIs to align with pieces previous handled by xe_hmm (Matthew Auld) Core Changes: - Add MEI driver for Late Binding Firmware Update/Upload (Alexander Usyskin) Driver Changes: - Fix GuC CT teardown wrt TLB invalidation (Satyanarayana) - Fix CCS save/restore on VF (Satyanarayana) - Increase default GuC crash buffer size (Zhanjun) - Allow to clear GT stats in debugfs to aid debugging (Matthew Brost) - Add more SVM GT stats to debugfs (Matthew Brost) - Fix error handling in VMA attr query (Himal) - Move sa_info in debugfs to be per tile (Michal Wajdeczko) - Limit number of retries upon receiving NO_RESPONSE_RETRY from GuC to avoid endless loop (Michal Wajdeczko) - Fix configfs handling for survivability_mode undoing user choice when unbinding the module (Michal Wajdeczko) - Refactor configfs attribute visibility to future-proof it and stop exposing survivability_mode if not applicable (Michal Wajdeczko) - Constify some functions (Harish Chegondi, Michal Wajdeczko) - Add/extend more HW workarounds for Xe2 and Xe3 (Harish Chegondi, Tangudu Tilak Tirumalesh) - Replace xe_hmm with gpusvm (Matthew Auld) - Improve fake pci and WA kunit handling for testing new platforms (Michal Wajdeczko) - Reduce unnecessary PTE writes when migrating (Sanjay Yadav) - Cleanup GuC interface definitions and log message (John Harrison) - Small improvements around VF CCS (Michal Wajdeczko) - Enable bus mastering for the I2C controller (Raag Jadav) - Prefer devm_mutex of hand rolling it (Christophe JAILLET) - Drop sysfs and debugfs attributes not available for VF (Michal Wajdeczko) - GuC CT devm actions improvements (Michal Wajdeczko) - Recommend new GuC versions for PTL and BMG (Julia Filipchuk) - Improveme driver handling for exhaustive eviction using new xe_validation wrapper around drm_exec (Thomas Hellström) - Add and use printk wrappers for tile and device (Michal Wajdeczko) - Better document workaround handling in Xe (Lucas De Marchi) - Improvements on ARRAY_SIZE and ERR_CAST usage (Lucas De Marchi, Fushuai Wang) - Align CSS firmware headers with the GuC APIs (John Harrison) - Test GuC to GuC (G2G) communication to aid debug in pre-production firmware (John Harrison) - Bail out driver probing if GuC fails to load (John Harrison) - Allow error injection in xe_pxp_exec_queue_add() (Daniele Ceraolo Spurio) - Minor refactors in xe_svm (Shuicheng Lin) - Fix madvise ioctl error handling (Shuicheng Lin) - Use attribute groups to simplify sysfs registration (Michal Wajdeczko) - Add Late Binding Firmware implementation in Xe to work together with the MEI component (Badal Nilawar, Daniele Ceraolo Spurio, Rodrigo Vivi) - Fix build with CONFIG_MODULES=n (Lucas De Marchi) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/c2et6dnkst2apsgt46dklej4nprqdukjosb55grpaknf3pvcxy@t7gtn3hqtp6n	2025-09-22 08:21:42 +10:00
Lucas De Marchi	d9b2623319	drm/xe: Fix build with CONFIG_MODULES=n When building with CONFIG_MODULES=n, the __exit functions are dropped. However our init functions may call them for error handling, so they are not good candidates for the exit sections. Fix this error reported by 0day: ld.lld: error: relocation refers to a symbol in a discarded section: xe_configfs_exit >>> defined in vmlinux.a(drivers/gpu/drm/xe/xe_configfs.o) >>> referenced by xe_module.c >>> drivers/gpu/drm/xe/xe_module.o:(init_funcs) in archive vmlinux.a This is the only exit function using __exit. Drop it to fix the build. Cc: Riana Tauro <riana.tauro@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506092221.1FmUQmI8-lkp@intel.com/ Fixes: `16280ded45` ("drm/xe: Add configfs to enable survivability mode") Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250912-fix-nomodule-build-v1-1-d11b70a92516@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-19 05:41:01 -07:00
Dave Airlie	748f41f353	Merge tag 'drm-intel-next-2025-09-12' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next Cross-subsystem Changes: - Overflow: add range_overflows and range_end_overflows (Jani) Core Changes: - Get rid of dev->struct_mutex (Luiz) Non-display related: - GVT: Remove redundant ternary operators (Liao) - Various i915_utils clean-ups (Jani) Display related: - Wait PSR idle before on dsb commit (Jouni) - Fix size for for_each_set_bit() in abox iteration (Jani) - Abstract figuring out encoder name (Jani) - Remove FBC modulo 4 restriction for ADL-P+ (Uma) - Panic: refactor framebuffer allocation (Jani) - Backlight luminance control improvements (Suraj, Aaron) - Add intel_display_device_present (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/aMxX_lBxm7wd5wmi@intel.com	2025-09-19 13:02:46 +10:00
Lucas De Marchi	b30d5de3d4	drm/xe/configfs: Add mid context restore bb Like done for post context restore, allow the user to add commands to the middle of context restore, at the beginning of engine restore commands. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-7-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	7a4756b2fd	drm/xe/lrc: Allow to add user commands mid context switch Like done for post-context-restore commands, allow to add commands from configfs in the middle of context restore. Since currently the indirect ctx hardcodes the offset to CTX_INDIRECT_CTX_OFFSET_DEFAULT, this is executed in the very beginning of engine context restore. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-6-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	c9dfd66cb9	drm/xe/lrc: Allow INDIRECT_CTX for more engine classes Currently it's only allowed for render and compute. Going forward we want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor for its availability. While at it, add the missing const to rcs_funcs array. Since CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there is no change in behavior, it's only preparation for future use case. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	39ac06f700	drm/xe/configfs: Add post context restore bb Allow the user to specify commands to execute during a context restore. Currently it's possible to parse 2 types of actions: - cmd: the instructions are added as is to the bb - reg: just use the address and value, without worrying about encoding the right LRI instruction. This is possibly the most useful use case, so added a dedicated action for that. This also prepares for future BBs: mid context restore and rc6 context restore that can re-use the same parsing functions. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-4-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	6c6988c5e0	drm/xe/lrc: Allow to add user commands on context switch During validation it's useful to allows additional commands to be executed on context switch. Fetch the commands from configfs (to be added) and add them to the WA BB. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-3-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	e2a9854d80	drm/xe/configfs: Allow to select by class only For a future configfs attribute, it's desirable to select by engine mask only as the instance doesn't make sense. Rename the function lookup_engine_mask() to lookup_engine_info() and make it return the entry. This allows parse_engine() to still return an item if the caller wants to allow parsing a class-only string like "rcs", "bcs", "ccs", etc. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-2-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:39 -07:00
Lucas De Marchi	7166cc3a6a	drm/xe/configfs: Extract function to parse engine Move the part that copies the engine to a local buffer so it can be shared in future for other configfs attributes parsing an engine. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-1-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 14:20:38 -07:00
Badal Nilawar	efa29317a5	drm/xe/xe_late_bind_fw: Extract and print version info Extract and print version info of the late binding binary. v2: Some refinements (Daniele) Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-10-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	67de7982d5	drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding Introduce a debug filesystem node to disable late binding fw reload during the system or runtime resume. This is intended for situations where the late binding fw needs to be loaded from user mode, perticularly for validation purpose. Note that xe kmd doesn't participate in late binding flow from user space. Binary loaded from the userspace will be lost upon entering to D3 cold hence user space app need to handle this situation. v2: - s/(uval == 1) ? true : false/!!uval/ (Daniele) v3: - Refine the commit message (Daniele) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-9-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	02f52f6d92	drm/xe/xe_late_bind_fw: Reload late binding fw during system resume Reload late binding fw during resume from system suspend v2: - Unconditionally reload late binding fw (Rodrigo) - Flush worker during system suspend Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-8-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	69ac1bb8fc	drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume Reload late binding fw during runtime resume. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-7-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	691a54ad94	drm/xe/xe_late_bind_fw: Load late binding firmware Load late binding firmware v2: - s/EAGAIN/EBUSY/ - Flush worker in suspend and driver unload (Daniele) v3: - Use retry interval of 6s, in steps of 200ms, to allow other OS components release MEI CL handle (Sasha) v4: - return -ENODEV if component not added (Daniele) - parse and print status returned by csc v5: - Use payload to check firmware valid (Daniele) - Obtain the RPM reference before scheduling the worker to ensure the device remains awake until the worker completes firmware loading (Rodrigo) v6: - In case of error donot re-attempt fw download (Daniele) v7 (Rodrigo): - Rename of mei structs and callback. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-6-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	45832bf9c1	drm/xe/xe_late_bind_fw: Initialize late binding firmware Search for late binding firmware binaries and populate the meta data of firmware structures. v2 (Daniele): - drm_err if firmware size is more than max pay load size - s/request_firmware/firmware_request_nowarn/ as firmware will not be available for all possible cards v3 (Daniele): - init firmware from within xe_late_bind_init, propagate error - switch late_bind_fw to array to handle multiple firmware types v4 (Daniele): - Alloc payload dynamically, fix nits v6 (Daniele) - %s/MAX_PAYLOAD_SIZE/XE_LB_MAX_PAYLOAD_SIZE/ Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-5-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:01 -07:00
Badal Nilawar	918bd789d6	drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw Introduce xe_late_bind_fw to enable firmware loading for the devices, such as the fan controller, during the driver probe. Typically, firmware for such devices are part of IFWI flash image but can be replaced at probe after OEM tuning. This patch binds mei late binding component to enable firmware loading. v2: - Add devm_add_action_or_reset to remove the component (Daniele) - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele) v3: - Fail driver probe if late bind initialization fails, add has_late_bind flag (Daniele) v4: - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/ v6: - rebased v7: - rebased - In xe_late_bind_init, use drm_err when returning an error to stop the probe (Lucas) - Use imperative mode in commit message (Lucas) Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-18 09:32:00 -07:00
Thomas Hellström	187e16f69d	drm/xe: Work around clang multiple goto-label error When using drm_exec_retry_on_contention(), clang may consider all labels for which we take addresses in a function as potential retry goto targets, although strictly only one is possible. It will then in some situations generate false positive errors. In this case, the compiler, for some architectures, consider the might_lock(&m->job_mutex); as a potential goto target from drm_exec_retry_on_contention(), and errors. Work around that by moving the xe_validate / drm_exec transaction to a separate function. v2: - New commit message based on analysis of Nathan Chancellor Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509101853.nDmyxTEM-lkp@intel.com/ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Link: https://lore.kernel.org/r/20250911080324.180307-1-thomas.hellstrom@linux.intel.com	2025-09-18 08:27:00 +02:00
Daniele Ceraolo Spurio	26caeae9fb	drm/xe/guc: Set RCS/CCS yield policy All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: `d9a1ae0d17` ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com (cherry picked from commit `8843444843`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo added #include "xe_guc_submit.h" while backporting]	2025-09-17 20:23:47 -04:00
Michal Wajdeczko	fb3c27a69c	drm/xe/sysfs: Simplify sysfs registration Instead of manually maintaining each sysfs file define and use attribute groups and register them using device managed function. Then use is_visible() to filter-out unsupported attributes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-3-michal.wajdeczko@intel.com	2025-09-17 21:59:31 +02:00
Michal Wajdeczko	a2d6223d22	drm/xe/vf: Don't expose sysfs attributes not applicable for VFs VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE (already guarded by the info.skip_pcode flag) so we shouldn't expose attributes that require any of them to avoid errors like: [] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \ inaccessible register 0x138340+0x0 [] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe] [] Call Trace: [] xe_mmio_read32+0x110/0x280 [xe] [] auto_link_downgrade_capable_show+0x2e/0x70 [xe] [] dev_attr_show+0x1a/0x70 [] sysfs_kf_seq_show+0xaa/0x120 [] kernfs_seq_show+0x41/0x60 Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Fixes: `cdc36b66cd` ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com	2025-09-17 21:59:29 +02:00

1 2 3 4 5 ...

4112 Commits