linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-02 05:13:11 -04:00

Author	SHA1	Message	Date
Matt Roper	4f3ecdb6ea	drm/xe: Bypass Wa_14018094691 when primary GT is disabled Don't try to lookup Wa_14018094691 on a NULL GT when the primary GT is disabled. Since this whole workaround centers around mid-thread preemption behavior, the workaround isn't relevant if the primary GT (where the engines that can do MTP live) is disabled. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-42-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:45:17 -07:00
Matt Roper	794e735cb6	drm/xe/rtp: Pass xe_device parameter to FUNC matches FUNC matches in RTP only pass the GT and hwe, preventing them from being used effectively in device workarounds. Add an additional xe_device parameter so that we can use them in device workarounds where a GT may not be available. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-41-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:45:17 -07:00
Matt Roper	78de8f8766	drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workarounds When Wa_22010954014 and Wa_14022085890 were first implemented, we didn't have a device workaround infrastructure so we hacked them into the GT workaround list. Now that we have proper device workaround support, move them to the proper place. Note that Wa_14022085890 specifically applies to BMG-G21 platforms, so this requires defining a BMG subplatform to capture the correct subset of device IDs. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-40-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:45:17 -07:00
Matt Roper	4d29240682	drm/xe/irq: Don't try to lookup engine masks for non-existent primary GT If the primary GT is disabled via configfs, we shouldn't try to access it to lookup BCS/CCS engine masks. For the purposes of IRQ reset (which masks & disables interrupts in an sgunit register), assume all possible instances are present. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-39-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:45:17 -07:00
Matt Roper	886e5b6e5c	drm/xe: Make display part of Wa_22019338487 a device workaround The display part of Wa_22019338487 (i.e., avoiding use of stolen memory) is using a platform test rather than an graphics/media IP test. Since this workaround is focused on non-GT uses of stolen memory, it makes sense that we'd want to still apply the workaround on affected platforms even if the GTs themselves are disabled via configfs. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-38-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:45:16 -07:00
Matt Roper	d0ff153cca	drm/xe: Check for primary GT before looking up Wa_22019338487 If the primary GT is disabled via configfs, we need to make sure that we don't search for this workaround on a NULL xe_gt pointer. Since we can disable the primary GT only on igpu platforms, the media GT is the one we'd want to check anyway for this workaround. The ternary operators in ggtt_update_access_counter() were getting a bit long/complicated, so rewrite them with regular if/else statements. While we're at it, throw in a couple extra assertions to make sure that we're truly picking the expected GT according to igpu/dgpu type. v2: - Adjust indentation/wrapping; it's easier to read this with longer, unwrapped lines. (Lucas) - Tweak wording of commit message to remove ambiguity. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-37-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:58 -07:00
Matt Roper	999ef874c1	drm/xe/pmu: Initialize PMU event types based on first available GT GT ID#0 (primary GT on tile 0) may not always be available if the primary GT has been disabled via configfs. Instead use the first available GT when determining which PMU events are supported. If there are no GTs, then don't advertise any GT-related events. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-36-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:58 -07:00
Matt Roper	090e7fc422	drm/xe/query: Report hwconfig size as 0 if primary GT is disabled The hwconfig table is part of the primary GT's GuC firmware. If the primary GT is disabled, the hwconfig is unavailable and should be reported to userspace as having size 0. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-35-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:58 -07:00
Matt Roper	082547d8b4	drm/xe: Skip L2 / TDF cache flushes if primary GT is disabled If the primary GT is disabled via configfs, GT-side L2 and TD cache flushes are unnecessary since nothing is using/filling these caches. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-34-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	9c52402f6b	drm/xe: Move primary GT allocation from xe_tile_init_early to xe_tile_init During the early days of the Xe driver, there were cases where we accessed some fields in the primary GT's xe_gt structure before the GT itself was formally initialized; this required that the structure itself be allocated during xe_tile_init_early(). A lot of refactoring of the device probe has happened since that time and there's no longer a need to allocate the primary GT early. Move the allocation into xe_info_init() where GT initialization happens and where we're doing the allocation of the media GT. v2: - Only make this change after a separate patch to perform VF GMD_ID lookup with a dummy GT instead of xe_root_mmio_gt(). Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-33-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	ff1d2b5e3d	drm/xe: Read VF GMD_ID with a specifically-allocated dummy GT SRIOV VF initialization has a bit of a chicken and egg design problem. Determining the IP version of the graphics and media IPs can't be done via direct register reads as it is on PF or native and instead requires querying the GuC. However initialization of the GT, including its GuC, needs to wait until after we know the IP versions so that the proper initialization steps for the platform/IP are followed. Currently the (somewhat hacky) solution is to manually fill out just enough fields in tile 0's primary GT structure to make it look as if the GT has been initialized so that the GuC can be partially initialized and queried to obtain the GMD_ID values. When the GT gets properly initialized during the regular flows, the hacked-up values will get overwritten as part of the general initialization flows. Rather than using tile 0's primary GT structure to hold the hacked up values for querying every GT on every tile, instead allocate a dedicated dummy structure. This will allow us to move the tile->primary_gt's allocation to a more consistent place later in the initialization flow in future patches (i.e., we shouldn't even allocate this GT structure if the GT is disabled/unavailable). It also helps ensure there can't be any accidental leakage of initialization or state between the dummy initialization for GMD_ID and the real driver initialization of the GT. v2: - Initialize gt->tile for temporary GT. (CI, Michal) - Use scope-based cleanup handler to free temp GT. (Michal) - Propagate actual error code from xe_gt_sriov_vf_bootstrap() rather than just setting IP version to 0.0 now that read_gmdid() can return an error. (Michal) v3: - Explicitly initialize gt to NULL, just in case something else gets inserted before the kzalloc() in the future. (Lucas) Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-32-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	1a28651c06	drm/xe: Move 'has_flatccs' flag back to platform descriptor FlatCCS presence/absence is a flag that should be tracked at the platform level rather than the IP level. FlatCCS affects the device-wide memory initialization and reservations so its effects are not confined to a single IP block or GT. This is also a trait that should be tied to the platform even if the graphics IP itself is not present (e.g., if we disable the primary GT via configfs). Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-31-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	76b7aedd66	drm/xe: Move 'vram_flags' flag back to platform descriptor Restrictions and requirements on VRAM alignment are something that should be tracked at the platform level rather than the IP level. Even when mixing and matching various graphics, media, and display IP blocks, the platform as a whole has to have consistent memory allocation handling. This is also a trait that should be tied to the platform even if the graphics IP itself is not present (e.g., if we disable the primary GT via configfs). Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-30-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	50292f9af8	drm/xe: Move 'vm_max_level' flag back to platform descriptor The number of page table levels for PPGTT virtual addresses is something that should be tracked at the platform level rather than the IP level. Even when mixing and matching various graphics, media, and display IP blocks, the platform as a whole has to have consistent page table handling. This is also a trait that should be tied to the platform even if the graphics IP itself is not present (e.g., if we disable the primary GT via configfs). v2: - Drop default value of 4 and explicitly set the value in each platform descriptor. (Lucas) v3: - Drop outdated code comment and commit message paragraph about default value. (Gustavo) v4: - Add missing setting for tgl_desc. (Gustavo) Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-29-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	a3bcaf11f4	drm/xe: Move 'va_bits' flag back to platform descriptor The number of virtual address bits is something that should be tracked at the platform level rather than the IP level. Even when mixing and matching various graphics, media, and display IP blocks, the platform as a whole has to have consistent page table handling. This is also a trait that should be tied to the platform even if the graphics IP itself is not present (e.g., if we disable the primary GT via configfs). v2: - Drop the default value of 48 and explicitly set it in each relevant descriptor. (Lucas, Michal) v3: - Drop an outdated comment about default value. (Michal) Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-28-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	d41f306cc2	drm/xe: Drop GT parameter to xe_display_irq_postinstall() Display interrupt handling has no relation to GT(s) on the platforms supported by the Xe driver. We only call xe_display_irq_postinstall with the first tile's primary GT, so the single condition that uses the GT pointer within the function always evaluates to true. Drop the unnecessary parameter and the condition. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-27-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	2816905e14	drm/xe/huc: Adjust HuC check on primary GT The HuC initialization code determines whether a platform can have a HuC on the primary GT by checking whether tile->media_gt is NULL; old Xe1 platforms that combined render+media into a single GT will always have a NULL media_gt pointer. However once we allow media to be disabled via configfs, there will also be cases where tile->media_gt is NULL on more modern platforms, causing this condition to behave incorrectly. To handle cases where media gets disabled via configfs (or theoretical cases where media is truly fused off in hardware), change the condition to consider the graphics version of the primary GT; only the old Xe1 platforms with graphics versions 12.55 or earlier should try to initialize a HuC on the primary GT. Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-26-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:57 -07:00
Matt Roper	89e347f8a7	drm/xe/kunit: Fix kerneldoc for parameterized tests Kunit's generate_params() was recently updated to take an additional test context parameter. Xe's IP and platform parameter generators were updated accordingly at the same time, but the new parameter was not added to the functions' kerneldoc, resulting in the following warnings: Warning: drivers/gpu/drm/xe/tests/xe_pci.c:78 function parameter 'test' not described in 'xe_pci_fake_data_gen_params' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:254 function parameter 'test' not described in 'xe_pci_graphics_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:278 function parameter 'test' not described in 'xe_pci_media_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:302 function parameter 'test' not described in 'xe_pci_id_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:390 function parameter 'test' not described in 'xe_pci_live_device_gen_param' 5 warnings as errors Document the new parameter to eliminate the warnings and make CI happy. Fixes: `b9a214b5f6` ("kunit: Pass parameterized test context to generate_params()") Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20251013153014.2362879-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:37:26 -07:00
Thomas Hellström	2cfcea7a74	drm/xe/svm: Ensure data will be migrated to system if indicated by madvise. If the location madvise() is set to DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, the drm_pagemap in the SVM gpu fault handler will be set to NULL. However there is nothing that explicitly migrates the data to system if it is already present in device memory. In that case, set the device memory owner to NULL to ensure data gets properly migrated to system on page-fault. v2: - Remove redundant dpagemap assignment (Himal Prasad Ghimiray) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251010104149.72783-2-thomas.hellstrom@linux.intel.com Fixes: `10aa5c8060` ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages")	2025-10-14 13:11:16 +02:00
Tomasz Lis	bb3d208250	drm/xe/ct: Separate waiting for retry from ct send function The function `guc_ct_send_locked()` is really quite simple, but still looks complex due to exposed internals. It is sending a message, and in case of lack of space, waiting for a proper moment to send a retry. Clear separation of send function and wait function will help with readability. This is a cosmetic change only, no functional difference is expected. This patch introduces `guc_ct_send_wait_for_retry()`, and uses it to greatly simplify `guc_ct_send_locked()`. Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251009170844.178199-1-tomasz.lis@intel.com	2025-10-14 12:37:02 +02:00
Thomas Hellström	82ee50252d	Merge drm/drm-next into drm-xe-next Backmerging to bring in 6.18-rc1. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-10-14 11:31:49 +02:00
Matthew Brost	dd83b101a4	drm/xe: Enable 2M pages in xe_migrate_vram Using 2M pages in xe_migrate_vram has two benefits: we issue fewer instructions per 2M copy (1 vs. 512), and the cache hit rate should be higher. This results in increased copy engine bandwidth, as shown by benchmark IGTs. Enable 2M pages by reserving PDEs in the migrate VM and using 2M pages in xe_migrate_vram if the DMA address order matches 2M. v2: - Reuse build_pt_update_batch_sram (Stuart) - Fix build_pt_update_batch_sram for PAGE_SIZE > 4K v3: - More fixes for PAGE_SIZE > 4K, align chunk, decrement chunk as needed - Use stack incr var in xe_migrate_vram_use_pde (Stuart) v4: - Split PAGE_SIZE > 4K fix out in different patch (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20251013034555.4121168-3-matthew.brost@intel.com	2025-10-13 12:31:22 -07:00
Matthew Brost	55991d854f	drm/xe: Fix build_pt_update_batch_sram for non-4K PAGE_SIZE The build_pt_update_batch_sram function in the Xe migrate layer assumes PAGE_SIZE == XE_PAGE_SIZE (4K), which is not a valid assumption on non-x86 platforms. This patch updates build_pt_update_batch_sram to correctly handle PAGE_SIZE > 4K by programming multiple 4K GPU pages per CPU page. v5: - Mask off non-address bits during compare Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20251013034555.4121168-2-matthew.brost@intel.com	2025-10-13 12:31:21 -07:00
Shuicheng Lin	604be9dad8	drm/xe: Fix comments in xe_gt struct Correct several spelling and grammar issues in xe_gt struct documentation to improve readability: - Fix "to not" -> "do not". - Fix "mmigrations" -> "migrations". - Fix "Multiple queues exists" -> "Multiple queues exist". - Fix "are be processed" -> "to be processed". - Fix "have being processed" -> "have been processed". These changes are purely cosmetic and do not affect functionality. v2: drop kernel-doc formatting change. (Jani) Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251006151317.2553182-2-shuicheng.lin@intel.com	2025-10-13 12:28:27 -07:00
Shuicheng Lin	9b42321a02	drm/xe/guc: Check GuC running state before deregistering exec queue In normal operation, a registered exec queue is disabled and deregistered through the GuC, and freed only after the GuC confirms completion. However, if the driver is forced to unbind while the exec queue is still running, the user may call exec_destroy() after the GuC has already been stopped and CT communication disabled. In this case, the driver cannot receive a response from the GuC, preventing proper cleanup of exec queue resources. Fix this by directly releasing the resources when GuC is not running. Here is the failure dmesg log: " [ 468.089581] ---[ end trace 0000000000000000 ]--- [ 468.089608] pci 0000:03:00.0: [drm] ERROR GT0: GUC ID manager unclean (1/65535) [ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535 [ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1 [ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1) [ 468.092716] ------------[ cut here ]------------ [ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe] " v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled(). As CT may go down and come back during VF migration. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com	2025-10-13 10:38:47 -07:00
Raag Jadav	0bb78ce099	drm/xe/i2c: Wire up reset/postinstall for I2C IRQ I2C IRQ needs to be routed to SGUnit or PUnit for the devices that support it. Wire up reset/postinstall handles for I2C IRQ to take care of this. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20251011123509.3233213-3-raag.jadav@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 07:24:25 -07:00
Raag Jadav	3df5aacb9d	drm/xe/i2c: Introduce xe_i2c_irq_present() In preparation of wider usecases which require checking for I2C IRQ presence, introduce xe_i2c_irq_present() helper. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20251011123509.3233213-2-raag.jadav@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 07:24:25 -07:00
Arun Abhishek Chowhan	64d00d41f5	drm/xe: Sort include files alphabetically. Sort the include lines alphabetically, no impact on code behavior. Signed-off-by: Arun Abhishek Chowhan <arun.abhishek.chowhan@intel.com> Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://lore.kernel.org/r/20251013095914.3742505-1-arun.abhishek.chowhan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 07:18:20 -07:00
Linus Torvalds	284fc30e66	Merge tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel Pull more drm fixes from Dave Airlie: "Just the follow up fixes for rc1 from the next branch, amdgpu and xe mostly with a single v3d fix in there. amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL xe: - Fix build with clang 16 - Fix handling of invalid configfs syntax usage and spell out the expected syntax in the documentation - Do not try late bind firmware when running as VF since it shouldn't handle firmware loading - Fix idle assertion for local BOs - Fix uninitialized variable for late binding - Do not require perfmon_capable to expose free memory at page granularity. Handle it like other drm drivers do - Fix lock handling on suspend error path - Fix I2C controller resume after S3 v3d: - fix fence locking" * tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/amd/display: Incorrect Mirror Cositing drm/amd/display: Enable Dynamic DTBCLK Switch drm/amdgpu: Report individual reset error drm/amdgpu: partially revert "revert to old status lock handling v3" drm/amd/display: Fix unsafe uses of kernel mode FPU drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine drm/amdgpu: Check swus/ds for switch state save drm/amdkfd: Fix two comments in kfd_ioctl.h drm/amd/pm: Avoid interface mismatch messaging drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init drm/amd/amdgpu: Fix the mes version that support inv_tlbs drm/amd: Check whether secure display TA loaded successfully drm/amdkfd: Fix mmap write lock not release drm/amdkfd: Fix kfd process ref leaking when userptr unmapping drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. drm/amd/display: Disable scaling on DCE6 for now drm/amd/display: Properly disable scaling on DCE6 drm/amd/display: Properly clear SCL__FILTER_CONTROL on DCE6 drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT SRIs ...	2025-10-10 14:02:14 -07:00
Linus Torvalds	1e5d41b981	Merge tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Some fixes leftover from our fixes branch, just nouveau and vmwgfx: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation" * tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel: drm/nouveau: fix bad ret code in nouveau_bo_move_prep drm/vmwgfx: Fix copy-paste typo in validation drm/vmwgfx: Fix Use-after-free in validation drm/vmwgfx: Fix a null-ptr access in the cursor snooper	2025-10-10 13:59:38 -07:00
Dave Airlie	5ca5f00a16	Merge tag 'drm-misc-fixes-2025-10-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251009120004.GA17570@linux.fritz.box	2025-10-11 06:17:13 +10:00
Shuicheng Lin	65369b8e29	drm/xe: Change return type of detect_bar2_dgfx() from s64 to u64 The function never returns a negative value, and the return value is assigned to a u64 variable. Use u64 for better type correctness and clarity. v2: add assert to catch theoretical negative or zero stolen size. (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20251009230239.2830207-8-shuicheng.lin@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-10-10 15:10:17 -04:00
Shuicheng Lin	0145a99eac	drm/xe: Fix copyright in xe_ttm_stolen_mgr - Correct Red Hat copyright year from "2002" to "2022". - Sort the include lines alphabetically. No functional changes intended. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20251009230239.2830207-7-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-10-10 15:10:13 -04:00
Shuicheng Lin	f85d4062bc	drm/xe: Fix copyright and function naming in xe_ttm_sys_mgr - Correct Red Hat copyright year from "2002" to "2022". - Rename ttm_sys_mgr_fini() to xe_ttm_sys_mgr_fini() to avoid confusion with generic TTM helpers. No functional changes intended. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20251009230239.2830207-6-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-10-10 15:10:13 -04:00
Vinay Belgaumkar	4cbc08649a	drm/xe: Enable media sampler power gating Where applicable, enable media sampler power gating. Also, add it to the powergate_info debugfs. v2: Remove the sampler powergate status since it is cleared quickly anyway. v3: Use vcs mask (Rodrigo) and fix the version check for media v4: Remove extra spaces v5: Media samplers are independent of vcs mask, use Media version 1255 (Matt Roper) Fixes: `38e8c4184e` ("drm/xe: Enable Coarse Power Gating") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-10-10 15:07:54 -04:00
Matthew Brost	75188605c5	drm/xe: Handle mixed mappings and existing VRAM on atomic faults Moving to VRAM will fail if mixed mappings are present or if the page is already located in VRAM. Atomic faults that require a move to VRAM currently retry without attempting to evict mixed mappings or locate existing VRAM mappings. This patch fixes the issue by attempting to evict mixed mappings or find existing VRAM pages when a move to VRAM fails during atomic fault handling. Fixes: `a9ac0fa455` ("drm/xe: Strict migration policy for atomic SVM faults") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com	2025-10-10 08:18:11 -07:00
Thomas Hellström	381f1ed151	drm/xe/migrate: Fix an error path The exhaustive eviction accidently changed an error path goto to a return. Fix this. Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com	2025-10-10 11:35:11 +02:00
Lucas De Marchi	45e33f220f	drm/xe: Move rebar to be done earlier There may be cases in which the BAR0 also needs to move to accommodate the bigger BAR2. However if it's not released, the BAR2 resize fails. During the vram probe it can't be released as it's already in use by xe_mmio for early register access. Add a new function in xe_vram and let xe_pci call it directly before even early device probe. This allows the BAR2 to resize in cases BAR0 also needs to move, assuming there aren't other reasons to hold that move: [] xe 0000:03:00.0: vgaarb: deactivate vga console [] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned [] pcieport 0000:00:01.0: PCI bridge to [bus 01-04] [] pcieport 0000:00:01.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:00:01.0: bridge window [mem 0x4000000000-0x44007fffff 64bit pref] [] pcieport 0000:01:00.0: PCI bridge to [bus 02-04] [] pcieport 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] pcieport 0000:02:01.0: PCI bridge to [bus 03] [] pcieport 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff] [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] xe 0000:03:00.0: [drm] BAR2 resized to 16384M [] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ... For BMG there are additional fix needed in the PCI side, but this helps getting it to a working resize. All the rebar logic is more pci-specific than xe-specific and can be done very early in the probe sequence. In future it would be good to move it out of xe_vram.c, but this refactor is left for later. Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org # 6.12+ Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-09 22:09:31 -07:00
Dave Airlie	c4b6ddcf01	Merge tag 'amd-drm-next-6.18-2025-10-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-10-09: amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20251009162915.981503-1-alexander.deucher@amd.com	2025-10-10 06:57:56 +10:00
Matthew Brost	8b9ba8d6d9	drm/xe: Don't allow evicting of BOs in same VM in array of VM binds An array of VM binds can potentially evict other buffer objects (BOs) within the same VM under certain conditions, which may lead to NULL pointer dereferences later in the bind pipeline. To prevent this, clear the allow_res_evict flag in the xe_bo_validate call. v2: - Invert polarity of no_res_evict (Thomas) - Add comment in code explaining issue (Thomas) Cc: stable@vger.kernel.org Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268 Fixes: `774b5fa509` ("drm/xe: Avoid evicting object of the same vm in none fault mode") Fixes: `77f2ef3f16` ("drm/xe: Lock all gpuva ops during VM bind IOCTL") Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com	2025-10-09 11:40:19 -07:00
Kenneth Graunke	146046907b	drm/xe: Increase global invalidation timeout to 1000us The previous timeout of 500us seems to be too small; panning the map in the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered timeouts within a few seconds of usage, causing the monitor to freeze and the following to be printed to dmesg: [Jul30 13:44] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation timeout [Jul30 13:48] xe 0000:03:00.0: [drm] ERROR [CRTC:82:pipe A] flip_done timed out I haven't hit a single timeout since increasing it to 1000us even after several multi-hour testing sessions. Fixes: `0dd2dd0182` ("drm/xe: Move DSB l2 flush to a more sensible place") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-09 07:53:40 -07:00
Satyanarayana K V P	60e2667557	drm/xe/guc: Increase wait timeout to 2sec after BUSY reply from GuC Some VF2GUC actions may take longer to process. Increase default timeout after received BUSY indication to 2sec to cover all worst case scenarios. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-35-matthew.brost@intel.com	2025-10-09 03:24:28 -07:00
Matthew Brost	8f1e1e524c	drm/xe/vf: Rebase CCS save/restore BB GGTT addresses Rebase the CCS save/restore BB's GGTT addresses during VF post-migration recovery by setting the software ring tail to zero, the LRC ring head to zero, and rewriting the jump-to-BB instructions. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-34-matthew.brost@intel.com	2025-10-09 03:24:28 -07:00
Matthew Brost	a093570ecd	drm/xe/vf: Ensure media GT VF recovery runs after primary GT on PTL It is possible that the media GT's VF post-migration recovery work item gets scheduled before the primary GT's work item. Since the media GT depends on the primary GT's work item to complete CCS restore, if the media GT's work item is scheduled first, detect this condition and re-queue the media GT's work item for a later time. v5: - Adjust debug message (Tomasz) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-33-matthew.brost@intel.com	2025-10-09 03:24:28 -07:00
Matthew Brost	efd38d619a	drm/xe/vf: Use primary GT ordered work queue on media GT on PTL VF VF CCS restore is a primary GT operation on which the media GT depends. Therefore, it doesn't make much sense to run these operations in parallel. To address this, point the media GT's ordered work queue to the primary GT's ordered work queue on platforms that require (PTL VFs) CCS restore as part of VF post-migration recovery. v7: - Remove bool from xe_gt_alloc (Lucas) v9: - Fix typo (Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-32-matthew.brost@intel.com	2025-10-09 03:24:28 -07:00
Satyanarayana K V P	673167d9f0	drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixups The migrate VM builds the CCS metadata save/restore batch buffer (BB) in advance and retains it so the GuC can submit it directly when saving a VM’s state. When a VM migrates between VFs, the GGTT base can change. Any GGTT-based addresses embedded in the BB would then have to be parsed and patched. Use PPGTT addresses in the BB (including for TLB invalidation) so the BB remains GGTT-agnostic and requires no address fixups during migration. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-31-matthew.brost@intel.com	2025-10-09 03:24:07 -07:00
Matthew Brost	3b56911960	drm/xe/vf: Workaround for race condition in GuC firmware during VF pause A race condition exists where a paused VF's H2G request can be processed and subsequently rejected. This rejection results in a FAST_REQ failure being delivered to the KMD, which then terminates the CT via a dead worker and triggers a GT reset—an undesirable outcome. This workaround mitigates the issue by checking if a VF post-migration recovery is in progress and aborting these adverse actions accordingly. The GuC firmware will address this bug in an upcoming release. Once that version is available and VF migration depends on it, this workaround can be safely removed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-30-matthew.brost@intel.com	2025-10-09 03:22:57 -07:00
Matthew Brost	1521fad9ad	drm/xe/vf: Add debug prints for GuC replaying state during VF recovery Helpful to manually verify the GuC state machine can correctly replay the state during a VF post-migration recovery. All replay paths have been manually verified as triggered and working during testing. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-29-matthew.brost@intel.com	2025-10-09 03:22:56 -07:00
Matthew Brost	3c1fa4aa60	drm/xe: Move queue init before LRC creation A queue must be in the submission backend's tracking state before the LRC is created to avoid a race condition where the LRC's GGTT addresses are not properly fixed up during VF post-migration recovery. Move the queue initialization—which adds the queue to the submission backend's tracking state—before LRC creation. Also wait on pending GGTT fixups before allocating LRCs to avoid racing with fixups. v2: - Wait on VF GGTT fixes before creating LRC (testing) v5: - Adjust comment in code (Tomasz) - Reduce race window v7: - Only wakeup waiters in recovery path (CI) - Wakeup waiters on abort - Use GT warn on (Michal) - Fix kernel doc for LRC ring size function (Tomasz) v8: - Guard against migration not supported or no memirq (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-28-matthew.brost@intel.com	2025-10-09 03:22:53 -07:00
Matthew Brost	c25c1010df	drm/xe/vf: Replay GuC submission state on pause / unpause Fixup GuC submission pause / unpause functions to properly replay any possible state lost during VF post migration recovery. v3: - Add helpers for revert / replay (Tomasz) - Add comment around WQ NOPs (Tomasz) v7: - Only fixup / replay parallel queues once (Testing) - Skip unpause step on queues created after resfix done (Testing) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-27-matthew.brost@intel.com	2025-10-09 03:22:50 -07:00

1 2 3 4 5 ...

118544 Commits