linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-04 19:22:08 -04:00

Author	SHA1	Message	Date
Linus Torvalds	ffe8ac927d	Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "This is the fixes that built up in the merge window, mostly amdgpu and xe with one i915 display fix, seems like things are pretty good for rc1. i915: - DP LPFS fixes xe: - SRIOV: PF fixes and removal of need of module param - Fix driver unbind around Devcoredump - Mark xe driver as BROKEN if kernel page size is not 4kB amdgpu: - GC 9.5.0 fixes - SMU fix - DCE 6 DC fixes - mmhub client ID fixes - VRR fix - Backlight fix - UserQ fix - Legacy reset fix - Misc fixes amdkfd: - CRIU fix - Debugfs fix" * tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits) drm/amdgpu: add missing vram lost check for LEGACY RESET drm/amdgpu/discovery: fix fw based ip discovery drm/amdkfd: Destroy KFD debugfs after destroy KFD wq amdgpu/amdgpu_discovery: increase timeout limit for IFWI init drm/amdgpu: Update SDMA firmware version check for user queue support drm/amdgpu: Add NULL check for asic_funcs drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value" drm/amd/display: fix a Null pointer dereference vulnerability drm/amd/display: Add primary plane to commits for correct VRR handling drm/amdgpu: update mmhub 3.3 client id mappings drm/amdgpu: update mmhub 3.0.1 client id mappings drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming. drm/amd/display: Don't overwrite dce60_clk_mgr drm/amdkfd: Fix checkpoint-restore on multi-xcc drm/amd: Restore cached manual clock settings during resume drm/amd: Restore cached power limit during resume drm/amdgpu: Update external revid for GC v9.5.0 drm/amdgpu: Update supported modes for GC v9.5.0 Mark xe driver as BROKEN if kernel page size is not 4kB ...	2025-08-08 06:48:14 +03:00
Simon Richter	022906afdf	Mark xe driver as BROKEN if kernel page size is not 4kB This driver, for the time being, assumes that the kernel page size is 4kB, so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB pages. Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250802024152.3021-1-Simon.Richter@hogyros.de (cherry picked from commit `0521a86822`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:59:11 -04:00
Michal Wajdeczko	cb7a3f949a	drm/xe/pf: Make sure PF is ready to configure VFs The PF driver might be resumed just to configure VFs, but since it is doing some asynchronous GuC reconfigurations after fresh reset, we should wait until all pending works are completed. This is especially important in case of LMEM provisioning, since we also need to update the LMTT and send invalidation requests to all GuCs, which are expected to be already in the VGT mode. Fixes: `68ae022278` ("drm/xe/pf: Force GuC virtualization mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com (cherry picked from commit `c6c86441c4`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:59:06 -04:00
Michal Wajdeczko	c286ce6b01	drm/xe/pf: Disable PF restart worker on device removal We can't let restart worker run once device is removed, since other data that it might want to access could be already released. Explicitly disable worker as part of device cleanup action. Fixes: `a4d1c5d0b9` ("drm/xe/pf: Move VFs reprovisioning to worker") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com (cherry picked from commit `a424353937`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:59:01 -04:00
Balasubramani Vivekanandan	465f1dba74	drm/xe/devcoredump: Defer devcoredump initialization during probe Doing devcoredump initializing before GT though look harmless, it leads to problem during driver unbind. Because of this order, GT/Engine release functions will be called before xe devcoredump release function (xe_driver_devcoredump_fini) leading to the following kernel crash[1] because the devcoredump functions might still use GT/Engine datastructures after those are freed. The following crash is observed while running the IGT xe_wedged@wedged-at-any-timeout. The test forces a wedged state by submitting a workload which hangs. Then does a unbind/rebind of the driver to recover from the wedged state. The hanged workload leads to a devcoredump. The following crash is noticed when the devcoredump capture races with the driver unbind. During driver unbind, the release function hw_engine_fini() will be called which assigns NULL to hwe->gt. But the same data structure is accessed during the coredump capture in the function xe_engine_snapshot_print by reading snapshot->hwe->gt. With this patch, we make sure the devcoredump is stopped before deinitializing the core driver functions. [1]: BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe] RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe] Call Trace: <TASK> ? drm_printf+0x64/0x90 __xe_devcoredump_read+0x23f/0x2d0 [xe] ? __pfx___drm_printfn_coredump+0x10/0x10 ? __pfx___drm_puts_coredump+0x10/0x10 xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe] process_one_work+0x22e/0x6f0 worker_thread+0x1e8/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x11f/0x250 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x47/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 v2: Detailed commit description (Rodrigo) v3: FIXME added (Rodrigo, Stuart) Fixes: `4209d635a8` ("drm/xe: Remove devcoredump during driver release") Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `1fdc4c381f`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:58:56 -04:00
Michal Wajdeczko	df9bdd4381	drm/xe/pf: Enable SR-IOV PF mode by default We already claim official support for SR-IOV PF/VF modes on PTL and BMG platforms, but by default we start the Xe driver on those platforms in non-virtualized mode (native) since we still have max_vfs modparam set to disable creation of the VFs. It's time to let the Xe driver support SR-IOV PF mode by default. We were already testing this on our CI, which was relying on the patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `a2b461bd6f`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:56:18 -04:00
Linus Torvalds	e991acf1bc	Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer\|\|notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...	2025-08-03 16:23:09 -07:00
WangYuli	26197b0fd2	drm/xe: fix typo "notifer" There is a spelling mistake of 'notifer' in the comment which should be 'notifier'. Link: https://lkml.kernel.org/r/94190C5F54A19F3E+20250722073431.21983-3-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:01:39 -07:00
Linus Torvalds	89748acdf2	Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Just a bunch of amdgpu and xe fixes. amdgpu: - DSC divide by 0 fix - clang fix - DC debugfs fix - Userq fixes - Avoid extra evict-restore with KFD - Backlight fix - Documentation fix - RAS fix - Add new kicker handling - DSC fix for DCN 3.1.4 - PSR fix - Atomic fix - DC reset fixes - DCN 3.0.1 fix - MMHUB client mapping fix xe: - Fix BMG probe on unsupported mailbox command - Fix OA static checker warning about null gt - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter - Fix missing unwind goto in GuC/HuC - Don't register I2C devices if VF - Clear whole GuC g2h_fence during initialization - Avoid call kfree for drmm_kzalloc - Fix pci_dev reference leak on configfs - SRIOV: Disable CSC support on VF * tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits) drm/xe/vf: Disable CSC support on VF drm/amdgpu: update mmhub 4.1.0 client id mappings drm/amd/display: Allow DCN301 to clear update flags drm/amd/display: Pass up errors for reset GPU that fails to init HW drm/amd/display: Only finalize atomic_obj if it was initialized drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported drm/amd/display: Disable dsc_power_gate for dcn314 by default drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14 drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c' drm/amd/display: fix initial backlight brightness calculation drm/amdgpu: Avoid extra evict-restore process. drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram() drm/amd/display: Fix divide by zero when calculating min ODM factor drm/xe/configfs: Fix pci_dev reference leak drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() drm/xe/guc: Clear whole g2h_fence during initialization drm/xe/vf: Don't register I2C devices if VF ...	2025-07-31 21:47:36 -07:00
Linus Torvalds	260f6f4fda	Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...	2025-07-30 19:26:49 -07:00
Lukasz Laguna	f62408efc8	drm/xe/vf: Disable CSC support on VF CSC is not accessible by VF drivers, so disable its support flag on VF to prevent further initialization attempts. Fixes: `e02cea83d3` ("drm/xe/gsc: add Battlemage support") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Cc: Alexander Usyskin <alexander.usyskin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com (cherry picked from commit `552dbba1ca`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-30 15:09:27 -04:00
Linus Torvalds	9669b2499e	Merge tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers from Ilpo Järvinen: - alienware: Add more precise labels to fans - amd/hsmp: Improve misleading probe errors (make the legacy driver aware when HSMP is supported through the ACPI driver) - amd/pmc: Add Lenovo Yoga 6 13ALCL6 to pmc quirk list - drm/xe: Correct (D)VSEC information to support PMT crashlog feature - fujitsu: Clamp charge threshold instead of returning an error - ideapad: Expore change types - intel/pmt: - Add PMT Discovery driver - Add API to retrieve telemetry regions by feature - Fix crashlog NULL access - Support Battlemage GPU (BMG) crashlog - intel/vsec: - Add Discovery feature - Add feature dependency support using device links - lenovo: - Move lenovo drivers under drivers/platform/x86/lenovo/ - Add WMI drivers for Lenovo Gaming series - Improve DMI handling - oxpec: - Add support for OneXPlayer X1 Mini Pro (Strix Point variant) - Fix EC registers for G1 AMD - samsung-laptop: Expose change types - wmi: Fix WMI device naming issue (same GUID corner cases) - x86-android-tables: Add ovc-capacity-table to generic battery nodes - Miscellaneous cleanups / refactoring / improvements * tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (63 commits) platform/x86: oxpec: Add support for OneXPlayer X1 Mini Pro (Strix Point) platform/x86: oxpec: Fix turbo register for G1 AMD platform/x86/intel/pmt: support BMG crashlog platform/x86/intel/pmt: use a version struct platform/x86/intel/pmt: refactor base parameter platform/x86/intel/pmt: add register access helpers platform/x86/intel/pmt: decouple sysfs and namespace platform/x86/intel/pmt: correct types platform/x86/intel/pmt: re-order trigger logic platform/x86/intel/pmt: use guard(mutex) platform/x86/intel/pmt: mutex clean up platform/x86/intel/pmt: white space cleanup drm/xe: Correct BMG VSEC header sizing drm/xe: Correct the rev value for the DVSEC entries platform/x86/intel/pmt: fix a crashlog NULL pointer access platform/x86: samsung-laptop: Expose charge_types platform/x86/amd: pmc: Add Lenovo Yoga 6 13ALC6 to pmc quirk list platform/x86: dell-uart-backlight: Use blacklight power constant platform/x86/intel/pmt: fix build dependency for kunit test platform/x86: lenovo: gamezone needs "other mode" ...	2025-07-28 23:21:28 -07:00
Michal Wajdeczko	942ac8da63	drm/xe/configfs: Fix pci_dev reference leak We are using pci_get_domain_bus_and_slot() function to verify if the given config directory name matches any existing PCI device, but we missed to call matching pci_dev_put() to release reference. While around, also change error code in case of no device match, to make it more specific than generic formatting error. Fixes: `16280ded45` ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `0bdd05c2a8`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:48 -04:00
Shuicheng Lin	4846856c3a	drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() Memory allocated with drmm_kzalloc() should not be freed using kfree(), as it is managed by the DRM subsystem. The memory will be automatically freed when the associated drm_device is released. These 3 group pointers are allocated using drmm_kzalloc() in hw_engine_group_alloc(), so they don't require manual deallocation. Fixes: `6797906074` ("drm/xe/hw_engine_group: Fix potential leak") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250724193854.1124510-2-shuicheng.lin@intel.com (cherry picked from commit `f98de826b4`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:43 -04:00
Michal Wajdeczko	a2e1407eb8	drm/xe/guc: Clear whole g2h_fence during initialization The struct g2h_fence must be explicitly initializated using the g2h_fence_init() function to avoid trash values in its members, but we missed to update this helper function with the new member. To fix that and avoid any future mistakes, memset the whole struct first, then update remaining non-zero members. Fixes: `94de94d24e` ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723175639.206875-1-michal.wajdeczko@intel.com (cherry picked from commit `159afd92ba`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:38 -04:00
Lukasz Laguna	cccb918e02	drm/xe/vf: Don't register I2C devices if VF VF drivers can't access I2C devices, so skip their registration when running as VF. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Fixes: `f0e53aadd7` ("drm/xe: Support for I2C attached MCUs") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250717155420.25298-1-lukasz.laguna@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `9a220e0659`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:33 -04:00
Zhanjun Dong	dc94168eaa	drm/xe/uc: Fix missing unwind goto Fix missing unwind goto on error handling. Fixes: `b2c4ac219f` ("drm/xe/uc: Disable GuC communication on hardware initialization error") Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250721214520.954014-1-zhanjun.dong@intel.com (cherry picked from commit `176f44a5ec`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:28 -04:00
Dan Carpenter	2bd986021c	drm/xe: Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter() The fwnode_create_software_node() function returns error pointers. It never returns NULL. Update the checks to match. Fixes: `f0e53aadd7` ("drm/xe: Support for I2C attached MCUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/65825d00-81ab-4665-af51-4fff6786a250@sabinyo.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `2f264d58cc`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:22 -04:00
Ashutosh Dixit	6aaceed7fe	drm/xe/oa: Fix static checker warning about null gt There is a static checker warning that gt returned by xe_device_get_gt can be NULL and that is being dereferenced. Use xe_root_mmio_gt instead, which is equivalent and cannot return a NULL gt 0. Fixes: `10d42ef34b` ("drm/xe/oa: Assign hwe for OAM_SAG") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250715181422.2807624-1-ashutosh.dixit@intel.com (cherry picked from commit `308dc9b278`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:17 -04:00
Raag Jadav	d9e9aa3e97	drm/xe: Don't fail probe on unsupported mailbox command If the device is running older pcode firmware, it is possible that newer mailbox commands are not supported by it. The sysfs attributes aren't useful in that case, but we shouldn't fail driver probe because of it. As of now, it is unknown if we can distinguish unsupported commands before attempting them. But until we figure out a way to do that, fix the regressions. v2: Add debug message (Lucas) Fixes: `cdc36b66cd` ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Tested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250714215503.2897748-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `ed5461daa1`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:12 -04:00
Lucas De Marchi	99e91521ce	drm/xe: Fix build without debugfs When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o is not built and build fails on some setups with: ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset': drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure' ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure' collect2: error: ld returned 1 exit status Do not use the gt_reset_failure attribute if debugfs is not enabled. Fixes: `8f3013e0b2` ("drm/xe: Introduce fault injection for gt reset") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `4d3bbe9dd2`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-07-24 21:06:52 +02:00
Michael J. Ruhl	5b27388171	drm/xe: Correct BMG VSEC header sizing The intel_vsec_header information for the crashlog feature is incorrect. Update the VSEC header with correct sizing and count. Since the crashlog entries are "merged" (num_entries = 2), the separate capabilities entries must be merged as well. Fixes: `0c45e76fcc` ("drm/xe/vsec: Support BMG devices") Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20250713172943.7335-4-michael.j.ruhl@intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2025-07-22 17:38:49 +03:00
Michael J. Ruhl	0ba9e9cf76	drm/xe: Correct the rev value for the DVSEC entries By definition, the Designated Vendor Specific Extended Capability (DVSEC) revision should be 1. Add the rev value to be correct. Fixes: `0c45e76fcc` ("drm/xe/vsec: Support BMG devices") Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20250713172943.7335-3-michael.j.ruhl@intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2025-07-22 17:38:47 +03:00
Dave Airlie	be3cd668ff	Merge tag 'drm-misc-next-2025-07-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - mode_config: Change fb_create prototype to pass the drm_format_info and avoid redundant lookups in drivers - sched: kunit improvements, memory leak fixes, reset handling improvements - tests: kunit EDID update Driver Changes: - amdgpu: Hibernation fixes, structure lifetime fixes - nouveau: sched improvements - sitronix: Add Sitronix ST7567 Support - bridge: - Make connector available to bridge detect hook - panel: - More refcounting changes - New panels: BOE NE14QDM Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat	2025-07-21 09:16:51 +10:00
Dave Airlie	af42cf30ea	Merge tag 'drm-xe-next-2025-07-15' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next Driver Changes: - Create and use XE_DEVICE_WA infrastructure (Atwood) - SRIOV: Mark BMG as SR-IOV capable (Michal) - Dont skip TLB invalidations on VF (Tejas) - Fix migration copy direction in access_memory (Auld) - General code clean-up (Lucas, Brost, Dr. David, Xin) - More missing XeLP workarounds (Tvrtko) - SRIOV: Relax VF/PF version negotiation (Michal) - SRIOV: LMTT invalidation (Michal) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/aHacDvF9IaVHI61C@intel.com	2025-07-18 19:48:20 +10:00
Michal Wajdeczko	5c244eeca5	drm/xe/pf: Resend PF provisioning after GT reset If we reload the GuC due to suspend/resume or GT reset then we have to resend not only any VFs provisioning data, but also PF configuration, like scheduling parameters (EQ, PT), as otherwise GuC will continue to use default values. Fixes: `411220808c` ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com (cherry picked from commit `1c38dd6afa`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-17 09:51:51 -03:00
Michal Wajdeczko	81dccec448	drm/xe/pf: Prepare to stop SR-IOV support prior GT reset As part of the resume or GT reset, the PF driver schedules work which is then used to complete restarting of the SR-IOV support, including resending to the GuC configurations of provisioned VFs. However, in case of short delay between those two actions, which could be seen by triggering a GT reset on the suspened device: $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset this PF worker might be still busy, which lead to errors due to just stopped or disabled GuC CTB communication: [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe] [ ] xe 0000:00:02.0: [drm] GT0: reset queued [ ] xe 0000:00:02.0: [drm] GT0: reset started [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled! [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED) [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed While this VFs reprovisioning will be successful during next spin of the worker, to avoid those errors, make sure to cancel restart worker if we are about to trigger next reset. Fixes: `411220808c` ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com (cherry picked from commit `9f50b729dd`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-17 09:51:51 -03:00
Lucas De Marchi	057a7d66f9	drm/xe/migrate: Fix alignment check The check would fail if the address is unaligned, but not when accounting the offset. Instead of `buf \| offset` it should have been `buf + offset`. To make it more readable and also drop the uintptr_t, just use the IS_ALIGNED() macro. Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `81e139db69`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-17 09:51:51 -03:00
Matthew Brost	3155ac8925	drm/xe: Move page fault init after topology init We need the topology to determine GT page fault queue size, move page fault init after topology init. Cc: stable@vger.kernel.org Fixes: `3338e4f90c` ("drm/xe: Use topology to determine page fault queue size") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com (cherry picked from commit `beb72acb5b`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-17 09:51:27 -03:00
Balasubramani Vivekanandan	2a58b21ade	drm/xe/mocs: Initialize MOCS index early MOCS uc_index is used even before it is initialized in the following callstack guc_prepare_xfer() __xe_guc_upload() xe_guc_min_load_for_hwconfig() xe_uc_init_hwconfig() xe_gt_init_hwconfig() Do MOCS index initialization earlier in the device probe. Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250520142445.2792824-1-balasubramani.vivekanandan@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `241cc827c0`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-17 09:51:12 -03:00
Matthew Auld	1381c231c1	drm/xe/migrate: fix copy direction in access_memory After we do the modification on the host side, ensure we write the result back to VRAM and not the other way around, otherwise the modification will be lost if treated like a read. Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com (cherry picked from commit `c12fe703ca`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-17 09:45:19 -03:00
Tejas Upadhyay	fd25fa90ed	drm/xe: Dont skip TLB invalidations on VF Skipping TLB invalidations on VF causing unrecoverable faults. Probable reason for skipping TLB invalidations on SRIOV could be lack of support for instruction MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some additional handling. Helps in resolving, [ 704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] ASID: 0 VFID: 0 PDATA: 0x0d92 Faulted Address: 0x0000000002fa0000 FaultType: 0 AccessType: 1 FaultLevel: 0 EngineClass: 3 bcs EngineInstance: 8 [ 704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22 V2: - Use Xmas tree (MichalW) Suggested-by: Matthew Brost <matthew.brost@intel.com> Fixes: `97515d0b3e` ("drm/xe/vf: Don't emit access to Global HWSP if VF") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit `b528e896fa`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-17 09:45:19 -03:00
Ville Syrjälä	800df9e50c	drm/i915: Pass along the format info from .fb_create() to drm_helper_mode_fill_fb_struct() Plumb the format info from .fb_create() all the way to drm_helper_mode_fill_fb_struct() to avoid the redundant lookup. For the fbdev case a manual drm_get_format_info() lookup is needed. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-14-ville.syrjala@linux.intel.com	2025-07-16 20:09:08 +03:00
Maíra Canal	53dcd0eaa2	drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset Xe can skip the reset if TDR has fired before the free job worker and can also re-arm the timeout timer in some scenarios. Instead of manipulating scheduler's internals, inform the scheduler that the job did not actually timeout and no reset was performed through the new status code DRM_GPU_SCHED_STAT_NO_HANG. Note that, in the first case, there is no need to restart submission if it hasn't been stopped. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-7-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>	2025-07-15 08:27:08 -03:00
Maíra Canal	0a5dc1b67e	drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET Among the scheduler's statuses, the only one that indicates an error is DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV signifies that the operation succeeded and the GPU is in a nominal state. However, to provide more information about the GPU's status, it is needed to convey more information than just "OK". Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has hung, but it has been successfully reset and is now in a nominal state again. Reviewed-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>	2025-07-15 08:27:00 -03:00
Michal Wajdeczko	a816487681	drm/xe/pf: Invalidate LMTT after completing changes Once we finish populating all leaf pages in the VF's LMTT we should make sure that hardware will not access any stale data. Explicitly force LMTT invalidation (as it was already planned in the past). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-7-michal.wajdeczko@intel.com	2025-07-15 13:05:22 +02:00
Michal Wajdeczko	e497957fee	drm/xe/pf: Invalidate LMTT during LMEM unprovisioning Invalidate LMTT immediately after removing VF's LMTT page tables and clearing root PTE in the LMTT PD to avoid any invalid access by the hardware (and VF) due to stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-6-michal.wajdeczko@intel.com	2025-07-15 13:05:20 +02:00
Michal Wajdeczko	68ae022278	drm/xe/pf: Force GuC virtualization mode By default the GuC starts in the 'native' mode and enables the VGT mode (aka 'virtualization' mode) only after it receives at least one set of VF configuration data. While this happens naturally while PF begins VFs provisioning, we might need this sooner as some actions, like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in the VGT mode. And this becomes a real problem if we would want to use above action to invalidate the LMTT early during VFs auto-provisioning, before VFs are enabled, as such H2G would be rejected: [ ] xe 0000:4d:00.0: [drm] ERROR GT0: FAST_REQ H2G fence 0x804e failed! e=0x30, h=0 [ ] xe 0000:4d:00.0: [drm] ERROR GT0: Fence 0x804e was used by action 0x7002 sent at: h2g_write+0x33e/0x870 [xe] __guc_ct_send_locked+0x1e1/0x1110 [xe] guc_ct_send_locked+0x9f/0x740 [xe] xe_guc_ct_send_locked+0x19/0x60 [xe] send_tlb_invalidation+0xc2/0x470 [xe] xe_gt_tlb_invalidation_all_async+0x45/0xa0 [xe] xe_gt_tlb_invalidation_all+0x4b/0xa0 [xe] lmtt_invalidate_hw+0x64/0x1a0 [xe] xe_lmtt_invalidate_hw+0x5c/0x340 [xe] pf_update_vf_lmtt+0x398/0xae0 [xe] pf_provision_vf_lmem+0x350/0xa60 [xe] xe_gt_sriov_pf_config_bulk_set_lmem+0xe2/0x410 [xe] xe_gt_sriov_pf_config_set_fair_lmem+0x1c6/0x620 [xe] xe_gt_sriov_pf_config_set_fair+0xd5/0x3f0 [xe] xe_pci_sriov_configure+0x360/0x1200 [xe] sriov_numvfs_store+0xbc/0x1d0 dev_attr_store+0x17/0x40 sysfs_kf_write+0x4a/0x80 kernfs_fop_write_iter+0x166/0x220 vfs_write+0x2ba/0x580 ksys_write+0x77/0x100 __x64_sys_write+0x19/0x30 x64_sys_call+0x2bf/0x2660 do_syscall_64+0x93/0x7a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [ ] xe 0000:4d:00.0: [drm] ERROR GT0: CT dequeue failed: -71 [ ] xe 0000:4d:00.0: [drm] GT0: trying reset from receive_g2h [xe] This could be mitigated by pushing earlier a PF self-configuration with some hard-coded values that cover unlimited access to the GGTT, use of all GuC contexts and doorbells. This step is sufficient for the GuC to switch into the VGT mode. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-5-michal.wajdeczko@intel.com	2025-07-15 13:05:19 +02:00
Michal Wajdeczko	92ba2032a1	drm/xe/pf: Move GGTT config KLVs encoding to helper In upcoming patch we will want to encode GGTT config KLVs based on raw numbers, without relying on the allocated GGTT node. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-4-michal.wajdeczko@intel.com	2025-07-15 13:05:18 +02:00
Michal Wajdeczko	1c38dd6afa	drm/xe/pf: Resend PF provisioning after GT reset If we reload the GuC due to suspend/resume or GT reset then we have to resend not only any VFs provisioning data, but also PF configuration, like scheduling parameters (EQ, PT), as otherwise GuC will continue to use default values. Fixes: `411220808c` ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com	2025-07-15 13:05:17 +02:00
Michal Wajdeczko	9f50b729dd	drm/xe/pf: Prepare to stop SR-IOV support prior GT reset As part of the resume or GT reset, the PF driver schedules work which is then used to complete restarting of the SR-IOV support, including resending to the GuC configurations of provisioned VFs. However, in case of short delay between those two actions, which could be seen by triggering a GT reset on the suspened device: $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset this PF worker might be still busy, which lead to errors due to just stopped or disabled GuC CTB communication: [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe] [ ] xe 0000:00:02.0: [drm] GT0: reset queued [ ] xe 0000:00:02.0: [drm] GT0: reset started [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled! [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED) [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV) [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed While this VFs reprovisioning will be successful during next spin of the worker, to avoid those errors, make sure to cancel restart worker if we are about to trigger next reset. Fixes: `411220808c` ("drm/xe/pf: Restart VFs provisioning after GT reset") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com	2025-07-15 13:05:15 +02:00
Lucas De Marchi	f4d51b6ce5	drm/xe/lrc: Add table with LRC layout Add a table to document the LRC's BO layout to make it easier to visualize how each region stacks on top of each other. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-4-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Tvrtko Ursulin	aded26ccaa	drm/xe: Waste fewer instructions in emit_wa_job() I was debugging some unrelated issue and noticed the current code was very verbose. We can improve it easily by using the more common batch buffer building pattern. Before: bb->cs[bb->len++] = MI_LOAD_REGISTER_REG \| MI_LRR_DST_CS_MMIO; c4d: 41 8b 56 10 mov 0x10(%r14),%edx c51: 49 8b 4e 08 mov 0x8(%r14),%rcx c55: 8d 72 01 lea 0x1(%rdx),%esi c58: 41 89 76 10 mov %esi,0x10(%r14) c5c: c7 04 91 01 00 08 15 movl $0x15080001,(%rcx,%rdx,4) bb->cs[bb->len++] = entry->reg.addr; c63: 8b 08 mov (%rax),%ecx c65: 41 8b 56 10 mov 0x10(%r14),%edx c69: 49 8b 76 08 mov 0x8(%r14),%rsi c6d: 81 e1 ff ff 3f 00 and $0x3fffff,%ecx c73: 8d 7a 01 lea 0x1(%rdx),%edi c76: 41 89 7e 10 mov %edi,0x10(%r14) c7a: 89 0c 96 mov %ecx,(%rsi,%rdx,4) ..etc.. After: cs++ = MI_LOAD_REGISTER_REG \| MI_LRR_DST_CS_MMIO; c52: 41 c7 04 24 01 00 08 movl $0x15080001,(%r12) c59: 15 cs++ = entry->reg.addr; c5a: 8b 10 mov (%rax),%edx ..etc.. Resulting in the following binary change: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-348 (-348) Function old new delta xe_gt_record_default_lrcs.cold 304 296 -8 xe_gt_record_default_lrcs 2200 1860 -340 Total: Before=13554, After=13206, chg -2.57% Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-7-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Lucas De Marchi	f4b538245f	drm/xe/gt: Drop third submission for default context There's no need to submit the nop job again on the first queue. Any state needed is already saved when the first LRC is switched out. The comment is a little misleading regarding indirect W/A: first of all there's still no indirect W/A enabled and secondly, even after they are, there's no need to submit this job again for having their state propagated: the indirect W/A will actually run on every LRC switch. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-6-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Lucas De Marchi	6d891d22c6	drm/xe/lrc: Remove leftover TODO/FIXME There isn't anything to set for CTX_TIMESTAMP handling in the empty LRC: that is set on every LRC init since it should always start from 0 rather than the value saved in the image after first submission. The FIXME about perma-pinning also doesn't make much sense as we will always going to pin the lrc and the GGTT mapping has nothing to do with VM bind. Nuke these leftover comments. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-5-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:40:17 -07:00
Lucas De Marchi	fab2cc0c09	drm/xe/gt: Extract emit_job_sync() Both the nop and wa jobs are going through the same boiler plate calls to emit the job with a timeout and handling error for both bb and job. Extract emit_job_sync() so those functions create the bb, handling possible errors and delegate the part about really emitting the job and waiting for its completion. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-3-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:20:03 -07:00
Lucas De Marchi	e4cb5823ba	drm/xe: Count dwords before allocating The bb allocation in emit_wa_job() is wrong in 2 ways: first it's allocating enough space for the 3DSTATE or hardcoding 4k depending on the engine. In the first case it doesn't account for the WAs and in the former it may not be sufficient. Secondly it's using the size instead of number of dwords, causing the buffer to be 4x bigger than needed: xe_bb_new() receives number of dwords as parameter and its declaration was also not following its implementation. Lastly, reword the debug message since it's not only about the LRC WAs anymore as it also include the 3DSTATE for render. While it's unlikely this is causing any real issue, let's calculate the needed space and allocate just enough. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-2-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:20:02 -07:00
Lucas De Marchi	76650bcf2a	drm/xe/lrc: Reduce scope of empty lrc data The only case in which new lrc data is created from scratch is when it's called prior to recording the default lrc. There's no need to check for NULL init_data since in that case the function already failed: just move the allocation where it's needed. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-1-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-14 13:20:02 -07:00
Michal Wajdeczko	b533b8e5a1	drm/xe/vf: Store negotiated VF/PF ABI version at device level There is no need to maintain PF ABI version on per-GT level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-8-michal.wajdeczko@intel.com	2025-07-14 18:19:39 +02:00
Michal Wajdeczko	a6c384b24f	drm/xe/pf: Stop requiring VF/PF version negotiation on every GT While some VF/PF relay actions must be handled on the GT level, like query for runtime registers, it was clarified by the arch team that initial version negotiation can be done by the VF just once, by using any available GuC/GT. Move handling of the VF/PF ABI version negotiation on the PF side from the GT level functions to the device level functions. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250713103625.1964-7-michal.wajdeczko@intel.com	2025-07-14 18:19:31 +02:00

1 2 3 4 5 ...

3753 Commits