linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-08 06:20:14 -04:00

Author	SHA1	Message	Date
Ilpo Järvinen	bf0a90fc90	PCI: Convert BAR sizes bitmasks to u64 PCIe r7.0, sec 7.8.6, defines resizable BAR sizes beyond the currently supported maximum of 128TB, which will require more than u32 to store the entire bitmask. Convert Resizable BAR related functions to use u64 bitmask for BAR sizes to make the typing more future-proof. The support for the larger BAR sizes themselves is not added at this point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-12-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:22 -06:00
Ilpo Järvinen	c7df7059e3	drm/amdgpu: Use pci_rebar_get_max_size() Use pci_rebar_get_max_size() to simplify amdgpu_device_resize_fb_bar(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-11-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:21 -06:00
Ilpo Järvinen	46ba95bed9	drm/xe/vram: Use pci_rebar_get_max_size() Use pci_rebar_get_max_size() from PCI core in resize_vram_bar() to simplify code. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251113180053.27944-10-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:21 -06:00
Ilpo Järvinen	2987a64de3	drm/xe/vram: Use PCI rebar helpers in resize_vram_bar() PCI core provides pci_rebar_size_supported() and pci_rebar_size_to_bytes(); use them in resize_vram_bar() to simplify code. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251113180053.27944-8-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:21 -06:00
Ilpo Järvinen	c59038d3c0	drm/i915/gt: Use pci_rebar_size_supported() PCI core provides pci_rebar_size_supported() that helps in checking if an encoded BAR Size is supported for the BAR or not. Use it in i915_resize_lmem_bar() to simplify code. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20251113180053.27944-7-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:21 -06:00
Ilpo Järvinen	db92e3fef5	drm/amdgpu: Remove driver side BAR release before resize PCI core handles releasing device's resources and their rollback in case of failure of a BAR resizing operation. Releasing resource prior to calling pci_resize_resource() prevents PCI core from restoring the BARs as they were. Remove driver-side release of BARs from the amdgpu driver. Also remove the driver initiated assignment as pci_resize_resource() should try to assign as much as possible. If the driver side call manages to get more required resources assigned in some scenario, such a problem should be fixed inside pci_resize_resource() instead. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Link: https://patch.msgid.link/20251113162628.5946-11-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:20 -06:00
Ilpo Järvinen	4efaa80b3d	drm/i915: Remove driver side BAR release before resize PCI core handles releasing device's resources and their rollback in case of failure of a BAR resizing operation. Releasing resource prior to calling pci_resize_resource() prevents PCI core from restoring the BARs as they were. Remove driver-side release of BARs from the i915 driver. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20251113162628.5946-10-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:17 -06:00
Ilpo Järvinen	1a3c05b32b	drm/xe: Remove driver side BAR release before resize PCI core handles releasing device's resources and their rollback in case of failure of a BAR resizing operation. Releasing resource prior to calling pci_resize_resource() prevents PCI core from restoring the BARs as they were. Remove driver-side release of BARs from the xe driver. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20251113162628.5946-9-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:11 -06:00
Ilpo Järvinen	337b1b566d	PCI: Fix restoring BARs on BAR resize rollback path BAR resize operation is implemented in the pci_resize_resource() and pbus_reassign_bridge_resources() functions. pci_resize_resource() can be called either from __resource_resize_store() from sysfs or directly by the driver for the Endpoint Device. The pci_resize_resource() requires that caller has released the device resources that share the bridge window with the BAR to be resized as otherwise the bridge window is pinned in place and cannot be changed. pbus_reassign_bridge_resources() rolls back resources if the resize operation fails, but rollback is performed only for the bridge windows. Because releasing the device resources are done by the caller of the BAR resize interface, these functions performing the BAR resize do not have access to the device resources as they were before the resize. pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources() after rolling back the bridge windows as they were, however, it will not guarantee the resource are assigned due to differences in how FW and the kernel assign the resources (alignment of the start address and tail). To perform rollback robustly, the BAR resize interface has to be altered to also release the device resources that share the bridge window with the BAR to be resized. Also, remove restoring from the entries failed list as saved list should now contain both the bridge windows and device resources so the extra restore is duplicated work. Some drivers (currently only amdgpu) want to prevent releasing some resources. Add exclude_bars param to pci_resize_resource() and make amdgpu pass its register BAR (BAR 2 or 5), which should never be released during resize operation. Normally 64-bit prefetchable resources do not share a bridge window with the 32-bit only register BAR, but there are various fallbacks in the resource assignment logic which may make the resources share the bridge window in rare cases. This change (together with the driver side changes) is to counter the resource releases that had to be done to prevent resource tree corruption in the ("PCI: Release assigned resource before restoring them") change. As such, it likely restores functionality in cases where device resources were released to avoid resource tree conflicts which appeared to be "working" when such conflicts were not correctly detected by the kernel. Reported-by: Simon Richter <Simon.Richter@hogyros.de> Link: https://lore.kernel.org/linux-pci/f9a8c975-f5d3-4dd2-988e-4371a1433a60@hogyros.de/ Reported-by: Alex Bennée <alex.bennee@linaro.org> Link: https://lore.kernel.org/linux-pci/874irqop6b.fsf@draig.linaro.org/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash amdgpu BAR selection from https://lore.kernel.org/r/20251114103053.13778-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113162628.5946-7-ilpo.jarvinen@linux.intel.com	2025-11-14 12:33:14 -06:00
Linus Torvalds	284fc30e66	Merge tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel Pull more drm fixes from Dave Airlie: "Just the follow up fixes for rc1 from the next branch, amdgpu and xe mostly with a single v3d fix in there. amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL xe: - Fix build with clang 16 - Fix handling of invalid configfs syntax usage and spell out the expected syntax in the documentation - Do not try late bind firmware when running as VF since it shouldn't handle firmware loading - Fix idle assertion for local BOs - Fix uninitialized variable for late binding - Do not require perfmon_capable to expose free memory at page granularity. Handle it like other drm drivers do - Fix lock handling on suspend error path - Fix I2C controller resume after S3 v3d: - fix fence locking" * tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/amd/display: Incorrect Mirror Cositing drm/amd/display: Enable Dynamic DTBCLK Switch drm/amdgpu: Report individual reset error drm/amdgpu: partially revert "revert to old status lock handling v3" drm/amd/display: Fix unsafe uses of kernel mode FPU drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine drm/amdgpu: Check swus/ds for switch state save drm/amdkfd: Fix two comments in kfd_ioctl.h drm/amd/pm: Avoid interface mismatch messaging drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init drm/amd/amdgpu: Fix the mes version that support inv_tlbs drm/amd: Check whether secure display TA loaded successfully drm/amdkfd: Fix mmap write lock not release drm/amdkfd: Fix kfd process ref leaking when userptr unmapping drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. drm/amd/display: Disable scaling on DCE6 for now drm/amd/display: Properly disable scaling on DCE6 drm/amd/display: Properly clear SCL__FILTER_CONTROL on DCE6 drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT SRIs ...	2025-10-10 14:02:14 -07:00
Linus Torvalds	1e5d41b981	Merge tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Some fixes leftover from our fixes branch, just nouveau and vmwgfx: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation" * tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel: drm/nouveau: fix bad ret code in nouveau_bo_move_prep drm/vmwgfx: Fix copy-paste typo in validation drm/vmwgfx: Fix Use-after-free in validation drm/vmwgfx: Fix a null-ptr access in the cursor snooper	2025-10-10 13:59:38 -07:00
Dave Airlie	5ca5f00a16	Merge tag 'drm-misc-fixes-2025-10-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251009120004.GA17570@linux.fritz.box	2025-10-11 06:17:13 +10:00
Dave Airlie	c4b6ddcf01	Merge tag 'amd-drm-next-6.18-2025-10-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-10-09: amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20251009162915.981503-1-alexander.deucher@amd.com	2025-10-10 06:57:56 +10:00
Shuhao Fu	e4bea91958	drm/nouveau: fix bad ret code in nouveau_bo_move_prep In `nouveau_bo_move_prep`, if `nouveau_mem_map` fails, an error code should be returned. Currently, it returns zero even if vmm addr is not correctly mapped. Cc: stable@vger.kernel.org Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Fixes: `9ce523cc3b` ("drm/nouveau: separate buffer object backing memory from nvkm structures") Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-10-07 22:15:25 +02:00
Jesse Agate	d07e142641	drm/amd/display: Incorrect Mirror Cositing [WHY] hinit/vinit are incorrect in the case of mirroring. [HOW] Cositing sign must be flipped when image is mirrored in the vertical or horizontal direction. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Jesse Agate <jesse.agate@amd.com> Signed-off-by: Brendan Leder <breleder@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:20 -04:00
Fangzhi Zuo	5949e7c489	drm/amd/display: Enable Dynamic DTBCLK Switch [WHAT] Since dcn35, DTBCLK can be disabled when no DP2 sink connected for power saving purpose. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Lijo Lazar	2e97663760	drm/amdgpu: Report individual reset error If reinitialization of one of the GPUs fails after reset, it logs failure on all subsequent GPUs eventhough they have resumed successfully. A sample log where only device at 0000:95:00.0 had a failure - amdgpu 0000:15:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:65:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:75:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:85:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:95:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:e5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:f5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:05:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:15:00.0: amdgpu: GPU reset end with ret = -5 To avoid confusion, report the error for each device separately and return the first error as the overall result. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Christian König	a107aeb6a2	drm/amdgpu: partially revert "revert to old status lock handling v3" The CI systems are pointing out list corruptions, so we still need to fix something here. Keep the asserts, but revert the lock changes for now. Fixes: `59e4405e9e` ("drm/amdgpu: revert to old status lock handling v3") Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Ard Biesheuvel	ddbfac1528	drm/amd/display: Fix unsafe uses of kernel mode FPU The point of isolating code that uses kernel mode FPU in separate compilation units is to ensure that even implicit uses of, e.g., SIMD registers for spilling occur only in a context where this is permitted, i.e., from inside a kernel_fpu_begin/end block. This is important on arm64, which uses -mgeneral-regs-only to build all kernel code, with the exception of such compilation units where FP or SIMD registers are expected to be used. Given that the compiler may invent uses of FP/SIMD anywhere in such a unit, none of its code may be accessible from outside a kernel_fpu_begin/end block. This means that all callers into such compilation units must use the DC_FP start/end macros, which must not occur there themselves. For robustness, all functions with external linkage that reside there should call dc_assert_fp_enabled() to assert that the FPU context was set up correctly. Fix this for the DCN35, DCN351 and DCN36 implementations. Cc: Austin Zheng <austin.zheng@amd.com> Cc: Jun Lei <jun.lei@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <siqueira@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Jesse.Zhang	bd8acfcfce	drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression Disable VCN reset capability for the program 4 as it's causing regressions. Fixes: `9d20f37a10` ("drm/amd/pm: Add VCN reset support for SMU v13.0.6") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Jesse.Zhang	8d557eab3a	drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine After GPU reset with VRAM loss, a general protection fault occurs during user queue restoration when accessing vm_bo->vm after spinlock release in amdgpu_vm_bo_reset_state_machine. The root cause is that vm_bo points to the last entry from the list_for_each_entry loop, but this becomes invalid after the spinlock is released. Accessing vm_bo->vm at this point leads to memory corruption. Crash log shows: [ 326.981811] Oops: general protection fault, probably for non-canonical address 0x4156415741e58ac8: 0000 [#1] SMP NOPTI [ 326.981820] CPU: 13 UID: 0 PID: 1035 Comm: kworker/13:3 Tainted: G E 6.16.0+ #25 PREEMPT(voluntary) [ 326.981826] Tainted: [E]=UNSIGNED_MODULE [ 326.981827] Hardware name: Gigabyte Technology Co., Ltd. X870E AORUS PRO ICE/X870E AORUS PRO ICE, BIOS F3i 12/19/2024 [ 326.981831] Workqueue: events amdgpu_userq_restore_worker [amdgpu] [ 326.981999] RIP: 0010:amdgpu_vm_assert_locked+0x16/0x70 [amdgpu] [ 326.982094] Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 ff 74 45 48 8b 87 80 03 00 00 48 85 c0 74 40 <48> 8b b8 80 01 00 00 48 85 ff 74 3b 8b 05 0c b7 0e f0 85 c0 75 05 [ 326.982098] RSP: 0018:ffffaa91c2a6bc20 EFLAGS: 00010206 [ 326.982100] RAX: 4156415741e58948 RBX: ffff9e8f013e8330 RCX: 0000000000000000 [ 326.982102] RDX: 0000000000000005 RSI: 000000001d254e88 RDI: ffffffffc144814a [ 326.982104] RBP: ffffaa91c2a6bc68 R08: 0000004c21a25674 R09: 0000000000000001 [ 326.982106] R10: 0000000000000001 R11: dccaf3f2f82863fc R12: ffff9e8f013e8000 [ 326.982108] R13: ffff9e8f013e8000 R14: 0000000000000000 R15: ffff9e8f09980000 [ 326.982110] FS: 0000000000000000(0000) GS:ffff9e9e79995000(0000) knlGS:0000000000000000 [ 326.982112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 326.982114] CR2: 000055ed6c9caa80 CR3: 0000000797060000 CR4: 0000000000750ef0 [ 326.982116] PKRU: 55555554 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Lijo Lazar	9b608fe948	drm/amdgpu: Check swus/ds for switch state save For saving switch state, check if the GPU is having SWUS/DS architecture. Otherwise, skip saving. Reported-by: Roman Elshin <roman.elshin@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4602 Fixes: `1dd2fa0e00` ("drm/amdgpu: Save and restore switch state") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Lijo Lazar	4538a93bbb	drm/amd/pm: Avoid interface mismatch messaging PMFW interface version is not used by some IP implementations like SMU v13.0.6/12, instead rely on PMFW version checks. Avoid the log if interface version is not used. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Jesse.Zhang	b809ca91a5	drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init As KFD no longer uses a separate PASID, the global amdgpu_vm_set_pasid()function is no longer necessary. Merge its functionality directly intoamdgpu_vm_init() to simplify code flow and eliminate redundant locking. v2: remove superflous check adjust amdgpu_vm_fin and remove amdgpu_vm_set_pasid (Chritian) v3: drop amdgpu_vm_assert_locked (Chritian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4614 Fixes: `59e4405e9e` ("drm/amdgpu: revert to old status lock handling v3") Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:07 -04:00
Shaoyun Liu	8dbac5cf8b	drm/amd/amdgpu: Fix the mes version that support inv_tlbs MES version 0x83 is not stable to use the inv_tlbs API. Defer it to 0x84 vertsion. Fixes: `85442bac84` ("drm/amd/amdgpu: Fix the mes version that support inv_tlbs") Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Mario Limonciello	c760bcda83	drm/amd: Check whether secure display TA loaded successfully [Why] Not all renoir hardware supports secure display. If the TA is present but the feature isn't supported it will fail to load or send commands. This shows ERR messages to the user that make it seems like there is a problem. [How] Check the resp_status of the context to see if there was an error before trying to send any secure display commands. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1415 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Philip Yang	7574f30337	drm/amdkfd: Fix mmap write lock not release If mmap write lock is taken while draining retry fault, mmap write lock is not released because svm_range_restore_pages calls mmap_read_unlock then returns. This causes deadlock and system hangs later because mmap read or write lock cannot be taken. Downgrade mmap write lock to read lock if draining retry fault fix this bug. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Philip Yang	58e6fc2fb9	drm/amdkfd: Fix kfd process ref leaking when userptr unmapping kfd_lookup_process_by_pid hold the kfd process reference to ensure it doesn't get destroyed while sending the segfault event to user space. Calling kfd_lookup_process_by_pid as function parameter leaks the kfd process refcount and miss the NULL pointer check if app process is already destroyed. Fixes: `2d274bf709` ("amd/amdkfd: Trigger segfault for early userptr unmmapping") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Heng Zhou	0c67342885	drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. There is some probability that reset workqueue is blocked by KIQ I/O for 10+ seconds after gpu hangs. So we need to add a in_reset check during each KIQ register poll. Signed-off-by: Heng Zhou <Heng.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	0e190a0446	drm/amd/display: Disable scaling on DCE6 for now Scaling doesn't work on DCE6 at the moment, the current register programming produces incorrect output when using fractional scaling (between 100-200%) on resolutions higher than 1080p. Disable it until we figure out how to program it properly. Fixes: `7c15fd86aa` ("drm/amd/display: dc/dce: add initial DCE6 support (v10)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	a7dc87f344	drm/amd/display: Properly disable scaling on DCE6 SCL_SCALER_ENABLE can be used to enable/disable the scaler on DCE6. Program it to 0 when scaling isn't used, 1 when used. Additionally, clear some other registers when scaling is disabled and program the SCL_UPDATE register as recommended. This fixes visible glitches for users whose BIOS sets up a mode with scaling at boot, which DC was unable to clean up. Fixes: `b70aaf5586` ("drm/amd/display: dce_transform: add DCE6 specific macros,functions") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	c0aa7cf49d	drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6 Previously, the code would set a bit field which didn't exist on DCE6 so it would be effectively a no-op. Fixes: `b70aaf5586` ("drm/amd/display: dce_transform: add DCE6 specific macros,functions") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Timur Kristóf	d60f9c45d1	drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs Without these, it's impossible to program these registers. Fixes: `102b2f587a` ("drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init (v2)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Alex Deucher	507296328b	drm/amdgpu: Add additional DCE6 SCL registers Fixes: `102b2f587a` ("drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init (v2)") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:06 -04:00
Linus Torvalds	2215336295	Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt	2025-10-07 08:40:15 -07:00
Dave Airlie	73bc073d42	Merge tag 'drm-xe-next-fixes-2025-10-03' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next Cross-subsystem Changes: - Fix userptr to not allow device private pages with SVM (Thomas Hellström) Driver Changes: - Fix build with clang 16 (Michal Wajdeczko) - Fix handling of invalid configfs syntax usage and spell out the expected syntax in the documentation (Lucas De Marchi) - Do not try late bind firmware when running as VF since it shouldn't handle firmware loading (Michal Wajdeczko) - Fix idle assertion for local BOs (Thomas Hellström) - Fix uninitialized variable for late binding (Colin Ian King, Mallesh Koujalagi) - Do not require perfmon_capable to expose free memory at page granularity. Handle it like other drm drivers do (Matthew Auld) - Fix lock handling on suspend error path (Shuicheng Lin) - Fix I2C controller resume after S3 (Raag Jadav) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/q6yeyb7n2eqo5megxjqayooajirx5hhsntfo65m3y4myscz7oz@25qbabbbr4hj	2025-10-07 11:50:11 +10:00
Dave Airlie	bae04c9658	Merge tag 'drm-misc-next-fixes-2025-10-02' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Short summary of fixes pull: v3d: - Fix fence locking Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251002122303.GA21323@linux.fritz.box	2025-10-07 11:35:57 +10:00
Ian Forbes	228c5d44df	drm/vmwgfx: Fix copy-paste typo in validation 'entry' should be 'val' which is the loop iterator. Fixes: `9e931f2e09` ("drm/vmwgfx: Refactor resource validation hashtable to use linux/hashtable implementation.") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://lore.kernel.org/r/20250926195427.1405237-2-ian.forbes@broadcom.com	2025-10-06 11:57:59 -04:00
Ian Forbes	dfe1323ab3	drm/vmwgfx: Fix Use-after-free in validation Nodes stored in the validation duplicates hashtable come from an arena allocator that is cleared at the end of vmw_execbuf_process. All nodes are expected to be cleared in vmw_validation_drop_ht but this node escaped because its resource was destroyed prematurely. Fixes: `64ad2abfe9` ("drm/vmwgfx: Adapt validation code for reference-free lookups") Reported-by: Kuzey Arda Bulut <kuzeyardabulut@gmail.com> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://lore.kernel.org/r/20250926195427.1405237-1-ian.forbes@broadcom.com	2025-10-06 11:57:38 -04:00
Zack Rusin	5ac2c02790	drm/vmwgfx: Fix a null-ptr access in the cursor snooper Check that the resource which is converted to a surface exists before trying to use the cursor snooper on it. vmw_cmd_res_check allows explicit invalid (SVGA3D_INVALID_ID) identifiers because some svga commands accept SVGA3D_INVALID_ID to mean "no surface", unfortunately functions that accept the actual surfaces as objects might (and in case of the cursor snooper, do not) be able to handle null objects. Make sure that we validate not only the identifier (via the vmw_cmd_res_check) but also check that the actual resource exists before trying to do something with it. Fixes unchecked null-ptr reference in the snooping code. Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Fixes: `c0951b797e` ("drm/vmwgfx: Refactor resource management") Reported-by: Kuzey Arda Bulut <kuzeyardabulut@gmail.com> Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Ian Forbes <ian.forbes@broadcom.com> Link: https://lore.kernel.org/r/20250917153655.1968583-1-zack.rusin@broadcom.com	2025-10-06 11:57:17 -04:00
Raag Jadav	1af59cd5cc	drm/xe/i2c: Don't rely on d3cold.allowed flag in system PM path In S3 and above sleep states, the device can loose power regardless of d3cold.allowed flag. Bring up I2C controller explicitly in system PM path to ensure its normal operation after losing power. v2: Cover S3 and above states (Rodrigo) Fixes: `0ea07b6951` ("drm/xe/pm: Wire up suspend/resume for I2C controller") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250918103200.2952576-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `e4863f1159`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-03 14:17:08 -05:00
Shuicheng Lin	08fdfd260e	drm/xe/hw_engine_group: Fix double write lock release in error path In xe_hw_engine_group_get_mode(), a write lock is acquired before calling switch_mode(), which in turn invokes xe_hw_engine_group_suspend_faulting_lr_jobs(). On failure inside xe_hw_engine_group_suspend_faulting_lr_jobs(), the write lock is released there, and then again in xe_hw_engine_group_get_mode(), leading to a double release. Fix this by keeping both acquire and release operation in xe_hw_engine_group_get_mode(). Fixes: `770bd1d341` ("drm/xe/hw_engine_group: Ensure safe transition between execution modes") Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250925023145.1203004-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `662d98b8b3`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-03 14:17:00 -05:00
Matthew Auld	2d1684a077	drm/xe/uapi: loosen used tracking restriction Currently this is hidden behind perfmon_capable() since this is technically an info leak, given that this is a system wide metric. However the granularity reported here is always PAGE_SIZE aligned, which matches what the core kernel is already willing to expose to userspace if querying how many free RAM pages there are on the system, and that doesn't need any special privileges. In addition other drm drivers seem happy to expose this. The motivation here if with oneAPI where they want to use the system wide 'used' reporting here, so not the per-client fdinfo stats. This has also come up with some perf overlay applications wanting this information. Fixes: `1105ac15d2` ("drm/xe/uapi: restrict system wide accounting") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Joshua Santosh <joshua.santosh.ranjan@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250919122052.420979-2-matthew.auld@intel.com (cherry picked from commit `4d0b035fd6`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-03 14:16:55 -05:00
Linus Torvalds	51e9889ab1	Merge tag 'pull-fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fs_context updates from Al Viro: "Change vfs_parse_fs_string() calling conventions Get rid of the length argument (almost all callers pass strlen() of the string argument there), add vfs_parse_fs_qstr() for the cases that do want separate length" * tag 'pull-fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_nfs4_mount(): switch to vfs_parse_fs_string() change the calling conventions for vfs_parse_fs_string()	2025-10-03 10:51:44 -07:00
Mallesh Koujalagi	6982a462cb	drm/xe/xe_late_bind_fw: Initialize uval variable in xe_late_bind_fw_num_fans() Initialize the uval variable to 0 in xe_late_bind_fw_num_fans() to fix a potential use of uninitialized variable warning and ensure predictable behavior. The variable is passed by reference to xe_pcode_read() which should populate it on success, but initializing it to 0 provides a safe default value and follows kernel coding best practices. v2: - uval = 0 which serves as both a safe default and the fallback value when the pcode read operation fails. v3: - Handle MMIO failure (Rodrigo) - The function should probably return the error and make the uval as pointer-argument, like the pcode_read. - Change the caller of this function to propagate the error upwards if mmio failed. Fixes: `45832bf9c1` ("drm/xe/xe_late_bind_fw: Initialize late binding firmware") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Link: https://lore.kernel.org/r/20251002005648.3185636-1-mallesh.koujalagi@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `07abc16c14`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Thomas Hellström	10aa5c8060	drm/gpusvm, drm/xe: Fix userptr to not allow device private pages When userptr is used on SVM-enabled VMs, a non-NULL hmm_range::dev_private_owner value might mean that hmm_range_fault() attempts to return device private pages. Either that will fail, or the userptr code will not know how to handle those. Use NULL for hmm_range::dev_private_owner to migrate such pages to system. In order to do that, move the struct drm_gpusvm::device_private_page_owner field to struct drm_gpusvm_ctx::device_private_page_owner so that it doesn't remain immutable over the drm_gpusvm lifetime. v2: - Don't conditionally compile xe_svm_devm_owner(). - Kerneldoc xe_svm_devm_owner(). Fixes: `9e97874148` ("drm/xe/userptr: replace xe_hmm with gpusvm") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250930122752.96034-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `ad298d9ec9`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Colin Ian King	5b440a8bba	drm/xe/xe_late_bind_fw: Fix missing initialization of variable offset The variable offset is not being initialized, and it is only set inside a for-loop if entry->name is the same as manifest_entry. In the case where it is not initialized a non-zero check on offset is potentialy checking a bogus uninitalized value. Fix this by initializing offset to zero. Fixes: `efa29317a5` ("drm/xe/xe_late_bind_fw: Extract and print version info") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250924102208.9216-1-colin.i.king@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `20f3b28e2e`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Thomas Hellström	17f6f6f25a	drm/xe/bo: Fix an idle assertion for local bos Before calling ttm_bo_populate() in the CPU fault path of a bo, we assert that the bo is not being migrated. However, for local bos we share the reservation object with other local bos that might be in the process of being migrated. Also some VM operations may attach USAGE_KERNEL fences to the common reservation object and trigger false positives from the assert. So remove the assert and instead wait for bo idle. This may unnecessarily wait for idle in some cases but since we're doing this wait later in the fault path anyway we might as well do it here as well. This fixes warnings like: Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------ Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) \|\| (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed! platform: BATTLEMAGE subplatform: 1 graphics: Xe2_HPG 20.01 step A0 media: Xe2_HPM 13.01 step A1 Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe] Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr> Sep 25 14:56:23 desky kernel: snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor> Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G U W 6.17.0-rc7+ #32 PREEMPT(voluntary) Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022 Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe] Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4> Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286 Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027 Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0 Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000 Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8 Sep 25 14:56:23 desky kernel: FS: 00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000 Sep 25 14:56:23 desky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0 Sep 25 14:56:23 desky kernel: PKRU: 55555558 Sep 25 14:56:23 desky kernel: Call Trace: Sep 25 14:56:23 desky kernel: <TASK> Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe] Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault+0x84/0x410 [xe] Sep 25 14:56:23 desky kernel: ? __x64_sys_mmap+0x33/0x50 Sep 25 14:56:23 desky kernel: ? x64_sys_call+0x1b2e/0x20d0 Sep 25 14:56:23 desky kernel: ? do_syscall_64+0x9d/0x1f0 Sep 25 14:56:23 desky kernel: ? __check_object_size+0x4a/0x2e0 Sep 25 14:56:23 desky kernel: __do_fault+0x36/0x190 Sep 25 14:56:23 desky kernel: do_fault+0xcf/0x570 Sep 25 14:56:23 desky kernel: __handle_mm_fault+0x92b/0xfe0 Sep 25 14:56:23 desky kernel: ? ktime_get_mono_fast_ns+0x39/0xd0 Sep 25 14:56:23 desky kernel: handle_mm_fault+0x164/0x2c0 Sep 25 14:56:23 desky kernel: do_user_addr_fault+0x2cb/0x840 Sep 25 14:56:23 desky kernel: exc_page_fault+0x75/0x180 Sep 25 14:56:23 desky kernel: asm_exc_page_fault+0x27/0x30 Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7 Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7> Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207 Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000 Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020 Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000 Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240 Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000 Sep 25 14:56:23 desky kernel: </TASK> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250 Fixes: `c2ae94cf8c` ("drm/xe: Convert the CPU fault handler for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250929112649.6131-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `8f1756a7ea`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Michal Wajdeczko	3734c8184d	drm/xe/vf: Don't claim support for firmware late-bind if VF In general, the VFs can't load firmwares so attempt to initialize the firmware late-bind component leads to errors like: [] xe 0000:03:00.1: [drm] ERROR Late bind component not bound Fixes: `918bd789d6` ("drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6190 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250928174811.198933-3-michal.wajdeczko@intel.com (cherry picked from commit `e35e288090`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:52 -07:00
Michal Wajdeczko	de61d5944c	drm/xe/vf: Rename sriov_update_device_info This is a VF only function and its name should reflect that to avoid any confusion. Move the VF check to the caller side. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250928174811.198933-2-michal.wajdeczko@intel.com (cherry picked from commit `b88bb1eefa`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-02 21:57:51 -07:00

1 2 3 4 5 ...

118416 Commits