linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-04 02:34:10 -04:00

Author	SHA1	Message	Date
Daniele Ceraolo Spurio	0387d46ea7	drm/xe/pxp: Add GSC session initialization support A session is initialized (i.e. started) by sending a message to the GSC. The initialization will be triggered when a user opts-in to using PXP; the interface for that is coming in a follow-up patch in the series. v2: clean up error messages, use new ARB define (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-7-daniele.ceraolospurio@intel.com	2025-02-03 11:51:15 -08:00
Daniele Ceraolo Spurio	3b506d73ec	drm/xe/pxp: Handle the PXP termination interrupt When something happen to the session, the HW generates a termination interrupt. In reply to this, the driver is required to submit an inline session termination via the VCS, trigger the global termination and notify the GSC FW that the session is now invalid. v2: rename ARB define to make it cleaner to move it to uapi (John) v3: fix parameter name in documentation Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-6-daniele.ceraolospurio@intel.com	2025-02-03 11:51:13 -08:00
Daniele Ceraolo Spurio	96e84a2f5a	drm/xe/pxp: Add GSC session invalidation support After a session is terminated, we need to inform the GSC so that it can clean up its side of the allocation. This is done by sending an invalidation command with the session ID. The invalidation will be triggered in response to a termination, interrupt, whose handling is coming in the next patch in the series. v2: Better comment and error messages (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-5-daniele.ceraolospurio@intel.com	2025-02-03 11:51:11 -08:00
Daniele Ceraolo Spurio	f0c06677d1	drm/xe/pxp: Add VCS inline termination support The key termination is done with a specific submission to the VCS engine. This flow will be triggered in response to a termination interrupt, whose handling is coming in a follow-up patch in the series. v2: clean up defines and command emission code. (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-4-daniele.ceraolospurio@intel.com	2025-02-03 11:51:09 -08:00
Daniele Ceraolo Spurio	dcdd6b84d9	drm/xe/pxp: Allocate PXP execution resources PXP requires submissions to the HW for the following operations 1) Key invalidation, done via the VCS engine 2) Communication with the GSC FW for session management, done via the GSCCS. Key invalidation submissions are serialized (only 1 termination can be serviced at a given time) and done via GGTT, so we can allocate a simple BO and a kernel queue for it. Submissions for session management are tied to a PXP client (identified by a unique host_session_id); from the GSC POV this is a user-accessible construct, so all related submission must be done via PPGTT. The driver does not currently support PPGTT submission from within the kernel, so to add this support, the following changes have been included: - a new type of kernel-owned VM (marked as GSC), required to ensure we don't use fault mode on the engine and to mark the different lock usage with lockdep. - a new function to map a BO into a VM from within the kernel. v2: improve comments and function name, remove unneeded include (John) v3: fix variable/function names in documentation Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-3-daniele.ceraolospurio@intel.com	2025-02-03 11:51:05 -08:00
Daniele Ceraolo Spurio	ff48e05d8d	drm/xe/pxp: Initialize PXP structure and KCR reg As the first step towards adding PXP support, hook in the PXP init function, allocate the PXP structure and initialize the KCR register to allow PXP HWDRM sessions. v2: remove unneeded includes, free PXP memory on error (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-2-daniele.ceraolospurio@intel.com	2025-02-03 11:51:02 -08:00
Lucas De Marchi	ae5d9cde9b	drm/xe: Remove xe_dummy_exit() Since commit `014125c64d` ("drm/xe: Support 'nomodeset' kernel command-line option") the dummy exit is not needed anymore since the caller check for a NULL pointer. Drop it. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250131223908.4147195-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-03 07:51:45 -08:00
Riana Tauro	d9bc304437	drm/xe: Skip survivability mode for VF Follow the probe flow in case of VF and do not enter survivability mode in case of pcode init failure. Fixes: `5e940312a2` ("drm/xe: Add functions and sysfs for boot survivability") Suggested-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250131080527.2256475-1-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-31 05:40:10 -05:00
Maarten Lankhorst	65e366ace5	drm/xe/display: Use a single early init call for display Now that interrupts are disabled for xe_display_init_noaccel, both xe_display_init_noirq and xe_display_init_noaccel run in the same context. This means that we can get rid of the 3 different init calls. Without interrupts, nothing is touching display up to this point. Unify those 3 early display calls into a single xe_display_init_early(), this makes the init sequence cleaner, and display less tangled during init. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250121142850.4960-3-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-01-31 09:42:18 +01:00
Maarten Lankhorst	f595fe5f6a	drm/xe: Defer irq init until after xe_display_init_noaccel As stated in previous commit, we have to move interrupt handling until after xe_display_init_noaccel, as using memirqs would require an allocation. A full solution will of course require memirq allocation to be moved, but the first part only focuses on the required changes to display. Reviewed-by: Ilia Levi <ilia.levi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250121142850.4960-2-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-01-31 09:42:18 +01:00
Maarten Lankhorst	cf29a866a1	drm/xe/display: Add intel_plane_initial_vblank_wait We're changing the driver to have no interrupts during early init for Xe, so we poll the PIPE_FRMSTMSMP counter instead. Interrupts cannot be enabled during FB readout because memirq's requires an allocation. This would overwrite the FB we want to read out. While it might be possible to also run do the same in i915 and run it without interrupts, the platforms i915 supports had a less clear distinction between display and graphics. For this reason I choose only to touch Xe for now. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250121142850.4960-1-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-01-31 09:42:17 +01:00
Lucas De Marchi	220ed69043	Merge drm/drm-next into drm-xe-next Backmerge drm-next to get the common APIs and refactors as well as getting the display changes from i915 in xe so the probe order can be improved. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-30 14:35:52 -08:00
Jakub Kolakowski	b73aebc7a1	drm/xe/pf: Add runtime registers for graphics gen >= 30 Add missing runtime registers for graphics versions of 3000 or higher. This is required for Xe3 where additionally we have MIRROR_L3BANK_ENABLE register. Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Suggested-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Adam Miszczak <adam.miszczak@linux.intel.com> Cc: Jakub Kolakowski <jakub1.kolakowski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Cc: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piorkowski <piotr.piorkowski@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128110300.2840596-2-jakub1.kolakowski@intel.com	2025-01-30 20:12:57 +01:00
Gustavo Sousa	c13a42f210	drm/xe: Fix sort order of .o lists in Makefile The Makefile for xe asks us to keep the lists of object files sorted: # Please keep these build lists sorted! Reshuffle the lists into the correct sort order. That was done by filtering each unsorted list through 'LC_ALL=C sort'. Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250115140812.20799-1-gustavo.sousa@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-01-30 09:41:59 -08:00
Michal Wajdeczko	33f17e2cbd	drm/xe/pf: Reset GuC VF config when unprovisioning critical resource GuC firmware counts received VF configuration KLVs and may start validation of the complete VF config even if some resources where unprovisioned in the meantime, leading to unexpected errors like: $ echo 1 \| sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/contexts_quota $ echo 0 \| sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/contexts_quota $ echo 1 \| sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/doorbells_quota $ echo 0 \| sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/doorbells_quota $ echo 1 \| sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/ggtt_quota tee: '/sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/ggtt_quota': Input/output error To mitigate this problem trigger explicit VF config reset after unprovisioning any of the critical resources (GGTT, context or doorbell IDs) that GuC is monitoring. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129195947.764-3-michal.wajdeczko@intel.com	2025-01-30 17:10:41 +01:00
Michal Wajdeczko	21ccac0e22	drm/xe/pf: Don't send BEGIN_ID if VF has no context/doorbells It turned out that GuC validates VF configuration immediately after receiving "some" set of configuration KLVs and complains if one of the critical, from GuC understanding, resource is left unprovisioned, even if PF should be still allowed to make late VF config adjustments, since VF was not yet started. This issue was discovered after we decided to asynchronously re-send configuration KLVs after GT reset/resume, as then fair VF auto-provisioning could already allocate some of the resources, which was a prerequiste for sending those config KLVs: # fair GGTT provisioning [] xe 0000:00:02.0: [drm] GT0: PF: pushed VF1 config with 2 KLVs: [] xe 0000:00:02.0: [drm] GT0: { key 0x0001 : 64b value 0x176a000 } # ggtt_start [] xe 0000:00:02.0: [drm] GT0: { key 0x0002 : 64b value 0xfd696000 } # ggtt_size [] xe 0000:00:02.0: [drm] GT0: PF: VF1 provisioned with 4251541504 (3.96 GiB) GGTT # re-provisioning worker [] xe 0000:00:02.0: [drm] ERROR GT0: H2G request 0x5503 failed: error 0x60 hint 0x0 [] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 14 config KLVs (-EIO) [] xe 0000:00:02.0: [drm] GT0: { key 0x0001 : 64b value 0x176a000 } # ggtt_start [] xe 0000:00:02.0: [drm] GT0: { key 0x0002 : 64b value 0xfd696000 } # ggtt_size [] xe 0000:00:02.0: [drm] GT0: { key 0x8a0b : 32b value 0 } # begin_ctx_id [] xe 0000:00:02.0: [drm] GT0: { key 0x0004 : 32b value 0 } # num_contexts [] xe 0000:00:02.0: [drm] GT0: { key 0x8a0a : 32b value 0 } # begin_db_id [] xe 0000:00:02.0: [drm] GT0: { key 0x0006 : 32b value 0 } # num_doorbells [] xe 0000:00:02.0: [drm] GT0: { key 0x8a01 : 32b value 0 } # exec_quantum [] xe 0000:00:02.0: [drm] GT0: { key 0x8a02 : 32b value 0 } # preempt_timeout [] xe 0000:00:02.0: [drm] GT0: { key 0x8a03 : 32b value 0 } # cat_error_count [] xe 0000:00:02.0: [drm] GT0: { key 0x8a04 : 32b value 0 } # engine_reset_count [] xe 0000:00:02.0: [drm] GT0: { key 0x8a05 : 32b value 0 } # page_fault_count [] xe 0000:00:02.0: [drm] GT0: { key 0x8a06 : 32b value 0 } # guc_time_us [] xe 0000:00:02.0: [drm] GT0: { key 0x8a07 : 32b value 0 } # irq_time_us [] xe 0000:00:02.0: [drm] GT0: { key 0x8a08 : 32b value 0 } # doorbell_time_us [] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-EIO) To avoid such errors stop sending BEGIN_CONTEXT/DOORBELL_ID KLVs if no GuC context/doorbell IDs were provisioned to VF. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4176 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129195947.764-2-michal.wajdeczko@intel.com	2025-01-30 17:10:39 +01:00
Francois Dugast	8f6ddb4ab5	drm/xe/gt_pagefault: Print engine class string The engine class index which is printed here is an internal representation for debugging. It is _not_ an index based on DRM_XE_ENGINE_CLASS_* values provided in the uAPI. Add the string representation of the engine class to the output in order to limit possible confusion by users when analyzing the logs. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129175241.338043-1-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>	2025-01-30 09:41:06 +01:00
Dave Airlie	1c470f4f61	Merge tag 'amd-drm-fixes-6.14-2025-01-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-fixes-6.14-2025-01-29: amdgpu: - GC 12 fix - Aldebaran fix - DCN 3.5 fix - Freesync fix amdkfd: - Per queue reset fix - MES fix Signed-off-by: Dave Airlie <airlied@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ5qcgwAKCRC93/aFa7yZ # 2GHEAP4qGRwRRm/XzGsT7t4IC6l1ALia3IycCpm8BusDpLIVlAD9HSSpKswHtNou # Zjz7N/t791BIeS/cz36ICNqYCmgQ2wY= # =1Q5i # -----END PGP SIGNATURE----- # gpg: Signature made Thu 30 Jan 2025 07:24:19 AEST # gpg: using EDDSA key 203B921D836B5735349902BDBDDFF6856BBC99D8 # gpg: Can't check signature: No public key From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129213037.3966625-1-alexander.deucher@amd.com	2025-01-30 14:31:38 +10:00
Lucas De Marchi	7748289df5	drm/xe/guc: Fix size_t print format Use %zx format to print size_t to remove the following warning when building for i386: >> drivers/gpu/drm/xe/xe_guc_ct.c:1727:43: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat] 1727 \| drm_printf(p, "[CTB].length: 0x%lx\n", snapshot->ctb_size); \| ~~~ ^~~~~~~~~~~~~~~~~~ \| %zx Cc: José Roberto de Souza <jose.souza@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501281627.H6nj184e-lkp@intel.com/ Fixes: `cb1f868ca1` ("drm/xe: Make GUC binaries dump consistent with other binaries in devcoredump") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128154242.3371687-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-29 10:53:21 -08:00
Rodrigo Vivi	55d4b69861	Revert "drm/xe/lnl: Enable GuC SLPC DCC task" This reverts commit `50554bf3e5`. DCC in LNL should be disabled. It was a mistake to decide to go against GuC platform defaults in this case and this could lead to regressions in some TDP limited scenarios instead of helping. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128223248.660748-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-29 09:10:59 -05:00
Matt Atwood	16016ade13	drm/xe/ptl: Update the PTL pci id table Update to current bspec table. Bspec: 72574 Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128175102.45797-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-28 18:27:46 -05:00
Shekhar Chauhan	fa8ffaae1b	drm/xe/bmg: Add new PCI IDs Add 3 new PCI IDs for BMG. v2: Fix typo -> Replace '.' with ',' Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128162015.3288675-1-shekhar.chauhan@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-28 17:47:44 -05:00
Melissa Wen	7f2b5237e3	drm/amd/display: restore invalid MSA timing check for freesync This restores the original behavior that gets min/max freq from EDID and only set DP/eDP connector as freesync capable if "sink device is capable of rendering incoming video stream without MSA timing parameters", i.e., `allow_invalid_MSA_timing_params` is true. The condition was mistakenly removed by `0159f88a99` ("drm/amd/display: remove redundant freesync parser for DP"). CC: Mario Limonciello <mario.limonciello@amd.com> CC: Alex Hung <alex.hung@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3915 Fixes: `0159f88a99` ("drm/amd/display: remove redundant freesync parser for DP") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-01-28 16:26:13 -05:00
Prike Liang	9078a5bfa2	drm/amdkfd: only flush the validate MES contex The following page fault was observed duringthe KFD process release. In this particular error case, the HIP test (./MemcpyPerformance -h) does not require the queue. As a result, the process_context_addr was not assigned when the KFD process was released, ultimately leading to this page fault during the execution of the function kfd_process_dequeue_from_all_devices(). [345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0) [345962.295333] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33 [345962.296097] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [345962.296394] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [345962.296633] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1 [345962.296876] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [345962.297135] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 [345962.297377] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-01-28 16:24:39 -05:00
loanchen	f88192d233	drm/amd/display: Correct register address in dcn35 [Why] the offset address of mmCLK5_spll_field_8 was incorrect for dcn35 which causes SSC not to be enabled. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Lo-An Chen <lo-an.chen@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-01-28 16:23:30 -05:00
Lijo Lazar	819bf6662b	drm/amd/pm: Mark MM activity as unsupported Aldebaran doesn't support querying MM activity percentage. Keep the field as 0xFFs to mark it as unsupported. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-01-28 16:23:06 -05:00
Kenneth Feng	5cda56bd86	drm/amd/amdgpu: change the config of cgcg on gfx12 change the config of cgcg on gfx12 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-01-28 16:22:39 -05:00
Jay Cornwall	f214b7beb0	drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1 The purpose of halt_if_hws_hang is to preserve GPU state for driver debugging when queue preemption fails. Issuing per-queue reset may kill wavefronts which caused the preemption failure. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-01-28 16:22:02 -05:00
Riana Tauro	8b47c9cdb6	drm/xe: Initialize mei-gsc and vsec in survivability mode Initialize mei-gsc in survivability mode and disable HECI interrupts. Also initialize vsec in survivability mode Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-4-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-28 08:58:46 -05:00
Riana Tauro	256daa32c9	drm/xe: Enable Boot Survivability mode Enable boot survivability mode if pcode initialization fails and if boot status indicates a failure. In this mode, drm card is not exposed and driver probe returns success after loading the bare minimum to allow firmware to be flashed via mei. v2: abstract survivability mode variable add BMG check inside function (Jani, Rodrigo) v3: return -EBUSY during system suspend (Anshuman) check survivability mode in pci probe only on error Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-28 08:58:46 -05:00
Riana Tauro	5e940312a2	drm/xe: Add functions and sysfs for boot survivability Boot Survivability is a software based workflow for recovering a system in a failed boot state. Here system recoverability is concerned with recovering the firmware responsible for boot. This is implemented by loading the driver with bare minimum (no drm card) to allow the firmware to be flashed through mei-gsc and collect telemetry. The driver's probe flow is modified such that it enters survivability mode when pcode initialization is incomplete and boot status denotes a failure. In this mode, drm card is not exposed and presence of survivability_mode entry in PCI sysfs is used to indicate survivability mode and provide additional information required for debug This patch adds initialization functions and exposes admin readable sysfs entries The new sysfs will have the below layout /sys/bus/.../bdf ├── survivability_mode v2: reorder headers fix doc remove survivability info and use mode to display information use separate function for logging survivability information for critical error (Rodrigo) v3: use for loop use dev logs instead of drm use helper function for aux history(Rodrigo) remove unnecessary error check of greater than max_scratch as we are reading only 3 bit v4: fix checkpatch warnings fix space (Rodrigo) rename register Signed-off-by: Riana Tauro <riana.tauro@intel.com> Acked-by: Ashwin Kumar Kulkarni <ashwin.kumar.kulkarni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-2-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-28 08:58:45 -05:00
José Roberto de Souza	cb1f868ca1	drm/xe: Make GUC binaries dump consistent with other binaries in devcoredump All other(hwsp, hwctx and vmas) binaries follow this format: [name].length: 0x1000 [name].data: xxxxxxx [name].error: errno The error one is just in case by some reason it was not able to capture the binary. So this GuC binaries should follow the same patern. v2: - renamed GUC binary to LOG Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-3-jose.souza@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 19:41:07 -08:00
Lucas De Marchi	2c95bbf500	drm/xe: Fix and re-enable xe_print_blob_ascii85() Commit `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool") partially reverted some changes to workaround breakage caused to mesa tools. However, in doing so it also broke fetching the GuC log via debugfs since xe_print_blob_ascii85() simply bails out. The fix is to avoid the extra newlines: the devcoredump interface is line-oriented and adding random newlines in the middle breaks it. If a tool is able to parse it by looking at the data and checking for chars that are out of the ascii85 space, it can still do so. A format change that breaks the line-oriented output on devcoredump however needs better coordination with existing tools. v2: Add suffix description comment v3: Reword explanation of xe_print_blob_ascii85() calling drm_puts() in a loop Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: stable@vger.kernel.org Fixes: `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool") Fixes: `ec1455ce7e` ("drm/xe/devcoredump: Add ASCII85 dump helper function") Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-2-jose.souza@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 19:40:00 -08:00
Lucas De Marchi	a37934ea75	drm/xe/devcoredump: Move exec queue snapshot to Contexts section Having the exec queue snapshot inside a "GuC CT" section was always wrong. Commit `c28fd6c358` ("drm/xe/devcoredump: Improve section headings and add tile info") tried to fix that bug, but with that also broke the mesa tool that parses the devcoredump, hence it was reverted in commit `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool"). With the mesa tool also fixed, this can propagate as a fix on both kernel and userspace side to avoid unnecessary headache for a debug feature. Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: stable@vger.kernel.org Fixes: `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool") Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250123051112.1938193-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 15:10:02 -08:00
John Harrison	ef34861098	drm/xe: Upgrade complaint about missing slice info The steering code needs to know slice/subslice counts and this information should be retrieved from the hwconfig table. However, earlier platforms don't have it, hence the KMD has a fallback path. Newer platforms really should have the entries and if they are missing that is a bug that needs to be fixed in the table. So update the complaint to be an error on newer platforms and remove it completely for older ones that we know are bad (but are not POR for the Xe driver anyway). Also, re-word the message a little to make it clearer what the issue is. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250118005403.2960807-1-John.C.Harrison@Intel.com	2025-01-27 12:25:19 -08:00
Michal Wajdeczko	a4d1c5d0b9	drm/xe/pf: Move VFs reprovisioning to worker Since the GuC is reset during GT reset, we need to re-send the entire SR-IOV provisioning configuration to the GuC. But since this whole configuration is protected by the PF master mutex and we can't avoid making allocations under this mutex (like during LMEM provisioning), we can't do this reprovisioning from gt-reset path if we want to be reclaim-safe. Move VFs reprovisioning to a async worker that we will start from the gt-reset path. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250125215505.720-1-michal.wajdeczko@intel.com	2025-01-27 20:34:18 +01:00
Michal Wajdeczko	14b6674608	drm/xe/pf: Use GuC Buffer Cache during policy provisioning Start using GuC buffer cache for the SRIOV policy configuration actions. This is a required step before we could declare SRIOV PF as being a reclaim safe. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124185247.676-1-michal.wajdeczko@intel.com	2025-01-27 19:53:59 +01:00
Vinay Belgaumkar	897286f294	drm/xe/pmu: Add GT C6 events Provide a PMU interface for GT C6 residency counters. The interface is similar to the one available for i915, but gt is passed in the config when creating the event. Sample usage and output: $ perf list \| grep gt-c6 xe_0000_00_02.0/gt-c6-residency/ [Kernel PMU event] $ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency* ==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <== event=0x01 ==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <== ms $ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000 # time counts unit events 1.001196056 1,001 ms xe_0000_00_02.0/gt-c6-residency,gt=0/ 2.005216219 1,003 ms xe_0000_00_02.0/gt-c6-residency,gt=0/ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:56:27 -08:00
Lucas De Marchi	6ea5bf169a	drm/xe/pmu: Add attribute skeleton Add the generic support for defining new attributes. This only adds the macros and common infra for the event counters, but no counters yet. This is going to be added as follow up changes. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:04 -08:00
Lucas De Marchi	4ee64041bc	drm/xe/pmu: Get/put runtime pm on event init When the event is created, make sure runtime pm is taken and later put: in order to read an event counter the GPU needs to remain accessible and doing a get/put during perf's read is not possible it's holding a raw_spinlock. Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:03 -08:00
Lucas De Marchi	ef7ce39386	drm/xe/pmu: Extract xe_pmu_event_update() Like other pmu drivers, keep the update separate from the read so it can be called from other methods (like stop()) without side effects. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:03 -08:00
Lucas De Marchi	257a10c18e	drm/xe/pmu: Assert max gt XE_PMU_MAX_GT needs to be used due to a circular dependency, but we should make sure it doesn't go out of sync with XE_PMU_MAX_GT. Add a compile check for that. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:03 -08:00
Vinay Belgaumkar	011c1e246a	drm/xe/pmu: Enable PMU interface Basic PMU enabling patch. Setup the basic framework for adding events. Based on previous versions by Bommu Krishnaiah, Aravind Iddamsetty and Riana Tauro, using i915 and rapl as reference implementations. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:54:06 -08:00
Simona Vetter	64179a1416	Merge tag 'drm-misc-next-fixes-2025-01-24' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next-fixes for v6.14-rc1: - Fix a serious regression from commit `e4b5ccd392` ("drm/v3d: Ensure job pointer is set to NULL after job completion") - dmem cgroup Kconfig fix (acked by Tejun) - virtio: uaf in dma_buf free path - xlnx: kerneldoc Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/0d4a18f4-222c-4767-9169-e6350ce8fea5@linux.intel.com	2025-01-24 17:06:06 +01:00
Simona Vetter	7f751be540	Merge tag 'amd-drm-next-6.14-2025-01-24' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.14-2025-01-24: amdgpu: - Documentation fixes - SMU 13.x fixes - SR-IOV fix - Display fix - PCIe calculation fix - MES 12 fix - HUBP fix - Cursor fix - Enforce isolation fixes - GFX 12 fix - Use drm scheduler API helper rather than open coding it - Mark some debugging parameters as unsafe - PSP 14.x fix - Add cleaner shader support for gfx12 - Add subvp debugging flag - SDMA 4.4.x fix - Clarify some kernel log messages - clang fix - PCIe lane reporting fix - Documentation fix amdkfd: - Mark some debugging parameters as unsafe - Fix partial migration handling - Trap handler updates Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124152153.3861868-1-alexander.deucher@amd.com	2025-01-24 17:01:41 +01:00
Aric Cyr	024771f3fb	drm/amd/display: Optimize cursor position updates [why] Updating the cursor enablement register can be a slow operation and accumulates when high polling rate cursors cause frequent updates asynchronously to the cursor position. [how] Since the cursor enable bit is cached there is no need to update the enablement register if there is no change to it. This removes the read-modify-write from the cursor position programming path in HUBP and DPP, leaving only the register writes. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Sung Lee <sung.lee@amd.com> Signed-off-by: Aric Cyr <Aric.Cyr@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:28 -05:00
Aric Cyr	01130f5260	drm/amd/display: Add hubp cache reset when powergating [Why] When HUBP is power gated, the SW state can get out of sync with the hardware state causing cursor to not be programmed correctly. [How] Similar to DPP, add a HUBP reset function which is called wherever HUBP is initialized or powergated. This function will clear the cursor position and attribute cache allowing for proper programming when the HUBP is brought back up. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Sung Lee <sung.lee@amd.com> Signed-off-by: Aric Cyr <Aric.Cyr@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:22 -05:00
Shaoyun Liu	335acfb64e	drm/amd/amdgpu: Enable scratch data dump for mes 12 MES internal will check CP_MES_MSCRATCH_LO/HI register to set scratch data location during ucode start, driver side need to start the MES one by one with different setting for each pipe Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:13 -05:00
Mario Limonciello	7e4cb7dea2	drm/amd: Clarify kdoc for amdgpu.gttsize Effectively amdgpu.gttsize gets set to ~1/2 of RAM, but that's controlled by what the TTM page limit is set to. Clarify the kdoc. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:08 -05:00
Srinivasan Shanmugam	dc915275ea	drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and width, ensuring that the function can still determine these capabilities from the device itself. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth() error: we previously assumed 'parent' could be null (see line 6180) drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device adev, 6171 enum pci_bus_speed speed, 6172 enum pcie_link_width width) 6173 { 6174 struct pci_dev parent = adev->pdev; 6175 6176 if (!speed \|\| !width) 6177 return; 6178 6179 parent = pci_upstream_bridge(parent); 6180 if (parent && parent->vendor == PCI_VENDOR_ID_ATI) { ^^^^^^ If parent is NULL 6181 /* use the upstream/downstream switches internal to dGPU / 6182 speed = pcie_get_speed_cap(parent); 6183 width = pcie_get_width_cap(parent); 6184 while ((parent = pci_upstream_bridge(parent))) { 6185 if (parent->vendor == PCI_VENDOR_ID_ATI) { 6186 / use the upstream/downstream switches internal to dGPU / 6187 speed = pcie_get_speed_cap(parent); 6188 width = pcie_get_width_cap(parent); 6189 } 6190 } 6191 } else { 6192 / use the device itself / --> 6193 speed = pcie_get_speed_cap(parent); ^^^^^^ Then we are toasted here. 6194 *width = pcie_get_width_cap(parent); 6195 } 6196 } Fixes: `757e8b951c` ("drm/amdgpu: cache gpu pcie link width") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:55:26 -05:00

1 2 3 4 5 ...

1326651 Commits