During driver probe we might be briefly using CT safe mode, which
is based on a delayed work, but usually we are able to stop this
once we have IRQ fully operational. However, if we abort the probe
quite early then during unwind we might try to destroy the workqueue
while there is still a pending delayed work that attempts to restart
itself which triggers a WARN.
This was recently observed during unsuccessful VF initialization:
[ ] xe 0000:00:02.1: probe with driver xe failed with error -62
[ ] ------------[ cut here ]------------
[ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
[ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
[ ] RIP: 0010:__queue_work+0x287/0x710
[ ] Call Trace:
[ ] delayed_work_timer_fn+0x19/0x30
[ ] call_timer_fn+0xa1/0x2a0
Exit the CT safe mode on unwind to avoid that warning.
Fixes: 09b286950f ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com
(cherry picked from commit 2ddbb73ec2)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
By default, HPD was disabled on SN65DSI86 bridge. When the driver was
added (commit "a095f15c00e27"), the HPD_DISABLE bit was set in pre-enable
call which was moved to other function calls subsequently.
Later on, commit "c312b0df3b13" added detect utility for DP mode. But with
HPD_DISABLE bit set, all the HPD events are disabled[0] and the debounced
state always return 1 (always connected state).
Set HPD_DISABLE bit conditionally based on display sink's connector type.
Since the HPD_STATE is reflected correctly only after waiting for debounce
time (~100-400ms) and adding this delay in detect() is not feasible
owing to the performace impact (glitches and frame drop), remove runtime
calls in detect() and add hpd_enable()/disable() bridge hooks with runtime
calls, to detect hpd properly without any delay.
[0]: <https://www.ti.com/lit/gpn/SN65DSI86> (Pg. 32)
Fixes: c312b0df3b ("drm/bridge: ti-sn65dsi86: Implement bridge connector operations for DP")
Cc: Max Krummenacher <max.krummenacher@toradex.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Jayesh Choudhary <j-choudhary@ti.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250624044835.165708-1-j-choudhary@ti.com
An earlier patch fixed a build failure with clang, but I still see the
same problem with some configurations using gcc:
drivers/gpu/drm/i915/i915_pmu.c: In function 'config_mask':
include/linux/compiler_types.h:568:38: error: call to '__compiletime_assert_462' declared with attribute error: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1
drivers/gpu/drm/i915/i915_pmu.c:116:3: note: in expansion of macro 'BUILD_BUG_ON'
116 | BUILD_BUG_ON(bit >
As I understand it, the problem is that the function is not always fully
inlined, but the __builtin_constant_p() can still evaluate the argument
as being constant.
Marking it as __always_inline so far works for me in all configurations.
Fixes: a7137b1825 ("drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled")
Fixes: a644fde77f ("drm/i915/pmu: Change bitmask of enabled events to u32")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250620111824.3395007-1-arnd@kernel.org
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit ef69f9dd1c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Prevent other bits of mailbox power limit from being overwritten with 0.
This issue was due to a missing read and modify of current power limit,
before setting a requested mailbox power limit, which is added in this
patch.
v2:
- Improve commit message. (Anshuman)
v3:
- Rebase.
- Rephrase commit message. (Riana)
- Add read-modify-write variant of xe_hwmon_pcode_write_power_limit()
i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal)
- Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits.
- Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits
writes use xe_hwmon_pcode_rmw_power_limit() only.
v4:
- Use PWR_LIM in place of (PWR_LIM_EN | PWR_LIM_VAL) wherever
applicable. (Riana)
Fixes: 25a2aa779f ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 8aa7306631)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
When EDID is retrieved via drm_edid_raw(), it doesn't guarantee to
return proper EDID bytes the caller wants: it may be either NULL (that
leads to an Oops) or with too long bytes over the fixed size raw_edid
array (that may lead to memory corruption). The latter was reported
actually when connected with a bad adapter.
Add sanity checks for drm_edid_raw() to address the above corner
cases, and return EDID_BAD_INPUT accordingly.
Fixes: 48edb2a425 ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1236415
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 648d3f4d20)
Cc: stable@vger.kernel.org
Enable the cleaner shader for other GFX9.x series of GPUs to provide
data isolation between GPU workloads. The cleaner shader is responsible
for clearing the Local Data Store (LDS), Vector General Purpose
Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which
helps prevent data leakage and ensures accurate computation results.
This update extends cleaner shader support to GFX9.x GPUs, previously
available for GFX9.4.2. It enhances security by clearing GPU memory
between processes and maintains a consistent GPU state across KGD and
KFD workloads.
Cc: Manu Rastogi <manu.rastogi@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 99808926d0)
Reading DPCD registers has side-effects in general. In particular
accessing registers outside of the link training register range
(0x102-0x106, 0x202-0x207, 0x200c-0x200f, 0x2216) is explicitly
forbidden by the DP v2.1 Standard, see
3.6.5.1 DPTX AUX Transaction Handling Mandates
3.6.7.4 128b/132b DP Link Layer LTTPR Link Training Mandates
Based on my tests, accessing the DPCD_REV register during the link
training of an UHBR TBT DP tunnel sink leads to link training failures.
Solve the above by using the DP_LANE0_1_STATUS (0x202) register for the
DPCD register access quirk.
Cc: <stable@vger.kernel.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://lore.kernel.org/r/20250605082850.65136-2-imre.deak@intel.com
(cherry picked from commit a40c5d727b)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The drm_writeback_connector_cleanup have the signature:
static void drm_writeback_connector_cleanup(
struct drm_device *dev,
struct drm_writeback_connector *wb_connector)
But it is stored and used as a drmres_release_t
typedef void (*drmres_release_t)(struct drm_device *dev, void *res);
While the current code is valid and does not produce any warning, the
CFI runtime check (CONFIG_CFI_CLANG) can fail because the function
signature is not the same as drmres_release_t.
In order to fix this, change the function signature to match what is
expected by drmres_release_t.
Fixes: 1914ba2b91 ("drm: writeback: Create drmm variants for drm_writeback_connector initialization")
Suggested-by: Mark Yacoub <markyacoub@google.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Link: https://lore.kernel.org/r/20250429-drm-fix-writeback-cleanup-v2-1-548ff3a4e284@bootlin.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
q->gws is not updated atomically with qpd->mapped_gws_queue. If a
runlist is created between pqm_set_gws and update_queue it will
contain a queue which uses GWS in a process with no GWS allocated.
This will result in a scheduler hang.
Use q->properties.is_gws which is changed while holding the DQM lock.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b98370220e)
Cc: stable@vger.kernel.org
Use the amdgpu fence container so we can store additional
data in the fence. This also fixes the start_time handling
for MCBP since we were casting the fence to an amdgpu_fence
and it wasn't.
Fixes: 3f4c175d62 ("drm/amdgpu: MCBP based on DRM scheduler (v9)")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bf1cd14f9e)
Cc: stable@vger.kernel.org
This commit makes two key fixes to SDMA v4.4.2 handling:
1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines
by reading the current value before modifying UTC_L1_ENABLE bit.
2. Ensure UTC_L1_ENABLE is consistently managed by:
- Adding the missing register write when enabling UTC_L1 during start
- Keeping UTC_L1 enabled by default as per hardware requirements
v2: Correct SDMA_CNTL setting (Philip)
Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 375bf56465)
Cc: stable@vger.kernel.org
Simplify SDMA v4_4_2 queue reset and stop operations by:
1. Removing GET_INST(SDMA0) conversion for ring->me
2. Using the logical instance ID (ring->me) directly
3. Maintaining consistent behavior with other SDMA queue operations
This change aligns with the existing queue handling logic where
ring->me already represents the correct instance identifier.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3bab282dfe)
Cc: stable@vger.kernel.org
This commit makes the following improvements to SDMA engine reset handling:
1. Clarifies in the function documentation that instance_id refers to a logical ID
2. Adds conversion from logical to physical instance ID before performing reset
using GET_INST(SDMA0, instance_id) macro
3. Improves error messaging to indicate when a logical instance reset fails
4. Adds better code organization with blank lines for readability
The change ensures proper SDMA engine reset by using the correct physical
instance ID while maintaining the logical ID interface for callers.
V2: Remove harvest_config check and convert directly to physical instance (Lijo)
Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5efa6217c2)
Cc: stable@vger.kernel.org
[WHY]
Userspace currently is offered a range from 0-0xFF but the PWM is
programmed from 0-0xFFFF. This can be limiting to some software
that wants to apply greater granularity.
[HOW]
Convert internally to firmware values only when mapping custom
brightness curves because these are in 0-0xFF range. Advertise full
PWM range to userspace.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8dbd72cb79)
Cc: stable@vger.kernel.org
[WHY]
These fields are read for the explicit purpose of detecting embedded LTTPRs
(i.e. between host ASIC and the user-facing port), and thus need to
calculate the correct DPCD address offset based on LTTPR count to target
the appropriate LTTPR's DPCD register space with these queries.
[HOW]
Cascaded LTTPRs in a link each snoop and increment LTTPR count when queried
via DPCD read, so an LTTPR embedded in a source device (e.g. USB4 port on a
laptop) will always be addressible using the max LTTPR count seen by the
host. Therefore we simply need to use a recently added helper function to
calculate the correct DPCD address to target potentially embedded LTTPRs
based on the received LTTPR count.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 791897f5c7)
Cc: stable@vger.kernel.org
Relocate the per-SDMA queue reset capability check from
kfd_topology_set_capabilities() to node_show() to ensure we read the
latest value of sdma.supported_reset after all IP blocks are initialized.
Fixes: ceb7114c96 ("drm/amdkfd: flag per-sdma queue reset supported to user space")
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e17df7b086)
Fixes for v6.16-rc3
Display:
- Fixed DP output on SDM845
- Fixed 10nm DSI PLL init
GPU:
- SUBMIT ioctl error path leak fixes
- drm half of stall-on-fault fixes. Note there is a soft dependency,
to get correct mmu fault devcoredumps, on arm-smmu changes which
are not in this branch, but have already been merged by Linus. So
by the time Linus merges this, everything should be peachy.
- a7xx: Missing CP_RESET_CONTEXT_STATE
- Skip GPU component bind if GPU is not in the device table.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <rob.clark@oss.qualcomm.com>
Link: https://lore.kernel.org/r/CACSVV03=OH74ip8O1xqb8RJWGyM4HFuUnWuR=p3zJR+-ko_AJA@mail.gmail.com