[Why&How]
When using a 4k monitor when cursor caching is not supported due to
framebuffer being on an uncacheable address, enabling display refresh
from MALL would trigger corruption if SS is enabled.
Prevent entering SS if we are on the edge case and cursor caching is not
possible. Do this only if cursor size larger than a 64x64@4bpp. Pull the
cursor size calculation out of if condition since cursor address may not
be set on all platforms
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Brian Chang <Brian.Chang@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DIG_FIFO_ERROR = 1 caused mst daisy chain 2nd monitor black.
[How]
We need to set dig fifo read start level = 7 before dig fifo reset during dig
fifo enable according to hardware designer's suggestion. If it is zero, it will
cause underflow or overflow and DIG_FIFO_ERROR = 1.
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Brian Chang <Brian.Chang@amd.com>
Signed-off-by: Wang Fudong <Fudong.Wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In amdgpu_cs_ioctl, amdgpu_job_free could be performed ealier if there
is -ERESTARTSYS error. In this case, job->hw_fence could be not
initialized yet. Putting hw_fence during amdgpu_job_free could lead to a
use-after-free warning.
[How]
Check if drm_sched_job_init is performed before job_free by checking
s_fence.
v2: Check hw_fence.ops instead since it could be NULL if fence is not
initialized. Reverse the condition since !=NULL check is discouraged in
kernel.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When ODM is enabled, H timing control register reset
to 0. Div mode manual field get overwritten causing
no display on certain modes for dcn314.
[How]
Use REG_UPDATE instead of REG_SET to set div_mode
field.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Brian Chang <Brian.Chang@amd.com>
Signed-off-by: Duncan Ma <duncan.ma@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Each index in the DPSTREAMCLK_CNTL register
phyiscally maps 1-to-1 with HPO stream encoder
instance. On the other hand, each index in
DTBCLK_P_CNTL physically maps 1-to-1 with OTG
instance.
Current DCN32 DPSTREAMCLK_CLK programing assumes
that OTG instance always maps 1-to-1 with
HPO stream encoder instance. This is not always
guaranteed and can result in blackscreen.
[How]
Program the correct dpstreamclk instance with
the correct dtbclk_p source.
Reviewed-by: Ariel Bernstein <Eric.Bernstein@amd.com>
Acked-by: Brian Chang <Brian.Chang@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Below driver load error will be printed, not friendly to end user.
amdgpu: ATOM BIOS: 113-D603GLXE-077
[drm] FRU: Failed to get size field
[drm:amdgpu_fru_get_product_info [amdgpu]] *ERROR* Failed to read FRU Manufacturer, ret:-5
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Commit 20f85ef89d ("drm/i915/backlight: use unique backlight device
names") added support for multiple backlight devices on dual panel
systems, but did so with error handling on -EEXIST from
backlight_device_register(). Unfortunately, that triggered a warning in
dmesg all the way down from sysfs_add_file_mode_ns() and
sysfs_warn_dup().
Instead of optimistically always attempting to register with the default
name ("intel_backlight", which we have to retain for backward
compatibility), check if a backlight device with the name exists first,
and, if so, use the card and connector based name.
v2: reworked on top of the patch commit 20f85ef89d
("drm/i915/backlight: use unique backlight device names")
v3: fixed the ref count leak(Jani N)
Fixes: 20f85ef89d ("drm/i915/backlight: use unique backlight device names")
Signed-off-by: Arun R Murthy <arun.r.murthy@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220808035750.3111046-1-arun.r.murthy@intel.com
(cherry picked from commit 4234ea3005)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If the GuC CTs are full and we need to stall the request submission
while waiting for space, we save the stalled request and where the stall
occurred; when the CTs have space again we pick up the request submission
from where we left off.
If a full GT reset occurs, the state of all contexts is cleared and all
non-guilty requests are unsubmitted, therefore we need to restart the
stalled request submission from scratch. To make sure that we do so,
clear the saved request after a reset.
Fixes note: the patch that introduced the bug is in 5.15, but no
officially supported platform had GuC submission enabled by default
in that kernel, so the backport to that particular version (and only
that one) can potentially be skipped.
Fixes: 925dc1cf58 ("drm/i915/guc: Implement GuC submission tasklet")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com
(cherry picked from commit f922fbb0f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Crucible + recent Mesa seems to sometimes hit:
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER)
And it looks like we can also trigger this with gem_lmem_swapping, if we
modify the test to use slightly larger object sizes.
Looking closer it looks like we have the following issues in
migrate_copy():
- We are using plain integer in various places, which we can easily
overflow with a large object.
- We pass the entire object size (when the src is lmem) into
emit_pte() and then try to copy it, which doesn't work, since we
only have a few fixed sized windows in which to map the pages and
perform the copy. With an object > 8M we therefore aren't properly
copying the pages. And then with an object > 64M we trigger the
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER).
So it looks like our copy handling for any object > 8M (which is our
CHUNK_SZ) is currently broken on DG2.
Fixes: da0595ae91 ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Testcase: igt@gem_lmem_swapping
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-2-matthew.auld@intel.com
(cherry picked from commit 8676145eb2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The mmVM_L2_CNTL3 register is not assigned an initial value
Signed-off-by: Qu Huang <jinsdb@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When translate_further is enabled, page table depth needs to
be updated. This was missing on Arcturus MMHUB init. This was
causing address translations to fail for SDMA user-mode queues.
Fixes: 352e683b72 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach")
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For some ASICs, like GFX IP v11.0.1, only have one SDMA instance,
so not need to configure SDMA1_RLC_CGCG_CTRL for this case.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the BIF0_PCIE_TX_POWER_CTRL_1 register offset and mask macro
definitions for AMD_CG_SUPPORT_BIF_LS.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Commit 20f85ef89d ("drm/i915/backlight: use unique backlight device
names") added support for multiple backlight devices on dual panel
systems, but did so with error handling on -EEXIST from
backlight_device_register(). Unfortunately, that triggered a warning in
dmesg all the way down from sysfs_add_file_mode_ns() and
sysfs_warn_dup().
Instead of optimistically always attempting to register with the default
name ("intel_backlight", which we have to retain for backward
compatibility), check if a backlight device with the name exists first,
and, if so, use the card and connector based name.
v2: reworked on top of the patch commit 20f85ef89d
("drm/i915/backlight: use unique backlight device names")
v3: fixed the ref count leak(Jani N)
Fixes: 20f85ef89d ("drm/i915/backlight: use unique backlight device names")
Signed-off-by: Arun R Murthy <arun.r.murthy@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220808035750.3111046-1-arun.r.murthy@intel.com
(cherry picked from commit 4234ea3005)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If the GuC CTs are full and we need to stall the request submission
while waiting for space, we save the stalled request and where the stall
occurred; when the CTs have space again we pick up the request submission
from where we left off.
If a full GT reset occurs, the state of all contexts is cleared and all
non-guilty requests are unsubmitted, therefore we need to restart the
stalled request submission from scratch. To make sure that we do so,
clear the saved request after a reset.
Fixes note: the patch that introduced the bug is in 5.15, but no
officially supported platform had GuC submission enabled by default
in that kernel, so the backport to that particular version (and only
that one) can potentially be skipped.
Fixes: 925dc1cf58 ("drm/i915/guc: Implement GuC submission tasklet")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com
(cherry picked from commit f922fbb0f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Crucible + recent Mesa seems to sometimes hit:
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER)
And it looks like we can also trigger this with gem_lmem_swapping, if we
modify the test to use slightly larger object sizes.
Looking closer it looks like we have the following issues in
migrate_copy():
- We are using plain integer in various places, which we can easily
overflow with a large object.
- We pass the entire object size (when the src is lmem) into
emit_pte() and then try to copy it, which doesn't work, since we
only have a few fixed sized windows in which to map the pages and
perform the copy. With an object > 8M we therefore aren't properly
copying the pages. And then with an object > 64M we trigger the
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER).
So it looks like our copy handling for any object > 8M (which is our
CHUNK_SZ) is currently broken on DG2.
Fixes: da0595ae91 ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Testcase: igt@gem_lmem_swapping
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-2-matthew.auld@intel.com
(cherry picked from commit 8676145eb2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Although radeon card fence and wait for gpu to finish processing current batch rings,
there is still a corner case that radeon lockup work queue may not be fully flushed,
and meanwhile the radeon_suspend_kms() function has called pci_set_power_state() to
put device in D3hot state.
Per PCI spec rev 4.0 on 5.3.1.4.1 D3hot State.
> Configuration and Message requests are the only TLPs accepted by a Function in
> the D3hot state. All other received Requests must be handled as Unsupported Requests,
> and all received Completions may optionally be handled as Unexpected Completions.
This issue will happen in following logs:
Unable to handle kernel paging request at virtual address 00008800e0008010
CPU 0 kworker/0:3(131): Oops 0
pc = [<ffffffff811bea5c>] ra = [<ffffffff81240844>] ps = 0000 Tainted: G W
pc is at si_gpu_check_soft_reset+0x3c/0x240
ra is at si_dma_is_lockup+0x34/0xd0
v0 = 0000000000000000 t0 = fff08800e0008010 t1 = 0000000000010000
t2 = 0000000000008010 t3 = fff00007e3c00000 t4 = fff00007e3c00258
t5 = 000000000000ffff t6 = 0000000000000001 t7 = fff00007ef078000
s0 = fff00007e3c016e8 s1 = fff00007e3c00000 s2 = fff00007e3c00018
s3 = fff00007e3c00000 s4 = fff00007fff59d80 s5 = 0000000000000000
s6 = fff00007ef07bd98
a0 = fff00007e3c00000 a1 = fff00007e3c016e8 a2 = 0000000000000008
a3 = 0000000000000001 a4 = 8f5c28f5c28f5c29 a5 = ffffffff810f4338
t8 = 0000000000000275 t9 = ffffffff809b66f8 t10 = ff6769c5d964b800
t11= 000000000000b886 pv = ffffffff811bea20 at = 0000000000000000
gp = ffffffff81d89690 sp = 00000000aa814126
Disabling lock debugging due to kernel taint
Trace:
[<ffffffff81240844>] si_dma_is_lockup+0x34/0xd0
[<ffffffff81119610>] radeon_fence_check_lockup+0xd0/0x290
[<ffffffff80977010>] process_one_work+0x280/0x550
[<ffffffff80977350>] worker_thread+0x70/0x7c0
[<ffffffff80977410>] worker_thread+0x130/0x7c0
[<ffffffff80982040>] kthread+0x200/0x210
[<ffffffff809772e0>] worker_thread+0x0/0x7c0
[<ffffffff80981f8c>] kthread+0x14c/0x210
[<ffffffff80911658>] ret_from_kernel_thread+0x18/0x20
[<ffffffff80981e40>] kthread+0x0/0x210
Code: ad3e0008 43f0074a ad7e0018 ad9e0020 8c3001e8 40230101
<88210000> 4821ed21
So force lockup work queue flush to fix this problem.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The file amdgpu_dm_plane.c missed the header amdgpu_dm_plane.h, which
resulted on the following warning:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1046:5:
warning: no previous prototype for 'fill_dc_scaling_info'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1222:6:
warning: no previous prototype for 'handle_cursor_update'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:152:6:
warning: no previous prototype for 'modifier_has_dcc'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1576:5:
warning: no previous prototype for 'amdgpu_dm_plane_init'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:157:10:
warning: no previous prototype for 'modifier_gfx9_swizzle_mode'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:752:5:
warning: no previous prototype for 'fill_plane_buffer_attributes'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:83:31:
warning: no previous prototype for 'amd_get_format_info'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:88:6:
warning: no previous prototype for 'fill_blending_from_plane_state'
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:992:5:
warning: no previous prototype for 'dm_plane_helper_check_state'
[-Wmissing-prototypes]
Therefore, include the missing header on the file and turn global functions
that are not used outside of the file into static functions.
Fixes: 5d945cbcd4 ("drm/amd/display: Create a file dedicated to planes")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V1:
The amdgpu_xgmi_remove_device function will send unload command
to psp through psp ring to terminate xgmi, but psp ring has been
destroyed in psp_hw_fini.
V2:
1. Change the commit title.
2. Restore amdgpu_xgmi_remove_device to its original calling location.
Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to
psp_hw_fini.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>