dEQP-VK.sparse_resources.image_rebind.2d_array.r64i.128_128_8
was causing a remap operation like the below.
op_remap: prev: 0000003fffed0000 00000000000f0000 00000000a5abd18a 0000000000000000
op_remap: next:
op_remap: unmap: 0000003fffed0000 0000000000100000 0
op_map: map: 0000003ffffc0000 0000000000010000 000000005b1ba33c 00000000000e0000
This was resulting in an unmap operation from 0x3fffed0000+0xf0000, 0x100000
which was corrupting the pagetables and oopsing the kernel.
Fixes the prev + unmap range calcs to use start/end and map back to addr/range.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes: b88baab828 ("drm/nouveau: implement new VM_BIND uAPI")
Cc: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240328024317.2041851-1-airlied@gmail.com
Increase the timeout value to prevent system logs on Amlogic boards flooding
with power transition warnings:
[ 13.047638] panfrost ffe40000.gpu: shader power transition timeout
[ 13.048674] panfrost ffe40000.gpu: l2 power transition timeout
[ 13.937324] panfrost ffe40000.gpu: shader power transition timeout
[ 13.938351] panfrost ffe40000.gpu: l2 power transition timeout
...
[39829.506904] panfrost ffe40000.gpu: shader power transition timeout
[39829.507938] panfrost ffe40000.gpu: l2 power transition timeout
[39949.508369] panfrost ffe40000.gpu: shader power transition timeout
[39949.509405] panfrost ffe40000.gpu: l2 power transition timeout
The 2000 value has been found through trial and error testing with devices
using G52 and G31 GPUs.
Fixes: 22aa1a2090 ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()")
Signed-off-by: Christian Hewitt <christianshewitt@gmail.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240322164525.2617508-1-christianshewitt@gmail.com
Pull drm fixes from Dave Airlie:
"Fixes from the last week (or 3 weeks in amdgpu case), after amdgpu,
it's xe and nouveau then a few scattered core fixes.
core:
- fix rounding in drm_fixp2int_round()
bridge:
- fix documentation for DRM_BRIDGE_OP_EDID
sun4i:
- fix 64-bit division on 32-bit architectures
tests:
- fix dependency on DRM_KMS_HELPER
probe-helper:
- never return negative values from .get_modes() plus driver fixes
xe:
- invalidate userptr vma on page pin fault
- fail early on sysfs file creation error
- skip VMA pinning on xe_exec if no batches
nouveau:
- clear bo resource bus after eviction
- documentation fixes
- don't check devinit disable on GSP
amdgpu:
- Freesync fixes
- UAF IOCTL fixes
- Fix mmhub client ID mapping
- IH 7.0 fix
- DML2 fixes
- VCN 4.0.6 fix
- GART bind fix
- GPU reset fix
- SR-IOV fix
- OD table handling fixes
- Fix TA handling on boards without display hardware
- DML1 fix
- ABM fix
- eDP panel fix
- DPPCLK fix
- HDCP fix
- Revert incorrect error case handling in ioremap
- VPE fix
- HDMI fixes
- SDMA 4.4.2 fix
- Other misc fixes
amdkfd:
- Fix duplicate BO handling in process restore"
* tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel: (50 commits)
drm/amdgpu/pm: Don't use OD table on Arcturus
drm/amdgpu: drop setting buffer funcs in sdma442
drm/amd/display: Fix noise issue on HDMI AV mute
drm/amd/display: Revert Remove pixle rate limit for subvp
Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode"
Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()"
drm/amd/display: Add a dc_state NULL check in dc_state_release
drm/amd/display: Return the correct HDCP error code
drm/amd/display: Implement wait_for_odm_update_pending_complete
drm/amd/display: Lock all enabled otg pipes even with no planes
drm/amd/display: Amend coasting vtotal for replay low hz
drm/amd/display: Fix idle check for shared firmware state
drm/amd/display: Update odm when ODM combine is changed on an otg master pipe with no plane
drm/amd/display: Init DPPCLK from SMU on dcn32
drm/amd/display: Add monitor patch for specific eDP
drm/amd/display: Allow dirty rects to be sent to dmub when abm is active
drm/amd/display: Override min required DCFCLK in dml1_validate
drm/amdgpu: Bypass display ta if display hw is not available
drm/amdgpu: correct the KGQ fallback message
drm/amdgpu/pm: Check the validity of overdiver power limit
...
To fix the entity rq NULL issue. This setting has been moved
to upper level.
Fixes: b70438004a ("drm/amdgpu: move buffer funcs setting up a level")
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch causes the following iounmap erorr and calltrace
iounmap: bad address 00000000d0b3631f
The original patch was unjustified because amdgpu_device_fini_sw() will
always cleanup the rmmio mapping.
This reverts commit eb4f139888.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Odm update is doubled buffered. We need to wait for ODM update to be
completed before optimizing bandwidth or programming new udpates.
[HOW]
implement wait_for_odm_update_pending_complete function to wait for:
1. odm configuration update is no longer pending in timing generator.
2. no pending dpg pattern update for each active OPP.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
On DCN32 we support dynamic ODM even when OTG is blanked. When ODM
configuration is dynamically changed and the OTG is on blank pattern,
we will need to reprogram OPP's test pattern based on new ODM
configuration. Therefore we need to lock the OTG pipe to avoid temporary
corruption when we are reprogramming OPP blank patterns.
[HOW]
Add a new interdependent update lock implementation to lock all enabled
OTG pipes even when there is no plane on the OTG for DCN32.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
When committing an update with ODM combine change when the plane is
removing or already removed, we fail to detect odm change in pipe
update flags. This has caused mismatch between new dc state and the
actual hardware state, because we missed odm programming.
[HOW]
- Detect odm change even for otg master pipe without a plane.
- Update odm config before calling program pipes for pipe with planes.
The commit also updates blank pattern programming when odm is changed
without plane. This is because number of OPP is changed when ODM
combine is changed. Blank pattern is per OPP so we will need to
reprogram OPP based on the new pipe topology.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In passthrough environment, when amdgpu is reloaded after unload, mode-1
is triggered after initializing the necessary IPs, That init does not
include KFD, and KFD init waits until the reset is completed. KFD init
is called in the reset handler, but in this case, the zone device and
drm client is not initialized, causing app to create kernel panic.
v2: Removing the init KFD condition from amdgpu_amdkfd_drm_client_create.
As the previous version has the potential of creating DRM client twice.
v3: v2 patch results in SDMA engine hung as DRM open causes VM clear to SDMA
before SDMA init. Adding the condition to in drm client creation, on top of v1,
to guard against drm client creation call multiple times.
Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise after the GTT bo is released, the GTT and gart space is freed
but amdgpu_ttm_backend_unbind will not clear the gart page table entry
and leave valid mapping entry pointing to the stale system page. Then
if GPU access the gart address mistakely, it will read undefined value
instead page fault, harder to debug and reproduce the real issue.
Cc: stable@vger.kernel.org
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In certain situations, some apps can import a BO multiple times
(through IPC for example). To restore such processes successfully,
we need to tell drm to ignore duplicate BOs.
While at it, also add additional logging to prevent silent failures
when process restore fails.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull tracing updates from Steven Rostedt:
"Main user visible change:
- User events can now have "multi formats"
The current user events have a single format. If another event is
created with a different format, it will fail to be created. That
is, once an event name is used, it cannot be used again with a
different format. This can cause issues if a library is using an
event and updates its format. An application using the older format
will prevent an application using the new library from registering
its event.
A task could also DOS another application if it knows the event
names, and it creates events with different formats.
The multi-format event is in a different name space from the single
format. Both the event name and its format are the unique
identifier. This will allow two different applications to use the
same user event name but with different payloads.
- Added support to have ftrace_dump_on_oops dump out instances and
not just the main top level tracing buffer.
Other changes:
- Add eventfs_root_inode
Only the root inode has a dentry that is static (never goes away)
and stores it upon creation. There's no reason that the thousands
of other eventfs inodes should have a pointer that never gets set
in its descriptor. Create a eventfs_root_inode desciptor that has a
eventfs_inode descriptor and a dentry pointer, and only the root
inode will use this.
- Added WARN_ON()s in eventfs
There's some conditionals remaining in eventfs that should never be
hit, but instead of removing them, add WARN_ON() around them to
make sure that they are never hit.
- Have saved_cmdlines allocation also include the map_cmdline_to_pid
array
The saved_cmdlines structure allocates a large amount of data to
hold its mappings. Within it, it has three arrays. Two are already
apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
can be saved by also including the map_cmdline_to_pid[] array as
well.
- Restructure __string() and __assign_str() macros used in
TRACE_EVENT()
Dynamic strings in TRACE_EVENT() are declared with:
__string(name, source)
And assigned with:
__assign_str(name, source)
In the tracepoint callback of the event, the __string() is used to
get the size needed to allocate on the ring buffer and
__assign_str() is used to copy the string into the ring buffer.
There's a helper structure that is created in the TRACE_EVENT()
macro logic that will hold the string length and its position in
the ring buffer which is created by __string().
There are several trace events that have a function to create the
string to save. This function is executed twice. Once for
__string() and again for __assign_str(). There's no reason for
this. The helper structure could also save the string it used in
__string() and simply copy that into __assign_str() (it also
already has its length).
By using the structure to store the source string for the
assignment, it means that the second argument to __assign_str() is
no longer needed.
It will be removed in the next merge window, but for now add a
warning if the source string given to __string() is different than
the source string given to __assign_str(), as the source to
__assign_str() isn't even used and will be going away.
- Added checks to make sure that the source of __string() is also the
source of __assign_str() so that it can be safely removed in the
next merge window.
Included fixes that the above check found.
- Other minor clean ups and fixes"
* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
tracing: Add __string_src() helper to help compilers not to get confused
tracing: Use strcmp() in __assign_str() WARN_ON() check
tracepoints: Use WARN() and not WARN_ON() for warnings
tracing: Use div64_u64() instead of do_div()
tracing: Support to dump instance traces by ftrace_dump_on_oops
tracing: Remove second parameter to __assign_rel_str()
tracing: Add warning if string in __assign_str() does not match __string()
tracing: Add __string_len() example
tracing: Remove __assign_str_len()
ftrace: Fix most kernel-doc warnings
tracing: Decrement the snapshot if the snapshot trigger fails to register
tracing: Fix snapshot counter going between two tracers that use it
tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
tracing: Use ? : shortcut in trace macros
tracing: Do not calculate strlen() twice for __string() fields
tracing: Rework __assign_str() and __string() to not duplicate getting the string
cxl/trace: Properly initialize cxl_poison region name
net: hns3: tracing: fix hclgevf trace event strings
drm/i915: Add missing ; to __assign_str() macros in tracepoint code
NFSD: Fix nfsd_clid_class use of __string_len() macro
...