mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-03-11 22:50:55 -04:00
79867462634836ee5c39a2cdf624719feeb189bd
Highly concurrent Piglit runs can trigger a race condition where a pending
SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the
buffer was stuck wherever it happened to be at the time, possibly in VRAM.
The symptom was user space processes stuck in interruptible waits with
kernel stacks like:
[<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
[<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
[<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
[<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
[<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
[<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
[<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
[<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
[<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
[<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
[<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
[<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
[<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
[<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
[<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
[<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
[<ffffffffffffffff>] 0xffffffffffffffff
Note: The correctness of this change depends on the earlier commit
"drm/amd/sched: move adding finish callback to amd_sched_job_begin"
v2: set an error on the finished fence
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
…
…
Linux kernel ============ This file was moved to Documentation/admin-guide/README.rst Please notice that there are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. See Documentation/00-INDEX for a list of what is contained in each file. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.
Description
Languages
C
97.1%
Assembly
1%
Shell
0.6%
Rust
0.4%
Python
0.4%
Other
0.3%