On some platforms the hw has dropped support for 4K GTT pages when
dealing with LMEM, and due to the design of 64K GTT pages in the hw, we
can only mark the *entire* page-table as operating in 64K GTT mode,
since the enable bit is still on the pde, and not the pte. And since we
we still need to allow 4K GTT pages for SMEM objects, we can't have a
"normal" 4K page-table with scratch pointing to LMEM, since that's
undefined from the hw pov. The simplest solution is to just move the 64K
scratch page to SMEM on such platforms and call it a day, since that
should work for all configurations.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211208141613.7251-4-ramalingam.c@intel.com
Certain functions within i915 uses macros that are defined for
specific architectures by the mmu, such as _PAGE_RW and _PAGE_PRESENT
(Some architectures don't even have these macros defined, like ARM64).
Instead of re-using bits defined for the CPU, we should use bits
defined for i915. This patch introduces two new 64 bit macros,
GEN8_PAGE_PRESENT and GEN8_PAGE_RW, to check for bits 0 and 1 and, to
replace all occurrences of _PAGE_RW and _PAGE_PRESENT within i915.
v2(Michael Cheng): Use GEN8_ instead of I915_
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
[ Move defines together with other GEN8 defines ]
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211206215245.513677-2-michael.cheng@intel.com
With asynchronous migrations, the vma state may be several migrations
ahead of the state that matches the request we're capturing.
Address that by introducing an i915_vma_snapshot structure that
can be used to snapshot relevant state at request submission.
In order to make sure we access the correct memory, the snapshots take
references on relevant sg-tables and memory regions.
Also move the capture list allocation out of the fence signaling
critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to
avoid compiling in members and functions used for error capture
when they're not used.
Finally, Introduce lockdep annotation.
v4:
- Break out the capture allocation mode change to a separate patch.
v5:
- Fix compilation error in the !CONFIG_DRM_I915_CAPTURE_ERROR case
(kernel test robot)
v6:
- Use #if IS_ENABLED() instead of #ifdef to match driver style.
- Move yet another change of allocation mode to the separate patch.
- Commit message rework due to patch reordering.
v7:
- Adjust for removal of region refcounting.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211129202245.472043-1-thomas.hellstrom@linux.intel.com
In igt_request_rewind(), mock_context(i915, "A") is assigned to ctx[0]
and used in i915_gem_context_get_engine(). There is a dereference
of ctx[0] in i915_gem_context_get_engine(), which could lead to a NULL
pointer dereference on failure of mock_context(i915, "A") .
So as mock_context(i915, "B").
Although this bug is not serious for it belongs to testing code, it is
better to be fixed to avoid unexpected failure in testing.
Fix this bugs by adding checks about ctx[0] and ctx[1].
This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.
Builds with CONFIG_DRM_I915_SELFTEST=y show no new warnings,
and our static analyzer no longer warns about this code.
References: 591c0fb85d ("drm/i915: Exercise request cancellation using a mock selftest")
[tursulin: Replaced fixes with references to avoid.]
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211130141545.153899-1-zhou1615@umn.edu
With both integrated and discrete Intel GPUs in a system, the current
global check of intel_iommu_gfx_mapped, as done from intel_vtd_active()
may not be completely accurate.
In this patch we add i915 parameter to intel_vtd_active() in order to
prepare it for multiple GPUs and we also change the check away from Intel
specific intel_iommu_gfx_mapped (global exported by the Intel IOMMU
driver) to probing the presence of IOMMU on a specific device using
device_iommu_mapped().
This will return true both for IOMMU pass-through and address translation
modes which matches the current behaviour. If in the future we wanted to
distinguish between these two modes we could either use
iommu_get_domain_for_dev() and check for __IOMMU_DOMAIN_PAGING bit
indicating address translation, or ask for a new API to be exported from
the IOMMU core code.
v2:
* Check for dmar translation specifically, not just iommu domain. (Baolu)
v3:
* Go back to plain "any domain" check for now, rewrite commit message.
v4:
* Use device_iommu_mapped. (Robin, Baolu)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211126141424.493753-1-tvrtko.ursulin@linux.intel.com
Rather than stealing bits from i915_sw_fence function pointer use
separate fields for function pointer and flags. If using two different
fields, the 4 byte alignment for the i915_sw_fence function pointer can
also be dropped.
v2:
(CI)
- Set new function field rather than flags in __i915_sw_fence_init
v3:
(Tvrtko)
- Remove BUG_ON(!fence->flags) in reinit as that will now blow up
- Only define fence->flags if CONFIG_DRM_I915_SW_FENCE_CHECK_DAG is
defined
v4:
- Rebase, resend for CI
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211116194929.10211-1-matthew.brost@intel.com
Since the PMU callback runs in irq context, it synchronizes with gt
reset using the reset count. We could run into a case where the PMU
callback could read the reset count before it is updated. This has a
potential of corrupting the busyness stats.
In addition to the reset count, check if the reset bit is set before
capturing busyness.
In addition save the previous stats only if you intend to update them.
v2:
- The 2 reset counts captured in the PMU callback can end up being the
same if they were captured right after the count is incremented in the
reset flow. This can lead to a bad busyness state. Ensure that reset
is not in progress when the initial reset count is captured.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211108211057.68783-1-umesh.nerlige.ramappa@intel.com
The gpu coredump typically takes place in a dma_fence signalling
critical path, and hence can't use GFP_KERNEL allocations, as that
means we might hit deadlocks under memory pressure. However
changing to __GFP_KSWAPD_RECLAIM which will be done in an upcoming
patch will instead mean a lower chance of the allocation succeeding.
In particular large contigous allocations like the coredump page
vector.
Remove the page vector in favor of a linked list of single pages.
Use the page lru list head as the list link, as the page owner is
allowed to do that.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211108174547.979714-2-thomas.hellstrom@linux.intel.com
The signaled bit is already used for quick testing if a fence is signaled.
On top of that, it's a terrible abuse of dma-fence api, and in the common
case where the object is already locked by the caller, the trylock will fail.
If it were useful, the core dma-api would have exposed the same functionality.
The fact that i915 has a dma_resv_utils.c file should be a warning that the
functionality either belongs in core, or is not very useful at all.
In this case the latter.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[mlankhorst: Improve commit message]
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021103605.735002-3-maarten.lankhorst@linux.intel.com
Reviewed-by: Matthew Auld <matthew.auld@intel.com> #irc
Maarten requested a backmerge due his work depending on subtle semantic
changes introduced by:
7e2e69ed46 ("drm/i915: Fix i915_request fence wait semantics")
2cbb8d4d67 ("drm/i915: use new iterator in i915_gem_object_wait_reservation")
Both should probably have been merged to drm-intel-gt-next anyway.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Don't wait sync while migrating, but rather make the GPU blit await the
dependencies and add a moving fence to the object.
This also enables asynchronous VRAM management in that on eviction,
rather than waiting for the moving fence to expire before freeing VRAM,
it is freed immediately and the fence is stored with the VRAM manager and
handed out to newly allocated objects to await before clears and swapins,
or for kernel objects before setting up gpu vmas or mapping.
To collect dependencies before migrating, add a set of utilities that
coalesce these to a single dma_fence.
What is still missing for fully asynchronous operation is asynchronous vma
unbinding, which is still to be implemented.
This commit substantially reduces execution time in the gem_lmem_swapping
test.
v2:
- Make a couple of functions static.
v4:
- Fix some style issues (Matthew Auld)
- Audit and add more checks for ghost objects (Matthew Auld)
- Add more documentation for the i915_deps utility (Mattew Auld)
- Simplify the i915_deps_sync() function
v6:
- Re-check for fence signaled before returning -EBUSY (Matthew Auld)
- Use dma_resv_iter_is_exclusive() (Matthew Auld)
- Await all dma-resv fences before a migration blit (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-6-thomas.hellstrom@linux.intel.com
With async migration, the shrinker may end up wanting to release the
pages of an object while the migration blit is still running, since
the GT migration code doesn't set up VMAs and the shrinker is thus
oblivious to the fact that the GPU is still using the pages.
Add waiting for gpu in the shrinker_release_pages() op and an
argument to that function indicating whether the shrinker expects it
to not wait for gpu. In the latter case the shrinker_release_pages()
op will return -EBUSY if the object is not idle.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-5-thomas.hellstrom@linux.intel.com
There is an interesting refcounting loop:
struct intel_memory_region has a struct ttm_resource_manager,
ttm_resource_manager->move may hold a reference to i915_request,
i915_request may hold a reference to intel_context,
intel_context may hold a reference to drm_i915_gem_object,
drm_i915_gem_object may hold a reference to intel_memory_region.
Break this loop by dropping region reference counting.
In addition, Have regions with a manager moving fence make sure
that all region objects are released before freeing the region.
v6:
- Fix a code comment.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-4-thomas.hellstrom@linux.intel.com
For now, we will only allow async migration when TTM is used,
so the paths we care about are related to TTM.
The mmap path is handled by having the fence in ttm_bo->moving,
when pinning, the binding only becomes available after the moving
fence is signaled, and pinning a cpu map will only work after
the moving fence signals.
This should close all holes where userspace can read a buffer
before it's fully migrated.
v2:
- Fix a couple of SPARSE warnings
v3:
- Fix a NULL pointer dereference
v4:
- Ditch the moving fence waiting for i915_vma_pin_iomap() and
replace with a verification that the vma is already bound.
(Matthew Auld)
- Squash with a previous patch introducing moving fence waiting and
accessing interfaces (Matthew Auld)
- Rename to indicated that we also add support for sync waiting.
v5:
- Fix check for NULL and unreferencing i915_vma_verify_bind_complete()
(Matthew Auld)
- Fix compilation failure if !CONFIG_DRM_I915_DEBUG_GEM
- Fix include ordering. (Matthew Auld)
v7:
- Fix yet another compilation failure with clang if
!CONFIG_DRM_I915_DEBUG_GEM
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-2-thomas.hellstrom@linux.intel.com
Irrespective of the backend for request submissions, busyness for an
engine with an active context is calculated using:
busyness = total + (current_time - context_switch_in_time)
In execlists mode of operation, the context switch events are handled
by the CPU. Context switch in/out time and current_time are captured
in CPU time domain using ktime_get().
In GuC mode of submission, context switch events are handled by GuC and
the times in the above formula are captured in GT clock domain. This
information is shared with the CPU through shared memory. This results
in 2 caveats:
1) The time taken between start of a batch and the time that CPU is able
to see the context_switch_in_time in shared memory is dependent on GuC
and memory bandwidth constraints.
2) Determining current_time requires an MMIO read that can take anywhere
between a few us to a couple ms. A reference CPU time is captured soon
after reading the MMIO so that the caller can compare the cpu delta
between 2 busyness samples. The issue here is that the CPU delta and the
busyness delta can be skewed because of the time taken to read the
register.
These 2 factors affect the accuracy of the selftest -
live_engine_busy_stats. For (1) the selftest waits until busyness stats
are visible to the CPU. The effects of (2) are more prominent for the
current busyness sample period of 100 us. Increase the busyness sample
period from 100 us to 10 ms to overccome (2).
v2: Fix checkpatch issues
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211115221640.30793-1-umesh.nerlige.ramappa@intel.com
drm-misc-next for 5.17:
UAPI Changes:
* Remove restrictions on DMA_BUF_SET_NAME ioctl
* connector: State of privacy screen
* sysfs: Send hotplug uevent
Cross-subsystem Changes:
* clk/bmc-2835: Fixes
* dma-buf: Add dma_resv selftest; Error-handling fixes; Add debugfs
helpers; Remove dma_resv_get_excl_unlocked(); Documentation fixes
* pwm: Introduce of_pwm_single_xlate()
Core Changes:
* Support for privacy screens
* Make drm_irq.c legacy
* Fix __stack_depot_* name conflict
* Documentation fixes
* Fixes and cleanups
* dp-helper: Reuse 8b/10b link-training delay helpers
* format-helper: Update interfaces
* fb-helper: Allocate shadow buffer of correct size
* gem: Link GEM SHMEM and CMA helpers into separate modules; Use
dma_resv iterator; Import DMA_BUF namespace into GEM-helper modules
* gem/shmem-helper: Interface cleanups
* scheduler: Grab fence in drm_sched_job_add_implicit_dependencies();
Lockdep fixes
* kms-helpers: Link several files from core into the KMS-helper module
Driver Changes:
* Use dma_resv_iter in several places
* Fixes and cleanups
* amdgpu: Use drm_kms_helper_connector_hotplug_event(); Get all fences
at once
* bridge: Switch to managed MIPI DSI helpers in several places; Register
and attach during probe in several places; Convert to YAML in several
places
* bridge/anx7625: Support MIPI DPI input; Support HDMI audio; Fixes
* bridge/dw-hdmi: Allow interlace on bridge
* bridge/ps8640: Enable PM; Support aux-bus
* bridge/tc358768: Enabled reference clock; Support pulse mode;
Modesetting fixes
* bridge/ti-sn65dsi86: Use regmap_bulk_write(); Implement PWM
* etnaviv: Get all fences at once
* gma500: GEM object cleanups; Remove generic drivers in probe function
* i915: Support VESA panel backlights
* ingenic: Fixes and cleanups
* kirin: Adjust probe order
* kmb: Enable framebuffer console
* lima: Kconfig fixes
* meson: Refactoring to supperot DRM_BRIDGE_ATTACH_NO_ENCODER
* msm: Fixes and cleanups
* msm/dsi: Adjust probe order
* omap: Fixes and cleanups
* nouveau: CRC fixes; Validate LUTs in atomic check; Set HDMI AVI RGB
quantization to FULL; Fixes and cleanups
* panel: Support Innolux G070Y2-T02, Vivax TPC-9150, JDI R63452,
Newhaven 1.8-128160EF, Wanchanglong W552964ABA, Novatek NT35950,
BOE BF060Y8M, Sony Tulip Truly NT35521; Use dev_err_probe() throughout
drivers; Fixes and cleanups
* panel/ili9881c: Orientation fixes
* radeon: Use dma_resv_wait_timeout()
* rockchip: Add timeout for DSP hold; Suspend/resume fixes; PLL clock
fixes; Implement mmap in GEM object functions
* simpledrm: Support FB_DAMAGE_CLIPS and virtual screen sizes
* sun4i: Use CMA helpers without vmap support
* tidss: Fixes and cleanups
* v3d: Cleanups
* vc4: Fix HDMI-CEC hang when display is off; Power on HDMI controller
while disabling; Support 4k@60 Hz modes; Fixes and cleanups
* video: Convert to sysfs_emit() in several places
* video/omapfb: Fix fall-through
* virtio: Overflow fixes
* xen: Implement mmap as GEM object functions
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YZYZSypIrr+qcih3@linux-uq9g.fritz.box
Pull x86 fixes from Thomas Gleixner:
- Move the command line preparation and the early command line parsing
earlier so that the command line parameters which affect
early_reserve_memory(), e.g. efi=nosftreserve, are taken into
account. This was broken when the invocation of
early_reserve_memory() was moved recently.
- Use an atomic type for the SGX page accounting, which is read and
written locklessly, to plug various race conditions related to it.
* tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Fix free page accounting
x86/boot: Pull up cmdline preparation and early param parsing
Pull x86 perf fixes from Thomas Gleixner:
- Remove unneded PEBS disabling when taking LBR snapshots to prevent an
unchecked MSR access error.
- Fix IIO event constraints for Snowridge and Skylake server chips.
* tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/perf: Fix snapshot_branch_stack warning in VM
perf/x86/intel/uncore: Fix IIO event constraints for Snowridge
perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server