When under severe stress for GTT mappable space, the LRU eviction model
falls off a cliff. We spend all our time scanning the much larger
non-mappable area searching for something within the mappable zone we can
evict. Turn this on its head by only using the full vma for the object if
it is already pinned in the mappable zone or there is sufficient *free*
space to accommodate it (prioritizing speedy reuse). If there is not,
immediately fall back to using small chunks (tilerow for GTT mmap, single
pages for pwrite/relocation) and using random eviction before doing a full
search.
Testcase: igt/gem_concurrent_blt
References: https://bugs.freedesktop.org/show_bug.cgi?id=110848
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190821123234.19194-1-chris@chris-wilson.co.uk
TTM assumes that drivers initialize the embedded GEM object before
calling the ttm_bo_init() function. This is not currently the case
in the Nouveau driver. Fix this by splitting up nouveau_bo_new()
into nouveau_bo_alloc() and nouveau_bo_init() so that the GEM can
be initialized before TTM BO initialization when necessary.
Fixes: b96f3e7c80 ("drm/ttm: use gem vma_node")
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190814093524.GA31345@ulmo
drm-misc-next for 5.4:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dma-buf: add reservation_object_fences helper, relax
reservation_object_add_shared_fence, remove
reservation_object seq number (and then
restored)
- dma-fence: Shrinkage of the dma_fence structure,
Merge dma_fence_signal and dma_fence_signal_locked,
Store the timestamp in struct dma_fence in a union with
cb_list
Driver Changes:
- More dt-bindings YAML conversions
- More removal of drmP.h includes
- dw-hdmi: Support get_eld and various i2s improvements
- gm12u320: Few fixes
- meson: Global cleanup
- panfrost: Few refactors, Support for GPU heap allocations
- sun4i: Support for DDC enable GPIO
- New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02,
Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1
Toppoly TD043MTEA1
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: fixup dma_resv rename fallout]
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819141923.7l2adietcr2pioct@flea
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: sunxi_defconfig arm):
drivers/gpu/drm/sun4i/sun4i_tcon.c: In function ‘sun4i_tcon0_mode_set_dithering’:
drivers/gpu/drm/sun4i/sun4i_tcon.c:318:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
val |= SUN4I_TCON0_FRM_CTL_MODE_B;
drivers/gpu/drm/sun4i/sun4i_tcon.c:319:2: note: here
case MEDIA_BUS_FMT_RGB666_1X18:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: multi_v7_defconfig arm):
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c: In function ‘sun6i_dsi_transfer’:
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:993:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (msg->rx_len == 1) {
^
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:998:2: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
The current SKUs added for Tiger Lake don't have DDIC hooked up, even
though it is supported by the SoC. The current state for these SKUs is
problematic since while enabling the combo phy, PORT_COMP_DW* return
0xFFFFFFFF, which is invalid per register definition.
During initialization we check what phys are not yet enabled by reading
PHY_MISC_C and try to enable it by toggling the "DE to IO Comp Pwr Down"
bit. But after that any read to the PORT_COMP_DW* returns invalid
results. This removes the following warning
[56997.634353] Missing case (val == 4294967295)
[56997.639241] WARNING: CPU: 5 PID: 768 at drivers/gpu/drm/i915/display/intel_combo_phy.c:54 cnl_get_procmon_ref_values+0xc9/0xf0 [i915]
[56997.639808] Modules linked in: i915(+) prime_numbers x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000e [last unloaded: prime_numbers]
[56997.639808] CPU: 5 PID: 768 Comm: insmod Tainted: G U W 5.2.0-demarchi+ #65
[56997.639808] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2252.A03.1906270154 06/27/2019
[56997.639808] RIP: 0010:cnl_get_procmon_ref_values+0xc9/0xf0 [i915]
[56997.639808] Code: 2c a0 85 c9 74 e0 81 f9 00 00 00 01 75 09 48 c7 c0 0c a4 2c a0 eb cf 48 c7 c6 3c 3a 31 a0 48 c7 c7 40 3a 31 a0 e8 6b 4d ea e0 <0f> 0b 48 c7 c0 00 a4 2c a0 eb b1 48 c7 c0 24 a4 2
c a0 eb a8 e8 be
[56997.639808] RSP: 0018:ffffc9000068f8a8 EFLAGS: 00010286
[56997.639808] RAX: 0000000000000000 RBX: ffff88848fa90000 RCX: 0000000000000000
[56997.639808] RDX: ffff8884a08b5ef8 RSI: ffff8884a08a6658 RDI: 00000000ffffffff
[56997.639808] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
[56997.639808] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88848fa90000
[56997.639808] R13: 0000000000000000 R14: 0000000000000002 R15: 0006c00000162000
[56997.639808] FS: 00007f61ca3d12c0(0000) GS:ffff8884a0880000(0000) knlGS:0000000000000000
[56997.639808] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56997.639808] CR2: 00007f71be6a92c0 CR3: 0000000494750006 CR4: 0000000000760ee0
[56997.639808] PKRU: 55555554
[56997.639808] Call Trace:
[56997.639808] cnl_verify_procmon_ref_values+0x36/0xf0 [i915]
[56997.639808] ? rcu_read_lock_sched_held+0x6f/0x80
[56997.639808] ? gen11_fwtable_read32+0x257/0x290 [i915]
[56997.639808] icl_combo_phy_verify_state.part.0+0x22/0xa0 [i915]
[56997.639808] intel_combo_phy_init+0x17e/0x3e0 [i915]
[56997.639808] ? icl_display_core_init+0x2c/0x1a0 [i915]
[56997.639808] ? _raw_spin_unlock_irqrestore+0x4c/0x60
[56997.639808] icl_display_core_init+0x34/0x1a0 [i915]
[56997.639808] intel_power_domains_init_hw+0x200/0x570 [i915]
[56997.639808] i915_driver_probe+0x103b/0x17e0 [i915]
[56997.639808] ? printk+0x53/0x6a
[56997.639808] i915_pci_probe+0x3b/0x190 [i915]
We may or may not need to change the implementation to account for DDIC
being available on other SKUs. For now I think the best thing to do is
to just disable the port.
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190814235517.10032-1-lucas.demarchi@intel.com
The current assertion tries to make sure that we do not over count the
number of used PDE inside a page directory -- that is with an array of
512 pde, we do not expect more than 512 elements used! However, our
assertion has to take into account that as we pin an element into the
page directory, the caller first pins the page directory so the usage
count is one higher. However, this should be one extra pin per thread,
and the upper bound is that we may have one thread for each entry.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190820141218.14714-1-chris@chris-wilson.co.uk
To work around a DMC/Punit issue on ICL where the driver's
ICL_PORT_COMP_DW8/IREFGEN PHY setting is lost when entering/exiting DC6
state, make sure to reinit the PHY whenever disabling DC states.
Similarly the driver's PHY/DBUF/CDCLK settings should have been preserved
across DC5/6 transitions, so check this on all platforms.
This gets rid of the following WARN during suspend:
Combo PHY A HW state changed unexpectedly
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816095523.15800-1-imre.deak@intel.com
Factor the main copy page to vram routine out into a helper that acts
on a single page and which doesn't require the nouveau_dmem_migrate
structure for argument passing. As an added benefit the new version
only allocates the dma address array once and reuses it for each
subsequent chunk of work.
Link: https://lore.kernel.org/r/20190814075928.23766-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Factor the main copy page to ram routine out into a helper that acts on
a single page and which doesn't require the nouveau_dmem_fault
structure for argument passing. Also remove the loop over multiple
pages as we only handle one at the moment, although the structure of
the main worker function makes it relatively easy to add multi page
support back if needed in the future. But at least for now this avoid
the needed to dynamically allocate memory for the dma addresses in
what is essentially the page fault path.
Link: https://lore.kernel.org/r/20190814075928.23766-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There isn't any good reason to pass callbacks to migrate_vma. Instead
we can just export the three steps done by this function to drivers and
let them sequence the operation without callbacks. This removes a lot
of boilerplate code as-is, and will allow the drivers to drastically
improve code flow and error handling further on.
Link: https://lore.kernel.org/r/20190814075928.23766-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When using mmu_notifer_unregister_no_release() the caller must ensure
there is a SRCU synchronize before the mn memory is freed, otherwise use
after free races are possible, for instance:
CPU0 CPU1
invalidate_range_start
hlist_for_each_entry_rcu(..)
mmu_notifier_unregister_no_release(&p->mn)
kfree(mn)
if (mn->ops->invalidate_range_end)
The error unwind in amdkfd misses the SRCU synchronization.
amdkfd keeps the kfd_process around until the mm is released, so split the
flow to fully initialize the kfd_process and register it for find_process,
and with the notifier. Past this point the kfd_process does not need to be
cleaned up as it is fully ready.
The final failable step does a vm_mmap() and does not seem to impact the
kfd_process global state. Since it also cannot be undone (and already has
problems with undo if it internally fails), it has to be last.
This way we don't have to try to unwind the mmu_notifier_register() and
avoid the problem with the SRCU.
Along the way this also fixes various other error unwind bugs in the flow.
Fixes: 45102048f7 ("amdkfd: Add process queue manager module")
Link: https://lore.kernel.org/r/20190806231548.25242-10-jgg@ziepe.ca
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
radeon is using a device global hash table to track what mmu_notifiers
have been registered on struct mm. This is better served with the new
get/put scheme instead.
radeon has a bug where it was not blocking notifier release() until all
the BO's had been invalidated. This could result in a use after free of
pages the BOs. This is tied into a second bug where radeon left the
notifiers running endlessly even once the interval tree became
empty. This could result in a use after free with module unload.
Both are fixed by changing the lifetime model, the BOs exist in the
interval tree with their natural lifetimes independent of the mm_struct
lifetime using the get/put scheme. The release runs synchronously and just
does invalidate_start across the entire interval tree to create the
required DMA fence.
Additions to the interval tree after release are already impossible as
only current->mm is used during the add.
Link: https://lore.kernel.org/r/20190806231548.25242-9-jgg@ziepe.ca
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is a significant simplification, it eliminates all the remaining
'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
paths, and takes away all the ugly locking and abuse of page_table_lock.
mmu_notifier_get() provides the single struct hmm per struct mm which
eliminates mm->hmm.
It also directly guarantees that no mmu_notifier op callback is callable
while concurrent free is possible, this eliminates all the krefs inside
the mmu_notifier callbacks.
The remaining krefs in the range code were overly cautious, drivers are
already not permitted to free the mirror while a range exists.
Link: https://lore.kernel.org/r/20190806231548.25242-6-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The 'memory-region' property of the komeda display driver DT binding
allows the use of a 'reserved-memory' node for buffer allocations. Add
the requisite of_reserved_mem_device_{init,release} calls to actually
make use of the memory if present.
Changes since v1:
- Move handling inside komeda_parse_dt
Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com>
Signed-off-by: Ayan Kumar Halder <ayan.halder@arm.com>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Link:- https://patchwork.kernel.org/patch/11076413/
Use the new cec_notifier_conn_(un)register() functions to
(un)register the notifier for the HDMI connector, and fill in
the cec_connector_info.
Changes since v6:
- move cec_notifier_conn_unregister to a bridge detach
function,
- add a mutex protecting a CEC notifier.
Changes since v4:
- typo fix
Changes since v2:
- removed unnecessary NULL check before a call to
cec_notifier_conn_unregister,
- use cec_notifier_phys_addr_invalidate to invalidate physical
address.
Changes since v1:
Add memory barrier to make sure that the notifier
becomes visible to the irq thread once it is fully
constructed.
Signed-off-by: Dariusz Marcinkiewicz <darekm@google.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tested-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190814104520.6001-9-darekm@google.com