compute units that are in use.
[Why]
Allow user to know how many compute units (CU) are in use at any given
moment.
[How]
Surface files in Sysfs that allow user to determine the number of compute
units that are in use for a given process. One Sysfs file is used per
device.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-By: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
waves that are in flight.
[Why]
Allow user to know how many compute units (CU) are in use at any given
moment.
[How]
Read registers of SQ that give number of waves that are in flight
of various queues. Use this information to determine number of CU's
in use.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-By: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We only allow persistent requests to remain on the GPU past the closure
of their containing context (and process) so long as they are continuously
checked for hangs or allow other requests to preempt them, as we need to
ensure forward progress of the system. If we allow persistent contexts
to remain on the system after the the hangcheck mechanism is disabled,
the system may grind to a halt. On disabling the mechanism, we sent a
pulse along the engine to remove all executing contexts from the engine
which would check for hung contexts -- but we did not prevent those
contexts from being resubmitted if they survived the final hangcheck.
Fixes: 9a40bddd47 ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-stop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
(cherry picked from commit 7a991cd3e3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If we create a new node, it is possible for the slab allocator to return
us a recently freed node. If that node was just retired, it will retain
the current jiffy as its node->age. There is then a miniscule window,
where as that node is retired, it will appear on the free list with an
incorrect age and be eligible for reuse by one thread, and then by a
second thread as the correct node->age is written.
Fixes: 06b73c2d0b ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-3-chris@chris-wilson.co.uk
(cherry picked from commit 9bb34ff25c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
On 32b, highmem using a finite set of indirect PTE (i.e. vmap) to provide
virtual mappings of the high pages. As these are finite, map_new_virtual()
must wait for some other kmap() to finish when it runs out. If we map a
large number of objects, there is no method for it to tell us to release
the mappings, and we deadlock.
However, if we make an explicit vmap of the page, that uses a larger
vmalloc arena, and also has the ability to tell us to release unwanted
mappings. Most importantly, it will fail and propagate an error instead
of waiting forever.
Fixes: fb8621d3be ("drm/i915: Avoid allocating a vmap arena for a single page") #x86-32
References: e87666b52f ("drm/i915/shrinker: Hook up vmap allocation failure notifier")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v4.7+
Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-1-chris@chris-wilson.co.uk
(cherry picked from commit 060bb115c2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
gfxoff is temporarily disabled for navy_flounder, since
at present the feature caused some tdr when performing
display operations.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The same ECC check has been executed in amdgpu_ras_init for vega10,
prior to gmc_v9_0_late_init.
v2: drop all atombios helper callings
v3: use bit operation
v4: correct inline comment, remove parity check statement
v5: squash in build fix
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add code to gracefuly handle any pipe reassignment
occuring on dcn3 hardware. This should only happen when new
surfaces are used for an update rather than old ones updated.
Fixes: 69fc1f4b97 ("amd/drm/display: avoid dcn3 on flip opp change for slave pipes")
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Allow user to know number of compute units (CU) that are in use at any
given moment.
[How]
Read registers of SQ that give number of waves that are in flight
of various queues. Use this information to determine number of CU's
in use.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If we manage to hit the intel_gt_set_wedged_on_fini() while active, i.e.
module unload during a stress test, we may cancel the requests but not
clean up. This leads to a very slow module unload as we wait for
something or other to trigger the retirement flushing, or timeout and
unload with a bunch of warnings. Instead if we explicitly cancel then
cleanup on an active unload, it should be instant and quiet.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-3-chris@chris-wilson.co.uk
gfxoff is temporarily disabled for navy_flounder, since
at present the feature caused some tdr when performing
display operations.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As the dpm clock table is needed during DC HW initialization.
And that (DC HW initialization) comes before smu_late_init()
where current APU dpm clock table setup is performed. So, NULL
pointer dereference will be triggered. By moving APU dpm clock
table setup to smu_hw_init(), this can be avoided.
Fixes: 02cf91c113 ("drm/amd/powerplay: postpone operations not required for hw setup to late_init")
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reported-by: Dirk Gouders <dirk@gouders.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ttm_mem_type_manager_func.get_node was changed to return -ENOSPC
instead of setting the node pointer to NULL. Unfortunately
vmwgfx still had two places where it was explicitly converting
-ENOSPC to 0 causing regressions. This fixes those spots by
allowing -ENOSPC to be returned. That seems to fix recent
regressions with vmwgfx.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Sigend-off-by: Roland Scheidegger <sroland@vmware.com>
A recent attempt to fix a ref count leak in
amdgpu_display_crtc_set_config() turned out to be doing too much and
"fixed" an intended decrease as if it were a leak. Undo that part to
restore the proper balance. This is the very nature of this function
to increase or decrease the power reference count depending on the
situation.
Consequences of this bug is that the power reference would
eventually get down to 0 while the display was still in use,
resulting in that display switching off unexpectedly.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: e008fa6fb4 ("drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config")
Cc: stable@vger.kernel.org
Cc: Navid Emamdoost <navid.emamdoost@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Properly handle clang and older versions of gcc.
Fixes: e77165bf7b ("drm/amd/display: Add DCN3 blocks to Makefile")
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove gpu_info fw support for sienna_cichlid etc., since the
information can be retrieved from discovery binary.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The VG20 DIDs 66a0, 66a1 and 66a4 are used for various SKUs that may or may
not have the FRU EEPROM on it. Parse the VBIOS to check for server SKU
variants (D131 or D134) until a more general solution can be determined.
v2: Remove string-based logic, correct the VBIOS string comment
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to add one sysfs file -- "pp_od_clk_voltage" for
Raven/Raven2/Picasso APU, which is only used by dGPU like VEGA10.
This sysfs file supports the feature to modify gfx engine clock(Mhz units), it can
be used to configure the min value and the max value for gfx clock limited in the
safe range.
Command guide:
echo "s level clock" > pp_od_clk_voltage
s - adjust teh sclk level
level - 0 or 1, "0" represents the min value, "1" represents the max value
clock - the clock value(Mhz units), like 400, 800 or 1200, the value must be within the
OD_RANGE limits.
Example:
$ cat pp_od_clk_voltage
OD_SCLK:
0: 200Mhz
1: 1400Mhz
OD_RANGE:
SCLK: 200MHz 1400MHz
$ echo "s 0 600" > pp_od_clk_voltage
$ echo "s 1 1000" > pp_od_clk_voltage
$ cat pp_od_clk_voltage
OD_SCLK:
0: 600Mhz
1: 1000Mhz
OD_RANGE:
SCLK: 200MHz 1400MHz
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>