linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-07 22:08:33 -04:00

Author	SHA1	Message	Date
Oded Gabbay	60d7bbb5b4	accel/habanalabs: fix field names in hl_info_hw_ip_info Don't use padX for actual reservedX fields. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>	2023-03-20 17:35:30 +02:00
Dafna Hirschfeld	5d8a5f2965	accel/habanalabs: in {e/p}dma_core events read the err cause reg Since the err_cause register is unprivileged, we should read it from the driver instead of using the param that came from the FW. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-20 17:35:30 +02:00
Dafna Hirschfeld	f8d139a71b	accel/habanalabs: fix use of var reset_sleep_ms - remove reset_sleep_ms arg from functions that don't use it. - move the call msleep(reset_sleep_ms) from btm poll to gaudi2_hw_fini as it is called from there already for other flow. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-20 17:35:30 +02:00
Dafna Hirschfeld	077a39fabe	accel/habanalabs: in hw_fini return error code if polling timed-out In hw_fini callback, we use either the cpucp packet method or polling a register. Currently we return error only in the case of cpucp packet failure. In this patch we also return error if polling timed out. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-20 17:35:29 +02:00
Ofir Bitton	8c695455ee	accel/habanalabs: increase reset poll timeout Due to a firmware bug we need to increase reset poll timeout or else we will timeout in secured environments. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-20 17:35:29 +02:00
Koby Elbaz	7c766e58cc	accel/habanalabs: do not verify engine modes after being changed Engines idle state can't always be verified between changes of engine modes (e.g., stall/halt). For example, if a CS is inflight when altering engine's mode, idle state will return NOT idle, always. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-20 17:35:29 +02:00
Oded Gabbay	336b78c655	accel/habanalabs: align to latest firmware specs Copy the most up-to-date interface files to the firmware. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>	2023-03-20 17:35:25 +02:00
Colin Ian King	443355d2ff	accel/habanalabs: Fix spelling mistake "maped" -> "mapped" There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-20 17:35:25 +02:00
Oded Gabbay	79c164372e	accel/habanalabs: make gaudi2_is_device_idle() static This function is only called inside gaudi2.c file. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303071320.X5ouBlNY-lkp@intel.com/ Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>	2023-03-20 17:31:02 +02:00
Dave Airlie	90031bc33f	Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.4-2023-03-17: amdgpu: - Misc code cleanups - Documentation fixes - Make kobj structures const - Add thermal throttling adjustments for supported APUs - UMC RAS fixes - Display reset fixes - DCN 3.2 fixes - Freesync fixes - DC code reorg - Generalize dmabuf import to work with KFD - DC DML fixes - SRIOV fixes - UVD code cleanups - IH 4.4.2 updates - HDP 4.4.2 updates - SDMA 4.4.2 updates - PSP 13.0.6 updates - Add capped/uncapped workload handling for supported APUs - DCN 3.1.4 updates - Re-org DC Kconfig - USB4 fixes - Reorg DC plane and stream handling - Register vga_switcheroo for apple-gmux - SMU 13.0.6 updates - Fix error checking in read_mm_registers functions for affected families - VCN 4.0.4 fix - Drop redundant pci_enable_pcie_error_reporting() call - RDNA2 SMU OD suspend/resume fix - Expose additional memory stats via fdinfo - RAS fixes - Misc display fixes - DP MST fixes - IOMMU regression fix for KFD amdkfd: - Make kobj structures const - Support for exporting buffers via dmabuf - Multi-VMA page migration fixes - NBIO fixes - Misc code cleanups - Fix possible double free - Fix possible UAF radeon: - iMac fix UAPI: - KFD dmabuf export support. Required for importing KFD buffers into GEM contexts and for RDMA P2P support. Proposed user mode changes: https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230317164416.138340-1-alexander.deucher@amd.com	2023-03-20 16:44:36 +10:00
Tom Rix	b24343eace	drm/nouveau/nvfw/acr: set wpr_generic_header_dump storage-class-specifier to static gcc with W=1 reports drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: error: no previous prototype for ‘wpr_generic_header_dump’ [-Werror=missing-prototypes] 49 \| wpr_generic_header_dump(struct nvkm_subdev *subdev, \| ^~~~~~~~~~~~~~~~~~~~~~~ wpr_generic_header_dump is only used in acr.c, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230302124819.686469-1-trix@redhat.com	2023-03-16 14:53:15 +01:00
Tom Rix	c14bff92ab	drm/nouveau/fifo: set nvkm_engn_cgrp_get storage-class-specifier to static smatch reports drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:33:18: warning: symbol 'nvkm_engn_cgrp_get' was not declared. Should it be static? nvkm_engn_cgrp_get is only used in runl.c, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230228221533.3240520-1-trix@redhat.com	2023-03-16 14:53:15 +01:00
Tom Rix	abe3c66f34	drm/nouveau/fifo: set gf100_fifo_nonstall_block_dump storage-class-specifier to static gcc with W=1 reports drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: error: no previous prototype for ‘gf100_fifo_nonstall_block’ [-Werror=missing-prototypes] 451 \| gf100_fifo_nonstall_block(struct nvkm_event *event, int type, int index) \| ^~~~~~~~~~~~~~~~~~~~~~~~~ gf100_fifo_nonstall_block is only used in gf100.c, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230303132731.1919329-1-trix@redhat.com	2023-03-16 14:53:15 +01:00
Felix Kuehling	7ee938ac00	drm/amdgpu: Don't resume IOMMU after incomplete init Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other kgd2kfd calls. This should fix IOMMU errors on resume from suspend when KFD IOMMU initialization failed. Reported-by: Matt Fagnani <matt.fagnani@bell.net> Link: https://lore.kernel.org/r/4a3b225c-2ffd-e758-4de1-447375e34cad@bell.net/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454 Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> Cc: stable@vger.kernel.org Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Matt Fagnani <matt.fagnani@bell.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
bobzhou	0bad3200df	drm/amd: fix compilation issue with legacy gcc This patch is used to fix following compilation issue with legacy gcc error: ‘for’ loop initial declarations are only allowed in C99 mode Signed-off-by: bobzhou <bob.zhou@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	4489f0fd9e	drm/amdgpu: Retire pcie_gen3_enable function Not needed since from vi. drop the function so we don't duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	dabc114e4b	drm/amdgpu: Move to common helper to query soc rev_id Replace soc15, nv, soc21 get_rev_id callback with common helper so we don't need to duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	65ba96e91b	drm/amdgpu: Move to common indirect reg access helper Replace soc15, nv, soc21 specific callbacks with common one. so we don't need to duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	370808876b	drm/amdgpu: drop ras check at asic level for new blocks amdgpu_ras_register_ras_block should always be invoked by ras_sw_init, where driver needs to check ras caps at ip level, instead of asic level. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	fdc94d3a8c	drm/amdgpu: Rework pcie_bif ras sw_init pcie_bif ras blocks needs to be initialized as early as possible to handle fatal error detected in hw_init phase. also align the pcie_bif ras sw_init with other ras blocks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	da9d669eab	drm/amdgpu: Rework xgmi_wafl_pcs ras sw_init To align with other IP blocks. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	7f544c5488	drm/amdgpu: Rework mca ras sw_init To align with other IP blocks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
David Belanger	22e3d9343b	drm/amdkfd: Fixed kfd_process cleanup on module exit. Handle case when module is unloaded (kfd_exit) before a process space (mm_struct) is released. v2: Fixed potential race conditions by removing all kfd_process from the process table first, then working on releasing the resources. v3: Fixed loop element access / synchronization. Fixed extra empty lines. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Aric Cyr	3726b6e7c0	drm/amd/display: 3.2.227 This version brings along the following: - FW Release 0.0.158.0 - Fixes to HDCP, DP MST and more - Improvements on USB4 links and more - Code re-architecture on link.h Reviewed-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Anthony Koo	c79503dc2e	drm/amd/display: [FW Promotion] Release 0.0.158.0 [Why & How] Add boot control bit to control dispclk and dppclk deep sleep Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Samson Tam	05cff51055	drm/amd/display: fix assert condition [Why & How] Reversed assert condition when checking that phy_pix_clk[] is not 0 Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Stylon Wang	c416a9e4e3	drm/amd/display: Clearly states if long or short HPD event in dmesg logs [Why] The log "DMUB HPD callback" is crucial to identify when DP tunneling is been established and driver is notified of this event from DMUB. Same log is shared for long and short hotplug event and we need to check trailing DC debug log to distinguish between them two, making debugging on DPIA related issues a bit more troublesome. [How] Clearly states in dmesg logs whether this is a long or short hotplug event. Reviewed-by: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Wesley Chalmers	7108a1c127	drm/amd/display: Make DCN32 functions available to future DCNs [Why & How] Make DCN32 functions available for more DCNs. Reviewed-by: Chris Park <Chris.Park@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Yifan Zha	057e335c71	drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOV [Why] If disable the mmhub vm contexts(set MMVM_CONTEXTS_DISABLE to 0xffff), driver loading failed on vf due to fence fallback timer expired on all rings. FLR cannot reset MMVM_CONTEXTS_DISABLE. So this vf can not be recovered anymore unless trigger a whole gpu reset. [How] Under SRIOV, init MMVM_CONTEXTS_DISABLE in gmc11 golden register setting. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Horace Chen <Horace.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Samson Tam	5f3401eeb0	drm/amd/display: reallocate DET for dual displays with high pixel rate ratio [Why] For dual displays where pixel rate is much higher on one display, we may get underflow when DET is evenly allocated. [How] Allocate less DET segments for the lower pixel rate display and more DET segments for the higher pixel rate display Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Ayush Gupta	562e08223a	drm/amd/display: disconnect MPCC only on OTG change [Why] Framedrops are observed while playing Vp9 and Av1 10 bit video on 8k resolution using VSR while playback controls are disappeared/appeared [How] Now ODM 2 to 1 is disabled for 5k or greater resolutions on VSR. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Ayush Gupta <ayugupta@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Cruise Hung	deaccddaf4	drm/amd/display: Fix DP MST sinks removal issue [Why] In USB4 DP tunneling, it's possible to have this scenario that the path becomes unavailable and CM tears down the path a little bit late. So, in this case, the HPD is high but fails to read any DPCD register. That causes the link connection type to be set to sst. And not all sinks are removed behind the MST branch. [How] Restore the link connection type if it fails to read DPCD register. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:26 -04:00
Robin Chen	eeefe7c482	drm/amd/display: hpd rx irq not working with eDP interface [Why] This is the fix for the defect of commit `ab144f0b4a` ("drm/amd/display: Allow individual control of eDP hotplug support"). [How] To revise the default eDP hotplug setting and use the enum to git rid of the magic number for different options. Fixes: `ab144f0b4a` ("drm/amd/display: Allow individual control of eDP hotplug support") Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Robin Chen <robin.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:13 -04:00
Zack Rusin	328839ff93	drm/vmwgfx: Fix src/dst_pitch confusion The src/dst_pitch got mixed up during the rework of the function, make sure the offset's refer to the correct one. Spotted by clang: Clang warns (or errors with CONFIG_WERROR): drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c:509:29: error: variable 'dst_pitch' is uninitialized when used here [-Werror,-Wuninitialized] src_offset = ddirty->top * dst_pitch + ddirty->left * stdu->cpp; ^~~~~~~~~ drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c:492:26: note: initialize the variable 'dst_pitch' to silence this warning s32 src_pitch, dst_pitch; ^ = 0 1 error generated. Signed-off-by: Zack Rusin <zackr@vmware.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Dave Airlie <airlied@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1811 Fixes: `39985eea5a` ("drm/vmwgfx: Abstract placement selection") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314211445.1363828-1-zack@kde.org	2023-03-15 16:15:25 -04:00
Tvrtko Ursulin	4230cea89c	drm: Track clients by tgid and not tid Thread group id (aka pid from userspace point of view) is a more interesting thing to show as an owner of a DRM fd, so track and show that instead of the thread id. In the next patch we will make the owner updated post file descriptor handover, which will also be tgid based to avoid ping-pong when multiple threads access the fd. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314141904.1210824-2-tvrtko.ursulin@linux.intel.com	2023-03-15 14:03:00 +01:00
Bjorn Helgaas	4516fede7c	accel/habanalabs: Drop redundant pci_enable_pcie_error_reporting() pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since commit `f26e58bf6f` ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:16 +02:00
Tomer Tayar	2e8e9a895c	accel/habanalabs: postpone mem_mgr IDR destruction to hpriv_release() The memory manager IDR is currently destroyed when user releases the file descriptor. However, at this point the user context might be still held, and memory buffers might be still in use. Later on, calls to release those buffers will fail due to not finding their handles in the IDR, leading to a memory leak. To avoid this leak, split the IDR destruction from the memory manager fini, and postpone it to hpriv_release() when there is no user context and no buffers are used. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Dafna Hirschfeld	801507d3b3	accel/habanalabs: move soft-reset wait to soft-reset execute We plan to do soft-reset either by mmio or by using cpucp packet depending on the FW version. We don't want to check FW version in two different places for that (execute soft-reset and wait to soft-reset) so move the waiting to gaudi2_execute_soft_reset. This also makes sense because the cpucp also does the waiting. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Koby Elbaz	f7f0085eec	accel/habanalabs: add uapi to stall/resume engine The user might want to stall/resume engines to perform power testing for various scenarios. Because our current HL_CS_FLAGS_ENGINE_CORE_COMMAND command only handles the engines' cores, we need to add another opcode for handling entire engine and not just its core. The user supplies an array, where each entry holds the engine's ID and the command to send to the engine. The size of the array is limited by the number of engines in the ASIC (only Gaudi2 is currently supported). Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Tomer Tayar	28fbc058f2	accel/habanalabs: use scnprintf() in print_device_in_use_info() compose_device_in_use_info() was added to handle the snprintf() return value in a single place. However, the buffer size in print_device_in_use_info() is set such that it would be enough for the max possible print, so compose_device_in_use_info() is not really needed. Moreover, scnprintf() can be used instead of snprintf(), to save the check if the return value larger than the given size. Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>	2023-03-15 13:29:15 +02:00
Dafna Hirschfeld	087fe7c9c2	accel/habanalabs: unify err log of hw-fini failure in dirty state print more informative message when failing in dirty state Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>	2023-03-15 13:29:15 +02:00
Koby Elbaz	7ffb5ced2b	accel/habanalabs: use a mutex rather than a spinlock There are two reasons why mutex is better here: 1. There's a critical section relatively long, where in certain scenarios (e.g., multiple VM allocations) taking a spinlock might cause noticeable performance degradation. 2. It will remove the incorrect usage of mutex under spin_lock (where preemption is disabled). Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Dafna Hirschfeld	c7ac65c881	accel/habanalabs: allow getting HL_INFO_DRAM_USAGE during soft-reset We can allow userspace to query the dram usage during soft-reset. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Koby Elbaz	9732d5d0d4	accel/habanalabs: fix register address on PDMA/EDMA idle check The PDMA/EDMA is_idle routines didn't check the correct CORE register in order to get the accurate idle state. Moreover, it's better to make the is_idle routine more robust by adding additional checks (IS_HALTED) before announcing that the core is idle. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Koby Elbaz	f831aade13	accel/habanalabs: remove a useless is_idle TPC flag Is appears that the flag - DCORE0_TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK, has no actual use when it comes to querying TPC idleness, since this flag's corresponding bit turns-off after stalling the engine, and turns back on after resuming it. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
farah kassabri	25ebbc57ca	accel/habanalabs: fix few misspelled words in the code Run spell checker on the code and fix accordingly. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Koby Elbaz	75276e23d1	accel/habanalabs: verify return code after scrubbing ARCs DCCMs In case the KDMA fails scrubbing the DCCMs (following a soft-reset upon device release), the driver will only print failure until reset flow ends, rather than escalating it into a hard-reset. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:15 +02:00
Tomer Tayar	81e609bebb	accel/habanalabs: use notifications and graceful reset for decoder Add notifications to user in case of decoder abnormal interrupts, and use the graceful reset mechanism if reset is required. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:14 +02:00
Dafna Hirschfeld	86b74d8438	accel/habanalabs: assert return value of hw_fini Since hw_fini return error code for failure indication, we should check its return value. Currently it might only fail upon soft-reset from hl_device_reset. Later patch will add hw_fini failure in case of polling timeout in hard-reset. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:14 +02:00
Koby Elbaz	d85f0531b9	accel/habanalabs: break is_idle function into per-engine sub-routines is_idle() was too long, so break it up for readability. In addition, we can now use the new sub-routines from other places. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-03-15 13:29:14 +02:00

1 2 3 4 5 ...

1170202 Commits