mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-04-29 12:55:24 -04:00
habanalabs/gaudi: recover from CPU WD event
There are rare cases where the device CPU's watchdog has expired and as a result, the watchdog reset has happened and the CPU will now move to running its preboot f/w. When that happens, the driver will only know that a heartbeat failure occurred. As a result, the driver will send a message to the CPU's main f/w asking it to reset the device, but because the CPU is now running preboot, it won't respond and the re-initialization process will later fail when trying to load the f/w. The solution is to send the request to the preboot as well, only if the reset was caused because of HB failure. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
/*
|
||||
* Copyright 2016-2020 HabanaLabs, Ltd.
|
||||
* Copyright 2016-2021 HabanaLabs, Ltd.
|
||||
* All Rights Reserved.
|
||||
*/
|
||||
|
||||
@@ -4296,6 +4296,24 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset
|
||||
|
||||
WREG32(irq_handler_offset,
|
||||
gaudi_irq_map_table[GAUDI_EVENT_HALT_MACHINE].cpu_id);
|
||||
|
||||
/* This is a hail-mary attempt to revive the card in the small chance that the
|
||||
* f/w has experienced a watchdog event, which caused it to return back to preboot.
|
||||
* In that case, triggering reset through GIC won't help. We need to trigger the
|
||||
* reset as if Linux wasn't loaded.
|
||||
*
|
||||
* We do it only if the reset cause was HB, because that would be the indication
|
||||
* of such an event.
|
||||
*
|
||||
* In case watchdog hasn't expired but we still got HB, then this won't do any
|
||||
* damage.
|
||||
*/
|
||||
if (hdev->curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT) {
|
||||
if (hdev->asic_prop.hard_reset_done_by_fw)
|
||||
hl_fw_ask_hard_reset_without_linux(hdev);
|
||||
else
|
||||
hl_fw_ask_halt_machine_without_linux(hdev);
|
||||
}
|
||||
} else {
|
||||
if (hdev->asic_prop.hard_reset_done_by_fw)
|
||||
hl_fw_ask_hard_reset_without_linux(hdev);
|
||||
|
||||
Reference in New Issue
Block a user