habanalabs/gaudi: collect undefined opcode error info

when an undefined opcode error occurres, the driver collects
the relevant information from the Qman and stores it inside
the hdev data structure. An event fd indication is sent towards the
user space.

Note: another commit shall be followed which will add support to
read the error info by an ioctl.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
Tal Cohen
2022-05-11 18:02:39 +03:00
committed by Oded Gabbay
parent 41021f728a
commit a7d6c35bcd
5 changed files with 138 additions and 32 deletions

View File

@@ -1402,9 +1402,13 @@ struct hl_debug_args {
/*
* Notifier event values - for the notification mechanism and the HL_INFO_GET_EVENTS command
*
* HL_NOTIFIER_EVENT_TPC_ASSERT - Indicates TPC assert event
* HL_NOTIFIER_EVENT_TPC_ASSERT - Indicates TPC assert event
* HL_NOTIFIER_EVENT_UNDEFINED_OPCODE - Indicates undefined operation code
* HL_NOTIFIER_EVENT_DEVICE_RESET - Indicates device requires a reset
*/
#define HL_NOTIFIER_EVENT_TPC_ASSERT (1 << 0)
#define HL_NOTIFIER_EVENT_TPC_ASSERT (1ULL << 0)
#define HL_NOTIFIER_EVENT_UNDEFINED_OPCODE (1ULL << 1)
#define HL_NOTIFIER_EVENT_DEVICE_RESET (1ULL << 2)
/*
* Various information operations such as: