drm/xe/guc: Don't treat GuC generic CAT error as protocol error

GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any
context. We shouldn't treat that as G2H protocol error that would
justify a GT reset, as it may happen due to some VF activity.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241105204557.1991-1-michal.wajdeczko@intel.com
This commit is contained in:
Michal Wajdeczko
2024-11-05 21:45:57 +01:00
parent 6b5f15445c
commit 44e21ea6dc
2 changed files with 10 additions and 0 deletions

View File

@@ -17,6 +17,7 @@
#define G2H_LEN_DW_TLB_INVALIDATE 3
#define GUC_ID_MAX 65535
#define GUC_ID_UNKNOWN 0xffffffff
#define GUC_CONTEXT_DISABLE 0
#define GUC_CONTEXT_ENABLE 1

View File

@@ -2021,6 +2021,15 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
guc_id = msg[0];
if (guc_id == GUC_ID_UNKNOWN) {
/*
* GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any PF/VF
* context. In such case only PF will be notified about that fault.
*/
xe_gt_err_ratelimited(gt, "Memory CAT error reported by GuC!\n");
return 0;
}
q = g2h_exec_queue_lookup(guc, guc_id);
if (unlikely(!q))
return -EPROTO;