linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-04 10:56:06 -04:00

Author	SHA1	Message	Date
Riana Tauro	256daa32c9	drm/xe: Enable Boot Survivability mode Enable boot survivability mode if pcode initialization fails and if boot status indicates a failure. In this mode, drm card is not exposed and driver probe returns success after loading the bare minimum to allow firmware to be flashed via mei. v2: abstract survivability mode variable add BMG check inside function (Jani, Rodrigo) v3: return -EBUSY during system suspend (Anshuman) check survivability mode in pci probe only on error Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-28 08:58:46 -05:00
Riana Tauro	5e940312a2	drm/xe: Add functions and sysfs for boot survivability Boot Survivability is a software based workflow for recovering a system in a failed boot state. Here system recoverability is concerned with recovering the firmware responsible for boot. This is implemented by loading the driver with bare minimum (no drm card) to allow the firmware to be flashed through mei-gsc and collect telemetry. The driver's probe flow is modified such that it enters survivability mode when pcode initialization is incomplete and boot status denotes a failure. In this mode, drm card is not exposed and presence of survivability_mode entry in PCI sysfs is used to indicate survivability mode and provide additional information required for debug This patch adds initialization functions and exposes admin readable sysfs entries The new sysfs will have the below layout /sys/bus/.../bdf ├── survivability_mode v2: reorder headers fix doc remove survivability info and use mode to display information use separate function for logging survivability information for critical error (Rodrigo) v3: use for loop use dev logs instead of drm use helper function for aux history(Rodrigo) remove unnecessary error check of greater than max_scratch as we are reading only 3 bit v4: fix checkpatch warnings fix space (Rodrigo) rename register Signed-off-by: Riana Tauro <riana.tauro@intel.com> Acked-by: Ashwin Kumar Kulkarni <ashwin.kumar.kulkarni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-2-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-28 08:58:45 -05:00
José Roberto de Souza	cb1f868ca1	drm/xe: Make GUC binaries dump consistent with other binaries in devcoredump All other(hwsp, hwctx and vmas) binaries follow this format: [name].length: 0x1000 [name].data: xxxxxxx [name].error: errno The error one is just in case by some reason it was not able to capture the binary. So this GuC binaries should follow the same patern. v2: - renamed GUC binary to LOG Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-3-jose.souza@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 19:41:07 -08:00
Lucas De Marchi	2c95bbf500	drm/xe: Fix and re-enable xe_print_blob_ascii85() Commit `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool") partially reverted some changes to workaround breakage caused to mesa tools. However, in doing so it also broke fetching the GuC log via debugfs since xe_print_blob_ascii85() simply bails out. The fix is to avoid the extra newlines: the devcoredump interface is line-oriented and adding random newlines in the middle breaks it. If a tool is able to parse it by looking at the data and checking for chars that are out of the ascii85 space, it can still do so. A format change that breaks the line-oriented output on devcoredump however needs better coordination with existing tools. v2: Add suffix description comment v3: Reword explanation of xe_print_blob_ascii85() calling drm_puts() in a loop Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: stable@vger.kernel.org Fixes: `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool") Fixes: `ec1455ce7e` ("drm/xe/devcoredump: Add ASCII85 dump helper function") Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-2-jose.souza@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 19:40:00 -08:00
Lucas De Marchi	a37934ea75	drm/xe/devcoredump: Move exec queue snapshot to Contexts section Having the exec queue snapshot inside a "GuC CT" section was always wrong. Commit `c28fd6c358` ("drm/xe/devcoredump: Improve section headings and add tile info") tried to fix that bug, but with that also broke the mesa tool that parses the devcoredump, hence it was reverted in commit `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool"). With the mesa tool also fixed, this can propagate as a fix on both kernel and userspace side to avoid unnecessary headache for a debug feature. Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: stable@vger.kernel.org Fixes: `70fb86a85d` ("drm/xe: Revert some changes that break a mesa debug tool") Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250123051112.1938193-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 15:10:02 -08:00
John Harrison	ef34861098	drm/xe: Upgrade complaint about missing slice info The steering code needs to know slice/subslice counts and this information should be retrieved from the hwconfig table. However, earlier platforms don't have it, hence the KMD has a fallback path. Newer platforms really should have the entries and if they are missing that is a bug that needs to be fixed in the table. So update the complaint to be an error on newer platforms and remove it completely for older ones that we know are bad (but are not POR for the Xe driver anyway). Also, re-word the message a little to make it clearer what the issue is. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250118005403.2960807-1-John.C.Harrison@Intel.com	2025-01-27 12:25:19 -08:00
Michal Wajdeczko	a4d1c5d0b9	drm/xe/pf: Move VFs reprovisioning to worker Since the GuC is reset during GT reset, we need to re-send the entire SR-IOV provisioning configuration to the GuC. But since this whole configuration is protected by the PF master mutex and we can't avoid making allocations under this mutex (like during LMEM provisioning), we can't do this reprovisioning from gt-reset path if we want to be reclaim-safe. Move VFs reprovisioning to a async worker that we will start from the gt-reset path. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250125215505.720-1-michal.wajdeczko@intel.com	2025-01-27 20:34:18 +01:00
Michal Wajdeczko	14b6674608	drm/xe/pf: Use GuC Buffer Cache during policy provisioning Start using GuC buffer cache for the SRIOV policy configuration actions. This is a required step before we could declare SRIOV PF as being a reclaim safe. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124185247.676-1-michal.wajdeczko@intel.com	2025-01-27 19:53:59 +01:00
Vinay Belgaumkar	897286f294	drm/xe/pmu: Add GT C6 events Provide a PMU interface for GT C6 residency counters. The interface is similar to the one available for i915, but gt is passed in the config when creating the event. Sample usage and output: $ perf list \| grep gt-c6 xe_0000_00_02.0/gt-c6-residency/ [Kernel PMU event] $ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency* ==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <== event=0x01 ==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <== ms $ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000 # time counts unit events 1.001196056 1,001 ms xe_0000_00_02.0/gt-c6-residency,gt=0/ 2.005216219 1,003 ms xe_0000_00_02.0/gt-c6-residency,gt=0/ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:56:27 -08:00
Lucas De Marchi	6ea5bf169a	drm/xe/pmu: Add attribute skeleton Add the generic support for defining new attributes. This only adds the macros and common infra for the event counters, but no counters yet. This is going to be added as follow up changes. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:04 -08:00
Lucas De Marchi	4ee64041bc	drm/xe/pmu: Get/put runtime pm on event init When the event is created, make sure runtime pm is taken and later put: in order to read an event counter the GPU needs to remain accessible and doing a get/put during perf's read is not possible it's holding a raw_spinlock. Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:03 -08:00
Lucas De Marchi	ef7ce39386	drm/xe/pmu: Extract xe_pmu_event_update() Like other pmu drivers, keep the update separate from the read so it can be called from other methods (like stop()) without side effects. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:03 -08:00
Lucas De Marchi	257a10c18e	drm/xe/pmu: Assert max gt XE_PMU_MAX_GT needs to be used due to a circular dependency, but we should make sure it doesn't go out of sync with XE_PMU_MAX_GT. Add a compile check for that. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:55:03 -08:00
Vinay Belgaumkar	011c1e246a	drm/xe/pmu: Enable PMU interface Basic PMU enabling patch. Setup the basic framework for adding events. Based on previous versions by Bommu Krishnaiah, Aravind Iddamsetty and Riana Tauro, using i915 and rapl as reference implementations. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-27 08:54:06 -08:00
Ashutosh Dixit	d3fedff828	drm/xe/oa: Set stream->pollin in xe_oa_buffer_check_unlocked We rely on stream->pollin to decide whether or not to block during poll/read calls. However, currently there are blocking read code paths which don't even set stream->pollin. The best place to consistently set stream->pollin for all code paths is therefore to set it in xe_oa_buffer_check_unlocked. Fixes: `e936f885f1` ("drm/xe/oa/uapi: Expose OA stream fd") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250115222029.3002103-1-ashutosh.dixit@intel.com	2025-01-23 18:07:03 -08:00
Vinay Belgaumkar	dddc53806d	drm/xe/ptl: Apply Wa_13011645652 Extend Wa_13011645652 to PTL. Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250116184659.384874-1-vinay.belgaumkar@intel.com	2025-01-22 14:20:27 -08:00
Lucas De Marchi	e0a4cd6ace	MAINTAINERS: Also exclude xe for drm-misc When the xe driver was added, it didn't extend the exclude entries for drm-misc, as done in commit `5a44d50f00` ("MAINTAINERS: Update drm-misc entry to match all drivers"). Exclude it like is done for i915 and other drivers with dedicated maintainers. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250117164529.393503-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-21 13:06:51 -08:00
Michal Wajdeczko	5994018ecf	drm/xe/guc: Fix sizeof(32) typo A small typo leads to the following static code checker warning: drivers/gpu/drm/xe/xe_guc_buf.c:81 xe_guc_buf_reserve() warn: sizeof(NUMBER)? Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/intel-xe/0d5bcbf1-79f9-4a10-a221-ddbaec9f6122@stanley.mountain/ Fixes: `696bfdf273` ("drm/xe/guc: Introduce the GuC Buffer Cache") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250121094832.588-1-michal.wajdeczko@intel.com	2025-01-21 22:01:28 +01:00
Michal Wajdeczko	9ebb5846e1	drm/xe/pf: Fix migration initialization The migration support only needs to be initialized once, but it was incorrectly called from the xe_gt_sriov_pf_init_hw(), which is part of the reset flow and may be called multiple times. Fixes: `d86e3737c7` ("drm/xe/pf: Add functions to save and restore VF GuC state") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250120232443.544-1-michal.wajdeczko@intel.com	2025-01-21 21:52:47 +01:00
Ashutosh Dixit	cfa9d40db8	drm/xe/oa: Preserve oa_ctrl unused bits UMD's have interest in setting unused bits of the oa_ctrl register "out of band" for certain experiments. To facilitate this, don't clobber previous oa_ctrl unused bits, i.e. rmw the values rather than simply write them. Fixes: `e936f885f1` ("drm/xe/oa/uapi: Expose OA stream fd") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250117032155.3048063-1-ashutosh.dixit@intel.com	2025-01-21 09:29:47 -08:00
Maarten Lankhorst	380b0cdaa7	drm/xe: Move suballocator init to after display init No allocations should be done before we have had a chance to preserve the display fb. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210083111.230484-4-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-01-21 14:59:38 +01:00
Rodrigo Vivi	a46ea12eca	drm/xe/uapi: Fix documentation indentation Fix these issues: Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:817: WARNING: +Bullet list ends without a blank line; unexpected unindent. Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:835: WARNING: +Definition list ends without a blank line; unexpected unindent. Fixes: `75d37750a7` ("drm/xe/mmap: Add mmap support for PCI memory barrier") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/intel-xe/20250117164023.3fdc00b9@canb.auug.org.au/ Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250117193827.91779-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-21 08:45:28 -05:00
Maarten Lankhorst	f3b5945780	drm/xe: Do not attempt to bootstrap VF in execlists mode It was mentioned in a review that there is a possibility of choosing to load the module with VF in execlists mode. Of course this doesn't work, just bomb out as hard as possible. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210083111.230484-12-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-01-21 14:21:25 +01:00
Satyanarayana K V P	173baa1b2d	drm/xe: Suppress printing of mode when running in non-sriov mode The xe_sriov_probe_early() function prints the sriov pf/vf mode on driver probe. When running in non-sriov mode, the below debug message is seen. "Running in none mode". This print does not convey any information. This commit suppresses this debug message and shows only when running in PF/VF mode. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250116055617.20611-1-satyanarayana.k.v.p@intel.com	2025-01-19 00:39:45 +01:00
Michal Wajdeczko	238f96315a	drm/xe/kunit: Add KUnit tests for GuC Buffer Cache Add tests to make sure that recently added GuC Buffer Cache component is working as expected. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250114192140.1039-1-michal.wajdeczko@intel.com	2025-01-19 00:12:07 +01:00
Michal Wajdeczko	f90b552dcb	drm/xe/kunit: Allow to replace xe_managed_bo_create_pin_map() We want to use replacement functions in upcoming kunit tests. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-11-michal.wajdeczko@intel.com	2025-01-19 00:12:06 +01:00
Michal Wajdeczko	d8b2149ba8	drm/xe/pf: Use GuC Buffer Cache during VFs provisioning Start using GuC buffer cache for the VF's configuration actions. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-10-michal.wajdeczko@intel.com	2025-01-19 00:12:05 +01:00
Michal Wajdeczko	696bfdf273	drm/xe/guc: Introduce the GuC Buffer Cache The purpose of the GuC Buffer Cache is to maintain a set ofreusable buffers that could be used while sending some of the CTB H2G actions that require separate buffer with indirect data. Currently only few PF actions need this so initialize it only when running as a PF. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-9-michal.wajdeczko@intel.com	2025-01-19 00:12:03 +01:00
Michal Wajdeczko	c49ca67181	drm/xe/sa: Minor header cleanups Drop unused struct xe_bo forward declaration and, while around, fix unnecessary line split in xe_sa_bo_free() declaration. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-8-michal.wajdeczko@intel.com	2025-01-19 00:12:02 +01:00
Michal Wajdeczko	ae8b507fb8	drm/xe/sa: Allow creating suballocator with custom guard size Actual xe_sa_manager implementation uses hardcoded 4K to exclude it from making suballocations but in upcoming patch we want to reuse the xe_sa_manager where such 4K guard is not needed. Add another variant of the xe_sa_bo_manager_init() function that accepts arbitrary guard size. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-7-michal.wajdeczko@intel.com	2025-01-19 00:12:00 +01:00
Michal Wajdeczko	0e1871f61e	drm/xe/sa: Allow making suballocations using custom gfp flags Actual xe_sa_manager implementation uses hardcoded GFP_KERNEL flag during creation of suballocations but in upcoming patch we want to reuse the xe_sa_manager in places where GFP_KERNEL is not allowed. Add another variant of the xe_sa_bo_new() function that accepts arbitrary gfp flags. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-6-michal.wajdeczko@intel.com	2025-01-19 00:11:59 +01:00
Michal Wajdeczko	7e937cdf18	drm/xe/sa: Tidy up coding style in init() There is no need to use tile_to_xe() since we already got the xe. And we should keep all variable declarations together, no need for separate sa_manager declaration. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-5-michal.wajdeczko@intel.com	2025-01-19 00:11:58 +01:00
Michal Wajdeczko	97ee0e351f	drm/xe/sa: Improve error message on init failure Instead of raw errno value we can print friendly error code and also print size of the buffer object that we fail to prepare. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-4-michal.wajdeczko@intel.com	2025-01-19 00:11:57 +01:00
Michal Wajdeczko	d29cddd49b	drm/xe/sa: Drop redundant NULL assignments The sa_manager is drmm_kzalloc'ed so all members are already zero. And in case of kvzalloc() failure we are not returning pointer to the sa_manager at all, so no point in resetting .bo member. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-3-michal.wajdeczko@intel.com	2025-01-19 00:11:57 +01:00
Michal Wajdeczko	9cd3f4efc8	drm/xe/sa: Always call drm_suballoc_manager_fini() After successful call to drm_suballoc_manager_init() we should make sure to call drm_suballoc_manager_fini() as it may include some cleanup code even if we didn't start using it for real. As we can abort init() early due to kvzalloc() failure, we should either explicitly call drm_suballoc_manager_fini() or, even better, postpone drm_suballoc_manager_init() once we finish all other preparation steps, so we can rely on fini() that will do cleanup. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-2-michal.wajdeczko@intel.com	2025-01-19 00:11:56 +01:00
Michal Wajdeczko	13265fe742	drm/xe/vf: Perform early GT MMIO initialization to read GMDID VFs need to communicate with the GuC to obtain the GMDID value and existing GuC functions used for that assume that the GT has it's MMIO members already setup. However, due to recent refactoring the gt->mmio is initialized later, and any attempt by the VF to use xe_mmio_read\|write() from GuC functions will lead to NPD crash due to unset MMIO register address: [] xe 0000:00:02.1: [drm] Running in SR-IOV VF mode [] xe 0000:00:02.1: [drm] GT0: sending H2G MMIO 0x5507 [] BUG: unable to handle page fault for address: 0000000000190240 Since we are already tweaking the id and type of the primary GT to mimic it's a Media GT before initializing the GuC communication, we can also call xe_gt_mmio_init() to perform early setup of the gt->mmio which will make those GuC functions work again. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250114211347.1083-1-michal.wajdeczko@intel.com	2025-01-18 22:04:59 +01:00
Michal Wajdeczko	bbd8429264	drm/xe: Always setup GT MMIO adjustment data While we believed that xe_gt_mmio_init() will be called just once per GT, this might not be a case due to some tweaks that need to performed by the VF driver during early probe. To avoid leaving any stale data in case of the re-run, reset the GT MMIO adjustment data for the non-media GT case. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241114175955.2299-2-michal.wajdeczko@intel.com	2025-01-18 22:03:42 +01:00
Francois Dugast	474c4dd29f	drm/xe: Add missing SPDX license identifiers Ensure all Xe driver files have a proper SPDX license identifier, add it in files where it was missing. Link: https://patchwork.freedesktop.org/patch/msgid/20250116124532.1480351-1-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-01-17 15:25:52 +01:00
Oak Zeng	b824709ee1	drm/xe: Fix a typo in xe_vm_doc.h s/vm->ttm.base.resv->lock/vm->gpuvm.r_obj->resv->lock Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113212324.3264218-1-oak.zeng@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-01-17 00:01:58 +05:30
Oak Zeng	22b1a53f28	drm/xe: Print vm parameter in xe_vma trace Print the vm that the vma belongs to in the vma trace. This is useful to correlate VMA operations to the VM. Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218164833.2364049-4-oak.zeng@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-01-17 00:00:59 +05:30
Oak Zeng	861b27584d	drm/xe: Print vm flags in xe_vm trace print Print vm flags in xe_vm trace print. This is helpful to diagnosis the VM mode of operation. Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218164833.2364049-3-oak.zeng@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-01-17 00:00:59 +05:30
Oak Zeng	63060df6f7	drm/xe: trace bo create Add a tracepoint to trace bo create. Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218164833.2364049-2-oak.zeng@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-01-17 00:00:59 +05:30
Matthew Brost	758debf35b	drm/xe: Mark ComputeCS read mode as UC on iGPU RING_CMD_CCTL read index should be UC on iGPU parts due to L3 caching structure. Having this as WB blocks ULLS from being enabled. Change to UC to unblock ULLS on iGPU. v2: - Drop internal communications commnet, bspec is updated Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: stable@vger.kernel.org Fixes: `328e089bfb` ("drm/xe: Leverage ComputeCS read L3 caching") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250114002507.114087-1-matthew.brost@intel.com	2025-01-16 08:26:20 -08:00
Tejas Upadhyay	75d37750a7	drm/xe/mmap: Add mmap support for PCI memory barrier In order to avoid having userspace to use MI_MEM_FENCE, we are adding a mechanism for userspace to generate a PCI memory barrier with low overhead (avoiding IOCTL call as well as writing to VRAM will adds some overhead). This is implemented by memory-mapping a page as uncached that is backed by MMIO on the dGPU and thus allowing userspace to do memory write to the page without invoking an IOCTL. We are selecting the MMIO so that it is not accessible from the PCI bus so that the MMIO writes themselves are ignored, but the PCI memory barrier will still take action as the MMIO filtering will happen after the memory barrier effect. When we detect special defined offset in mmap(), We are mapping 4K page which contains the last of page of doorbell MMIO range to userspace for same purpose. For user to query special offset we are adding special flag in mmap_offset ioctl which needs to be passed as follows, struct drm_xe_gem_mmap_offset mmo = { .handle = 0, /* this must be 0 */ .flags = DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER, }; igt_ioctl(fd, DRM_IOCTL_XE_GEM_MMAP_OFFSET, &mmo); map = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, mmo); IGT : `b2dbc6f228` UMD : https://github.com/intel/compute-runtime/pull/772 V7: - Dgpu filter added V6(MAuld) - Move physical mmap to fault handler - Modify kernel-doc and attach UMD PR when ready V5(MAuld) - Return invalid early in case of non 4K PAGE_SIZE - Format kernel-doc and add note for 4K PAGE_SIZE HW limit V4(MAuld) - Add kernel-doc for uapi change - Restrict page size to 4K V3(MAuld) - Remove offset defination from UAPI to be able to change later - Edit commit message for special flag addition V2(MAuld) - Add fault handler with dummy page to handle unplug device - Add Build check for special offset to be below normal start page - Test d3hot, mapping seems to be valid in d3hot as well - Add more info to commit message Cc: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113114201.3178806-1-tejas.upadhyay@intel.com	2025-01-16 11:50:00 +00:00
John Harrison	174e9ce0da	drm/xe/guc: Drop error messages about missing GuC logs The GuC log snapshot code would complain loudly if there was no GuC log to take a snapshot of or if the snapshot alloc failed. Originally, this code was only called on demand when a user (or developer) explicitly requested a dump of the log. Hence an error message was useful. However, it is now part of the general devcoredump file and is called for any GPU hang. Most people don't care about GuC logs and GPU hangs do not generally mean a kernel/GuC bug. More importantly, there are valid situations where there is no GuC log, e.g. SRIOV VFs. So drop the error message. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3958 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113194405.2033085-1-John.C.Harrison@Intel.com	2025-01-15 16:19:53 -08:00
Francois Dugast	11a64adcdb	drm/xe/xe3: Generate and store the L3 bank mask On Xe3, the register used to indicate which L3 banks are enabled on the system is a new one called MIRROR_L3BANK_ENABLE. Each bit represents one bank enabled in each node. Extend the existing topology code for Xe3 to read this register and generate the correct L3 bank mask, which can be read by user space throug the topology query. Bspec: 72573, 73439 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250114203853.35055-1-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-01-15 12:39:22 -08:00
Rodrigo Vivi	50554bf3e5	drm/xe/lnl: Enable GuC SLPC DCC task Enable DCC (Duty Cycle Control) in Lunar Lake. DCC is the SLPC task that tries to keep the GT from operating inefficiently when thermally constrained. Although the recommendation is to enable it, LNL GuC is leaving it disabled by default on LNL. It would minimize the GT frequency oscillation on throttled scenarios, which could potentially reduce latencies. v2: Move set_policies call after wait for running state, so we ensure it is not overwritten. (Vinay) v3: Fix English in the commit message (Jonathan) v4: Also set disable to 0 so DCC can really get into effect. v5: Avoid lnl_ prefix (Vinay) v6: Finish renaming... Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> #v3 Link: https://patchwork.freedesktop.org/patch/msgid/20250115145053.1142023-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-15 14:12:52 -05:00
Rodrigo Vivi	aaab5404b1	drm/xe: Introduce GuC PC debugfs Allows the visualization of the current GuC power conservation status and policies. v2: Fix DCC msg (Vinay) v3: Simplify pc_get_state_string (Jonathan) Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250115145053.1142023-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-15 14:12:52 -05:00
Oak Zeng	0af944f0e3	drm/xe: Reject BO eviction if BO is bound to current VM This is a follow up fix for https://patchwork.freedesktop.org/patch/msgid/20241203021929.1919730-1-oak.zeng@intel.com The overall goal is to fail vm_bind when there is memory pressure. See more details in the commit message of above patch. Abbove patch fixes the issue when user pass in a vm_id parameter during gem_create. If user doesn't pass in a vm_id during gem_create, above patch doesn't help. This patch further reject BO eviction (which could be triggered by bo validation) if BO is bound to the current VM. vm_bind could fail due to the eviction failure. The BO to VM reverse mapping structure is used to determine whether BO is bound to VM. v2: Move vm_bo definition from function scope to if(evict) clause (Thomas) Further constraint the condition by adding ctx->resv (Thomas) Add a short comment describe the change. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110210137.3181576-1-oak.zeng@intel.com	2025-01-15 11:58:58 +01:00
Matt Roper	3318ef9888	drm/xe: Remove unused "mmio_ext" code The "mmio_ext" and 'REG_EXT" code is currently unused on any existing platform. Going forward, this also isn't the design we want to use for any future platforms/features either, so we should just go ahead and remove the dead code to avoid confusion. mmio_ext was originally added in an attempt to hack around the early (mis)design of the Xe driver, which used xe_gt as the target for all register MMIO access, even those completely unrelated to the GT subunit of the hardware. With the introduction of commit `34953ee349` ("drm/xe: Create dedicated xe_mmio structure") and its follow-up patches, that misdesign has been corrected and access to register MMIO regions specific to hardware units is now done through xe_mmio structures which encapsulate an iomap, region size, and some other metadata. Although all of the registers used by the driver today happen to fall within one specific PCI BAR region, and thus re-use a single device-wide iomap, there's no requirement that this stay true for future platforms or features. I.e., if a future platform adds a new 'foo' hardware unit that exists at a different area in the BAR, or even in a completely different BAR, then that would be handled by doing a separate iomap of that unit's register region and wrapping it in its own 'struct xe_mmio foo_regs' structure. The pointer to the new 'foo_regs' could be placed within the xe_device, xe_tile, xe_gt, etc., according to where the new hardware unit falls within the current hardware hierarchy. This effectively reverts the following commits, although parts of these commits had already vanished or changed with the earlier xe_mmio refactor work: - commit `399a13323f` ("drm/xe: add 28-bit address support in struct xe_reg") - commit `fdef72e02e` ("drm/xe: add a flag to bypass multi-tile config from MTCFG reg") - commit `866b2b1764` ("drm/xe: add MMIO extension support flags") - commit `ef29b390c7` ("drm/xe: map MMIO BAR according to the num of tiles in device desc") - commit `a4e2f3a299` ("drm/xe: refactor xe_mmio_probe_tiles to support MMIO extension") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Koby Elbaz <kelbaz@habana.ai> Acked-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106234312.2986065-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-01-14 12:09:10 -08:00

1 2 3 4 5 ...

1324024 Commits