When all the firmware files are loaded from `Firmware::new`, it makes
sense to have the firmware request code as a closure. However, since we
eventually want each individual firmware constructor to request its own
file (and get rid of `Firmware` altogether), move this code into a
dedicated function that can be called by individual firmware types.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250913-nova_firmware-v6-4-9007079548b0@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
There are a few cases where we need the lowercase name of a given
chipset, notably to resolve firmware files paths for dynamic loading or
to build the module information.
So far, we relied on a static `NAMES` array for the latter, and some
CString hackery for the former.
Replace both with a new `name` const method that returns the lowercase
name of a chipset instance. We can generate it using the `paste!` macro.
Using this method removes the need to create a `CString` when loading
firmware, and lets us remove a couple of utility functions that now have
no user.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250913-nova_firmware-v6-3-9007079548b0@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Right now the GSP boot code is very incomplete and limited to running
FRTS, so having it in `Gpu::new` is not a big constraint.
However, this will change as we add more steps of the GSP boot process,
and not all GPU families follow the same procedure, so having these
steps in a dedicated method is the logical construct.
There is also the fact the GSP will require its own runtime data, and
while it won't immediately need to be pinned, we want to be ready for
the time where it will - most likely when it starts using mutexes.
Thus, add an empty `Gsp` type that is pinned inside `Gpu` and
initialized using a pin initializer. This sets the constraint we need to
observe from the start, and could spare us some costly refactoring down
the road.
Then, move the code related to GSP boot to the `gsp::boot` module, as
part of the `Gsp` implementation.
Doing so allows us to make `Gpu::new` return a fallible `impl PinInit`
instead of a `Result.` This is more idiomatic when working with pinned
objects, and sets up the pinned initialization pattern we want to
preserve as the code grows more complex.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250913-nova_firmware-v6-2-9007079548b0@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
We want to store the GSP and SEC2 falcon instances inside the `Gpu`
structure, but doing so require these types to implement `Send` for
`pci::Driver` to remain implementable on `NovaCore`, which embeds `Gpu`.
All implementors of `FalconEngine` and `FalconHal` satisfy the
requirements of `Send`, and these traits also already required `Sync`,
so this a minor tweak.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250913-nova_firmware-v6-1-9007079548b0@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Commit 6cc44e9618 ("drm: Add directive to format code in comment")
fixes original Sphinx indentation warning as introduced in
471920ce25 ("drm/gpuvm: Add locking helpers"), by means of using
code-block:: directive. It semantically conflicts with earlier
bb324f85f7 ("drm/gpuvm: Wrap drm_gpuvm_sm_map_exec_lock() expected
usage in literal code block") that did the same using double colon
syntax instead. These duplicated literal code block directives causes
the original warnings not being fixed.
Revert 6cc44e9618 to keep things rolling without these warnings.
Fixes: 6cc44e9618 ("drm: Add directive to format code in comment")
Fixes: 471920ce25 ("drm/gpuvm: Add locking helpers")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Fix this warning while building the documentation:
Documentation/gpu/xe/xe_configfs:9: drivers/gpu/drm/xe/xe_configfs.c:138:
WARNING: Definition list ends without a blank line; unexpected unindent.
That also makes it better formatted in the output.
While at it, also fix the underline length in "Overview".
Fixes: e2b33fce5e ("drm/xe/configfs: Improve documentation steps")
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-2-c8f7e48f7eae@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Refactor: eliminate type casts by using proper u32
declarations.
v2:
- Address review comments. (Karthik)
v3:
- Use the proper u32 type and drop cast. (Lucas De Marchi)
- Modify variable when actually using u64 value.
- Change r value to reg_value with u32 type.
v4:
- Remove newline between trailer and Signed-off-by. (Lucas De Marchi)
- Change reg_val to val for more user-friendly logging.
- Use mul_u32_u32 function since both values are u32.
v5:
- mul_u32_u32 function with shift. (Lucas De Marchi)
Fixes: 7596d839f6 ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
We already have dedicated helper macros for printing GT-oriented
messages but we don't have any to print messages that are tile
oriented and we wrongly try to use plain drm or GT-oriented ones.
Add tile-oriented printk messages and to provide similar coverage
as we have with xe_assert() macros. Also add set of simple macros
for the top level xe_device, which we could easily tweak to include
extra device specific info if needed.
Typical output of our printk macros will look like:
[drm] this is xe_WARN()
[drm] *ERROR* this is xe_err()
[drm] *ERROR* this is xe_err_printer()
[drm] this is xe_info()
[drm] this is xe_info_printer()
[drm:printk_demo.cold] this is xe_dbg()
[drm:printk_demo.cold] this is xe_dbg_printer()
[drm] Tile0: this is xe_tile_WARN()
[drm] *ERROR* Tile0: this is xe_tile_err()
[drm] *ERROR* Tile0: this is xe_tile_err_printer()
[drm] Tile0: this is xe_tile_info()
[drm] Tile0: this is xe_tile_info_printer()
[drm:printk_demo.cold] Tile0: this is xe_tile_dbg()
[drm:printk_demo.cold] Tile0: this is xe_tile_dbg_printer()
[drm] Tile0: GT0: this is xe_gt_WARN()
[drm] *ERROR* Tile0: GT0: this is xe_gt_err()
[drm] *ERROR* Tile0: GT0: this is xe_gt_err_printer()
[drm] Tile0: GT0: this is xe_gt_info()
[drm] Tile0: GT0: this is xe_gt_info_printer()
[drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg()
[drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg_printer()
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250909165941.31730-5-michal.wajdeczko@intel.com
If GuC hangs, the GuC logs might not contain enough information to
understand exactly why the hang occurred. In this case, we need to
look at the GuC HW state to try to understand where the GuC is stuck. It
is therefore useful to include the GuC HW state in the error capture.
The list of registers that are part of the GuC HW state can change based
on platform, but it is the same for all platforms from TGL to MTL so we
only need to support one version for i915.
v2: revised list
v3: remove confusing comment, use sizeof(u32) instead of 4 (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250909223621.3782625-2-daniele.ceraolospurio@intel.com
All recent platforms (including all the ones officially supported by the
Xe driver) do not allow concurrent execution of RCS and CCS workloads
from different address spaces, with the HW blocking the context switch
when it detects such a scenario.
The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a
context it knows will not be able to execute. This, however, causes a new
problem: if RCS and CCS queues have pending workloads from different
address spaces, the GuC needs to choose from which of the 2 queues to
pick the next workload to execute. By default, the GuC prioritizes RCS
submissions over CCS ones, which can lead to CCS workloads being
significantly (or completely) starved of execution time.
The driver can tune this by setting a dedicated scheduling policy KLV;
this KLV allows the driver to specify a quantum (in ms) and a ratio
(percentage value between 0 and 100), and the GuC will prioritize the CCS
for that percentage of each quantum.
Given that we want to guarantee enough RCS throughput to avoid missing
frames, we set the yield policy to 20% of each 80ms interval.
v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable
in gt_sanitize
Fixes: d9a1ae0d17 ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
This effectively reverts commit 4c3fe5eae4 ("drm/xe/pf: Limit
fair VF LMEM provisioning") since we don't need it any more after
non-contig VRAM allocations were fixed. This allows larger LMEM
auto-provisioning for VFs, so instead:
[ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M)
[ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM
or
[ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M)
[ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM
we may get:
[ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M)
[ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM
and
[ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M)
[ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM
Fixes: 1e32ffbc9d ("drm/xe/sriov: support non-contig VRAM provisioning")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com
GuC has an interface to set a power profile for the SLPC algorithm.
Base mode is default and ensures a balanced performance, power_saving
mode has conservative up/down thresholds and is suitable for use with
apps that typically need to be power efficient. This will result in
lower GT frequencies, thus consuming lower power.
Selected power profile will be displayed in this format:
$ cat power_profile
[base] power_saving
$ echo power_saving > power_profile
$ cat power_profile
base [power_saving]
v2: Address review comments (Rodrigo)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add a Rust driver for ARM Mali CSF-based GPUs. It is a port of Panthor
and therefore exposes Panthor's uAPI and name to userspace, and the
product of a joint effort between Collabora, Arm and Google engineers.
The aim is to incrementally develop Tyr with the abstractions that are
currently available until it is consider to be in parity with Panthor
feature-wise.
The development of Tyr itself started in January, after a few failed
attempts of converting Panthor piecewise through a mix of Rust and C
code. There is a downstream branch that's much further ahead in terms of
capabilities than this initial patch.
The downstream code is capable of booting the MCU, doing sync VM_BINDS
through the work-in-progress GPUVM abstraction and also doing (trivial)
submits through Asahi's drm_scheduler and dma_fence abstractions. So
basically, most of what one would expect a modern GPU driver to do,
except for power management and some other very important adjacent
pieces. It is not at the point where submits can correctly deal with
dependencies, or at the point where it can rotate access to the GPU
hardware fairly through a software scheduler, but that is simply a
matter of writing more code.
This first patch, however, only implements a subset of the current
features available downstream, as the rest is not implementable without
pulling in even more abstractions. In particular, a lot of things depend
on properly mapping memory on a given VA range, which itself depends on
the GPUVM abstraction that is currently work-in-progress. For this
reason, we still cannot boot the MCU and thus, cannot do much for the
moment.
This constitutes a change in the overall strategy that we have been
using to develop Tyr so far. By submitting small parts of the driver
upstream iteratively, we aim to:
a) evolve together with Nova and rvkms, hopefully reducing regressions
due to upstream changes (that may break us because we were not there, in
the first place)
b) prove any work-in-progress abstractions by having them run on a real
driver and hardware and,
c) provide a reason to work on and review said abstractions by providing
a user, which would be tyr itself.
Despite its limited feature-set, we offer IGT tests. It is only tested
on the rk3588, so any other SoC is probably not going to work at all for
now.
The skeleton is basically taken from Nova and also
rust_platform_driver.rs.
Lastly, the name "Tyr" is inspired by Norse mythology, reflecting ARM's
tradition of naming their GPUs after Nordic mythological figures and
places.
Co-developed-by: Beata Michalska <beata.michalska@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Co-developed-by: Carsten Haitzler <carsten.haitzler@foss.arm.com>
Signed-off-by: Carsten Haitzler <carsten.haitzler@foss.arm.com>
Co-developed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://www.collabora.com/news-and-blog/news-and-events/introducing-tyr-a-new-rust-drm-driver.html
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
[aliceryhl: minor Kconfig update on apply]
[aliceryhl: s/drm::device::/drm::/]
Link: https://lore.kernel.org/r/20250910-tyr-v3-1-dba3bc2ae623@collabora.com
Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Before exporting a buffer, make sure it has been populated with
pages at least once.
While discussing cgroups we noticed a problem where you could export
a BO to a dma-buf without having it ever being backed or accounted for.
This meant in low memory situations or eventually with cgroups, a
lower privledged process might cause the compositor to try and allocate
a lot of memory on it's behalf and this could fail. At least make
sure the exporter has managed to allocate the RAM at least once
before exporting the object.
This only applies currently to TTM_PL_SYSTEM objects, because
GTT objects get populated on first validate, and VRAM doesn't
use TT.
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250904021643.2050497-4-airlied@gmail.com
Before exporting a buffer, make sure it has been populated with
pages at least once.
While discussing cgroups we noticed a problem where you could export
a BO to a dma-buf without having it ever being backed or accounted for.
This meant in low memory situations or eventually with cgroups, a
lower privledged process might cause the compositor to try and allocate
a lot of memory on it's behalf and this could fail. At least make
sure the exporter has managed to allocate the RAM at least once
before exporting the object.
This only applies currently to TTM_PL_SYSTEM objects, because
GTT objects get populated on first validate, and VRAM doesn't
use TT.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250904021643.2050497-3-airlied@gmail.com
Before exporting a buffer, make sure it has been populated with
pages at least once.
While discussing cgroups we noticed a problem where you could export
a BO to a dma-buf without having it ever being backed or accounted for.
This meant in low memory situations or eventually with cgroups, a
lower privledged process might cause the compositor to try and allocate
a lot of memory on it's behalf and this could fail. At least make
sure the exporter has managed to allocate the RAM at least once
before exporting the object.
This only applies currently to TTM_PL_SYSTEM objects, because
GTT objects get populated on first validate, and VRAM doesn't
use TT.
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250904021643.2050497-2-airlied@gmail.com
While discussing cgroups we noticed a problem where you could export
a BO to a dma-buf without having it ever being backed or accounted for.
This meant in low memory situations or eventually with cgroups, a
lower privledged process might cause the compositor to try and allocate
a lot of memory on it's behalf and this could fail. At least make
sure the exporter has managed to allocate the RAM at least once
before exporting the object.
This only applies currently to TTM_PL_SYSTEM objects, because
GTT objects get populated on first validate, and VRAM doesn't
use TT.
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250904021643.2050497-1-airlied@gmail.com
The upstream mesa copy of the GPU regs has shifted more things to reg64
instead of seperate 32b HI/LO reg32's. This works better with the "new-
style" c++ builders that mesa has been migrating to for a6xx+ (to better
handle register shuffling between gens), but it leaves the C builders
with missing _HI/LO builders.
So handle the special case of reg64, automatically generating the
missing _HI/LO builders.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/673559/
Since these generated files are no longer checked in, either in mesa or
in the linux kernel, simplify things by dropping the verbose generated
comment.
These were semi-nerf'd on the kernel side, in the name of build
reproducibility, by commit ba64c6737f ("drivers: gpu: drm: msm:
registers: improve reproducibility"), but in a way that was semi-
kernel specific. We can just reduce the divergence between kernel
and mesa by just dropping all of this.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/673551/
On systems with multiple GPUs there can be uncertainty which GPU is the
primary one used to drive the display at bootup. In some desktop
environments this can lead to increased power consumption because
secondary GPUs may be used for rendering and never go to a low power
state. In order to disambiguate this add a new sysfs attribute
'boot_display' that uses the output of video_is_primary_device() to
populate whether the PCI device was used for driving the display.
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/issues/23
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250811162606.587759-5-superm1@kernel.org
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>