Clippy fires two clippy::precedence warnings on the manual
byte-shifting expression:
warning: operator precedence can trip the unwary
--> drivers/gpu/nova-core/vbios.rs:511:17
|
511 | / u32::from(data[29]) << 24
512 | | | u32::from(data[28]) << 16
513 | | | u32::from(data[27]) << 8
| |______________________________________________^
Clear the warnings by replacing manual byte-shifting with
u32::from_le_bytes(). Using from_le_bytes() is also a tiny code
improvement, because it uses less code and is clearer about the intent.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Link: https://patch.msgid.link/20260404212831.78971-2-jhubbard@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Commit a88831502c ("gpu: nova-core: falcon: use dma::Coherent")
dropped the nova-local `DmaObject` device memory type for the
kernel-global `Coherent` one.
This switch had a side-effect: `DmaObject` always aligned the requested
size to `PAGE_SIZE`, and also reported that adjusted size when queried.
`Coherent`, on the other hand, does page-align allocation sizes but only
allows CPU access on the exact size provided by the caller.
This change runs into a limitation of falcon DMA copies, namely that DMA
accesses are done on blocks of exactly 256 bytes. If the provided data
does not have a length that is a multiple of 256, `dma_wr` returns
an error.
It was expected that all firmwares would present the proper adjusted
size, but this is not the case at least on my GA107:
NovaCore 0000:08:00.0: DMA transfer goes beyond range of DMA object
NovaCore 0000:08:00.0: Failed to load FWSEC firmware: EINVAL
NovaCore 0000:08:00.0: probe with driver NovaCore failed with error -22
Fix this by padding the `Coherent`'s size to `MEM_BLOCK_ALIGNMENT` (i.e.
256) when allocating it and filling it with zeroes, before copying the
firmware on top of it.
Fixes: a88831502c ("gpu: nova-core: falcon: use dma::Coherent")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260405-falcon-dma-roundup-v2-1-4af5b2ff9c16@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
`driver_read_area` and `driver_write_area` are internal methods that
return slices containing the area of the command queue buffer that the
driver has exclusive read or write access, respectively.
While their returned value is correct and safe to use, internally they
temporarily create a reference to the whole command-buffer slice,
including GSP-owned regions. These regions can change without notice,
and thus creating a slice to them, even if never accessed, is undefined
behavior.
Fix this by making these methods create slices to valid regions only.
Fixes: 75f6b1de81 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Reported-by: Danilo Krummrich <dakr@kernel.org>
Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260404-cmdq-ub-fix-v5-1-53d21f4752f5@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Up until now, only the GSP required parsing of its firmware headers.
However, upcoming support for Hopper/Blackwell+ adds another firmware
image (FMC), along with another format (ELF32).
Therefore, the current ELF64 section parsing support needs to be moved
up a level, so that both of the above can use it.
There are no functional changes. This is pure code movement.
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260326013902.588242-8-jhubbard@nvidia.com
[acourbot: use fuller prefix in commit message.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Replace the nova-core local `DmaObject` with a `CoherentBox` that can
fulfill the same role.
Since `CoherentBox` is more flexible than `DmaObject`, we can use the
native `u64` type for page table entries instead of messing with bytes.
The `dma` module becomes unused with that change, so remove it as well.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-7-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
A very common pattern is to create a block of coherent memory with the
content of an already-existing slice of bytes (e.g. a loaded firmware
blob).
`CoherentBox` makes this easier, but still implies a potentially
panicking operation with `copy_from_slice` that requires a `PANIC`
comment.
Add `from_slice_with_attrs` and `from_slice` methods to both `Coherent`
and `CoherentBox` to turn this into a trivial one-step operation.
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-1-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Add CoherentHandle, an opaque DMA allocation type for buffers that are
only ever accessed by hardware. Unlike Coherent<T>, it does not provide
CPU access to the allocated memory.
CoherentHandle implicitly sets DMA_ATTR_NO_KERNEL_MAPPING and stores the
value returned by dma_alloc_attrs() as an opaque handle
(NonNull<c_void>) rather than a typed pointer, since with this flag the
C API returns an opaque cookie (e.g. struct page *), not a CPU pointer
to the allocated memory.
Only the DMA bus address is exposed to drivers; the opaque handle is
used solely to free the allocation on drop.
This commit is for reference only; there is currently no in-tree user.
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260321172749.592387-2-dakr@kernel.org
[acourbot: fix conflict in dma.rs.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
When DMA_ATTR_NO_KERNEL_MAPPING is passed to dma_alloc_attrs(), the
returned CPU address is not a pointer to the allocated memory but an
opaque handle (e.g. struct page *).
Coherent<T> (or CoherentAllocation<T> respectively) stores this value as
NonNull<T> and exposes methods that dereference it and even modify its
contents.
Remove the flag from the public attrs module such that drivers cannot
pass it to Coherent<T> (or CoherentAllocation<T> respectively) in the
first place.
Instead DMA_ATTR_NO_KERNEL_MAPPING can be supported with an additional
opaque type (e.g. CoherentHandle) which does not provide access to the
allocated memory.
Cc: stable@vger.kernel.org
Fixes: ad2907b4e3 ("rust: add dma coherent allocator abstraction")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260321172749.592387-1-dakr@kernel.org
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Switch LogBuffer from Coherent<[u8]> (unsized) to
Coherent<[u8; LOG_BUFFER_SIZE]> (sized). The buffer size is a
compile-time constant (RM_LOG_BUFFER_NUM_PAGES * GSP_PAGE_SIZE), so a
fixed-size array is more precise and avoids the need for the runtime
length parameter of zeroed_slice().
Acked-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260325003921.3420-3-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Generalize the BinaryWriter implementation from Coherent<[u8]> to
Coherent<T> where T: KnownSize + AsBytes + ?Sized. The implementation
only uses size() and write_dma(), neither of which depends on the
inner type being a byte slice.
This allows any Coherent allocation with an AsBytes inner type to be
exposed as a debugfs binary file.
Acked-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260325003921.3420-2-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The DRM shmem helper includes common code useful for drivers which
allocate GEM objects as anonymous shmem. Add a Rust abstraction for
this. Drivers can choose the raw GEM implementation or the shmem layer,
depending on their needs.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Janne Grunau <j@jananu.net>
Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Link: https://patch.msgid.link/20260316211646.650074-6-lyude@redhat.com
[ * DRM_GEM_SHMEM_HELPER is a tristate; when a module driver selects it,
it becomes =m. The Rust kernel crate and its C helpers are always
built into vmlinux and can't reference symbols from a module,
causing link errors.
Thus, add RUST_DRM_GEM_SHMEM_HELPER bool Kconfig that selects
DRM_GEM_SHMEM_HELPER, forcing it built-in when Rust drivers need it;
use cfg(CONFIG_RUST_DRM_GEM_SHMEM_HELPER) for the shmem module.
* Add cfg_attr(not(CONFIG_RUST_DRM_GEM_SHMEM_HELPER), expect(unused))
on pub(crate) use impl_aref_for_gem_obj and BaseObjectPrivate, so
that unused warnings are suppressed when shmem is not enabled.
* Enable const_refs_to_static (stabilized in 1.83) to prevent build
errors with older compilers.
* Use &raw const for bindings::drm_gem_shmem_vm_ops and add
#[allow(unused_unsafe, reason = "Safe since Rust 1.82.0")].
* Fix incorrect C Header path and minor spelling and formatting
issues.
* Drop shmem::Object::sg_table() as the current implementation is
unsound.
- Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
This implementation dispatches any work enqueued on ARef<drm::Device<T>> to
its driver-provided handler. It does so by building upon the newly-added
ARef<T> support in workqueue.rs in order to call into the driver
implementations for work_container_of and raw_get_work.
This is notably important for work items that need access to the drm
device, as it was not possible to enqueue work on a ARef<drm::Device<T>>
previously without failing the orphan rule.
The current implementation needs T::Data to live inline with drm::Device in
order for work_container_of to function. This restriction is already
captured by the trait bounds. Drivers that need to share their ownership of
T::Data may trivially get around this:
// Lives inline in drm::Device
struct DataWrapper {
work: ...,
// Heap-allocated, shared ownership.
data: Arc<DriverData>,
}
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Link: https://lore.kernel.org/r/20260323-aref-workitem-v3-2-f59729b812aa@collabora.com
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Add support for the ARef<T> smart pointer. This allows an instance of
ARef<T> to handle deferred work directly, which can be convenient or even
necessary at times, depending on the specifics of the driver or subsystem.
The implementation is similar to that of Arc<T>, and a subsequent patch
will implement support for drm::Device as the first user. This is notably
important for work items that need access to the drm device, as it was not
possible to enqueue work on a ARef<drm::Device<T>> previously without
failing the orphan rule.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Link: https://lore.kernel.org/r/20260323-aref-workitem-v3-1-f59729b812aa@collabora.com
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Convert all PFALCON, PFALCON2 and PRISCV registers to use the kernel's
register macro and update the code accordingly.
Because they rely on the same types to implement relative registers,
they need to be updated in lockstep.
nova-core's local register macro is now unused, so remove it.
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-8-bdf172f0f6ca@nvidia.com
[acourbot@nvidia.com: remove unused import.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Introduce a powered-up version of our ad-hoc `impl_from_enum_to_u8`
macro that allows the definition of an enum type associated to a
`Bounded` of a given width, and provides the `From` and `TryFrom`
implementations required to use that enum as a register field member.
This allows us to generate the required conversion implementations for
using the kernel register macro and skip some tedious boilerplate.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-1-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Create read-only debugfs entries for LOGINIT, LOGRM, and LOGINTR, which
are the three primary printf logging buffers from GSP-RM. LOGPMU will
be added at a later date, as it requires support for its RPC message
first.
This patch uses the `pin_init_scope` feature to create the entries.
`pin_init_scope` solves the lifetime issue over the `DEBUGFS_ROOT`
reference by delaying its acquisition until the time the entry is
actually initialized.
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260319212658.2541610-7-ttabi@nvidia.com
[ Rebase onto Coherent<T> changes. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Create the 'nova_core' root debugfs entry when the driver loads.
Normally, non-const global variables need to be protected by a
mutex. Instead, we use unsafe code, as we know the entry is never
modified after the driver is loaded. This solves the lifetime
issue of the mutex guard, which would otherwise have required the
use of `pin_init_scope`.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260319212658.2541610-6-ttabi@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Replace the module_pci_driver! macro with an explicit module
initialization using the standard module! macro and InPlaceModule
trait implementation. No functional change intended, with the
exception that the driver now prints a message when loaded.
This change is necessary so that we can create a top-level "nova_core"
debugfs entry when the driver is loaded.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260319212658.2541610-5-ttabi@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add UserSliceWriter::write_dma() to copy data from a Coherent<[u8]> to
userspace. This provides a safe interface for copying DMA buffer
contents to userspace without requiring callers to work with raw
pointers.
Because write_dma() and write_slice() have common code, factor that code
out into a helper function, write_raw().
The method handles bounds checking and offset calculation internally,
wrapping the unsafe copy_to_user() call.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260319212658.2541610-3-ttabi@nvidia.com
[ Rebase onto Coherent<T> changes; remove unnecessary turbofish from
cast(). - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The command-queue structure has a `dma_handle` method that returns the
DMA handle to the memory segment shared with the GSP. This works, but is
not ideal for the following reasons:
- That method is effectively only ever called once, and is technically
an accessor method since the handle doesn't change over time,
- It feels a bit out-of-place with the other methods of `Cmdq` which
only deal with the sending or receiving of messages,
- The method has `pub(crate)` visibility, allowing other driver code to
access this highly-sensitive handle.
Address all these issues by turning `dma_handle` into a struct member
with `pub(super)` visibility. This keeps the method space focused, and
also ensures the member is not visible outside of the modules that need
it.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260319-b4-cmdq-dma-handle-v1-1-57840b4a4f90@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>