Add the missing fields of the nvfw_hs_load_header_v2 struct, so that the
struct matches the actual contents of the firmware images.
nvfw_hs_load_header_v2 is a struct that defines a header for some firmware
images used by Nouveau. The current structure definition is incomplete;
it omits the last two fields because they are unused.
To maintain consistency between Nouveau, OpenRM, and Nova, and to
make it easier to support possible future images, we should fully define
the struct. Also add a __counted_by tag for the flex array.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251010223957.1078525-1-ttabi@nvidia.com
From GP100 onwards it's not possible to initialise comptag RAM without
PMU firmware, which nouveau has no support for.
As such, this code is essentially a no-op and will always revert to the
equivalent non-compressed kind due to comptag allocation failure. It's
also broken for the needs of VM_BIND/Vulkan.
Remove the code entirely to make way for supporting compression on GPUs
that support GSM-RM.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251110-nouveau-compv6-v6-3-83b05475f57c@mary.zone
Pull drm fixes from Dave Airlie:
"Some fixes leftover from our fixes branch, just nouveau and vmwgfx:
nouveau:
- Return errno code from TTM move helper
vmwgfx:
- Fix null-ptr access in cursor code
- Fix UAF in validation
- Use correct iterator in validation"
* tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel:
drm/nouveau: fix bad ret code in nouveau_bo_move_prep
drm/vmwgfx: Fix copy-paste typo in validation
drm/vmwgfx: Fix Use-after-free in validation
drm/vmwgfx: Fix a null-ptr access in the cursor snooper
In `nouveau_bo_move_prep`, if `nouveau_mem_map` fails, an error code
should be returned. Currently, it returns zero even if vmm addr is not
correctly mapped.
Cc: stable@vger.kernel.org
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Fixes: 9ce523cc3b ("drm/nouveau: separate buffer object backing memory from nvkm structures")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Using pmu counters for usage stats. This enables dynamic frequency
scaling on all of the currently supported Tegra gpus.
The register offsets are valid for gk20a, gm20b, gp10b, and gv11b. If
support is added for ga10b, this will need rearchitected.
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[fixed tab alignment in gk20a_devfreq_target()]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://lore.kernel.org/r/20250906-gk20a-devfreq-v2-1-0217f53ee355@gmail.com
Starting with Tegra186, gpu clock handling is done by the bpmp and there
is little to be done by the kernel. The only thing necessary for
reclocking is to set the gpcclk to the desired rate and the bpmp handles
the rest. The pstate list is based on the downstream driver generates.
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[added newline before gp10b_clk macro declaration for checkpatch error]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://lore.kernel.org/r/20250823-gp10b-reclock-v2-1-90a1974a54e3@gmail.com
Before exporting a buffer, make sure it has been populated with
pages at least once.
While discussing cgroups we noticed a problem where you could export
a BO to a dma-buf without having it ever being backed or accounted for.
This meant in low memory situations or eventually with cgroups, a
lower privledged process might cause the compositor to try and allocate
a lot of memory on it's behalf and this could fail. At least make
sure the exporter has managed to allocate the RAM at least once
before exporting the object.
This only applies currently to TTM_PL_SYSTEM objects, because
GTT objects get populated on first validate, and VRAM doesn't
use TT.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250904021643.2050497-3-airlied@gmail.com
This reverts:
commit bead880022 ("drm/nouveau: Remove waitque for sched teardown")
commit 5f46f5c7af ("drm/nouveau: Add new callback for scheduler teardown")
from the drm/sched teardown leak fix series:
https://lore.kernel.org/dri-devel/20250710125412.128476-2-phasta@kernel.org/
The aforementioned series removed a blocking waitqueue from
nouveau_sched_fini(). It was mistakenly assumed that this waitqueue only
prevents jobs from leaking, which the series fixed.
The waitqueue, however, also guarantees that all VM_BIND related jobs
are finished in order, cleaning up mappings in the GPU's MMU. These jobs
must be executed sequentially. Without the waitqueue, this is no longer
guaranteed, because entity and scheduler teardown can race with each
other.
Revert all patches related to the waitqueue removal.
Fixes: bead880022 ("drm/nouveau: Remove waitque for sched teardown")
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250901083107.10206-2-phasta@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Nouveau has code that when it gets an IRQ with no allowed handler
it disables it to avoid storms.
However with nonstall interrupts, we often disable them from
the drm driver, but still request their emission via the push submission.
Just don't disable nonstall irqs ever in normal operation, the
event handling code will filter them out, and the driver will
just enable/disable them at load time.
This fixes timeouts we've been seeing on/off for a long time,
but they became a lot more noticeable on Blackwell.
This doesn't fix all of them, there is a subsequent fence emission
fix to fix the last few.
Fixes: 3ebd64aa3c ("drm/nouveau/intr: support multiple trees, and explicit interfaces")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250829021633.1674524-1-airlied@gmail.com
[ Fix a typo and a minor checkpatch.pl warning; remove "v2" from commit
subject. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The 'tag' parameter is passed by value and is not actually used after
being incremented, so remove the increment. It's the function that calls
gm200_flcn_pio_imem_wr that is supposed to (and does) increment 'tag'.
Fixes: 0e44c21708 ("drm/nouveau/flcn: new code to load+boot simple HS FWs (VPR scrubber)")
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Link: https://lore.kernel.org/r/20250813001004.2986092-2-ttabi@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
drm-misc-next for v6.18:
UAPI Changes:
- Add DRM_IOCTL_GEM_CHANGE_HANDLE for reassigning GEM handles
- Document DRM_MODE_PAGE_FLIP_EVENT
Cross-subsystem Changes:
fbcon:
- Add missing declarations in fbcon.h
Core Changes:
bridge:
- Fix ref counting
panel:
- Replace and remove mipi_dsi_generic_write_{seq/_chatty}()
sched:
- Fixes
Rust:
- Drop Opaque<> from ioctl arguments
Driver Changes:
amdxdma:
- Support buffers allocated by user space
- Streamline PM interfaces
- Fixes
bridge:
- cdns-dsi: Various improvements to mode setting
- Support Solomon SSD2825 plus DT bindings
- Support Waveshare DSI2DPI plus DT bindings
gud:
- Fixes
ivpu:
- Fixes
nouveau:
- Use GSP firmware by default
- Fixes
panel:
- panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
Support SHP LQ134Z1; Fixes
- panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
- Support Samsung AMS561RA01
- Support Hydis HV101HD1 plus DT bindings
panthor:
- Print task/pid on errors
- Fixes
renesas:
- convert to RUNTIME_PM_OPS
repaper:
- Use shadow-plane helpers
rocket:
- Add driver for Rockchip NPU plus DT bindings
sharp-memory:
- Use shadow-plane helpers
simpledrm:
- Use of_reserved_mem_region_to_resource() helper
tidss:
- Use crtc_ fields for programming display mode
- Remove other drivers from aperture
v3d:
- Support querying nubmer of GPU resets for KHR_robustness
vmwgfx:
- Fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250814072454.GA18104@linux.fritz.box
This is inteded to address concerns that users might get cryptic error
messages or a failure to boot if they set nouveau.config=NvGspRm=0 on
the kernel command line and their gpu requires gsp (Ada or newer).
With this patch, that configuration results in error messages like this:
nouveau 0000:01:00.0: gsp: Failed to load required firmware for device.
nouveau 0000:01:00.0: gsp ctor failed: -22
nouveau 0000:01:00.0: probe with driver nouveau failed with error -22
When nouveau fails to load like this, we still fall back to the generic
framebuffer device, so users will still have limited graphical output.
Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://lore.kernel.org/r/20250811213843.4294-4-mhenning@darkrefraction.com
This option was originally intoduced because the GSP code path was
not well tested and we wanted to leave it up to distros which code path
they shipped by default. By now though, the GSP path is probably better
tested than the old firmware eg. Fedora ships GSP by default and we
generally run CTS on GSP. We've always been GSP-only on Ada and later.
So, this path removes the option and effectively sets the option to
always on. We still fall back to the old firmware if GSP is not found.
This change only affects Turing and Ampere.
Users can still set nouveau.config=NvGspRm=0 on the kernel command line
to force using the old firmware on Turing/Ampere.
Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://lore.kernel.org/r/20250811213843.4294-2-mhenning@darkrefraction.com
Always set the RMDevidCheckIgnore registry key for GSP-RM so that it
will continue support newer variants of already supported GPUs.
GSP-RM maintains an internal list of PCI IDs of GPUs that it supports,
and checks if the current GPU is on this list. While the actual GPU
architecture (as specified in the BOOT_0/BOOT_42 registers) determines
how to enable the GPU, the PCI ID is used for the product name, e.g.
"NVIDIA GeForce RTX 5090".
Unfortunately, if there is no match, GSP-RM will refuse to initialize,
even if the device is fully supported. Nouveau will get an error
return code, but by then it's too late. This behavior may be corrected
in a future version of GSP-RM, but that does not help Nouveau today.
Fortunately, GSP-RM supports an undocumented registry key that tells it
to ignore the mismatch. In such cases, the product name returned will
be a blank string, but otherwise GSP-RM will continue.
Unlike Nvidia's proprietary driver, Nouveau cannot update to newer
firmware versions to keep up with every new hardware release. Instead,
we can permanently set this registry key, and GSP-RM will continue
to function the same with known hardware.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Link: https://lore.kernel.org/r/20250808191340.1701983-1-ttabi@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
struct nouveau_channel contains the member 'accel_done' and a forgotten
TODO which hints at that mechanism being removed in the "near future".
Since that variable is read nowhere anymore, this "near future" is now.
Remove the variable and the TODO.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250801074531.79237-2-phasta@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
snprintf() returns the number of characters that *would* have been
written, which can overestimate how much you actually wrote to the
buffer in case of truncation. That leads to 'data += this' advancing
the pointer past the end of the buffer and size going negative.
Switching to scnprintf() prevents potential buffer overflows and ensures
consistent behavior when building the output string.
Signed-off-by: Seyediman Seyedarab <ImanDevel@gmail.com>
Link: https://lore.kernel.org/r/20250724195913.60742-1-ImanDevel@gmail.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Pull drm updates from Dave Airlie:
"Highlights:
- Intel xe enable Panthor Lake, started adding WildCat Lake
- amdgpu has a bunch of reset improvments along with the usual IP
updates
- msm got VM_BIND support which is important for vulkan sparse memory
- more drm_panic users
- gpusvm common code to handle a bunch of core SVM work outside
drivers.
Detail summary:
Changes outside drm subdirectory:
- 'shrink_shmem_memory()' for better shmem/hibernate interaction
- Rust support infrastructure:
- make ETIMEDOUT available
- add size constants up to SZ_2G
- add DMA coherent allocation bindings
- mtd driver for Intel GPU non-volatile storage
- i2c designware quirk for Intel xe
core:
- atomic helpers: tune enable/disable sequences
- add task info to wedge API
- refactor EDID quirks
- connector: move HDR sink to drm_display_info
- fourcc: half-float and 32-bit float formats
- mode_config: pass format info to simplify
dma-buf:
- heaps: Give CMA heap a stable name
ci:
- add device tree validation and kunit
displayport:
- change AUX DPCD access probe address
- add quirk for DPCD probe
- add panel replay definitions
- backlight control helpers
fbdev:
- make CONFIG_FIRMWARE_EDID available on all arches
fence:
- fix UAF issues
format-helper:
- improve tests
gpusvm:
- introduce devmem only flag for allocation
- add timeslicing support to GPU SVM
ttm:
- improve eviction
sched:
- tracing improvements
- kunit improvements
- memory leak fixes
- reset handling improvements
color mgmt:
- add hardware gamma LUT handling helpers
bridge:
- add destroy hook
- switch to reference counted drm_bridge allocations
- tc358767: convert to devm_drm_bridge_alloc
- improve CEC handling
panel:
- switch to reference counter drm_panel allocations
- fwnode panel lookup
- Huiling hl055fhv028c support
- Raspberry Pi 7" 720x1280 support
- edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
- simple: AUO P238HAN01
- st7701: Winstar wf40eswaa6mnn0
- visionox: rm69299-shift
- Renesas R61307, Renesas R69328 support
- DJN HX83112B
hdmi:
- add CEC handling
- YUV420 output support
xe:
- WildCat Lake support
- Enable PanthorLake by default
- mark BMG as SRIOV capable
- update firmware recommendations
- Expose media OA units
- aux-bux support for non-volatile memory
- MTD intel-dg driver for non-volatile memory
- Expose fan control and voltage regulator in sysfs
- restructure migration for multi-device
- Restore GuC submit UAF fix
- make GEM shrinker drm managed
- SRIOV VF Post-migration recovery of GGTT nodes
- W/A additions/reworks
- Prefetch support for svm ranges
- Don't allocate managed BO for each policy change
- HWMON fixes for BMG
- Create LRC BO without VM
- PCI ID updates
- make SLPC debugfs files optional
- rework eviction rejection of bound external BOs
- consolidate PAT programming logic for pre/post Xe2
- init changes for flicker-free boot
- Enable GuC Dynamic Inhibit Context switch
i915:
- drm_panic support for i915/xe
- initial flip queue off by default for LNL/PNL
- Wildcat Lake Display support
- Support for DSC fractional link bpp
- Support for simultaneous Panel Replay and Adaptive sync
- Support for PTL+ double buffer LUT
- initial PIPEDMC event handling
- drm_panel_follower support
- DPLL interface renames
- allocate struct intel_display dynamically
- flip queue preperation
- abstract DRAM detection better
- avoid GuC scheduling stalls
- remove DG1 force probe requirement
- fix MEI interrupt handler on RT kernels
- use backlight control helpers for eDP
- more shared display code refactoring
amdgpu:
- add userq slot to INFO ioctl
- SR-IOV hibernation support
- Suspend improvements
- Backlight improvements
- Use scaling for non-native eDP modes
- cleaner shader updates for GC 9.x
- Remove fence slab
- SDMA fw checks for userq support
- RAS updates
- DMCUB updates
- DP tunneling fixes
- Display idle D3 support
- Per queue reset improvements
- initial smartmux support
amdkfd:
- enable KFD on loongarch
- mtype fix for ext coherent system memory
radeon:
- CS validation additional GL extensions
- drop console lock during suspend/resume
- bump driver version
msm:
- VM BIND support
- CI: infrastructure updates
- UBWC single source of truth
- decouple GPU and KMS support
- DP: rework I/O accessors
- DPU: SM8750 support
- DSI: SM8750 support
- GPU: X1-45 support and speedbin support for X1-85
- MDSS: SM8750 support
nova:
- register! macro improvements
- DMA object abstraction
- VBIOS parser + fwsec lookup
- sysmem flush page support
- falcon: generic falcon boot code and HAL
- FWSEC-FRTS: fb setup and load/execute
ivpu:
- Add Wildcat Lake support
- Add turbo flag
ast:
- improve hardware generations implementation
imx:
- IMX8qxq Display Controller support
lima:
- Rockchip RK3528 GPU support
nouveau:
- fence handling cleanup
panfrost:
- MT8370 support
- bo labeling
- 64-bit register access
qaic:
- add RAS support
rockchip:
- convert inno_hdmi to a bridge
rz-du:
- add RZ/V2H(P) support
- MIPI-DSI DCS support
sitronix:
- ST7567 support
sun4i:
- add H616 support
tidss:
- add TI AM62L support
- AM65x OLDI bridge support
bochs:
- drm panic support
vkms:
- YUV and R* format support
- use faux device
vmwgfx:
- fence improvements
hyperv:
- move out of simple
- add drm_panic support"
* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
drm/tidss: encoder: convert to devm_drm_bridge_alloc()
drm/amdgpu: move reset support type checks into the caller
drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
drm/amdgpu: Add WARN_ON to the resource clear function
drm/amd/pm: Use cached metrics data on SMUv13.0.6
drm/amd/pm: Use cached data for min/max clocks
gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
drm/amdgpu: Replace HQD terminology with slots naming
drm/amdgpu: Add user queue instance count in HW IP info
drm/amd/amdgpu: Add helper functions for isp buffers
drm/amd/amdgpu: Initialize swnode for ISP MFD device
...
My previous patch ended up causing a regression for the
DRM_IOCTL_NOUVEAU_NVIF ioctl. The intention of my patch was to only
pass ioctl commands that have the correct dir/type/nr bits into the
nouveau_abi16_ioctl() function.
This turned out to be too strict, as userspace does use at least
write-only and write-read direction settings. Checking for both of these
still did not fix the issue, so the best we can do for the 6.16 release
is to revert back to what we've had since linux-3.16.
This version is still fragile, but at least it is known to work with
existing userspace. Fixing this properly requires a better understanding
of what commands are being passed from userspace in practice, and how
that relies on the undocumented (miss)behavior in nouveau_drm_ioctl().
Fixes: e5478166df ("drm/nouveau: check ioctl command codes better")
Reported-by: Satadru Pramanik <satadru@gmail.com>
Closes: https://lore.kernel.org/lkml/CAFrh3J85tsZRpOHQtKgNHUVnn=EG=QKBnZTRtWS8eWSc1K1xkA@mail.gmail.com/
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Closes: https://lore.kernel.org/lkml/aH9n_QGMFx2ZbKlw@debian.local/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250722115830.2587297-1-arnd@kernel.org
[ Add Closes: tags, fix minor typo in commit message. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Among the scheduler's statuses, the only one that indicates an error is
DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV
signifies that the operation succeeded and the GPU is in a nominal state.
However, to provide more information about the GPU's status, it is needed
to convey more information than just "OK".
Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this
status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has
hung, but it has been successfully reset and is now in a nominal state
again.
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>