Refactor user_mem_abort() to improve code clarity and simplify
assumptions within the function.
Key changes include:
* Immediately set force_pte to true at the beginning of the function if
logging_active is true. This simplifies the flow and makes the
condition for forcing a PTE more explicit.
* Remove the misleading comment stating that logging_active is
guaranteed to never be true for VM_PFNMAP memslots, as this assertion
is not entirely correct.
* Extract reusable code blocks into new helper functions:
* prepare_mmu_memcache(): Encapsulates the logic for preparing and
topping up the MMU page cache.
* adjust_nested_fault_perms(): Isolates the adjustments to shadow S2
permissions and the encoding of nested translation levels.
* Update min(a, (long)b) to min_t(long, a, b) for better type safety and
consistency.
* Perform other minor tidying up of the code.
These changes primarily aim to simplify user_mem_abort() and make its
logic easier to understand and maintain, setting the stage for future
modifications.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Tao Chan <chentao@kylinos.cn>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update the KVM MMU fault handler to service guest page faults
for memory slots backed by guest_memfd with mmap support. For such
slots, the MMU must always fault in pages directly from guest_memfd,
bypassing the host's userspace_addr.
This ensures that guest_memfd-backed memory is always handled through
the guest_memfd specific faulting path, regardless of whether it's for
private or non-private (shared) use cases.
Additionally, rename kvm_mmu_faultin_pfn_private() to
kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
pages from guest_memfd for both private and non-private memory,
accommodating the new use cases.
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
[sean: drop the helper]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250729225455.670324-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rework kvm_mmu_max_mapping_level() to consult guest_memfd for all mappings,
not just private mappings, so that hugepage support plays nice with the
upcoming support for backing non-private memory with guest_memfd.
In addition to getting the max order from guest_memfd for gmem-only
memslots, update TDX's hook to effectively ignore shared mappings, as TDX's
restrictions on page size only apply to Secure EPT mappings. Do nothing
for SNP, as RMP restrictions apply to both private and shared memory.
Suggested-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250729225455.670324-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rework kvm_mmu_max_mapping_level() to provide the plumbing to consult
guest_memfd (and relevant vendor code) when recovering hugepages, e.g.
after disabling live migration. The flaw has existed since guest_memfd was
originally added, but has gone unnoticed due to lack of guest_memfd support
for hugepages or dirty logging.
Don't actually call into guest_memfd at this time, as it's unclear as to
what the API should be. Ideally, KVM would simply use kvm_gmem_get_pfn(),
but invoking kvm_gmem_get_pfn() would lead to sleeping in atomic context
if guest_memfd needed to allocate memory (mmu_lock is held). Luckily,
the path isn't actually reachable, so just add a TODO and WARN to ensure
the functionality is added alongisde guest_memfd hugepage support, and
punt the guest_memfd API design question to the future.
Note, calling kvm_mem_is_private() in the non-fault path is safe, so long
as mmu_lock is held, as hugepage recovery operates on shadow-present SPTEs,
i.e. calling kvm_mmu_max_mapping_level() with @fault=NULL is mutually
exclusive with kvm_vm_set_mem_attributes() changing the PRIVATE attribute
of the gfn.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new internal flag, KVM_MEMSLOT_GMEM_ONLY, to the top half of
memslot->flags (which makes it strictly for KVM's internal use). This
flag tracks when a guest_memfd-backed memory slot supports host
userspace mmap operations, which implies that all memory, not just
private memory for CoCo VMs, is consumed through guest_memfd: "gmem
only".
This optimization avoids repeatedly checking the underlying guest_memfd
file for mmap support, which would otherwise require taking and
releasing a reference on the file for each check. By caching this
information directly in the memslot, we reduce overhead and simplify the
logic involved in handling guest_memfd-backed pages for host mappings.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce the core infrastructure to enable host userspace to mmap()
guest_memfd-backed memory. This is needed for several evolving KVM use
cases:
* Non-CoCo VM backing: Allows VMMs like Firecracker to run guests
entirely backed by guest_memfd, even for non-CoCo VMs [1]. This
provides a unified memory management model and simplifies guest memory
handling.
* Direct map removal for enhanced security: This is an important step
for direct map removal of guest memory [2]. By allowing host userspace
to fault in guest_memfd pages directly, we can avoid maintaining host
kernel direct maps of guest memory. This provides additional hardening
against Spectre-like transient execution attacks by removing a
potential attack surface within the kernel.
* Future guest_memfd features: This also lays the groundwork for future
enhancements to guest_memfd, such as supporting huge pages and
enabling in-place sharing of guest memory with the host for CoCo
platforms that permit it [3].
Enable the basic mmap and fault handling logic within guest_memfd, but
hold off on allow userspace to actually do mmap() until the architecture
support is also in place.
[1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[2] https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com
[3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com/T/#u
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable KVM_GUEST_MEMFD for all KVM x86 64-bit builds, i.e. for "default"
VM types when running on 64-bit KVM. This will allow using guest_memfd
to back non-private memory for all VM shapes, by supporting mmap() on
guest_memfd.
Opportunistically clean up various conditionals that become tautologies
once x86 selects KVM_GUEST_MEMFD more broadly. Specifically, because
SW protected VMs, SEV, and TDX are all 64-bit only, private memory no
longer needs to take explicit dependencies on KVM_GUEST_MEMFD, because
it is effectively a prerequisite.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() to improve
clarity and accurately reflect its purpose.
The function kvm_slot_can_be_private() was previously used to check if a
given kvm_memory_slot is backed by guest_memfd. However, its name
implied that the memory in such a slot was exclusively "private".
As guest_memfd support expands to include non-private memory (e.g.,
shared host mappings), it's important to remove this association. The
new name, kvm_slot_has_gmem(), states that the slot is backed by
guest_memfd without making assumptions about the memory's privacy
attributes.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The original name was vague regarding its functionality. This Kconfig
option specifically enables and gates the kvm_gmem_populate() function,
which is responsible for populating a GPA range with guest data.
The new name, HAVE_KVM_ARCH_GMEM_POPULATE, describes the purpose of the
option: to enable arch-specific guest_memfd population mechanisms. It
also follows the same pattern as the other HAVE_KVM_ARCH_* configuration
options.
This improves clarity for developers and ensures the name accurately
reflects the functionality it controls, especially as guest_memfd
support expands beyond purely "private" memory scenarios.
Temporarily keep KVM_GENERIC_PRIVATE_MEM as an x86-only config so as to
minimize churn, and to hopefully make it easier to see what features
require HAVE_KVM_ARCH_GMEM_POPULATE. On that note, omit GMEM_POPULATE
for KVM_X86_SW_PROTECTED_VM, as regular ol' memset() suffices for
software-protected VMs.
As for KVM_GENERIC_PRIVATE_MEM, a future change will select KVM_GUEST_MEMFD
for all 64-bit KVM builds, at which point the intermediate config will
become obsolete and can/will be dropped.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Select KVM_GENERIC_PRIVATE_MEM and KVM_GENERIC_MEMORY_ATTRIBUTES directly
from KVM_INTEL_TDX, i.e. if and only if TDX support is fully enabled in
KVM. There is no need to enable KVM's private memory support just because
the core kernel's INTEL_TDX_HOST is enabled.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that KVM_SW_PROTECTED_VM doesn't have a hidden dependency on KVM_X86,
select KVM_GENERIC_PRIVATE_MEM from within KVM_SW_PROTECTED_VM instead of
conditionally selecting it from KVM_X86.
No functional change intended.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make all vendor neutral KVM x86 configs depend on KVM_X86, not just KVM,
i.e. gate them on at least one vendor module being enabled and thus on
kvm.ko actually being built. Depending on just KVM allows the user to
select the configs even though they won't actually take effect, and more
importantly, makes it all too easy to create unmet dependencies. E.g.
KVM_GENERIC_PRIVATE_MEM can't be selected by KVM_SW_PROTECTED_VM, because
the KVM_GENERIC_MMU_NOTIFIER dependency is select by KVM_X86.
Hiding all sub-configs when neither KVM_AMD nor KVM_INTEL is selected also
helps communicate to the user that nothing "interesting" is going on, e.g.
--- Virtualization
<M> Kernel-based Virtual Machine (KVM) support
< > KVM for Intel (and compatible) processors support
< > KVM for AMD processors support
Fixes: ea4290d77b ("KVM: x86: leave kvm.ko out of the build if no vendor module is requested")
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename the Kconfig option CONFIG_KVM_PRIVATE_MEM to
CONFIG_KVM_GUEST_MEMFD. The original name implied that the feature only
supported "private" memory. However, CONFIG_KVM_PRIVATE_MEM enables
guest_memfd in general, which is not exclusively for private memory.
Subsequent patches in this series will add guest_memfd support for
non-CoCo VMs, whose memory is not private.
Renaming the Kconfig option to CONFIG_KVM_GUEST_MEMFD more accurately
reflects its broader scope as the main Kconfig option for all
guest_memfd-backed memory. This provides clearer semantics for the
option and avoids confusion as new features are introduced.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Use array_index_nospec() to sanitize the target vCPU ID when handling PV
IPIs and yields as the ID is guest-controlled.
* Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as
the common case, by far, is that at least one CPU will have entered the
VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where
the set of have_run_cpus is empty.
* Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var()
to fix a collision with linux/overflow.h. The collision generates compiler
warnings due to the two macros having different implementations.
Pull perf fix from Borislav Petkov:
- Fix a case where the events throttling logic operates on inactive
events
* tag 'perf_urgent_for_v6.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Avoid undefined behavior from stopping/starting inactive events
Pull x86 fixes from Borislav Petkov:
- Fix the GDS mitigation detection on some machines after the recent
attack vectors conversion
- Filter out the invalid machine reset reason value -1 when running as
a guest as in such cases the reason why the machine was rebooted does
not make a whole lot of sense
- Init the resource control machinery on Hygon hw in order to avoid a
division by zero and to actually enable the feature on hw which
supports it
* tag 'x86_urgent_for_v6.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Fix GDS mitigation selecting when mitigation is off
x86/CPU/AMD: Ignore invalid reset reason value
x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper
Pull modules fix from Daniel Gomez:
"This includes a fix part of the KSPP (Kernel Self Protection Project)
to replace the deprecated and unsafe strcpy() calls in the kernel
parameter string handler and sysfs parameters for built-in modules.
Single commit, no functional changes"
* tag 'modules-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
params: Replace deprecated strcpy() with strscpy() and memcpy()
Pull char/misc/iio fixes from Greg KH:
"Here are a small number of char/misc/iio and other driver fixes for
6.17-rc3. Included in here are:
- IIO driver bugfixes for reported issues
- bunch of comedi driver fixes
- most core bugfix
- fpga driver bugfix
- cdx driver bugfix
All of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
most: core: Drop device reference after usage in get_channel()
comedi: Make insn_rw_emulate_bits() do insn->n samples
comedi: Fix use of uninitialized memory in do_insn_ioctl() and do_insnlist_ioctl()
comedi: pcl726: Prevent invalid irq number
cdx: Fix off-by-one error in cdx_rpmsg_probe()
fpga: zynq_fpga: Fix the wrong usage of dma_map_sgtable()
iio: pressure: bmp280: Use IS_ERR() in bmp280_common_probe()
iio: light: as73211: Ensure buffer holes are zeroed
iio: adc: rzg2l_adc: Set driver data before enabling runtime PM
iio: adc: rzg2l: Cleanup suspend/resume path
iio: adc: ad7380: fix missing max_conversion_rate_hz on adaq4381-4
iio: adc: bd79124: Add GPIOLIB dependency
iio: imu: inv_icm42600: change invalid data error to -EBUSY
iio: adc: ad7124: fix channel lookup in syscalib functions
iio: temperature: maxim_thermocouple: use DMA-safe buffer for spi_read()
iio: adc: ad7173: prevent scan if too many setups requested
iio: proximity: isl29501: fix buffered read on big-endian systems
iio: accel: sca3300: fix uninitialized iio scan data
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 6.17-rc3 to resolve a bunch
of reported issues. Included in here are:
- typec driver fixes
- dwc3 new device id
- dwc3 driver fixes
- new usb-storage driver quirks
- xhci driver fixes
- other tiny USB driver fixes to resolve bugs
All of these have been in linux-next this week with no reported issues"
* tag 'usb-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: xhci: fix host not responding after suspend and resume
usb: xhci: Fix slot_id resource race conflict
usb: typec: fusb302: Revert incorrect threaded irq fix
USB: core: Update kerneldoc for usb_hcd_giveback_urb()
usb: typec: maxim_contaminant: re-enable cc toggle if cc is open and port is clean
usb: typec: maxim_contaminant: disable low power mode when reading comparator values
usb: dwc3: Remove WARN_ON for device endpoint command timeouts
USB: storage: Ignore driver CD mode for Realtek multi-mode Wi-Fi dongles
usb: storage: realtek_cr: Use correct byte order for bcs->Residue
usb: chipidea: imx: improve usbmisc_imx7d_pullup()
kcov, usb: Don't disable interrupts in kcov_remote_start_usb_softirq()
usb: dwc3: pci: add support for the Intel Wildcat Lake
usb: dwc3: Ignore late xferNotReady event to prevent halt timeout
USB: storage: Add unusual-devs entry for Novatek NTK96550-based camera
usb: core: hcd: fix accessing unmapped memory in SINGLE_STEP_SET_FEATURE test
usb: renesas-xhci: Fix External ROM access timeouts
usb: gadget: tegra-xudc: fix PM use count underflow
usb: quirks: Add DELAY_INIT quick for another SanDisk 3.2Gen1 Flash Drive
Pull tracing fixes from Steven Rostedt:
- Fix rtla and latency tooling pkg-config errors
If libtraceevent and libtracefs is installed, but their corresponding
'.pc' files are not installed, it reports that the libraries are
missing and confuses the developer. Instead, report that the
pkg-config files are missing and should be installed.
- Fix overflow bug of the parser in trace_get_user()
trace_get_user() uses the parsing functions to parse the user space
strings. If the parser fails due to incorrect processing, it doesn't
terminate the buffer with a nul byte. Add a "failed" flag to the
parser that gets set when parsing fails and is used to know if the
buffer is fine to use or not.
- Remove a semicolon that was at an end of a comment line
- Fix register_ftrace_graph() to unregister the pm notifier on error
The register_ftrace_graph() registers a pm notifier but there's an
error path that can exit the function without unregistering it. Since
the function returns an error, it will never be unregistered.
- Allocate and copy ftrace hash for reader of ftrace filter files
When the set_ftrace_filter or set_ftrace_notrace files are open for
read, an iterator is created and sets its hash pointer to the
associated hash that represents filtering or notrace filtering to it.
The issue is that the hash it points to can change while the
iteration is happening. All the locking used to access the tracer's
hashes are released which means those hashes can change or even be
freed. Using the hash pointed to by the iterator can cause UAF bugs
or similar.
Have the read of these files allocate and copy the corresponding
hashes and use that as that will keep them the same while the
iterator is open. This also simplifies the code as opening it for
write already does an allocate and copy, and now that the read is
doing the same, there's no need to check which way it was opened on
the release of the file, and the iterator hash can always be freed.
- Fix function graph to copy args into temp storage
The output of the function graph tracer shows both the entry and the
exit of a function. When the exit is right after the entry, it
combines the two events into one with the output of "function();",
instead of showing:
function() {
}
In order to do this, the iterator descriptor that reads the events
includes storage that saves the entry event while it peaks at the
next event in the ring buffer. The peek can free the entry event so
the iterator must store the information to use it after the peek.
With the addition of function graph tracer recording the args, where
the args are a dynamic array in the entry event, the temp storage
does not save them. This causes the args to be corrupted or even
cause a read of unsafe memory.
Add space to save the args in the temp storage of the iterator.
- Fix race between ftrace_dump and reading trace_pipe
ftrace_dump() is used when a crash occurs where the ftrace buffer
will be printed to the console. But it can also be triggered by
sysrq-z. If a sysrq-z is triggered while a task is reading trace_pipe
it can cause a race in the ftrace_dump() where it checks if the
buffer has content, then it checks if the next event is available,
and then prints the output (regardless if the next event was
available or not). Reading trace_pipe at the same time can cause it
to not be available, and this triggers a WARN_ON in the print. Move
the printing into the check if the next event exists or not
* tag 'trace-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Also allocate and copy hash for reading of filter files
ftrace: Fix potential warning in trace_printk_seq during ftrace_dump
fgraph: Copy args in intermediate storage with entry
trace/fgraph: Fix the warning caused by missing unregister notifier
ring-buffer: Remove redundant semicolons
tracing: Limit access to parser->buffer when trace_get_user failed
rtla: Check pkg-config install
tools/latency-collector: Check pkg-config install
Pull driver core fixes from Danilo Krummrich:
- Fix swapped handling of lru_gen and lru_gen_full debugfs files in
vmscan
- Fix debugfs mount options (uid, gid, mode) being silently ignored
- Fix leak of devres action in the unwind path of Devres::new()
- Documentation:
- Expand and fix documentation of (outdated) Device, DeviceContext
and generic driver infrastructure
- Fix C header link of faux device abstractions
- Clarify expected interaction with the security team
- Smooth text flow in the security bug reporting process
documentation
* tag 'driver-core-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
Documentation: smooth the text flow in the security bug reporting process
Documentation: clarify the expected collaboration with security bugs reporters
debugfs: fix mount options not being applied
rust: devres: fix leaking call to devm_add_action()
rust: faux: fix C header link
driver: rust: expand documentation for driver infrastructure
device: rust: expand documentation for Device
device: rust: expand documentation for DeviceContext
mm/vmscan: fix inverted polarity in lru_gen_seq_show()
Pull drm fixes from Dave Airlie:
"Weekly drm fixes. Looks like things did indeed get busier after rc2,
nothing seems too major, but stuff scattered all over the place,
amdgpu, xe, i915, hibmc, rust support code, and other small fixes.
rust:
- drm device memory layout and safety fixes
tests:
- Endianness fixes
gpuvm:
- docs warning fix
panic:
- fix division on 32-bit arm
i915:
- TypeC DP display Fixes
- Silence rpm wakeref asserts on GEN11_GU_MISC_IIR access
- Relocate compression repacking WA for JSL/EHL
xe:
- xe_vm_create fixes
- fix vm bind ioctl double free
amdgpu:
- Replay fixes
- SMU14 fix
- Null check DC fixes
- DCE6 DC fixes
- Misc DC fixes
bridge:
- analogix_dp: devm_drm_bridge_alloc() error handling fix
habanalabs:
- Memory deallocation fix
hibmc:
- modesetting black screen fixes
- fix UAF on irq
- fix leak on i2c failure path
nouveau:
- memory leak fixes
- typos
rockchip:
- Kconfig fix
- register caching fix"
* tag 'drm-fixes-2025-08-23-1' of https://gitlab.freedesktop.org/drm/kernel: (49 commits)
drm/xe: Fix vm_bind_ioctl double free bug
drm/xe: Move ASID allocation and user PT BO tracking into xe_vm_create
drm/xe: Assign ioctl xe file handler to vm in xe_vm_create
drm/i915/gt: Relocate compression repacking WA for JSL/EHL
drm/i915: silence rpm wakeref asserts on GEN11_GU_MISC_IIR access
drm/amd/display: Fix DP audio DTO1 clock source on DCE 6.
drm/amd/display: Fix fractional fb divider in set_pixel_clock_v3
drm/amd/display: Don't print errors for nonexistent connectors
drm/amd/display: Don't warn when missing DCE encoder caps
drm/amd/display: Fill display clock and vblank time in dce110_fill_display_configs
drm/amd/display: Find first CRTC and its line time in dce110_fill_display_configs
drm/amd/display: Adjust DCE 8-10 clock, don't overclock by 15%
drm/amd/display: Don't overclock DCE 6 by 15%
drm/amd/display: Add null pointer check in mod_hdcp_hdcp1_create_session()
drm/amd/display: Fix Xorg desktop unresponsive on Replay panel
drm/amd/display: Avoid a NULL pointer dereference
drm/amdgpu/swm14: Update power limit logic
drm/amd/display: Revert Add HPO encoder support to Replay
drm/i915/icl+/tc: Convert AUX powered WARN to a debug message
drm/i915/lnl+/tc: Use the cached max lane count value
...
When calling ftrace_dump_one() concurrently with reading trace_pipe,
a WARN_ON_ONCE() in trace_printk_seq() can be triggered due to a race
condition.
The issue occurs because:
CPU0 (ftrace_dump) CPU1 (reader)
echo z > /proc/sysrq-trigger
!trace_empty(&iter)
trace_iterator_reset(&iter) <- len = size = 0
cat /sys/kernel/tracing/trace_pipe
trace_find_next_entry_inc(&iter)
__find_next_entry
ring_buffer_empty_cpu <- all empty
return NULL
trace_printk_seq(&iter.seq)
WARN_ON_ONCE(s->seq.len >= s->seq.size)
In the context between trace_empty() and trace_find_next_entry_inc()
during ftrace_dump, the ring buffer data was consumed by other readers.
This caused trace_find_next_entry_inc to return NULL, failing to populate
`iter.seq`. At this point, due to the prior trace_iterator_reset, both
`iter.seq.len` and `iter.seq.size` were set to 0. Since they are equal,
the WARN_ON_ONCE condition is triggered.
Move the trace_printk_seq() into the if block that checks to make sure the
return value of trace_find_next_entry_inc() is non-NULL in
ftrace_dump_one(), ensuring the 'iter.seq' is properly populated before
subsequent operations.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@elte.hu>
Link: https://lore.kernel.org/20250822033343.3000289-1-wutengda@huaweicloud.com
Fixes: d769041f86 ("ring_buffer: implement new locking")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The output of the function graph tracer has two ways to display its
entries. One way for leaf functions with no events recorded within them,
and the other is for functions with events recorded inside it. As function
graph has an entry and exit event, to simplify the output of leaf
functions it combines the two, where as non leaf functions are separate:
2) | invoke_rcu_core() {
2) | raise_softirq() {
2) 0.391 us | __raise_softirq_irqoff();
2) 1.191 us | }
2) 2.086 us | }
The __raise_softirq_irqoff() function above is really two events that were
merged into one. Otherwise it would have looked like:
2) | invoke_rcu_core() {
2) | raise_softirq() {
2) | __raise_softirq_irqoff() {
2) 0.391 us | }
2) 1.191 us | }
2) 2.086 us | }
In order to do this merge, the reading of the trace output file needs to
look at the next event before printing. But since the pointer to the event
is on the ring buffer, it needs to save the entry event before it looks at
the next event as the next event goes out of focus as soon as a new event
is read from the ring buffer. After it reads the next event, it will print
the entry event with either the '{' (non leaf) or ';' and timestamps (leaf).
The iterator used to read the trace file has storage for this event. The
problem happens when the function graph tracer has arguments attached to
the entry event as the entry now has a variable length "args" field. This
field only gets set when funcargs option is used. But the args are not
recorded in this temp data and garbage could be printed. The entry field
is copied via:
data->ent = *curr;
Where "curr" is the entry field. But this method only saves the non
variable length fields from the structure.
Add a helper structure to the iterator data that adds the max args size to
the data storage in the iterator. Then simply copy the entire entry into
this storage (with size protection).
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250820195522.51d4a268@gandalf.local.home
Reported-by: Sasha Levin <sashal@kernel.org>
Tested-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/aJaxRVKverIjF4a6@lappy/
Fixes: ff5c9c576e ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull iommufd fixes from Jason Gunthorpe:
"Two very minor fixes:
- Fix mismatched kvalloc()/kfree()
- Spelling fixes in documentation"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Fix spelling errors in iommufd.rst
iommufd: viommu: free memory allocated by kvcalloc() using kvfree()
Bindig requires a node name matching ‘^ethernet@[0-9a-f]+$’. This patch
changes the clock name from “etop” to “ethernet”.
This fixes the following warning:
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): $nodename:0: 'etop@e180000' does not match '^ethernet@[0-9a-f]+$'
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#
Fixes: dac0bad937 ("dt-bindings: net: lantiq,etop-xway: Document Lantiq Xway ETOP bindings")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Jakub Kicinski <kuba@kernel.org>
The upstream dts lacks the lantiq,{rx/tx}-burst-length property. Other
issues were also fixed:
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): 'interrupt-names' is a required property
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): 'lantiq,tx-burst-length' is a required property
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): 'lantiq,rx-burst-length' is a required property
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#
Fixes: 14d4e308e0 ("net: lantiq: configure the burst length in ethernet drivers")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Pull s390 fixes from Alexander Gordeev:
- When kernel lockdown is active userspace tools that rely on read
operations only are unnecessarily blocked. Fix that by avoiding ioctl
registration during lockdown
- Invalid NULL pointer accesses succeed due to the lowcore is always
mapped the identity mapping pinned to zero. To fix that never map the
first two pages of physical memory with identity mapping
- Fix invalid SCCB present check in the SCLP interrupt handler
- Update defconfigs
* tag 's390-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/hypfs: Enable limited access during lockdown
s390/hypfs: Avoid unnecessary ioctl registration in debugfs
s390/mm: Do not map lowcore with identity mapping
s390/sclp: Fix SCCB present check
s390/configs: Set HZ=1000
s390/configs: Update defconfigs
Pull xen fixes from Juergen Gross:
"Two small cleanups which are both relevant only when running as a Xen
guest"
* tag 'for-linus-6.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
drivers/xen/xenbus: remove quirk for Xen 3.x
compiler: remove __ADDRESSABLE_ASM{_STR,}() again
Pull x86 platform driver fixes from Ilpo Järvinen:
- amd/hsmp:
- Ensure sock->metric_tbl_addr is non-NULL
- Register driver even if hwmon registration fails
- amd/pmc: Drop SMU F/W match for Cezanne
- dell-smbios-wmi: Separate "priority" from WMI device ID
- hp-wmi: mark Victus 16-r1xxx for Victus s fan and thermal profile
support
- intel-uncore-freq: Check write blocked for efficiency latency control
* tag 'platform-drivers-x86-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: hp-wmi: mark Victus 16-r1xxx for victus_s fan and thermal profile support
platform/x86/amd/hsmp: Ensure success even if hwmon registration fails
platform/x86/amd/hsmp: Ensure sock->metric_tbl_addr is non-NULL
platform/x86/intel-uncore-freq: Check write blocked for ELC
platform/x86/amd: pmc: Drop SMU F/W match for Cezanne
platform/x86: dell-smbios-wmi: Stop touching WMI device ID
Pull block fixes from Jens Axboe:
"A set of fixes for block that should go into this tree. A bit larger
than what I usually have at this point in time, a lot of that is the
continued fixing of the lockdep annotation for queue freezing that we
recently added, which has highlighted a number of little issues here
and there. This contains:
- MD pull request via Yu:
- Add a legacy_async_del_gendisk mode, to prevent a user tools
regression. New user tools releases will not use such a mode,
the old release with a new kernel now will have warning about
deprecated behavior, and we prepare to remove this legacy mode
after about a year later
- The rename in kernel causing user tools build failure, revert
the rename in mdp_superblock_s
- Fix a regression that interrupted resync can be shown as
recover from mdstat or sysfs
- Improve file size detection for loop, particularly for networked
file systems, by using getattr to get the size rather than the
cached inode size.
- Hotplug CPU lock vs queue freeze fix
- Lockdep fix while updating the number of hardware queues
- Fix stacking for PI devices
- Silence bio_check_eod() for the known case of device removal where
the size is truncated to 0 sectors"
* tag 'block-6.17-20250822' of git://git.kernel.dk/linux:
block: avoid cpu_hotplug_lock depedency on freeze_lock
block: decrement block_rq_qos static key in rq_qos_del()
block: skip q->rq_qos check in rq_qos_done_bio()
blk-mq: fix lockdep warning in __blk_mq_update_nr_hw_queues
block: tone down bio_check_eod
loop: use vfs_getattr_nosec for accurate file size
loop: Consolidate size calculation logic into lo_calculate_size()
block: remove newlines from the warnings in blk_validate_integrity_limits
block: handle pi_tuple_size in queue_limits_stack_integrity
selftests: ublk: Use ARRAY_SIZE() macro to improve code
md: fix sync_action incorrect display during resync
md: add helper rdev_needs_recovery()
md: keep recovery_cp in mdp_superblock_s
md: add legacy_async_del_gendisk mode
Pull io_uring fixes from Jens Axboe:
"Just two small fixes - one that fixes inconsistent ->async_data vs
REQ_F_ASYNC_DATA handling in futex, and a followup that just ensures
that if other opcode handlers mess this up, it won't cause any issues"
* tag 'io_uring-6.17-20250822' of git://git.kernel.dk/linux:
io_uring: clear ->async_data as part of normal init
io_uring/futex: ensure io_futex_wait() cleans up properly on failure
Pull SCSI fixes from James Bottomley:
"All fixes in drivers. The largest diffstat in ufs is caused by the doc
update with the next being the qcom null pointer deref fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: ufs-qcom: Fix ESI null pointer dereference
scsi: ufs: core: Rename ufshcd_wait_for_doorbell_clr()
scsi: ufs: core: Fix the return value documentation
scsi: ufs: core: Remove WARN_ON_ONCE() call from ufshcd_uic_cmd_compl()
scsi: ufs: core: Fix IRQ lock inversion for the SCSI host lock
scsi: qla4xxx: Prevent a potential error pointer dereference
scsi: ufs: ufs-pci: Add support for Intel Wildcat Lake
scsi: fnic: Remove a useless struct mempool forward declaration
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci_am654: Disable HS400 for AM62P SR1.0 and SR1.1
- sdhci-of-arasan: Ensure CD logic stabilization before power-up
- sdhci-pci-gli: Mask the replay timer timeout of AER for GL9763e
MEMSTICK:
- Fix deadlock by moving removing flag earlier"
* tag 'mmc-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci_am654: Disable HS400 for AM62P SR1.0 and SR1.1
memstick: Fix deadlock by moving removing flag earlier
mmc: sdhci-of-arasan: Ensure CD logic stabilization before power-up
mmc: sdhci-pci-gli: GL9763e: Mask the replay timer timeout of AER
mmc: sdhci-pci-gli: GL9763e: Rename the gli_set_gl9763e() for consistency
mmc: sdhci-pci-gli: Add a new function to simplify the code
Pull rdma fixes from Jason Gunthorpe:
- syzkaller found a WARN_ON in rxe due to poor lifecycle management of
resources linked to skbs
- Missing error path handling in erdma qp creation
- Initialize the qp number for the GSI QP in erdma
- Mismatching of DIP, SCC and QP numbers in hns
- SRQ bug fixes in bnxt_re
- Memory leak and possibly uninited memory in bnxt_re
- Remove retired irdma maintainer
- Fix kfree() for kvalloc() in ODP
- Fix memory leak in hns
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Fix dip entries leak on devices newer than hip09
RDMA/core: Free pfn_list with appropriate kvfree call
MAINTAINERS: Remove bouncing irdma maintainer
RDMA/bnxt_re: Fix to initialize the PBL array
RDMA/bnxt_re: Fix a possible memory leak in the driver
RDMA/bnxt_re: Fix to remove workload check in SRQ limit path
RDMA/bnxt_re: Fix to do SRQ armena by default
RDMA/hns: Fix querying wrong SCC context for DIP algorithm
RDMA/erdma: Fix unset QPN of GSI QP
RDMA/erdma: Fix ignored return value of init_kernel_qp
RDMA/rxe: Flush delayed SKBs while releasing RXE resources
Pull sound fixes from Takashi Iwai:
"Only small fixes.
- ASoC Cirrus codec fixes
- A regression fix for the recent TAS2781 codec refactoring
- A fix for user-timer error handling
- Fixes for USB-audio descriptor validators
- Usual HD-audio and ASoC device-specific quirks"
* tag 'sound-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Use correct sub-type for UAC3 feature unit validation
ALSA: timer: fix ida_free call while not allocated
ASoC: cs35l56: Remove SoundWire Clock Divider workaround for CS35L63
ASoC: cs35l56: Handle new algorithms IDs for CS35L63
ASoC: cs35l56: Update Firmware Addresses for CS35L63 for production silicon
ALSA: hda: tas2781: Fix wrong reference of tasdevice_priv
ALSA: hda/realtek: Audio disappears on HP 15-fc000 after warm boot again
ALSA: hda/realtek: Fix headset mic on ASUS Zenbook 14
ASoC: codecs: ES9389: Modify the standby configuration
ALSA: usb-audio: Fix size validation in convert_chmap_v3()
ALSA: hda/tas2781: Add name prefix tas2781 for tas2781's dvc_tlv and amp_vol_tlv
ALSA: hda/realtek: Add support for HP EliteBook x360 830 G6 and EliteBook 830 G6
Pull smb client fix from Steve French:
"Fix for netfs smb3 oops"
* tag '6.17-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix oops due to uninitialised variable
Pull NFS client fix from Trond Myklebust:
- NFS: Fix a data corrupting race when updating an existing write
* tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a race when updating an existing write