Pull more VFIO updates from Alex Williamson:
- Fix ordering of dma-buf cleanup versus device disabling in vfio-pci
(Matt Evans)
- Resolve an inconsistent and incorrect use of spinlock-irq in the
virtio vfio-pci variant by conversion to mutex and proceed to
modernize and simplify driver with use of guards (Alex Williamson)
- Resurrect the removal of the remaining class_create() call in vfio,
replacing with const struct class and class_register() (Jori
Koolstra, Alex Williamson)
- Fix NULL pointer dereference, properly serialize interrupt setup, and
cleanup interrupt state tracking in the cdx vfio bus driver (Prasanna
Kumar T S M, Alex Williamson)
* tag 'vfio-v7.1-rc1-pt2' of https://github.com/awilliam/linux-vfio:
vfio/cdx: Consolidate MSI configured state onto cdx_irqs
vfio/cdx: Serialize VFIO_DEVICE_SET_IRQS with a per-device mutex
vfio/cdx: Fix NULL pointer dereference in interrupt trigger path
vfio: replace vfio->device_class with a const struct class
vfio/virtio: Use guard() for bar_mutex in legacy I/O
vfio/virtio: Use guard() for migf->lock where applicable
vfio/virtio: Use guard() for list_lock where applicable
vfio/virtio: Convert list_lock from spinlock to mutex
vfio/pci: Clean up DMABUFs before disabling function
Pull input updates from Dmitry Torokhov:
- a new charlieplex GPIO keypad driver
- an update to aw86927 driver to support 86938 chip
- an update for Chrome OS EC keyboard driver to support Fn-<key> keymap
extension
- an UAF fix in debugfs teardown in EDT touchscreen driver
- a number of conversions for input drivers to use guard() and __free()
cleanup primitives
- several drivers for bus mice (inport, logibm) and other very old
devices have been removed
- OLPC HGPK PS/2 protocol has been removed as it's been broken and
inactive for 10 something years
- dedicated kpsmoused has been removed from psmouse driver
- other assorted cleanups and fixups
* tag 'input-for-v7.1-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (101 commits)
Input: charlieplex_keypad - add GPIO charlieplex keypad
dt-bindings: input: add GPIO charlieplex keypad
dt-bindings: input: add settling-time-us common property
dt-bindings: input: add debounce-delay-ms common property
Input: imx_keypad - fix spelling mistake "Colums" -> "Columns"
Input: edt-ft5x06 - fix use-after-free in debugfs teardown
Input: ims-pcu - fix heap-buffer-overflow in ims_pcu_process_data()
Input: ct82c710 - remove driver
Input: mk712 - remove driver
Input: logibm - remove driver
Input: inport - remove driver
Input: qt1070 - inline i2c_check_functionality check
Input: qt1050 - inline i2c_check_functionality check
Input: aiptek - validate raw macro indices before updating state
Input: gf2k - skip invalid hat lookup values
Input: xpad - add RedOctane Games vendor id
Input: xpad - remove stale TODO and changelog header
Input: usbtouchscreen - refactor endpoint lookup
Input: aw86927 - add support for Awinic AW86938
dt-bindings: input: awinic,aw86927: Add Awinic AW86938
...
Pull tracefs fixes from Steven Rostedt:
- Use list_add_tail_rcu() for walking eventfs children
The linked list of children is protected by SRCU and list walkers can
walk the list with only using SRCU. Using just list_add_tail() on
weakly ordered architectures can cause issues. Instead use
list_add_tail_rcu().
- Hold eventfs_mutex and SRCU for remount walk events
The trace_apply_options() walks the tracefs_inodes where some are
eventfs inodes and eventfs_remount() is called which in turn calls
eventfs_set_attr(). This walk only holds normal RCU read locks, but
the eventfs_mutex and SRCU should be held.
Add a eventfs_remount_(un)lock() helpers to take the necessary locks
before iterating the list.
* tag 'tracefs-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Hold eventfs_mutex and SRCU when remount walks events
eventfs: Use list_add_tail_rcu() for SRCU-protected children list
Pull ktest updates from Steven Rostedt:
- Fix month in date timestamp used to create failure directories
On failure, a directory is created to store the logs and config file
to analyze the failure. The Perl function localtime is used to create
the data timestamp of the directory. The month passed back from that
function starts at 0 and not 1, but the timestamp used does not
account for that. Thus for April 20, 2026, the timestamp of 20260320
is used, instead of 20260420.
- Save the logfile to the failure directory
Just the test log was saved to the directory on failure, but there's
useful information in the full logfile that can be helpful to
analyzing the failure. Save the logfile as well.
* tag 'ktest-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Add logfile to failure directory
ktest: Fix the month in the name of the failure directory
Pull ring-buffer fix from Steven Rostedt:
- Make undefsyms_base.c into a real file
The file undefsyms_base.c is used to catch any symbols used by a
remote ring buffer that is made for use of a pKVM hypervisor. As it
doesn't share the same text as the rest of the kernel, referencing
any symbols within the kernel will make it fail to be built for the
standalone hypervisor.
A file was created by the Makefile that checked for any symbols that
could cause issues. There's no reason to have this file created by
the Makefile, just create it as a normal file instead.
* tag 'trace-ring-buffer-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Make undefsyms_base.c a first-class citizen
Pull kgdb update from Daniel Thompson:
"Only a very small update for kgdb this cycle: a single patch from
Kexin Sun that fixes some outdated comments"
* tag 'kgdb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kgdb: update outdated references to kgdb_wait()
Pull tomoyo update from Tetsuo Handa:
"Handle 64-bit inode numbers"
* tag 'tomoyo-pr-20260422' of git://git.code.sf.net/p/tomoyo/tomoyo:
tomoyo: use u64 for holding inode->i_ino value
Pull s390 updates from Vasily Gorbik:
- Add support for CONFIG_PAGE_TABLE_CHECK and enable it in
debug_defconfig. s390 can only tell user from kernel PTEs via the mm,
so mm_struct is now passed into pxx_user_accessible_page() callbacks
- Expose the PCI function UID as an arch-specific slot attribute in
sysfs so a function can be identified by its user-defined id while
still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in
drivers/pci/slot.c
- Refresh s390 PCI documentation to reflect current behavior and cover
previously undocumented sysfs attributes
- zcrypt device driver cleanup series: consistent field types, clearer
variable naming, a kernel-doc warning fix, and a comment explaining
the intentional synchronize_rcu() in pkey_handler_register()
- Provide an s390 arch_raw_cpu_ptr() that avoids the detour via
get_lowcore() using alternatives, shrinking defconfig by ~27 kB
- Guard identity-base randomization with kaslr_enabled() so nokaslr
keeps the identity mapping at 0 even with RANDOMIZE_IDENTITY_BASE=y
- Build S390_MODULES_SANITY_TEST as a module only by requiring KUNIT &&
m, since built-in would not exercise module loading
- Remove the permanently commented-out HMCDRV_DEV_CLASS create_class()
code in the hmcdrv driver
- Drop stale ident_map_size extern conflicting with asm/page.h
* tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Fix warning about wrong kernel doc comment
PCI: s390: Expose the UID as an arch specific PCI slot attribute
docs: s390/pci: Improve and update PCI documentation
s390/pkey: Add comment about synchronize_rcu() to pkey base
s390/hmcdrv: Remove commented out code
s390/zcrypt: Slight rework on the agent_id field
s390/zcrypt: Explicitly use a card variable in _zcrypt_send_cprb
s390/zcrypt: Rework MKVP fields and handling
s390/zcrypt: Make apfs a real unsigned int field
s390/zcrypt: Rework domain processing within zcrypt device driver
s390/zcrypt: Move inline function rng_type6cprb_msgx from header to code
s390/percpu: Provide arch_raw_cpu_ptr()
s390: Enable page table check for debug_defconfig
s390/pgtable: Add s390 support for page table check
s390/pgtable: Use set_pmd_bit() to invalidate PMD entry
mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
s390/boot: Respect kaslr_enabled() for identity randomization
s390/Kconfig: Make modules sanity test a module-only option
s390/setup: Drop stale ident_map_size declaration
Pull Hyper-V updates from Wei Liu:
- Fix cross-compilation for hv tools (Aditya Garg)
- Fix vmemmap_shift exceeding MAX_FOLIO_ORDER in mshv_vtl (Naman Jain)
- Limit channel interrupt scan to relid high water mark (Michael
Kelley)
- Export hv_vmbus_exists() and use it in pci-hyperv (Dexuan Cui)
- Fix cleanup and shutdown issues for MSHV (Jork Loeser)
- Introduce more tracing support for MSHV (Stanislav Kinsburskii)
* tag 'hyperv-next-signed-20260421' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Skip LP/VP creation on kexec
x86/hyperv: move stimer cleanup to hv_machine_shutdown()
Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing
mshv: Add tracepoint for GPA intercept handling
mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER
tools: hv: Fix cross-compilation
Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hyperv
mshv: Introduce tracing support
Drivers: hv: vmbus: Limit channel interrupt scan to relid high water mark
After a kexec the logical processors and virtual processors already
exist in the hypervisor because they were created by the previous
kernel. Attempting to add them again causes either a BUG_ON or
corrupted VP state leading to MCEs in the new kernel.
Add hv_lp_exists() to probe whether an LP is already present by
calling HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME. When it succeeds the
LP exists and we skip the add-LP and create-VP loops entirely.
Also add hv_call_notify_all_processors_started() which informs the
hypervisor that all processors are online. This is required after
adding LPs (fresh boot) and is a no-op on kexec since we skip that
path.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Co-developed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Signed-off-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to
hv_machine_shutdown() in the platform code. This ensures stimer cleanup
happens before the vmbus unload, which is required for root partition
kexec to work correctly.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
vmbus_alloc_synic_and_connect() declares a local 'int
hyperv_cpuhp_online' that shadows the file-scope global of the same
name. The cpuhp state returned by cpuhp_setup_state() is stored in
the local, leaving the global at 0 (CPUHP_OFFLINE). When
hv_kexec_handler() or hv_machine_shutdown() later call
cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the
BUG_ON in __cpuhp_remove_state_cpuslocked().
Remove the local declaration so the cpuhp state is stored in the
file-scope global where hv_kexec_handler() and hv_machine_shutdown()
expect it.
Fixes: 2647c96649 ("Drivers: hv: Support establishing the confidential VMBus connection")
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Provide visibility into GPA intercept operations for debugging and
performance analysis of Microsoft Hypervisor guest memory management.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Pull probes fixes from Masami Hiramatsu:
"fprobe bug fixes:
- Prevent re-registration
Add an earlier check to reject re-registering an already active
fprobe before its state is modified during the initialization phase
- Robustness in failure paths:
- Ensure fprobes are correctly removed from all internal tables
and properly RCU-freed during registration failure
- Make unregister_fprobe() proceed with unregistration even if
temporary memory allocation fails
- RCU safety in module unloading
Avoid a potential "sleep in RCU" warning by removing a kcalloc()
call in the module notifier path. This also tries to remove
fprobe_hash_node even if memory allocation fails.
- Type-aware unregistration
Fix a bug where unregistering an fprobe did not account for
different types (entry-only vs entry-exit) at the same address,
which previously left "junk" entries in the underlying
ftrace/fgraph ops
- Unregistration of empty ftrace_ops
Avoid unneeded performance overhead due to making registered
ftrace_ops empty - which means 'trace all functions'. This counts
remaining entries and unregister ftrace_ops when it becomes empty.
Two new selftests to check above fixes:
- Module Unloading Test:
Specifically verifies that fprobe events on a module are correctly
cleaned up and do not trigger 'trace-all' behavior when the module
is removed.
- Multiple Fprobe Events Test:
Ensure that having multiple fprobes on the same function correctly
manages the ftrace hash map during removal"
* tag 'probes-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/ftrace: Add a testcase for multiple fprobe events
selftests/ftrace: Add a testcase for fprobe events on module
tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading
tracing/fprobe: Check the same type fprobe on table as the unregistered one
tracing/fprobe: Avoid kcalloc() in rcu_read_lock section
tracing/fprobe: Remove fprobe from hash in failure path
tracing/fprobe: Unregister fprobe even if memory allocation fails
tracing/fprobe: Reject registration of a registered fprobe before init
Pull more drm updates from Dave Airlie:
"This is a followup which is mostly next material with some fixes.
Alex pointed out I missed one of his AMD MRs from last week, so I
added that, then Jani sent the pipe reordering stuff, otherwise it's
just some minor i915 fixes and a dma-buf fix.
drm:
- Add support for AMD VSDB parsing to drm_edid
dma-buf:
- fix documentation formatting
i915:
- add support for reordered pipes to support joined pipes better
- Fix VESA backlight possible check condition
- Verify the correct plane DDB entry
amdgpu:
- Audio regression fix
- Use drm edid parser for AMD VSDB
- Misc cleanups
- VCE cs parse fixes
- VCN cs parse fixes
- RAS fixes
- Clean up and unify vram reservation handling
- GPU Partition updates
- system_wq cleanups
- Add CONFIG_GCOV_PROFILE_AMDGPU kconfig option
- SMU vram copy updates
- SMU 13/14/15 fixes
- UserQ fixes
- Replace pasid idr with an xarray
- Dither handling fix
- Enable amdgpu by default for CIK APUs
- Add IBs to devcoredump
amdkfd:
- system_wq cleanups
radeon:
- system_wq cleanups"
* tag 'drm-next-2026-04-22' of https://gitlab.freedesktop.org/drm/kernel: (62 commits)
drm/i915/display: change pipe allocation order for discrete platforms
drm/i915/wm: Verify the correct plane DDB entry
drm/i915/backlight: Fix VESA backlight possible check condition
drm/i915: Walk crtcs in pipe order
drm/i915/joiner: Make joiner "nomodeset" state copy independent of pipe order
dma-buf: fix htmldocs error for dma_buf_attach_revocable
drm/amdgpu: dump job ibs in the devcoredump
drm/amdgpu: store ib info for devcoredump
drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault
drm/amdgpu: Use amdgpu by default for CIK APUs too
drm/amd/display: Remove unused NUM_ELEMENTS macros
drm/amd/display: Replace inline NUM_ELEMENTS macro with ARRAY_SIZE
drm/amdgpu: save ring content before resetting the device
drm/amdgpu: make userq fence_drv drop explicit in queue destroy
drm/amdgpu: rework userq fence driver alloc/destroy
drm/amdgpu/userq: use dma_fence_wait_timeout without test for signalled
drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
drm/amdgpu/userq: add the return code too in error condition
drm/amdgpu/userq: fence wait for max time in amdgpu_userq_wait_for_signal
drm/amd/display: Change dither policy for 10 bpc output back to dithering
...
Fix fprobe to unregister ftrace_ops if corresponding type of fprobe
does not exist on the fprobe_ip_table and it is expected to be empty
when unloading modules.
Since ftrace thinks that the empty hash means everything to be traced,
if we set fprobes only on the unloaded module, all functions are traced
unexpectedly after unloading module.
e.g.
# modprobe xt_LOG.ko
# echo 'f:test log_tg*' > dynamic_events
# echo 1 > events/fprobes/test/enable
# cat enabled_functions
log_tg [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
log_tg_check [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
log_tg_destroy [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
# rmmod xt_LOG
# wc -l enabled_functions
34085 enabled_functions
Link: https://lore.kernel.org/all/177669368776.132053.10042301916765771279.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Pull f2fs updates from Jaegeuk Kim:
"In this round, the changes primarily focus on resolving race
conditions, memory safety issues (UAF), and improving the robustness
of garbage collection (GC), and folio management.
Enhancements:
- add page-order information for large folio reads in iostat
- add defrag_blocks sysfs node
Bug fixes:
- fix uninitialized kobject put in f2fs_init_sysfs()
- disallow setting an extension to both cold and hot
- fix node_cnt race between extent node destroy and writeback
- preserve previous reserve_{blocks,node} value when remount
- freeze GC and discard threads quickly
- fix false alarm of lockdep on cp_global_sem lock
- fix data loss caused by incorrect use of nat_entry flag
- skip empty sections in f2fs_get_victim
- fix inline data not being written to disk in writeback path
- fix fsck inconsistency caused by FGGC of node block
- fix fsck inconsistency caused by incorrect nat_entry flag usage
- call f2fs_handle_critical_error() to set cp_error flag
- fix fiemap boundary handling when read extent cache is incomplete
- fix use-after-free of sbi in f2fs_compress_write_end_io()
- fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io()
- fix incorrect file address mapping when inline inode is unwritten
- fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled
- avoid memory leak in f2fs_rename()"
* tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (35 commits)
f2fs: add page-order information for large folio reads in iostat
f2fs: do not support mmap write for large folio
f2fs: fix uninitialized kobject put in f2fs_init_sysfs()
f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show()
f2fs: disallow setting an extension to both cold and hot
f2fs: fix node_cnt race between extent node destroy and writeback
f2fs: allow empty mount string for Opt_usr|grp|projjquota
f2fs: fix to preserve previous reserve_{blocks,node} value when remount
f2fs: invalidate block device page cache on umount
f2fs: fix to freeze GC and discard threads quickly
f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer
f2fs: fix false alarm of lockdep on cp_global_sem lock
f2fs: fix data loss caused by incorrect use of nat_entry flag
f2fs: fix to skip empty sections in f2fs_get_victim
f2fs: fix inline data not being written to disk in writeback path
f2fs: fix fsck inconsistency caused by FGGC of node block
f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage
f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally
f2fs: refactor node footer flag setting related code
f2fs: refactor f2fs_move_node_folio function
...
Pull dax updates from Ira Weiny:
"The series adds DAX support required for the upcoming fuse/famfs file
system.[1] The support here is required because famfs is backed by
devdax rather than pmem. This all lays the groundwork for using shared
memory as a file system"
Link: https://lore.kernel.org/all/0100019d43e5f632-f5862a3e-361c-4b54-a9a6-96c242a8f17a-000000@email.amazonses.com/ [1]
* tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax/fsdev: fix uninitialized kaddr in fsdev_dax_zero_page_range()
dax: export dax_dev_get()
dax: Add fs_dax_get() func to prepare dax for fs-dax usage
dax: Add dax_set_ops() for setting dax_operations at bind time
dax: Add dax_operations for use by fs-dax on fsdev dax
dax: Save the kva from memremap
dax: add fsdev.c driver for fs-dax on character dax
dax: Factor out dax_folio_reset_order() helper
dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c
Pull coda dcache updates from Al Viro:
"Coda dcache-related cleanups and fixes"
* tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coda_flag_children(): fix a UAF
sanitize coda_dentry_delete()
coda: is_bad_inode() is always false there
Pull more crypto library updates from Eric Biggers:
"Crypto library fix and documentation update:
- Fix an integer underflow in the mpi library
- Improve the crypto library documentation"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: docs: Add rst documentation to Documentation/crypto/
docs: kdoc: Expand 'at_least' when creating parameter list
lib/crypto: mpi: Fix integer underflow in mpi_read_raw_from_sgl()
Pull erofs fixes from Gao Xiang:
- Fix dirent nameoff handling to avoid out-of-bound reads
out of crafted images
- Fix two type truncation issues on 32-bit platforms
* tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: unify lcn as u64 for 32-bit platforms
erofs: fix offset truncation when shifting pgoff on 32-bit platforms
erofs: fix the out-of-bounds nameoff handling for trailing dirents
struct vfio_cdx_device carries three fields that track whether MSI has
been configured: vdev->cdx_irqs (the allocated vector array), vdev->
msi_count (the array length), and vdev->config_msi (a boolean flag).
The three are set together when vfio_cdx_msi_enable() succeeds and
cleared together by vfio_cdx_msi_disable(). However, the error paths
in vfio_cdx_msi_enable() free the cdx_irqs allocation on failure
without resetting the pointer, leaving it stale and skewed from the
other two fields until the next enable call overwrites it.
Clear vdev->cdx_irqs to NULL alongside the kfree() in both error paths
so the pointer consistently reflects the configured state. With that
invariant restored and access to the MSI state serialized by
cdx_irqs_lock, vdev->config_msi is fully redundant with
(vdev->cdx_irqs != NULL). Drop the config_msi field and switch all
readers to test cdx_irqs directly.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
vfio_cdx_set_msi_trigger() reads vdev->config_msi and operates on the
vdev->cdx_irqs array based on its value, but provides no serialization
against concurrent VFIO_DEVICE_SET_IRQS ioctls. Two callers can race
such that one observes config_msi as set while another clears it and
frees cdx_irqs via vfio_cdx_msi_disable(), resulting in a use-after-free
of the cdx_irqs array.
Add a cdx_irqs_lock mutex to struct vfio_cdx_device and acquire it in
vfio_cdx_set_msi_trigger(), which is the single chokepoint through
which all updates to config_msi, cdx_irqs, and msi_count flow, covering
both the ioctl path and the close-device cleanup path. This keeps the
test of config_msi atomic with the subsequent enable, disable, or
trigger operations.
Drop the pre-call !cdx_irqs test from vfio_cdx_irqs_cleanup() as part
of this change: the optimization it provided is redundant with the
!config_msi early-return inside vfio_cdx_msi_disable(), and leaving the
test in place would be an unsynchronized read of state the new lock is
meant to protect.
Fixes: 848e447e00 ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Add validation to ensure MSI is configured before accessing cdx_irqs
array in vfio_cdx_set_msi_trigger(). Without this check, userspace
can trigger a NULL pointer dereference by calling VFIO_DEVICE_SET_IRQS
with VFIO_IRQ_SET_DATA_BOOL or VFIO_IRQ_SET_DATA_NONE flags before
ever setting up interrupts via VFIO_IRQ_SET_DATA_EVENTFD.
The vfio_cdx_msi_enable() function allocates the cdx_irqs array and
sets config_msi to 1 only when called through the EVENTFD path. The
trigger loop (for DATA_BOOL/DATA_NONE) assumed this had already been
done, but there was no enforcement of this call ordering.
This matches the protection used in the PCI VFIO driver where
vfio_pci_set_msi_trigger() checks irq_is() before the trigger loop.
Fixes: 848e447e00 ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Acked-by: Nipun Gupta <nipun.gupta@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Convert migf->lock acquisitions in virtiovf_disable_fd() and
virtiovf_save_read() to use guard(). In virtiovf_save_read() this
eliminates the out_unlock label and multiple goto paths by allowing
direct returns, and removes the need for the done variable to double
as an error carrier.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
The list_lock spinlock with IRQ disabling was copied from the mlx5
vfio-pci variant driver, where it is justified by a hardirq async
command completion callback that accesses the protected lists. The
virtio driver has no such interrupt context usage; all list_lock
acquisitions occur in process context via file read/write operations
or state transitions under state_mutex.
Convert list_lock to a mutex to be consistent with peer vfio-pci
variant drivers (hisilicon, pds, qat, xe) which all use mutexes for
equivalent migration data protection. This also fixes a mismatched
spin_lock()/spin_unlock_irq() pair in virtiovf_read_device_context_chunk()
that could incorrectly enable interrupts.
Reported-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Closes: https://lore.kernel.org/all/20260413073603.30538-1-guojinhui.liam@bytedance.com
Fixes: 0bbc82e4ec ("vfio/virtio: Add support for the basic live migration functionality")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
On device shutdown, make vfio_pci_core_close_device() call
vfio_pci_dma_buf_cleanup() before the function is disabled via
vfio_pci_core_disable(). This ensures that all access via DMABUFs is
revoked before the function's BARs become inaccessible.
This fixes an issue where, if the function is disabled first, a tiny
window exists in which the function's MSE is cleared and yet BARs
could still be accessed via the DMABUF. The resources would also be
freed and up for grabs by a different driver.
Fixes: 5d74781ebc ("vfio/pci: Add dma-buf export support for MMIO regions")
Signed-off-by: Matt Evans <mattev@meta.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20260415181752.1027604-1-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
The function kgdb_wait() was folded into the static function
kgdb_cpu_enter() by commit 62fae31219 ("kgdb: eliminate
kgdb_wait(), all cpus enter the same way"). Update the four stale
references accordingly:
- include/linux/kgdb.h and arch/x86/kernel/kgdb.c: the
kgdb_roundup_cpus() kdoc describes what other CPUs are rounded up
to call. Because kgdb_cpu_enter() is static, the correct public
entry point is kgdb_handle_exception(); also fix a pre-existing
grammar error ("get them be" -> "get them into") and reflow the
text.
- kernel/debug/debug_core.c: replace with the generic description
"the debug trap handler", since the actual entry path is
architecture-specific.
- kernel/debug/gdbstub.c: kgdb_cpu_enter() is correct here (it
describes internal state, not a call target); add the missing
parentheses.
Suggested-by: Daniel Thompson <daniel@riscstar.com>
Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>
Pull clk updates from Stephen Boyd:
"We've finally gotten rid of the struct clk_ops::round_rate() code
after months of effort from Brian Masney. Now the only option is to
use determine_rate(), which is good because that takes a struct
argument instead of just a couple unsigned longs, allowing us to
easily modify the way we determine and set rates in the clk tree.
Beyond that core framework change we've got the typical pile of new
SoC clk driver additions, fixes for clk data and/or adding missing
clks because the consumer driver using those clks wasn't ready, etc.
The usual suspects are all here: Qualcomm, Samsung, Mediatek, and
Rockchip along with some newcomers making RISC-V SoCs like ESWIN's
eic700 and Tenstorrent's Atlantis. The clk driver side of this looks
pretty normal.
Core:
- Remove the round_rate() clk op (yay!)
New Drivers:
- ESWIN eic700 SoC clk support
- Econet EN751221 SoC clock/reset support
- Global TCSR, RPMh, and display clock controller support for the
Qualcomm Eliza platform
- TCSR, the multiple global, and the RPMh clock controller support
for the Qualcomm Nord platform
- GPU clock controller support for Qualcomm SM8750
- Video and GPU clock controller support for Qualcomm Glymur
- Global clock controller support for Qualcomm IPQ5210
- Axis ARTPEC-9: Add new PLL clocks and new drivers for eight clock
controllers on the SoC
- ExynosAutov920: Add G3D (GPU) clock controller
- Clock driver for the Rockchip RV1103B SoC
- Initial support for the Renesas RZ/G3L (R9A08G046) SoC
- Clock and reset controllers (e.g. PRCM) in the Tenstorrent Atlantis SoC"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (132 commits)
clk: visconti: pll: initialize clk_init_data to zero
clk: fsl-sai: Add MCLK generation support
clk: fsl-sai: Extract clock setup into fsl_sai_clk_register()
dt-bindings: clock: fsl-sai: Document clock-cells = <1> support
clk: fsl-sai: Add i.MX8M support with 8 byte register offset
clk: fsl-sai: Sort the headers
dt-bindings: clock: fsl-sai: Document i.MX8M support
clk: qcom: gcc: Add multiple global clock controller driver for Nord SoC
clk: qcom: rpmh: Add support for Nord rpmh clocks
clk: qcom: Add TCSR clock driver for Nord SoC
dt-bindings: clock: qcom: Add Nord Global Clock Controller
dt-bindings: clock: qcom-rpmhcc: Add support for Nord SoCs
dt-bindings: clock: qcom: Document the Nord SoC TCSR Clock Controller
clk: qcom: gcc-x1e80100: Keep GCC USB QTB clock always ON
clk: qcom: Constify list of critical CBCR registers
clk: qcom: Constify qcom_cc_driver_data
clk: qcom: videocc-glymur: Constify qcom_cc_desc
clk: qcom: Add a driver for SM8750 GPU clocks
dt-bindings: clock: qcom: Add SM8750 GPU clocks
clk: qcom: ipq-cmn-pll: Add IPQ8074 SoC support
...
Pull SCSI updates from James Bottomley:
"Usual driver updates (ufs, lpfc, fnic, target, mpi3mr).
The substantive core changes are adding a 'serial' sysfs attribute and
getting sd to support > PAGE_SIZE sectors"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (98 commits)
scsi: target: Don't validate ignored fields in PROUT PREEMPT
scsi: qla2xxx: Use nr_cpu_ids instead of NR_CPUS for qp_cpu_map allocation
scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP
scsi: mpi3mr: Fix typo
scsi: sd: fix missing put_disk() when device_add(&disk_dev) fails
scsi: libsas: Delete unused to_dom_device() and to_dev_attr()
scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC
scsi: iscsi_tcp: Remove unneeded selections of CRYPTO and CRYPTO_MD5
scsi: lpfc: Update lpfc version to 15.0.0.0
scsi: lpfc: Add PCI ID support for LPe42100 series adapters
scsi: lpfc: Introduce 128G link speed selection and support
scsi: lpfc: Check ASIC_ID register to aid diagnostics during failed fw updates
scsi: lpfc: Update construction of SGL when XPSGL is enabled
scsi: lpfc: Remove deprecated PBDE feature
scsi: lpfc: Add REG_VFI mailbox cmd error handling
scsi: lpfc: Log MCQE contents for mbox commands with no context
scsi: lpfc: Select mailbox rq_create cmd version based on SLI4 if_type
scsi: lpfc: Break out of IRQ affinity assignment when mask reaches nr_cpu_ids
scsi: ufs: core: Make the header files self-contained
scsi: ufs: core: Remove an include directive from ufshcd-crypto.h
...
Commit 2c67dc457b ("tracing: fprobe: optimization for entry only case")
introduced a different ftrace_ops for entry-only fprobes.
However, when unregistering an fprobe, the kernel only checks if another
fprobe exists at the same address, without checking which type of fprobe
it is.
If different fprobes are registered at the same address, the same address
will be registered in both fgraph_ops and ftrace_ops, but only one of
them will be deleted when unregistering. (the one removed first will not
be deleted from the ops).
This results in junk entries remaining in either fgraph_ops or ftrace_ops.
For example:
=======
cd /sys/kernel/tracing
# 'Add entry and exit events on the same place'
echo 'f:event1 vfs_read' >> dynamic_events
echo 'f:event2 vfs_read%return' >> dynamic_events
# 'Enable both of them'
echo 1 > events/fprobes/enable
cat enabled_functions
vfs_read (2) ->arch_ftrace_ops_list_func+0x0/0x210
# 'Disable and remove exit event'
echo 0 > events/fprobes/event2/enable
echo -:event2 >> dynamic_events
# 'Disable and remove all events'
echo 0 > events/fprobes/enable
echo > dynamic_events
# 'Add another event'
echo 'f:event3 vfs_open%return' > dynamic_events
cat dynamic_events
f:fprobes/event3 vfs_open%return
echo 1 > events/fprobes/enable
cat enabled_functions
vfs_open (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
vfs_read (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
=======
As you can see, an entry for the vfs_read remains.
To fix this issue, when unregistering, the kernel should also check if
there is the same type of fprobes still exist at the same address, and
if not, delete its entry from either fgraph_ops or ftrace_ops.
Link: https://lore.kernel.org/all/177669367993.132053.10553046138528674802.stgit@mhiramat.tok.corp.google.com/
Fixes: 2c67dc457b ("tracing: fprobe: optimization for entry only case")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
When register_fprobe_ips() fails, it tries to remove a list of
fprobe_hash_node from fprobe_ip_table, but it missed to remove
fprobe itself from fprobe_table. Moreover, when removing
the fprobe_hash_node which is added to rhltable once, it must
use kfree_rcu() after removing from rhltable.
To fix these issues, this reuses unregister_fprobe() internal
code to rollback the half-way registered fprobe.
Link: https://lore.kernel.org/all/177669366417.132053.17874946321744910456.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
unregister_fprobe() can fail under memory pressure because of memory
allocation failure, but this maybe called from module unloading, and
usually there is no way to retry it. Moreover. trace_fprobe does not
check the return value.
To fix this problem, unregister fprobe and fprobe_hash_node even if
working memory allocation fails.
Anyway, if the last fprobe is removed, the filter will be freed.
Link: https://lore.kernel.org/all/177669365629.132053.8433032896213721288.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Pull dcache busy loop updates from Al Viro:
"Fix livelocks in shrink_dcache_tree()
If shrink_dcache_tree() finds a dentry in the middle of being killed
by another thread, it has to wait until the victim finishes dying,
gets detached from the tree and ceases to pin its parent.
The way we used to deal with that amounted to busy-wait;
unfortunately, it's not just inefficient but can lead to reliably
reproducible hard livelocks.
Solved by having shrink_dentry_tree() attach a completion to such
dentry, with dentry_unlist() calling complete() on all objects
attached to it. With a bit of care it can be done without growing
struct dentry or adding overhead in normal case"
* tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get rid of busy-waiting in shrink_dcache_tree()
dcache.c: more idiomatic "positives are not allowed" sanity checks
struct dentry: make ->d_u anonymous
for_each_alias(): helper macro for iterating through dentries of given inode
As sashiko reported [1], `lcn` was typed as `unsigned long` (or
`unsigned int` sometimes), which is only 32 bits wide on 32-bit
platforms, which causes `(lcn << lclusterbits)` to be truncated
at 4 GiB.
In order to consolidate the logic, just use `u64` consistently
around the codebase.
[1] https://sashiko.dev/r/20260420034612.1899973-1-hsiangkao%40linux.alibaba.com
Fixes: 152a333a58 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
On 32-bit platforms, pgoff_t is 32 bits wide, so left-shifting
large arbitrary pgoff_t values by PAGE_SHIFT performs 32-bit arithmetic
and silently truncates the result for pages beyond the 4 GiB boundary.
Cast the page index to loff_t before shifting to produce a correct
64-bit byte offset.
Fixes: 386292919c ("erofs: introduce readmore decompression strategy")
Fixes: 307210c262 ("erofs: verify metadata accesses for file-backed mounts")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
This odd file was added to automatically figure out tool-generated
symbols.
Honestly, it *should* have been just a real honest-to-goodness regular
file in git, instead of having strange code to generate it in the
Makefile, but that is not how that silly thing works. So now we need to
ignore it explicitly.
Fixes: 1211907ac0 ("tracing: Generate undef symbols allowlist for simple_ring_buffer")
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kselftest fixes from Shuah Khan:
"Fix regressions in non-bash shells and busybox support, and revert a
commit that regressed in build and installation when one or more tests
fail to build.
Fix duplicated test number reporting introduced in ktap support patch"
* tag 'linux_kselftest-next-7.1-next-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Fix duplicated test number reporting
selftests: Fix runner.sh for non-bash shells
selftests: Fix runner.sh busybox support
selftests: Deescalate error reporting
Pull more arm64 updates from Catalin Marinas:
"The main 'feature' is a workaround for C1-Pro erratum 4193714
requiring IPIs during TLB maintenance if a process is running in user
space with SME enabled.
The hardware acknowledges the DVMSync messages before completing
in-flight SME accesses, with security implications. The workaround
makes use of the mm_cpumask() to track the cores that need
interrupting (arm64 hasn't used this mask before).
The rest are fixes for MPAM, CCA and generated header that turned up
during the merging window or shortly before.
Summary:
Core features:
- Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
DVMSync acknowledgement. The fix consists of sending IPIs on TLB
maintenance to those CPUs running in user space with SME enabled
- Include kernel-hwcap.h in list of generated files (missed in a
recent commit generating the KERNEL_HWCAP_* macros)
CCA:
- Fix RSI_INCOMPLETE error check in arm-cca-guest
MPAM:
- Fix an unmount->remount problem with the CDP emulation,
uninitialised variable and checker warnings"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
arm_mpam: resctrl: Fix the check for no monitor components found
arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
arm64/hwcap: Include kernel-hwcap.h in list of generated files
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
Pull sh updates from John Paul Adrian Glaubitz:
"Two patches from Thomas Zimmermann, one by Tim Bird and one by Thomas
Weißschuh.
The first patch by Thomas Zimmermann adds a missing include in dac.h
for SH-3 which became necessary after 243ce64b2b ("backlight: Do not
include <linux/fb.h> in header file") which made __raw_readb() and
__raw_writeb() inaccessible in dac.h.
Thomas' second patch drops CONFIG_FIRMWARE_EDID for SH as it depends
on X86 or EFI_GENERIC_STUB which are not defined on SH for obvious
reasons.
The patch by Tim Bird fixes just a small typo in two SPDX ID lines
which he stumbled over by accident.
And, least but not last, the patch by Thomas Weißschuh removes the
CONFIG_VSYSCALL reference from UAPI. This was necessary as the
definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
avoid a default gate VMA to be created. However that default gate VMA
was removed entirely in commit a6c19dfe39 (arm64,ia64,ppc,s390,
sh,tile,um,x86,mm: remove default gate area)"
* tag 'sh-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Drop CONFIG_FIRMWARE_EDID from defconfig files
sh: Remove CONFIG_VSYSCALL reference from UAPI
sh: Fix typo in SPDX license ID lines
sh: Include <linux/io.h> in dac.h