Pull driver core fixes from Danilo Krummrich:
- Fix a race condition in Devres::drop(). This depends on two other
patches:
- (Minimal) Rust abstractions for struct completion
- Let Revocable indicate whether its data is already being revoked
- Fix Devres to avoid exposing the internal Revocable
- Add .mailmap entry for Danilo Krummrich
- Add Madhavan Srinivasan to embargoed-hardware-issues.rst
* tag 'driver-core-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
Documentation: embargoed-hardware-issues.rst: Add myself for Power
mailmap: add entry for Danilo Krummrich
rust: devres: do not dereference to the internal Revocable
rust: devres: fix race in Devres::drop()
rust: revocable: indicate whether `data` has been revoked already
rust: completion: implement initial abstraction
Pull cgroup fix from Tejun Heo:
- In cgroup1 freezer, a task migrating into a frozen cgroup might not
get frozen immediately due to the wrong operation order. Fix it.
* tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup,freezer: fix incomplete freezing when attaching tasks
Pull workqueue fix from Tejun Heo:
- Fix missed early init of wq_isolated_cpumask
* tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()
Pull sched_ext fixes from Tejun Heo:
- Fix a couple bugs in cgroup cpu.weight support
- Add the new sched-ext@lists.linux.dev to MAINTAINERS
* tag 'sched_ext-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group()
sched_ext: Make scx_group_set_weight() always update tg->scx.weight
sched_ext: Update mailing list entry in MAINTAINERS
Pull crypto library fixes from Eric Biggers:
- Fix a regression in the arm64 Poly1305 code
- Fix a couple compiler warnings
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto/poly1305: Fix arm64's poly1305_blocks_arch()
lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older
lib/crypto: Annotate crypto strings with nonstring
An issue was found:
# cd /sys/fs/cgroup/freezer/
# mkdir test
# echo FROZEN > test/freezer.state
# cat test/freezer.state
FROZEN
# sleep 1000 &
[1] 863
# echo 863 > test/cgroup.procs
# cat test/freezer.state
FREEZING
When tasks are migrated to a frozen cgroup, the freezer fails to
immediately freeze the tasks, causing the cgroup to remain in the
"FREEZING".
The freeze_task() function is called before clearing the CGROUP_FROZEN
flag. This causes the freezing() check to incorrectly return false,
preventing __freeze_task() from being invoked for the migrated task.
To fix this issue, clear the CGROUP_FROZEN state before calling
freeze_task().
Fixes: f5d39b0208 ("freezer,sched: Rewrite core freezer logic")
Cc: stable@vger.kernel.org # v6.1+
Reported-by: Zhong Jiawei <zhongjiawei1@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull selinux fix from Paul Moore:
"A small SELinux patch to resolve a UBSAN warning in the
xfrm/labeled-IPsec code"
* tag 'selinux-pr-20250618' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix selinux_xfrm_alloc_user() to set correct ctx_len
Pull ftrace fix from Steven Rostedt:
- Do not blindly enable function_graph tracer when updating
funcgraph-args
When the option to trace function arguments in the function graph
trace is updated, it requires the function graph tracer to switch its
callback routine. It disables function graph tracing, updates the
callback and then re-enables function graph tracing.
The issue is that it doesn't check if function graph tracing is
currently enabled or not. If it is not enabled, it will try to
disable it and re-enable it (which will actually enable it even
though it is not the current tracer). This causes an issue in the
accounting and will trigger a WARN_ON() if the function tracer is
enabled after that.
* tag 'ftrace-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fgraph: Do not enable function_graph tracer when setting funcgraph-args
Pull ata fixes from Niklas Cassel:
- Force PIO for ATAPI devices on VT6415/VT6330 as the controller locks
up on ATAPI DMA (Tasos)
- Fix ACPI PATA cable type detection such that the controller is not
forced down to a slow transfer mode (Tasos)
- Fix build error on 32-bit UML (Johannes)
- Fix a PCI region leak in the pata_macio driver so that the driver no
longer fails to load after rmmod (Philipp)
- Use correct DMI BIOS build date for ThinkPad W541 quirk (me)
- Disallow LPM for ASUSPRO-D840SA motherboard as this board
interestingly enough gets graphical corruptions on the iGPU when LPM
is enabled (me)
- Disallow LPM for Asus B550-F motherboard as this board will get
command timeouts on ports 5 and 6, yet LPM with the same drive works
fine on all other ports (Mikko)
* tag 'ata-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: ahci: Disallow LPM for Asus B550-F motherboard
ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboard
ata: ahci: Use correct BIOS build date for ThinkPad W541 quirk
ata: pata_macio: Fix PCI region leak
ata: pata_cs5536: fix build on 32-bit UML
ata: libata-acpi: Do not assume 40 wire cable if no devices are enabled
ata: pata_via: Force PIO for ATAPI devices on VT6415/VT6330
Pull libnvdimm fix from Ira Weiny:
"This converts the pmem-region device tree bindings to YAML to fix
errors and bring it up to date"
* tag 'libnvdimm-fixes-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dt-bindings: pmem: Convert binding to YAML
Now when isolcpus is enabled via the cmdline, wq_isolated_cpumask does
not include these isolated CPUs, even wq_unbound_cpumask has already
excluded them. It is only when we successfully configure an isolate cpuset
partition that wq_isolated_cpumask gets overwritten by
workqueue_unbound_exclude_cpumask(), including both the cmdline-specified
isolated CPUs and the isolated CPUs within the cpuset partitions.
Fix this issue by initializing wq_isolated_cpumask properly in
workqueue_init_early().
Fixes: fe28f631fa ("workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask")
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull x86 platform driver fixes from Ilpo Järvinen:
- amd/hsmp: Timeout handling fixes
- amd/pmc:
- Clear metrics table at start of cycle
- Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
- amd/pmf: Fix error handling corner cases (nth attempt)
- alienware-wmi-wmax: Revert G-Mode support as it lowers performance
- dell_rbu:
- Fix sparse lock context warning
- Fix list head usage
- Don't overwrite data buffer past the size of the last packet
- ideapad-laptop: Ensure EC is not polled too frequently
- intel-uncore-freq:
- Fail module load when plat_info is NULL
- Avoid a non-literal format string as it triggers a compiler warning
- intel/pmc: Add Lunar Lake and Panther Lake support to SSRAM Telemetry
- intel/power-domains: Fix error code in tpmi_init()
- samsung-galaxybook: Add support for Notebook 9 Pro and others
(SAM0426)
* tag 'platform-drivers-x86-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1"
platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
platform/x86/intel-uncore-freq: avoid non-literal format string
platform/x86/intel/pmc: Add Panther Lake support to Intel PMC SSRAM Telemetry
platform/x86/intel/pmc: Add Lunar Lake support to Intel PMC SSRAM Telemetry
MAINTAINERS: .mailmap: Update Hans de Goede's email address
platform/x86: dell_rbu: Bump version
platform/x86: dell_rbu: Stop overwriting data buffer
platform/x86: dell_rbu: Fix list usage
platform/x86: dell_rbu: Fix lock context warning
platform/x86/amd: pmf: Simplify error flow in amd_pmf_init_smart_pc()
platform/x86/amd: pmf: Prevent amd_pmf_tee_deinit() from running twice
platform/x86/amd: pmf: Use device managed allocations
x86/platform/amd: replace down_timeout() with down_interruptible()
x86/platform/amd: move final timeout check to after final sleep
platform/x86/amd: pmc: Clear metrics table at start of cycle
platform/x86/intel: power-domains: Fix error code in tpmi_init()
platform/x86: samsung-galaxybook: Add SAM0426
platform/x86/intel-uncore-freq: Fail module load when plat_info is NULL
platform/x86: ideapad-laptop: use usleep_range() for EC polling
During task_group creation, sched_create_group() calls
scx_group_set_weight() with CGROUP_WEIGHT_DFL to initialize the sched_ext
portion. This is premature and ends up calling ops.cgroup_set_weight() with
an incorrect @cgrp before ops.cgroup_init() is called.
sched_create_group() should just initialize SCX related fields in the new
task_group. Fix it by factoring out scx_tg_init() from sched_init() and
making sched_create_group() call that function instead of
scx_group_set_weight().
v2: Retain CONFIG_EXT_GROUP_SCHED ifdef in sched_init() as removing it leads
to build failures on !CONFIG_GROUP_SCHED configs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8195136669 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Otherwise, tg->scx.weight can go out of sync while scx_cgroup is not enabled
and ops.cgroup_init() may be called with a stale weight value.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8195136669 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Pull x86 fixes from Dave Hansen:
"This is a pretty scattered set of fixes. The majority of them are
further fixups around the recent ITS mitigations.
The rest don't really have a coherent story:
- Some flavors of Xen PV guests don't support large pages, but the
set_memory.c code assumes all CPUs support them.
Avoid problems with a quick CPU feature check.
- The TDX code has some wrappers to help retry calls to the TDX
module. They use function pointers to assembly functions and the
compiler usually generates direct CALLs. But some new compilers,
plus -Os turned them in to indirect CALLs and the assembly code was
not annotated for indirect calls.
Force inlining of the helper to fix it up.
- Last, a FRED issue showed up when single-stepping. It's fine when
using an external debugger, but was getting stuck returning from a
SIGTRAP handler otherwise.
Clear the FRED 'swevent' bit to ensure that forward progress is
made"
* tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "mm/execmem: Unify early execmem_cache behaviour"
x86/its: explicitly manage permissions for ITS pages
x86/its: move its_pages array to struct mod_arch_specific
x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set
x86/mm/pat: don't collapse pages without PSE set
x86/virt/tdx: Avoid indirect calls to TDX assembly functions
selftests/x86: Add a test to detect infinite SIGTRAP handler loop
x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
Pull powerpc fixes from Madhavan Srinivasan:
- Fix to handle VDSO32 with pcrel
- Couple of dts fixes in microwatt and mpc8315erdb
- Fix to handle PE bridge reconfiguration in VFIO EEH recovery path
- Fix ioctl macros related to struct termio
Thanks to Christophe Leroy, Ganesh Goudar, J. Neuschäfer, Justin M.
Forbes, Michael Ellerman, Narayana Murty N, Tulio Magno, and Vaibhav
Jain
* tag 'powerpc-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix struct termio related ioctl macros
powerpc: dts: mpc8315erdb: Add GPIO controller node
powerpc/microwatt: Fix model property in device tree
powerpc/eeh: Fix missing PE bridge reconfiguration during VFIO EEH recovery
powerpc/vdso: Fix build of VDSO32 with pcrel
Pull vfs fixes from Christian Brauner:
- Fix a regression in overlayfs caused by reworking the lookup_one*()
set of helpers
- Make sure that the name of the dentry is printed in overlayfs'
mkdir() helper
- Add missing iocb values to TRACE_IOCB_STRINGS define
- Unlock the superblock during iterate_supers_type(). This was an
accidental internal api change
- Drop a misleading assert in file_seek_cur_needs_f_lock() helper
- Never refuse to return PIDFD_GET_INGO when parent pid is zero
That can trivially happen in container scenarios where the parent
process might be located in an ancestor pid namespace
- Don't revalidate in try_lookup_noperm() as that causes regression for
filesystems such as cifs
- Fix simple_xattr_list() and reset the err variable after
security_inode_listsecurity() got called so as not to confuse
userspace about the length of the xattr
* tag 'vfs-6.16-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: drop assert in file_seek_cur_needs_f_lock
fs: unlock the superblock during iterate_supers_type
ovl: fix debug print in case of mkdir error
VFS: change try_lookup_noperm() to skip revalidation
fs: add missing values to TRACE_IOCB_STRINGS
fs/xattr.c: fix simple_xattr_list()
ovl: fix regression caused by lookup helpers API changes
pidfs: never refuse ppid == 0 in PIDFD_GET_INFO
The assert in function file_seek_cur_needs_f_lock() can be triggered very
easily because there are many users of vfs_llseek() (such as overlayfs)
that do their custom locking around llseek instead of relying on
fdget_pos(). Just drop the overzealous assertion.
Fixes: da06e3c517 ("fs: don't needlessly acquire f_lock")
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Luis Henriques <luis@igalia.com>
Link: https://lore.kernel.org/20250613101111.17716-1-luis@igalia.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
After commit 6f110a5e4f ("Disable SLUB_TINY for build testing"), which
causes CONFIG_KASAN to be enabled in allmodconfig again, arm64
allmodconfig builds with clang-17 and older show an instance of
-Wframe-larger-than (which breaks the build with CONFIG_WERROR=y):
lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (2336) exceeds limit (2048) in 'curve25519_generic' [-Werror,-Wframe-larger-than]
757 | void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE],
| ^
When KASAN is disabled, the stack usage is roughly quartered:
lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (608) exceeds limit (128) in 'curve25519_generic' [-Werror,-Wframe-larger-than]
757 | void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE],
| ^
Using '-Rpass-analysis=stack-frame-layout' shows the following variables
and many, many 8-byte spills when KASAN is enabled:
Offset: [SP-144], Type: Variable, Align: 8, Size: 40
Offset: [SP-464], Type: Variable, Align: 8, Size: 320
Offset: [SP-784], Type: Variable, Align: 8, Size: 320
Offset: [SP-864], Type: Variable, Align: 32, Size: 80
Offset: [SP-896], Type: Variable, Align: 32, Size: 32
Offset: [SP-1016], Type: Variable, Align: 8, Size: 120
When KASAN is disabled, there are still spills but not at many and the
variables list is smaller:
Offset: [SP-192], Type: Variable, Align: 32, Size: 80
Offset: [SP-224], Type: Variable, Align: 32, Size: 32
Offset: [SP-344], Type: Variable, Align: 8, Size: 120
Disable KASAN for this file when using clang-17 or older to avoid
blowing out the stack, clearing up the warning.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250609-curve25519-hacl64-disable-kasan-clang-v1-1-08ea0ac5ccff@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Annotate various keys, ivs, and other byte arrays with __nonstring so
that static initializers will not complain about truncating the trailing
NUL byte under GCC 15 with -Wunterminated-string-initialization enabled.
Silences many warnings like:
../lib/crypto/aesgcm.c:642:27: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (13 chars into 12 available) [-Wunterminated-string-initialization]
642 | .iv = "\xca\xfe\xba\xbe\xfa\xce\xdb\xad"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250529173113.work.760-kees@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Pull Kbuild fixes from Masahiro Yamada:
- Move warnings about linux/export.h from W=1 to W=2
- Fix structure type overrides in gendwarfksyms
* tag 'kbuild-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
gendwarfksyms: Fix structure type overrides
kbuild: move warnings about linux/export.h from W=1 to W=2
As we always iterate through the entire die_map when expanding
type strings, recursively processing referenced types in
type_expand_child() is not actually necessary. Furthermore,
the type_string kABI rule added in commit c9083467f7
("gendwarfksyms: Add a kABI rule to override type strings") can
fail to override type strings for structures due to a missing
kabi_get_type_string() check in this function.
Fix the issue by dropping the unnecessary recursion and moving
the override check to type_expand(). Note that symbol versions
are otherwise unchanged with this patch.
Fixes: c9083467f7 ("gendwarfksyms: Add a kABI rule to override type strings")
Reported-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull block fixes from Jens Axboe:
- Fix for a deadlock on queue freeze with zoned writes
- Fix for zoned append emulation
- Two bio folio fixes, for sparsemem and for very large folios
- Fix for a performance regression introduced in 6.13 when plug
insertion was changed
- Fix for NVMe passthrough handling for polled IO
- Document the ublk auto registration feature
- loop lockdep warning fix
* tag 'block-6.16-20250614' of git://git.kernel.dk/linux:
nvme: always punt polled uring_cmd end_io work to task_work
Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists
block: Fix bvec_set_folio() for very large folios
bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAP
block: use plug request list tail for one-shot backmerge attempt
block: don't use submit_bio_noacct_nocheck in blk_zone_wplug_bio_work
block: Clear BIO_EMULATES_ZONE_APPEND flag on BIO completion
ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)
loop: move lo_set_size() out of queue freeze
Pull io_uring fixes from Jens Axboe:
- Fix for a race between SQPOLL exit and fdinfo reading.
It's slim and I was only able to reproduce this with an artificial
delay in the kernel. Followup sparse fix as well to unify the access
to ->thread.
- Fix for multiple buffer peeking, avoiding truncation if possible.
- Run local task_work for IOPOLL reaping when the ring is exiting.
This currently isn't done due to an assumption that polled IO will
never need task_work, but a fix on the block side is going to change
that.
* tag 'io_uring-6.16-20250614' of git://git.kernel.dk/linux:
io_uring: run local task_work from ring exit IOPOLL reaping
io_uring/kbuf: don't truncate end buffer for multiple buffer peeks
io_uring: consistently use rcu semantics with sqpoll thread
io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()
Pull Rust fix from Miguel Ojeda:
- 'hrtimer': fix future compile error when the 'impl_has_hr_timer!'
macro starts to get called
* tag 'rust-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: time: Fix compile error in impl_has_hr_timer macro
Pull misc fixes from Andrew Morton:
"9 hotfixes. 3 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels. Only 4 are for MM"
* tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: add mmap_prepare() compatibility layer for nested file systems
init: fix build warnings about export.h
MAINTAINERS: add Barry as a THP reviewer
drivers/rapidio/rio_cm.c: prevent possible heap overwrite
mm: close theoretical race where stale TLB entries could linger
mm/vma: reset VMA iterator on commit_merge() OOM failure
docs: proc: update VmFlags documentation in smaps
scatterlist: fix extraneous '@'-sign kernel-doc notation
selftests/mm: skip failed memfd setups in gup_longterm
Pull SCSI fixes from James Bottomley:
"All fixes for drivers.
The core change in the error handler is simply to translate an ALUA
specific sense code into a retry the ALUA components can handle and
won't impact any other devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: error: alua: I/O errors for ALUA state transitions
scsi: storvsc: Increase the timeouts to storvsc_timeout
scsi: s390: zfcp: Ensure synchronous unit_add
scsi: iscsi: Fix incorrect error path labels for flashnode operations
scsi: mvsas: Fix typos in per-phy comments and SAS cmd port registers
scsi: core: ufs: Fix a hang in the error handler
Pull drm fixes from Dave Airlie:
"Quiet week, only two pull requests came my way, xe has a couple of
fixes and then a bunch of fixes across the board, vc4 probably fixes
the biggest problem:
vc4:
- Fix infinite EPROBE_DEFER loop in vc4 probing
amdxdna:
- Fix amdxdna firmware size
meson:
- modesetting fixes
sitronix:
- Kconfig fix for st7171-i2c
dma-buf:
- Fix -EBUSY WARN_ON_ONCE in dma-buf
udmabuf:
- Use dma_sync_sgtable_for_cpu in udmabuf
xe:
- Fix regression disallowing 64K SVM migration
- Use a bounce buffer for WA BB"
* tag 'drm-fixes-2025-06-14' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/lrc: Use a temporary buffer for WA BB
udmabuf: use sgtable-based scatterlist wrappers
dma-buf: fix compare in WARN_ON_ONCE
drm/sitronix: st7571-i2c: Select VIDEOMODE_HELPERS
drm/meson: fix more rounding issues with 59.94Hz modes
drm/meson: use vclk_freq instead of pixel_freq in debug print
drm/meson: fix debug log statement when setting the HDMI clocks
drm/vc4: fix infinite EPROBE_DEFER loop
drm/xe/svm: Fix regression disallowing 64K SVM migration
accel/amdxdna: Fix incorrect PSP firmware size
We can't expose direct access to the internal Revocable, since this
allows users to directly revoke the internal Revocable without Devres
having the chance to synchronize with the devres callback -- we have to
guarantee that the internal Revocable has been fully revoked before
the device is fully unbound.
Hence, remove the corresponding Deref implementation and, instead,
provide indirect accessors for the internal Revocable.
Note that we can still support Devres::revoke() by implementing the
required synchronization (which would be almost identical to the
synchronization in Devres::drop()).
Fixes: 76c01ded72 ("rust: add devres abstraction")
Reviewed-by: Benno Lossin <lossin@kernel.org>
Link: https://lore.kernel.org/r/20250611174827.380555-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
In Devres::drop() we first remove the devres action and then drop the
wrapped device resource.
The design goal is to give the owner of a Devres object control over when
the device resource is dropped, but limit the overall scope to the
corresponding device being bound to a driver.
However, there's a race that was introduced with commit 8ff656643d
("rust: devres: remove action in `Devres::drop`"), but also has been
(partially) present from the initial version on.
In Devres::drop(), the devres action is removed successfully and
subsequently the destructor of the wrapped device resource runs.
However, there is no guarantee that the destructor of the wrapped device
resource completes before the driver core is done unbinding the
corresponding device.
If in Devres::drop(), the devres action can't be removed, it means that
the devres callback has been executed already, or is still running
concurrently. In case of the latter, either Devres::drop() wins revoking
the Revocable or the devres callback wins revoking the Revocable. If
Devres::drop() wins, we (again) have no guarantee that the destructor of
the wrapped device resource completes before the driver core is done
unbinding the corresponding device.
CPU0 CPU1
------------------------------------------------------------------------
Devres::drop() { Devres::devres_callback() {
self.data.revoke() { this.data.revoke() {
is_available.swap() == true
is_available.swap == false
}
}
// [...]
// device fully unbound
drop_in_place() {
// release device resource
}
}
}
Depending on the specific device resource, this can potentially lead to
user-after-free bugs.
In order to fix this, implement the following logic.
In the devres callback, we're always good when we get to revoke the
device resource ourselves, i.e. Revocable::revoke() returns true.
If Revocable::revoke() returns false, it means that Devres::drop(),
concurrently, already drops the device resource and we have to wait for
Devres::drop() to signal that it finished dropping the device resource.
Note that if we hit the case where we need to wait for the completion of
Devres::drop() in the devres callback, it means that we're actually
racing with a concurrent Devres::drop() call, which already started
revoking the device resource for us. This is rather unlikely and means
that the concurrent Devres::drop() already started doing our work and we
just need to wait for it to complete it for us. Hence, there should not
be any additional overhead from that.
(Actually, for now it's even better if Devres::drop() does the work for
us, since it can bypass the synchronize_rcu() call implied by
Revocable::revoke(), but this goes away anyways once I get to implement
the split devres callback approach, which allows us to first flip the
atomics of all registered Devres objects of a certain device, execute a
single synchronize_rcu() and then drop all revocable objects.)
In Devres::drop() we try to revoke the device resource. If that is *not*
successful, it means that the devres callback already did and we're good.
Otherwise, we try to remove the devres action, which, if successful,
means that we're good, since the device resource has just been revoked
by us *before* we removed the devres action successfully.
If the devres action could not be removed, it means that the devres
callback must be running concurrently, hence we signal that the device
resource has been revoked by us, using the completion.
This makes it safe to drop a Devres object from any task and at any point
of time, which is one of the design goals.
Fixes: 76c01ded72 ("rust: add devres abstraction")
Reported-by: Alice Ryhl <aliceryhl@google.com>
Closes: https://lore.kernel.org/lkml/aD64YNuqbPPZHAa5@google.com/
Reviewed-by: Benno Lossin <lossin@kernel.org>
Link: https://lore.kernel.org/r/20250612121817.1621-4-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Return a boolean from Revocable::revoke() and Revocable::revoke_nosync()
to indicate whether the data has been revoked already.
Return true if the data hasn't been revoked yet (i.e. this call revoked
the data), false otherwise.
This is required by Devres in order to synchronize the completion of the
revoke process.
Reviewed-by: Benno Lossin <lossin@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20250612121817.1621-3-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
In preparation for needing to shift NVMe passthrough to always use
task_work for polled IO completions, ensure that those are suitably
run at exit time. See commit:
9ce6c9875f ("nvme: always punt polled uring_cmd end_io work to task_work")
for details on why that is necessary.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently NVMe uring_cmd completions will complete locally, if they are
polled. This is done because those completions are always invoked from
task context. And while that is true, there's no guarantee that it's
invoked under the right ring context, or even task. If someone does
NVMe passthrough via multiple threads and with a limited number of
poll queues, then ringA may find completions from ringB. For that case,
completing the request may not be sound.
Always just punt the passthrough completions via task_work, which will
redirect the completion, if needed.
Cc: stable@vger.kernel.org
Fixes: 585079b6e4 ("nvme: wire up async polling for io passthrough commands")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI APEI error injection driver failure that started to
occur after switching it over to using a faux device, address an EC
driver issue related to invalid ECDT tables, clean up the usage of
mwait_idle_with_hints() in the ACPI PAD driver, add a new IRQ override
quirk, and fix a NULL pointer dereference related to nosmp:
- Update the faux device handling code in the driver core and address
an ACPI APEI error injection driver failure that started to occur
after switching it over to using a faux device on top of that (Dan
Williams)
- Update data types of variables passed as arguments to
mwait_idle_with_hints() in the ACPI PAD (processor aggregator
device) driver to match the function definition after recent
changes (Uros Bizjak)
- Fix a NULL pointer dereference in the ACPI CPPC library that occurs
when nosmp is passed to the kernel in the command line (Yunhui Cui)
- Ignore ECDT tables with an invalid ID string to prevent using an
incorrect GPE for signaling events on some systems (Armin Wolf)
- Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan)"
* tag 'acpi-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Use IRQ override on MACHENIKE 16P
ACPI: EC: Ignore ECDT tables with an invalid ID string
ACPI: CPPC: Fix NULL pointer dereference when nosmp is used
ACPI: PAD: Update arguments of mwait_idle_with_hints()
ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
driver core: faux: Quiet probe failures
driver core: faux: Suppress bind attributes
Pull power management fixes from Rafael Wysocki:
"These fix the cpupower utility installation, fix up the recently added
Rust abstractions for cpufreq and OPP, restore the x86 update
eliminating mwait_play_dead_cpuid_hint() that has been reverted during
the 6.16 merge window along with preventing the failure caused by it
from happening, and clean up mwait_idle_with_hints() usage in
intel_idle:
- Implement CpuId Rust abstraction and use it to fix doctest failure
related to the recently introduced cpumask abstraction (Viresh
Kumar)
- Do minor cleanups in the `# Safety` sections for cpufreq
abstractions added recently (Viresh Kumar)
- Unbreak cpupower systemd service units installation on some systems
by adding a unitdir variable for specifying the location to install
them (Francesco Poli)
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the 6.16 merge window due to a problem with
handling "dead" SMT siblings, but this time prevent leaving them in
C1 after initialization by taking them online and back offline when
a proper cpuidle driver for the platform has been registered
(Rafael Wysocki)
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition after
recent changes (Uros Bizjak)"
* tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
rust: cpu: Add CpuId::current() to retrieve current CPU ID
rust: Use CpuId in place of raw CPU numbers
rust: cpu: Introduce CpuId abstraction
intel_idle: Update arguments of mwait_idle_with_hints()
cpufreq: Convert `/// SAFETY` lines to `# Safety` sections
cpupower: split unitdir from libdir in Makefile
Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
ACPI: processor: Rescan "dead" SMT siblings during initialization
intel_idle: Rescan "dead" SMT siblings during initialization
x86/smp: PM/hibernate: Split arch_resume_nosmt()
intel_idle: Use subsys_initcall_sync() for initialization
Merge assorted ACPI updates for 6.16-rc2:
- Update data types of variables passed as arguments to
mwait_idle_with_hints() in the ACPI PAD (processor aggregator device)
driver to match the function definition after recent changes (Uros
Bizjak).
- Fix a NULL pointer dereference in the ACPI CPPC library that occurs
when nosmp is passed to the kernel in the command line (Yunhui Cui).
- Ignore ECDT tables with an invalid ID string to prevent using an
incorrect GPE for signaling events on some systems (Armin Wolf).
- Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan).
* acpi-pad:
ACPI: PAD: Update arguments of mwait_idle_with_hints()
* acpi-cppc:
ACPI: CPPC: Fix NULL pointer dereference when nosmp is used
* acpi-ec:
ACPI: EC: Ignore ECDT tables with an invalid ID string
* acpi-resource:
ACPI: resource: Use IRQ override on MACHENIKE 16P
Merge cpuidle updates for 6.16-rc2:
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition
after recent changes (Uros Bizjak).
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the merge window due to a problem with handling
"dead" SMT siblings, but this time prevent leaving them in C1 after
initialization by taking them online and back offline when a proper
cpuidle driver for the platform has been registered (Rafael Wysocki).
* pm-cpuidle:
intel_idle: Update arguments of mwait_idle_with_hints()
Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
ACPI: processor: Rescan "dead" SMT siblings during initialization
intel_idle: Rescan "dead" SMT siblings during initialization
x86/smp: PM/hibernate: Split arch_resume_nosmt()
intel_idle: Use subsys_initcall_sync() for initialization
Merge a cpupower utility fix for 6.16-rc2 that unbreaks systemd service
units installation on some sysems (Francesco Poli).
* pm-tools:
cpupower: split unitdir from libdir in Makefile
Pull spi fixes from Mark Brown:
"A collection of driver specific fixes, most minor apart from the OMAP
ones which disable some recent performance optimisations in some
non-standard cases where we could start driving the bus incorrectly.
The change to the stm32-ospi driver to use the newer reset APIs is a
fix for interactions with other IP sharing the same reset line in some
SoCs"
* tag 'spi-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engine
spi: stm32-ospi: clean up on error in probe()
spi: stm32-ospi: Make usage of reset_control_acquire/release() API
spi: offload: check offload ops existence before disabling the trigger
spi: spi-pci1xxxx: Fix error code in probe
spi: loongson: Fix build warnings about export.h
spi: omap2-mcspi: Disable multi-mode when the previous message kept CS asserted
spi: omap2-mcspi: Disable multi mode when CS should be kept asserted after message