Pull RISC-V updates from Paul Walmsley:
"There is one significant change outside arch/riscv in this pull
request: the addition of a set of KUnit tests for strlen(), strnlen(),
and strrchr().
Otherwise, the most notable changes are to add some RISC-V-specific
string function implementations, to remove XIP kernel support, to add
hardware error exception handling, and to optimize our runtime
unaligned access speed testing.
A few comments on the motivation for removing XIP support. It's been
broken in the RISC-V kernel for months. The code is not easy to
maintain. Furthermore, for XIP support to truly be useful for RISC-V,
we think that compile-time feature switches would need to be added for
many of the RISC-V ISA features and microarchitectural properties that
are currently implemented with runtime patching. No one has stepped
forward to take responsibility for that work, so many of us think it's
best to remove it until clear use cases and champions emerge.
Summary:
- Add Kunit correctness testing and microbenchmarks for strlen(),
strnlen(), and strrchr()
- Add RISC-V-specific strnlen(), strchr(), strrchr() implementations
- Add hardware error exception handling
- Clean up and optimize our unaligned access probe code
- Enable HAVE_IOREMAP_PROT to be able to use generic_access_phys()
- Remove XIP kernel support
- Warn when addresses outside the vmemmap range are passed to
vmemmap_populate()
- Update the ACPI FADT revision check to warn if it's not at least
ACPI v6.6, which is when key RISC-V-specific tables were added to
the specification
- Increase COMMAND_LINE_SIZE to 2048 to match ARM64, x86, PowerPC,
etc.
- Make kaslr_offset() a static inline function, since there's no need
for it to show up in the symbol table
- Add KASLR offset and SATP to the VMCOREINFO ELF notes to improve
kdump support
- Add Makefile cleanup rule for vdso_cfi copied source files, and add
a .gitignore for the build artifacts in that directory
- Remove some redundant ifdefs that check Kconfig macros
- Add missing SPDX license tag to the CFI selftest
- Simplify UTS_MACHINE assignment in the RISC-V Makefile
- Clarify some unclear comments and remove some superfluous comments
- Fix various English typos across the RISC-V codebase"
* tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
riscv: Remove support for XIP kernel
riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
riscv: Split out compare_unaligned_access()
riscv: Reuse measure_cycles() in check_vector_unaligned_access()
riscv: Split out measure_cycles() for reuse
riscv: Clean up & optimize unaligned scalar access probe
riscv: lib: add strrchr() implementation
riscv: lib: add strchr() implementation
riscv: lib: add strnlen() implementation
lib/string_kunit: extend benchmarks to strnlen() and chr searches
lib/string_kunit: add performance benchmark for strlen()
lib/string_kunit: add correctness test for strrchr()
lib/string_kunit: add correctness test for strnlen()
lib/string_kunit: add correctness test for strlen()
riscv: vdso_cfi: Add .gitignore for build artifacts
riscv: vdso_cfi: Add clean rule for copied sources
riscv: enable HAVE_IOREMAP_PROT
riscv: mm: WARN_ON() for bad addresses in vmemmap_populate()
riscv: acpi: update FADT revision check to 6.6
riscv: add hardware error trap handler support
...
Pull LoongArch updates from Huacai Chen:
- Adjust build infrastructure for 32BIT/64BIT
- Add HIGHMEM (PKMAP and FIX_KMAP) support
- Show and handle CPU vulnerabilites correctly
- Batch the icache maintenance for jump_label
- Add more atomic instructions support for BPF JIT
- Add more features (e.g. fsession) support for BPF trampoline
- Some bug fixes and other small changes
* tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (21 commits)
selftests/bpf: Enable CAN_USE_LOAD_ACQ_STORE_REL for LoongArch
LoongArch: BPF: Add fsession support for trampolines
LoongArch: BPF: Introduce emit_store_stack_imm64() helper
LoongArch: BPF: Support up to 12 function arguments for trampoline
LoongArch: BPF: Support small struct arguments for trampoline
LoongArch: BPF: Open code and remove invoke_bpf_mod_ret()
LoongArch: BPF: Support load-acquire and store-release instructions
LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions
LoongArch: BPF: Add the default case in emit_atomic() and rename it
LoongArch: Define instruction formats for AM{SWAP/ADD}.{B/H} and DBAR
LoongArch: Batch the icache maintenance for jump_label
LoongArch: Add flush_icache_all()/local_flush_icache_all()
LoongArch: Add spectre boundry for syscall dispatch table
LoongArch: Show CPU vulnerabilites correctly
LoongArch: Make arch_irq_work_has_interrupt() true only if IPI HW exist
LoongArch: Use get_random_canary() for stack canary init
LoongArch: Improve the logging of disabling KASLR
LoongArch: Align FPU register state to 32 bytes
LoongArch: Handle CONFIG_32BIT in syscall_get_arch()
LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support
...
Pull networking deletions from Jakub Kicinski:
"Delete some obsolete networking code
Old code like amateur radio and NFC have long been a burden to core
networking developers. syzbot loves to find bugs in BKL-era code, and
noobs try to fix them.
If we want to have a fighting chance of surviving the LLM-pocalypse
this code needs to find a dedicated owner or get deleted. We've talked
about these deletions multiple times in the past and every time
someone wanted the code to stay. It is never very clear to me how many
of those people actually use the code vs are just nostalgic to see it
go. Amateur radio did have occasional users (or so I think) but most
users switched to user space implementations since its all super slow
stuff. Nobody stepped up to maintain the kernel code.
We were lucky enough to find someone who wants to help with NFC so
we're giving that a chance. Let's try to put the rest of this code
behind us"
* tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next:
drivers: net: 8390: wd80x3: Remove this driver
drivers: net: 8390: ultra: Remove this driver
drivers: net: 8390: AX88190: Remove this driver
drivers: net: fujitsu: fmvj18x: Remove this driver
drivers: net: smsc: smc91c92: Remove this driver
drivers: net: smsc: smc9194: Remove this driver
drivers: net: amd: nmclan: Remove this driver
drivers: net: amd: lance: Remove this driver
drivers: net: 3com: 3c589: Remove this driver
drivers: net: 3com: 3c574: Remove this driver
drivers: net: 3com: 3c515: Remove this driver
drivers: net: 3com: 3c509: Remove this driver
net: packetengines: remove obsolete yellowfin driver and vendor dir
net: packetengines: remove obsolete hamachi driver
net: remove unused ATM protocols and legacy ATM device drivers
net: remove ax25 and amateur radio (hamradio) subsystem
net: remove ISDN subsystem and Bluetooth CMTP
caif: remove CAIF NETWORK LAYER
Pull slab fix from Vlastimil Babka:
- A stable fix for k(v)ealloc() where reallocating on a different node
or shrinking the object can result in either losing the original data
or a buffer overflow (Marco Elver)
* tag 'slab-for-7.1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slub: fix data loss and overflow in krealloc()
Pull Clang build fix from Nathan Chancellor:
- Wrap declaration and assignment of key_pass in certs/extract-cert.c
with '#ifdef' that matches its only usage to clear up an instance of
a new clang subwarning, -Wunused-but-set-global.
* tag 'clang-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux:
extract-cert: Wrap key_pass with '#ifdef USE_PKCS11_ENGINE'
Pull apparmor updates from John Johansen:
"Cleanups
- Use sysfs_emit in param_get_{audit,mode}
- Remove redundant if check in sk_peer_get_label
- Replace memcpy + NUL termination with kmemdup_nul in do_setattr
Bug Fixes:
- Fix aa_dfa_unpack's error handling in aa_setup_dfa_engine
- Fix string overrun due to missing termination
- Fix wrong dentry in RENAME_EXCHANGE uid check
- fix unpack_tags to properly return error in failure cases
- fix dfa size check
- return error on namespace mismatch in verify_header
- use target task's context in apparmor_getprocattr()"
* tag 'apparmor-pr-2026-04-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor/lsm: Fix aa_dfa_unpack's error handling in aa_setup_dfa_engine
apparmor: Fix string overrun due to missing termination
apparmor: Fix wrong dentry in RENAME_EXCHANGE uid check
apparmor: fix unpack_tags to properly return error in failure cases
apparmor: fix dfa size check
apparmor: Use sysfs_emit in param_get_{audit,mode}
apparmor: Remove redundant if check in sk_peer_get_label
apparmor: Replace memcpy + NUL termination with kmemdup_nul in do_setattr
apparmor: return error on namespace mismatch in verify_header
apparmor: use target task's context in apparmor_getprocattr()
Pull vfs fixes from Christian Brauner:
- eventpoll: fix ep_remove() UAF and follow-up cleanup
- fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference
error
- writeback: Fix use after free in inode_switch_wbs_work_fn()
- fuse: reject oversized dirents in page cache
- fs: aio: reject partial mremap to avoid Null-pointer-dereference
error
- nstree: fix func. parameter kernel-doc warnings
- fs: Handle multiply claimed blocks more gracefully with mmb
* tag 'vfs-7.1-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
eventpoll: drop vestigial epi->dying flag
eventpoll: drop dead bool return from ep_remove_epi()
eventpoll: refresh eventpoll_release() fast-path comment
eventpoll: move f_lock acquisition into ep_remove_file()
eventpoll: fix ep_remove struct eventpoll / struct file UAF
eventpoll: move epi_fget() up
eventpoll: rename ep_remove_safe() back to ep_remove()
eventpoll: drop vestigial __ prefix from ep_remove_{file,epi}()
eventpoll: kill __ep_remove()
eventpoll: split __ep_remove()
eventpoll: use hlist_is_singular_node() in __ep_remove()
fs: Handle multiply claimed blocks more gracefully with mmb
nstree: fix func. parameter kernel-doc warnings
fs: aio: reject partial mremap to avoid Null-pointer-dereference error
fuse: reject oversized dirents in page cache
writeback: Fix use after free in inode_switch_wbs_work_fn()
fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference error
Pull more smb server updates from Steve French:
- move fs/smb/common/smbdirect to fs/smb/smbdirect
- change signature calc to use AES-CMAC library, simpler and faster
- invalid signature fix
- multichannel fix
- open create options fix
- fix durable handle leak
- cap maximum lock count to avoid potential denial of service
- four connection fixes: connection free and session destroy IDA fixes,
refcount fix, connection leak fix, max_connections off by one fix
- IPC validation fix
- fix out of bounds write in getting xattrs
- fix use after free in durable handle reconnect
- three ACL fixes: fix potential ACL overflow, harden num_aces check,
and fix minimum ACE size check
* tag 'v7.1-rc-part2-ksmbd-fixes' of git://git.samba.org/ksmbd:
smb: smbdirect: move fs/smb/common/smbdirect/ to fs/smb/smbdirect/
smb: server: stop sending fake security descriptors
ksmbd: scope conn->binding slowpath to bound sessions only
ksmbd: fix CreateOptions sanitization clobbering the whole field
ksmbd: fix durable fd leak on ClientGUID mismatch in durable v2 open
ksmbd: fix O(N^2) DoS in smb2_lock via unbounded LockCount
ksmbd: destroy async_ida in ksmbd_conn_free()
ksmbd: destroy tree_conn_ida in ksmbd_session_destroy()
ksmbd: Use AES-CMAC library for SMB3 signature calculation
ksmbd: reset rcount per connection in ksmbd_conn_wait_idle_sess_id()
ksmbd: fix out-of-bounds write in smb2_get_ea() EA alignment
ksmbd: use check_add_overflow() to prevent u16 DACL size overflow
ksmbd: fix use-after-free in smb2_open during durable reconnect
ksmbd: validate num_aces and harden ACE walk in smb_inherit_dacl()
smb: server: fix max_connections off-by-one in tcp accept path
ksmbd: require minimum ACE size in smb_check_perm_dacl()
ksmbd: validate response sizes in ipc_validate_msg()
smb: server: fix active_num_conn leak on transport allocation failure
Pull smb client fixes from Steve French:
- Four bug fixes: OOB read in ioctl query info, 3 ACL fixes
- SMB1 Unix extensions mount fix
- Four crypto improvements: move to AES-CMAC library, simpler and faster
- Remove drop_dir_cache to avoid potential crash, and move to /procfs
- Seven SMB3.1.1 compression fixes
* tag 'v7.1-rc1-part3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Drop 'allocate_crypto' arg from smb*_calc_signature()
smb: client: Make generate_key() return void
smb: client: Remove obsolete cmac(aes) allocation
smb: client: Use AES-CMAC library for SMB3 signature calculation
smb: common: add SMB3_COMPRESS_MAX_ALGS
smb: client: compress: add code docs to lz77.c
smb: client: compress: LZ77 optimizations
smb: client: compress: increase LZ77_MATCH_MAX_DIST
smb: client: compress: fix counting in LZ77 match finding
smb: client: compress: fix buffer overrun in lz77_compress()
smb: client: scope end_of_dacl to CIFS_DEBUG2 use in parse_dacl
smb: client: fix (remove) drop_dir_cache module parameter
smb: client: require a full NFS mode SID before reading mode bits
smb: client: validate the whole DACL before rewriting it in cifsacl
smb: client: fix OOB read in smb2_ioctl_query_info QUERY_INFO path
cifs: update internal module version number
smb: client: compress: fix bad encoding on last LZ77 flag
smb: client: fix dir separator in SMB1 UNIX mounts
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Steady stream of fixes. Last two weeks feel comparable to the two
weeks before the merge window. Lots of AI-aided bug discovery. A newer
big source is Sashiko/Gemini (Roman Gushchin's system), which points
out issues in existing code during patch review (maybe 25% of fixes
here likely originating from Sashiko). Nice thing is these are often
fixed by the respective maintainers, not drive-bys.
Current release - new code bugs:
- kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP
Previous releases - regressions:
- add async ndo_set_rx_mode and switch drivers which we promised to
be called under the per-netdev mutex to it
- dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops
- hv_sock: report EOF instead of -EIO for FIN
- vsock/virtio: fix MSG_PEEK calculation on bytes to copy
Previous releases - always broken:
- ipv6: fix possible UAF in icmpv6_rcv()
- icmp: validate reply type before using icmp_pointers
- af_unix: drop all SCM attributes for SOCKMAP
- netfilter: fix a number of bugs in the osf (OS fingerprinting)
- eth: intel: fix timestamp interrupt configuration for E825C
Misc:
- bunch of data-race annotations"
* tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits)
rxrpc: Fix error handling in rxgk_extract_token()
rxrpc: Fix re-decryption of RESPONSE packets
rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets
rxrpc: Fix missing validation of ticket length in non-XDR key preparsing
rxgk: Fix potential integer overflow in length check
rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
rxrpc: Fix potential UAF after skb_unshare() failure
rxrpc: Fix rxkad crypto unalignment handling
rxrpc: Fix memory leaks in rxkad_verify_response()
net: rds: fix MR cleanup on copy error
m68k: mvme147: Make me the maintainer
net: txgbe: fix firmware version check
selftests/bpf: check epoll readiness during reuseport migration
tcp: call sk_data_ready() after listener migration
vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll()
ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
tipc: fix double-free in tipc_buf_append()
llc: Return -EINPROGRESS from llc_ui_connect()
ipv4: icmp: validate reply type before using icmp_pointers
selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges
...
Pull more i2c updates from Wolfram Sang:
- cx92755: convert I2C bindings to DT schema
- mediatek: add optional bus power management during transfers
- pxa: handle early bus busy condition
- MAINTAINERS: update I2C RUST entry
* tag 'i2c-for-7.1-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: add Rust I2C tree and update Igor Korotin's email
i2c: mediatek: add bus regulator control for power saving
dt-bindings: i2c: cnxt,cx92755-i2c: Convert to DT schema
i2c: pxa: handle 'Early Bus Busy' condition on Armada 3700
Pull Xtensa updates from Max Filippov:
- use register_sys_off_handler(SYS_OFF_MODE_RESTART) instead of
the deprecated register_restart_handler()
- drop custom ucontext.h and reuse asm-generic ucontext.h
* tag 'xtensa-20260422' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: uapi: Reuse asm-generic ucontext.h
xtensa: xtfpga: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)
xtensa: xt2000: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)
xtensa: ISS: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)
Andrew Lunn says:
====================
Remove a number of ISA and PCMCIA Ethernet drivers
These old drivers have not been much of a Maintenance burden until
recently. Now there are more newbies using AI and fuzzers finding
issues, resulting in more work for Maintainers. Fixing these old
drivers make little sense, if it is not clear they have users.
These mostly ISA and PCMCIA Ethernet devices, mostly from the last
century, a couple from 2001 or 2002. It seems unlikely they are still
used. However, remove them one patch at a time so they can be brought
back if somebody still has the hardware, runs modern kernels and wants
to take up the roll of driver Maintainer.
====================
Link: https://patch.msgid.link/20260422-v7-0-0-net-next-driver-removal-v1-v2-0-08a5b59784d5@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian Brauner <brauner@kernel.org> says:
ep_remove() (via __ep_remove_file()) cleared file->f_ep under
file->f_lock but then kept using @file in the same critical section:
is_file_epoll(), hlist_del_rcu() through the head, spin_unlock. A
concurrent __fput() on the watched eventpoll caught the transient
NULL in eventpoll_release()'s lockless fast path, skipped
eventpoll_release_file() entirely, and ran to ep_eventpoll_release()
-> ep_clear_and_put() -> ep_free(). That kfree()s the struct
eventpoll whose embedded ->refs hlist_head is exactly where
epi->fllink.pprev points and the subsequent hlist_del_rcu()'s
"*pprev = next" scribbles into freed kmalloc-192 memory, which is
the slab-use-after-free KASAN caught.
struct file is SLAB_TYPESAFE_BY_RCU on top of that so the same window
also lets the slot recycle while ep_remove() is still nominally
inside file->f_lock. The upshot is an attacker-influencable
kmem_cache_free() against the wrong slab cache.
The comment on eventpoll_release()'s fast path - "False positives
simply cannot happen because the file in on the way to be removed
and nobody ( but eventpoll ) has still a reference to this file" -
was itself the wrong invariant this race exploits.
The fix pins @file via epi_fget() at the top of ep_remove() and
gates the f_ep clear / hlist_del_rcu() on the pin succeeding. With
the pin held __fput() cannot start which transitively keeps the
watched struct eventpoll alive across the critical section and also
prevents the struct file slot from recycling. Both UAFs are closed.
If the pin fails __fput() is already in flight on @file. Because we
bail before clearing f_ep that path takes eventpoll_release()'s slow
path into eventpoll_release_file() which blocks on ep->mtx until
ep_clear_and_put() drops it and then cleans up the orphaned epi. The
bailed epi's share of ep->refcount stays intact so
ep_clear_and_put()'s trailing ep_refcount_dec_and_test() cannot free
the eventpoll out from under eventpoll_release_file().
With epi_fget() now gating every ep_remove() call the epi->dying
flag becomes vestigial. epi->dying == true always coincides with
file_ref_get() == false because __fput() is reachable only once the
refcount hits zero and the refcount is monotone there. The last
patch drops the flag and leaves a single coordination mechanism
instead of two.
* patches from https://patch.msgid.link/20260423-work-epoll-uaf-v1-0-2470f9eec0f5@kernel.org:
eventpoll: drop vestigial epi->dying flag
eventpoll: drop dead bool return from __ep_remove_epi()
eventpoll: refresh eventpoll_release() fast-path comment
eventpoll: move f_lock acquisition into __ep_remove_file()
eventpoll: fix ep_remove struct eventpoll / struct file UAF
eventpoll: move epi_fget() up
eventpoll: rename ep_remove_safe() back to ep_remove()
eventpoll: kill __ep_remove()
eventpoll: split __ep_remove()
eventpoll: use hlist_is_singular_node() in __ep_remove()
Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-0-2470f9eec0f5@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
With ep_remove() now pinning @file via epi_fget() across the
f_ep clear and hlist_del_rcu(), the dying flag no longer
orchestrates anything: it was set in eventpoll_release_file()
(which only runs from __fput(), i.e. after @file's refcount has
reached zero) and read in __ep_remove() / ep_remove() as a cheap
bail before attempting the same synchronization epi_fget() now
provides unconditionally.
The implication is simple: epi->dying == true always coincides
with file_ref_get(&file->f_ref) == false, because __fput() is
reachable only once the refcount hits zero and the refcount is
monotone in that state. The READ_ONCE(epi->dying) in ep_remove()
therefore selects exactly the same callers that epi_fget() would
reject, just one atomic cheaper. That's not worth a struct
field, a second coordination mechanism, and the comments on
both.
Refresh the eventpoll_release_file() comment to describe what
actually makes the path race-free now (the pin in ep_remove()).
No functional change: the correctness argument is unchanged,
only the mechanism is now a single one instead of two.
Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-10-2470f9eec0f5@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
ep_remove_epi() always returns true -- the "can be disposed"
answer was meaningful back when the dying-check lived inside the
pre-split __ep_remove(), but after that check moved to ep_remove()
the return value is just noise. Both callers gate on it
unconditionally:
if (ep_remove_epi(ep, epi))
WARN_ON_ONCE(ep_refcount_dec_and_test(ep));
dispose = ep_remove_epi(ep, epi);
...
if (dispose && ep_refcount_dec_and_test(ep))
ep_free(ep);
Make ep_remove_epi() return void, drop the dispose local in
eventpoll_release_file(), and the useless conditionals at both
callers. No functional change.
Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-9-2470f9eec0f5@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
The old comment justified the lockless READ_ONCE(file->f_ep) check
with "False positives simply cannot happen because the file is on
the way to be removed and nobody ( but eventpoll ) has still a
reference to this file." That reasoning was the root of the UAF
fixed in "eventpoll: fix ep_remove struct eventpoll / struct file
UAF": __ep_remove() could clear f_ep while another close raced
past the fast path and freed the watched eventpoll / recycled the
struct file slot.
With ep_remove() now pinning @file via epi_fget() across the f_ep
clear and hlist_del_rcu(), the invariant is re-established for the
right reason: anyone who might clear f_ep holds @file alive for
the duration, so a NULL observation really does mean no
concurrent eventpoll path has work left on this file. Refresh the
comment accordingly so the next reader doesn't inherit the broken
model.
Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-8-2470f9eec0f5@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Let the helper own its critical section end-to-end: take &file->f_lock
at the top, read file->f_ep inside the lock, release on exit. Callers
(ep_remove() and eventpoll_release_file()) no longer need to wrap the
call, and the function-comment lock-handoff contract is gone.
Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-7-2470f9eec0f5@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
ep_remove() (via ep_remove_file()) cleared file->f_ep under
file->f_lock but then kept using @file inside the critical section
(is_file_epoll(), hlist_del_rcu() through the head, spin_unlock).
A concurrent __fput() taking the eventpoll_release() fastpath in
that window observed the transient NULL, skipped
eventpoll_release_file() and ran to f_op->release / file_free().
For the epoll-watches-epoll case, f_op->release is
ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(), which
kfree()s the watched struct eventpoll. Its embedded ->refs
hlist_head is exactly where epi->fllink.pprev points, so the
subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed
kmalloc-192 memory.
In addition, struct file is SLAB_TYPESAFE_BY_RCU, so the slot
backing @file could be recycled by alloc_empty_file() --
reinitializing f_lock and f_ep -- while ep_remove() is still
nominally inside that lock. The upshot is an attacker-controllable
kmem_cache_free() against the wrong slab cache.
Pin @file via epi_fget() at the top of ep_remove() and gate the
critical section on the pin succeeding. With the pin held @file
cannot reach refcount zero, which holds __fput() off and
transitively keeps the watched struct eventpoll alive across the
hlist_del_rcu() and the f_lock use, closing both UAFs.
If the pin fails @file has already reached refcount zero and its
__fput() is in flight. Because we bailed before clearing f_ep,
that path takes the eventpoll_release() slow path into
eventpoll_release_file() and blocks on ep->mtx until the waiter
side's ep_clear_and_put() drops it. The bailed epi's share of
ep->refcount stays intact, so the trailing ep_refcount_dec_and_test()
in ep_clear_and_put() cannot free the eventpoll out from under
eventpoll_release_file(); the orphaned epi is then cleaned up
there.
A successful pin also proves we are not racing
eventpoll_release_file() on this epi, so drop the now-redundant
re-check of epi->dying under f_lock. The cheap lockless
READ_ONCE(epi->dying) fast-path bailout stays.
Fixes: 58c9b016e1 ("epoll: use refcount to reduce ep_mutex contention")
Reported-by: Jaeyoung Chung <jjy600901@snu.ac.kr>
Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-6-2470f9eec0f5@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
With __ep_remove() gone, the double-underscore on __ep_remove_file()
and __ep_remove_epi() no longer contrasts with a __-less parent and
just reads as noise. Rename both to ep_remove_file() and
ep_remove_epi(). No functional change.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
When a metadata block is referenced by multiple inodes and tracked by
metadata bh infrastructure (which is forbidden and generally indicates
filesystem corruption), it can happen that mmb_mark_buffer_dirty() is
called for two different mmb structures in parallel. This can lead to a
corruption of mmb linked list. Handle that situation gracefully (at
least from mmb POV) by serializing on setting bh->b_mmb.
Reported-by: Ruikai Peng <ruikai@pwno.io>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260423090311.10955-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use the correct parameter name ("__ns") for function parameter kernel-doc
to avoid 3 warnings:
Warning: include/linux/nstree.h:68 function parameter '__ns' not described in 'ns_tree_add_raw'
Warning: include/linux/nstree.h:77 function parameter '__ns' not described in 'ns_tree_add'
Warning: include/linux/nstree.h:88 function parameter '__ns' not described in 'ns_tree_remove'
Fixes: 885fc8ac0a ("nstree: make iterator generic")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260416215429.948898-1-rdunlap@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
[BUG]
Recently, our internal syzkaller testing uncovered a null pointer
dereference issue:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 51.111664] filemap_read_folio+0x25/0xe0
[ 51.112410] filemap_fault+0xad7/0x1250
[ 51.113112] __do_fault+0x4b/0x460
[ 51.113699] do_pte_missing+0x5bc/0x1db0
[ 51.114250] ? __pte_offset_map+0x23/0x170
[ 51.114822] __handle_mm_fault+0x9f8/0x1680
...
Crash analysis showed the file involved was an AIO ring file. The
phenomenon triggered is the same as the issue described in [1].
[CAUSE]
Consider the following scenario: userspace sets up an AIO context via
io_setup(), which creates a VMA covering the entire ring buffer. Then
userspace calls mremap() with the AIO ring address as the source, a smaller
old_len (less than the full ring size), MREMAP_MAYMOVE set, and without
MREMAP_DONTUNMAP. The kernel will relocate the requested portion to a new
destination address.
During this move, __split_vma() splits the original AIO ring VMA. The
requested portion is unmapped from the source and re-established at the
destination, while the remainder stays at the original source address as
an orphan VMA. The aio_ring_mremap() callback fires on the new destination
VMA, updating ctx->mmap_base to the destination address. But the callback
is unaware that only a partial region was moved and that an orphan VMA
still exists at the source:
source(AIO):
+-------------------+---------------------+
| moved to dest | orphan VMA (AIO) |
+-------------------+---------------------+
A A+partial_len A+ctx->mmap_size
dest:
+-------------------+
| moved VMA (AIO) |
+-------------------+
B B+partial_len
Later, io_destroy() calls vm_munmap(ctx->mmap_base, ctx->mmap_size), which
unmaps the destination. This not only fails to unmap the orphan VMA at the
source, but also overshoots the destination VMA and may unmap unrelated
mappings adjacent to it! After put_aio_ring_file() calls truncate_setsize()
to remove all pages from the pagecache, any subsequent access to the orphan
VMA triggers filemap_fault(), which calls a_ops->read_folio(). Since aio
does not implement read_folio, this results in a NULL pointer dereference.
[FIX]
Note that expanding mremap (new_len > old_len) is already rejected because
AIO ring VMAs are created with VM_DONTEXPAND. The only problematic case is
a partial move where "old_len == new_len" but both are smaller than the
full ring size.
Fix this by checking in aio_ring_mremap() that the new VMA covers the
entire ring. This ensures the AIO ring is always moved as a whole,
preventing orphan VMAs and the subsequent crash.
[1]: https://lore.kernel.org/all/20260413010814.548568-1-wozizhi@huawei.com/
Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
Link: https://patch.msgid.link/20260418060634.3713620-1-wozizhi@huaweicloud.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
fuse_add_dirent_to_cache() computes a serialized dirent size from the
server-controlled namelen field and copies the dirent into a single
page-cache page. The existing logic only checks whether the dirent fits
in the remaining space of the current page and advances to a fresh page
if not. It never checks whether the dirent itself exceeds PAGE_SIZE.
As a result, a malicious FUSE server can return a dirent with
namelen=4095, producing a serialized record size of 4120 bytes. On 4 KiB
page systems this causes memcpy() to overflow the cache page by 24 bytes
into the following kernel page.
Reject dirents that cannot fit in a single page before copying them into
the readdir cache.
Fixes: 69e3455115 ("fuse: allow caching readdir")
Cc: stable@vger.kernel.org # v6.16+
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
Reported-by: Qi Tang <tpluszz77@gmail.com>
Reported-by: Zijun Hu <nightu@northwestern.edu>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260420090139.662772-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
inode_switch_wbs_work_fn() has a loop like:
wb_get(new_wb);
while (1) {
list = llist_del_all(&new_wb->switch_wbs_ctxs);
/* Nothing to do? */
if (!list)
break;
... process the items ...
}
Now adding of items to the list looks like:
wb_queue_isw()
if (llist_add(&isw->list, &wb->switch_wbs_ctxs))
queue_work(isw_wq, &wb->switch_work);
Because inode_switch_wbs_work_fn() loops when processing isw items, it
can happen that wb->switch_work is pending while wb->switch_wbs_ctxs is
empty. This is a problem because in that case wb can get freed (no isw
items -> no wb reference) while the work is still pending causing
use-after-free issues.
We cannot just fix this by cancelling work when freeing wb because that
could still trigger problematic 0 -> 1 transitions on wb refcount due to
wb_get() in inode_switch_wbs_work_fn(). It could be all handled with
more careful code but that seems unnecessarily complex so let's avoid
that until it is proven that the looping actually brings practical
benefit. Just remove the loop from inode_switch_wbs_work_fn() instead.
That way when wb_queue_isw() queues work, we are guaranteed we have
added the first item to wb->switch_wbs_ctxs and nobody is going to
remove it (and drop the wb reference it holds) until the queued work
runs.
Fixes: e1b849cfa6 ("writeback: Avoid contention on wb->list_lock when switching inodes")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260413093618.17244-2-jack@suse.cz
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
[BUG]
Recently, our internal syzkaller testing uncovered a null pointer
dereference issue:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 51.111664] filemap_read_folio+0x25/0xe0
[ 51.112410] filemap_fault+0xad7/0x1250
[ 51.113112] __do_fault+0x4b/0x460
[ 51.113699] do_pte_missing+0x5bc/0x1db0
[ 51.114250] ? __pte_offset_map+0x23/0x170
[ 51.114822] __handle_mm_fault+0x9f8/0x1680
[ 51.115408] handle_mm_fault+0x24c/0x570
[ 51.115958] do_user_addr_fault+0x226/0xa50
...
Crash analysis showed the file involved was an AIO ring file.
[CAUSE]
PARENT process CHILD process
t=0 io_setup(1, &ctx)
[access ctx addr]
fork()
io_destroy
vm_munmap // not affect child vma
percpu_ref_put
...
put_aio_ring_file
t=1 [access ctx addr] // pagefault
...
__do_fault
filemap_fault
max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE)
t=2 truncate_setsize
truncate_pagecache
t=3 filemap_get_folio // no folio, create folio
__filemap_get_folio(..., FGP_CREAT, ...) // page_not_uptodate
filemap_read_folio(file, mapping->a_ops->read_folio, folio) // oops!
At t=0, the parent process calls io_setup and then fork. The child process
gets its own VMA but without any PTEs. The parent then calls io_destroy.
Before i_size is truncated to 0, at t=1 the child process accesses this AIO
ctx address and triggers a pagefault. After the max_idx check passes, at
t=2 the parent calls truncate_setsize and truncate_pagecache. At t=3 the
child fails to obtain the folio, falls into the "page_not_uptodate" path,
and hits this problem because AIO does not implement "read_folio".
[Fix]
Fix this by marking the AIO ring buffer VMA with VM_DONTCOPY so
that fork()'s dup_mmap() skips it entirely. This is the correct
semantic because:
1) The child's ioctx_table is already reset to NULL by mm_init_aio() during
fork(), so the child has no AIO context and no way to perform any AIO
operations on this mapping.
2) The AIO ring VMA is only meaningful in conjunction with its associated
kioctx, which is never inherited across fork(). So child process with no
AIO context has no legitimate reason to access the ring buffer. Delivering
SIGSEGV on such an erroneous access is preferable to a kernel crash.
Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
Link: https://patch.msgid.link/20260413010814.548568-1-wozizhi@huawei.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Mingyu Wang says:
====================
net: packetengines: remove obsolete PCI drivers
As discussed with Andrew Lunn, this patch series removes the obsolete
hamachi and yellowfin PCI drivers. Both drivers support hardware that
is over two decades old and no longer in active use.
Removing them eliminates dead code and reduces the overall maintenance
burden on the netdev subsystem.
====================
Jakub: trim defconfigs appropriately
Link: https://patch.msgid.link/20260422044820.485660-1-25181214217@stu.xidian.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to the hamachi driver, the yellowfin driver supports hardware
that is over two decades old and no longer in active use.
Since yellowfin was the last remaining driver in the packetengines
vendor directory, we can now safely remove the entire directory and
drop its associated references from the parent Kconfig and Makefile.
This eliminates dead code and reduces the overall maintenance burden
on the netdev subsystem.
Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260422044820.485660-3-25181214217@stu.xidian.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>