linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-17 16:36:45 -04:00

Author	SHA1	Message	Date
Linus Torvalds	f4d0ec0aa2	Merge tag 'erofs-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Do not share the page cache if the real @aops differs - Fix the incomplete condition for interlaced plain extents - Get rid of more unnecessary #ifdefs * tag 'erofs-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix interlaced plain identification for encoded extents erofs: remove more unnecessary #ifdefs erofs: allow sharing page cache with the same aops only	2026-02-25 16:39:25 -08:00
Linus Torvalds	0e335a7745	Merge tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix an uninitialized variable in file_getattr(). The flags_valid field wasn't initialized before calling vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is not enabled. sysctl_hung_task_timeout_secs is 0 in that case causing spurious "waiting for writeback completion for more than 1 seconds" warnings - Fix a null-ptr-deref in do_statmount() when the mount is internal - Add missing kernel-doc description for the @private parameter in iomap_readahead() - Fix mount namespace creation to hold namespace_sem across the mount copy in create_new_namespace(). The previous drop-and-reacquire pattern was fragile and failed to clean up mount propagation links if the real rootfs was a shared or dependent mount - Fix /proc mount iteration where m->index wasn't updated when m->show() overflows, causing a restart to repeatedly show the same mount entry in a rapidly expanding mount table - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the inode number is out of range - Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared. copy_mnt_ns() received the live fs_struct so if a subsequent namespace creation failed the rollback would leave pwd and root pointing to detached mounts. Always allocate a new fs_struct when CLONE_NEWNS is requested - fserror bug fixes: - Remove the unused fsnotify_sb_error() helper now that all callers have been converted to fserror_report_metadata - Fix a lockdep splat in fserror_report() where igrab() takes inode::i_lock which can be held in IRQ context. Replace igrab() with a direct i_count bump since filesystems should not report inodes that are about to be freed or not yet exposed - Handle error pointer in procfs for try_lookup_noperm() - Fix an integer overflow in ep_loop_check_proc() where recursive calls returning INT_MAX would overflow when +1 is added, breaking the recursion depth check - Fix a misleading break in pidfs * tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pidfs: avoid misleading break eventpoll: Fix integer overflow in ep_loop_check_proc() proc: Fix pointer error dereference fserror: fix lockdep complaint when igrabbing inode fsnotify: drop unused helper unshare: fix unshare_fs() handling minix: Correct errno in minix_new_inode namespace: fix proc mount iteration mount: hold namespace_sem across copy in create_new_namespace() iomap: Describe @private in iomap_readahead() statmount: Fix the null-ptr-deref in do_statmount() writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK fs: init flags_valid before calling vfs_fileattr_get	2026-02-25 10:34:23 -08:00
Gao Xiang	4a2d046e4b	erofs: fix interlaced plain identification for encoded extents Only plain data whose start position and on-disk physical length are both aligned to the block size should be classified as interlaced plain extents. Otherwise, it must be treated as shifted plain extents. This issue was found by syzbot using a crafted compressed image containing plain extents with unaligned physical lengths, which can cause OOB read in z_erofs_transform_plain(). Reported-and-tested-by: syzbot+d988dc155e740d76a331@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/699d5714.050a0220.cdd3c.03e7.GAE@google.com Fixes: `1d191b4ca5` ("erofs: implement encoded extent metadata") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2026-02-25 17:40:58 +08:00
Christian Brauner	4a1ddb0f1c	pidfs: avoid misleading break The break would only break out of the scoped_guard() loop, not the switch statement. It still works correct as is ofc but let's avoid the confusion. Reported-by: David Lechner <dlechner@baylibre.com> Link:: https://lore.kernel.org/cd2153f1-098b-463c-bbc1-5c6ca9ef1f12@baylibre.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-24 12:09:00 +01:00
Ferry Meng	bf4fde7db4	erofs: remove more unnecessary #ifdefs Many #ifdefs can be replaced with IS_ENABLED() to improve code readability. No functional changes. Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2026-02-24 18:36:52 +08:00
Jann Horn	fdcfce9307	eventpoll: Fix integer overflow in ep_loop_check_proc() If a recursive call to ep_loop_check_proc() hits the `result = INT_MAX`, an integer overflow will occur in the calling ep_loop_check_proc() at `result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1)`, breaking the recursion depth check. Fix it by using a different placeholder value that can't lead to an overflow. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: `f2e467a482` ("eventpoll: Fix semi-unbounded recursion") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260223-epoll-int-overflow-v1-1-452f35132224@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-24 10:21:30 +01:00
Hongbo Li	03c0d030f5	erofs: allow sharing page cache with the same aops only Inode with identical data but different @aops cannot be mixed because the page cache is managed by different subsystems (e.g., @aops for compressed on-disk inodes cannot handle plain on-disk inodes). In this patch, we never allow inodes to share the page cache among plain, compressed, and fileio cases. When a shared inode is created, we initialize @aops that is the same as the initial real inode, and subsequent inodes cannot share the page cache if the inferred @aops differ from the corresponding shared inode. This is reasonable as a first step because, in typical use cases, if an inode is compressible, it will fall into compressed inodes across different filesystem images unless users use plain filesystems. However, in that cases, users will use plain filesystems all the time. Fixes: `5ef3208e3b` ("erofs: introduce the page cache share feature") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2026-02-23 18:04:18 +08:00
Linus Torvalds	fbf3380361	Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux Pull fsverity fixes from Eric Biggers: - Fix a build error on parisc - Remove the non-large-folio-aware function fsverity_verify_page() * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: fix build error by adding fsverity_readahead() stub fsverity: remove fsverity_verify_page() f2fs: make f2fs_verify_cluster() partially large-folio-aware f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()	2026-02-22 13:12:04 -08:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	8934827db5	Merge tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kmalloc_obj conversion from Kees Cook: "This does the tree-wide conversion to kmalloc_obj() and friends using coccinelle, with a subsequent small manual cleanup of whitespace alignment that coccinelle does not handle. This uncovered a clang bug in __builtin_counted_by_ref(), so the conversion is preceded by disabling that for current versions of clang. The imminent clang 22.1 release has the fix. I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv, s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc" * tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kmalloc_obj: Clean up after treewide replacements treewide: Replace kmalloc with kmalloc_obj for non-scalar types compiler_types: Disable __builtin_counted_by_ref for Clang	2026-02-21 11:02:58 -08:00
Linus Torvalds	f9d66e64a2	Merge tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - A fix for a missing URING_CMD128 opcode check, fixing an issue with the SQE mixed mode support introduced in 6.19. Merged late due to having multiple dependencies - Add sqe->cmd size checking for big SQEs, similar to what we have for normal sized SQEs - Fix a race condition in zcrx, that leads to a double free * tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: Add size check for sqe->cmd io_uring: add IORING_OP_URING_CMD128 to opcode checks io_uring/zcrx: fix user_ref race between scrub and refill paths	2026-02-21 10:05:49 -08:00
Linus Torvalds	8eb604d4ee	Merge tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: "Two small fixes: - fix potential deadlock - minor cleanup" * tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: call ksmbd_vfs_kern_path_end_removing() on some error paths smb: server: Remove duplicate include of misc.h	2026-02-21 09:11:32 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	b3f1da2a4d	Merge tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - multiple error handling fixes of unexpected conditions - reset block group size class once it becomes empty so that its class can be changed - error message level adjustments - fixes of returned error values - use correct block reserve for delayed refs * tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found btrfs: fix lost error return in btrfs_find_orphan_roots() btrfs: fix lost return value on error in finish_verity() btrfs: change unaligned root messages to error level in btrfs_validate_super() btrfs: use the correct type to initialize block reserve for delayed refs btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure() btrfs: reset block group size class when it becomes empty btrfs: replace BUG() with error handling in __btrfs_balance() btrfs: handle unexpected exact match in btrfs_set_inode_index_count()	2026-02-20 14:57:09 -08:00
Linus Torvalds	233a0c0f44	Merge tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs updates from Tyler Hicks: "This consists of some really minor typo fixes that fell through the cracks and some more recent code cleanups: - Comment typo fixes - Removal of an unused function declaration - Use strscpy() instead of the deprecated strcpy() - Use string copying helpers instead of memcpy() and manually terminating strings" * tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filename ecryptfs: Drop redundant NUL terminations after calling ecryptfs_to_hex ecryptfs: Replace memcpy + NUL termination in ecryptfs_new_file_context ecryptfs: Replace strcpy with strscpy in ecryptfs_validate_options ecryptfs: Replace strcpy with strscpy in ecryptfs_cipher_code_to_string ecryptfs: Replace strcpy with strscpy in ecryptfs_set_default_crypt_stat_vals ecryptfs: simplify list initialization in ecryptfs_parse_packet_set() ecryptfs: Remove unused declartion ecryptfs_fill_zeros() ecryptfs: Fix packet format comment in parse_tag_67_packet() ecryptfs: comment typo fix ecryptfs: keystore: Fix typo 'the the' in comment	2026-02-20 14:46:31 -08:00
Ethan Tidmore	f6a495484a	proc: Fix pointer error dereference The function try_lookup_noperm() can return an error pointer. Add check for error pointer. Detected by Smatch: fs/proc/base.c:2148 proc_fill_cache() error: 'child' dereferencing possible ERR_PTR() Fixes: `1df98b8bbc` ("proc_fill_cache(): clean up, get rid of pointless find_inode_number() use") Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com> Link: https://patch.msgid.link/20260219221001.1117135-1-ethantidmore06@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-20 11:21:16 +01:00
Govindarajulu Varadarajan	ea129e55c9	io_uring: Add size check for sqe->cmd For SQE128, sqe->cmd provides 80 bytes for uring_cmd. Add macro to check if size of user struct does not exceed 80 bytes at compile time. User doesn't have to track this manually during development. Replace io_uring_sqe_cmd() inline func with macro and add io_uring_sqe128_cmd() which checks struct size for 16 bytes cmd and 80 bytes cmd respectively. Signed-off-by: Govindarajulu Varadarajan <govind.varadar@gmail.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-19 07:26:26 -07:00
Darrick J. Wong	294f54f849	fserror: fix lockdep complaint when igrabbing inode Christoph Hellwig reported a lockdep splat in generic/108: ================================ WARNING: inconsistent lock state 6.19.0+ #4827 Tainted: G N -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff88811ed1b140 (&sb->s_type->i_lock_key#33){?.+.}-{3:3}, at: igrab+0x1a/0xb0 {HARDIRQ-ON-W} state was registered at: lock_acquire+0xca/0x2c0 _raw_spin_lock+0x2e/0x40 unlock_new_inode+0x2c/0xc0 xfs_iget+0xcf4/0x1080 xfs_trans_metafile_iget+0x3d/0x100 xfs_metafile_iget+0x2b/0x50 xfs_mount_setup_metadir+0x20/0x60 xfs_mountfs+0x457/0xa60 xfs_fs_fill_super+0x6b3/0xa90 get_tree_bdev_flags+0x13c/0x1e0 vfs_get_tree+0x27/0xe0 vfs_cmd_create+0x54/0xe0 __do_sys_fsconfig+0x309/0x620 do_syscall_64+0x8b/0xf80 entry_SYSCALL_64_after_hwframe+0x76/0x7e irq event stamp: 139080 hardirqs last enabled at (139079): [<ffffffff813a923c>] do_idle+0x1ec/0x270 hardirqs last disabled at (139080): [<ffffffff828a8d09>] common_interrupt+0x19/0xe0 softirqs last enabled at (139032): [<ffffffff8134a853>] __irq_exit_rcu+0xc3/0x120 softirqs last disabled at (139025): [<ffffffff8134a853>] __irq_exit_rcu+0xc3/0x120 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sb->s_type->i_lock_key#33); <Interrupt> lock(&sb->s_type->i_lock_key#33); * DEADLOCK * 1 lock held by swapper/1/0: #0: ffff8881052c81a0 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x4b/0x110 stack backtrace: CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G N 6.19.0+ #4827 PREEMPT(full) Tainted: [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x5b/0x80 print_usage_bug.part.0+0x22c/0x2c0 mark_lock+0xa6f/0xe90 __lock_acquire+0x10b6/0x25e0 lock_acquire+0xca/0x2c0 _raw_spin_lock+0x2e/0x40 igrab+0x1a/0xb0 fserror_report+0x135/0x260 iomap_finish_ioend_buffered+0x170/0x210 clone_endio+0x8f/0x1c0 blk_update_request+0x1e4/0x4d0 blk_mq_end_request+0x1b/0x100 virtblk_done+0x6f/0x110 vring_interrupt+0x59/0x80 __handle_irq_event_percpu+0x8a/0x2e0 handle_irq_event+0x33/0x70 handle_edge_irq+0xdd/0x1e0 __common_interrupt+0x6f/0x180 common_interrupt+0xb7/0xe0 </IRQ> It looks like the concern here is that inode::i_lock is sometimes taken in IRQ context, and sometimes it is held when going to IRQ context, though it's a little difficult to tell since I think this is a kernel from after the actual 6.19 release but before 7.0-rc1. Either way, we don't need to take i_lock, because filesystems should not report files to fserror if they're about to be freed or have not yet been exposed to other threads, because the resulting fsnotify report will be meaningless. Therefore, bump inode::i_count directly and clarify the preconditions on the inode being passed in. Link: https://lore.kernel.org/linux-fsdevel/aY7BndIgQg3ci_6s@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/177148129564.716249.3069780698231701540.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-19 09:12:08 +01:00
Linus Torvalds	eeccf287a2	Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a couple of issues in the demotion code - pages were failed demotion and were finding themselves demoted into disallowed nodes (Bing Jiao) - "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare mapledtree race and performs a number of cleanups (Liam Howlett) - "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use them" implements a lot of cleanups following on from the conversion of the VMA flags into a bitmap (Lorenzo Stoakes) - "support batch checking of references and unmapping for large folios" implements batching to greatly improve the performance of reclaiming clean file-backed large folios (Baolin Wang) - "selftests/mm: add memory failure selftests" does as claimed (Miaohe Lin) * tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits) mm/page_alloc: clear page->private in free_pages_prepare() selftests/mm: add memory failure dirty pagecache test selftests/mm: add memory failure clean pagecache test selftests/mm: add memory failure anonymous page test mm: rmap: support batched unmapping for file large folios arm64: mm: implement the architecture-specific clear_flush_young_ptes() arm64: mm: support batch clearing of the young flag for large folios arm64: mm: factor out the address and ptep alignment into a new helper mm: rmap: support batched checks of the references for large folios tools/testing/vma: add VMA userland tests for VMA flag functions tools/testing/vma: separate out vma_internal.h into logical headers tools/testing/vma: separate VMA userland tests into separate files mm: make vm_area_desc utilise vma_flags_t only mm: update all remaining mmap_prepare users to use vma_flags_t mm: update shmem_[kernel]_file_*() functions to use vma_flags_t mm: update secretmem to use VMA flags on mmap_prepare mm: update hugetlbfs to use VMA flags on mmap_prepare mm: add basic VMA flag operation helper functions tools: bitmap: add missing bitmap_[subset(), andnot()] mm: add mk_vma_flags() bitmap flag macro helper ...	2026-02-18 20:50:32 -08:00
Linus Torvalds	23b0f90ba8	Merge tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove macros from proc handler converters Replace the proc converter macros with "regular" functions. Though it is more verbose than the macro version, it helps when debugging and better aligns with coding-style.rst. - General cleanup Remove superfluous ctl_table forward declarations. Const qualify the memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays. Add missing kernel doc to proc_dointvec_conv. - Testing This series was run through sysctl selftests/kunit test suite in x86_64. And went into linux-next after rc4, giving it a good 3 weeks of testing * tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions sysctl: Replace unidirectional INT converter macros with functions sysctl: Add kernel doc to proc_douintvec_conv sysctl: Replace UINT converter macros with functions sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros sysctl: clarify proc_douintvec_minmax doc sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n sysctl: Remove unused ctl_table forward declarations loadpin: Implement custom proc_handler for enforce alloc_tag: move memory_allocation_profiling_sysctls into .rodata sysctl: Add missing kernel-doc for proc_dointvec_conv	2026-02-18 10:45:36 -08:00
Filipe Manana	ecb7c2484c	btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found If btrfs_search_slot_for_read() returns 1, it means we did not find any key greater than or equals to the key we asked for, meaning we have reached the end of the tree and therefore the path is not valid. If this happens we need to break out of the loop and stop, instead of continuing and accessing an invalid path. Fixes: `5223cc60b4` ("btrfs: drop the path before adding qgroup items when enabling qgroups") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:54 +01:00
Filipe Manana	7b54e08f2e	btrfs: fix lost error return in btrfs_find_orphan_roots() If the call to btrfs_get_fs_root() returns an error different from -ENOENT we break out of the loop and then return 0, losing the error. Fix this by returning the error instead of breaking from the loop. Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-btrfs/20260208185321.1128472-1-clm@meta.com/ Fixes: `8670a25ecb` ("btrfs: use single return variable in btrfs_find_orphan_roots()") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:54 +01:00
Filipe Manana	29e525665a	btrfs: fix lost return value on error in finish_verity() If btrfs_update_inode() or del_orphan() fail, we jump to the 'end_trans' label and then return 0 instead of the error returned by one of those calls. Fix this and return the error. Fixes: `61fb7f04ee` ("btrfs: remove out label in finish_verity()") Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-btrfs/20260208161129.3888234-1-clm@meta.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:54 +01:00
Filipe Manana	f46a283bbc	btrfs: change unaligned root messages to error level in btrfs_validate_super() If the root nodes for the chunk root, tree root or log root are not sector size aligned, we are logging a warning message but these are in fact errors that makes the super block validation fail. So change the level of the messages from warning to error. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:53 +01:00
Filipe Manana	2155d0c0a7	btrfs: use the correct type to initialize block reserve for delayed refs When initializing the delayed refs block reserve for a transaction handle we are passing a type of BTRFS_BLOCK_RSV_DELOPS, which is meant for delayed items and not for delayed refs. The correct type for delayed refs is BTRFS_BLOCK_RSV_DELREFS. On release of any excess space reserved in a local delayed refs reserve, we also should transfer that excess space to the global block reserve (it it's full, we return to the space info for general availability). By initializing a transaction's local delayed refs block reserve with a type of BTRFS_BLOCK_RSV_DELOPS, we were also causing any excess space released from the delayed block reserve (fs_info->delayed_block_rsv, used for delayed inodes and items) to be transferred to the global block reserve instead of the global delayed refs block reserve. This was an unintentional change in commit `28270e25c6` ("btrfs: always reserve space for delayed refs when starting transaction"), but it's not particularly serious as things tend to cancel out each other most of the time and it's relatively rare to be anywhere near exhaustion of the global reserve. Fix this by initializing a transaction's local delayed refs reserve with a type of BTRFS_BLOCK_RSV_DELREFS and making btrfs_block_rsv_release() attempt to transfer unused space from such a reserve into the global block reserve, just as we did before that commit for when the block reserve is a delayed refs rsv. Reported-by: Alex Lyakas <alex.lyakas@zadara.com> Link: https://lore.kernel.org/linux-btrfs/CAOcd+r0FHG5LWzTSu=LknwSoqxfw+C00gFAW7fuX71+Z5AfEew@mail.gmail.com/ Fixes: `28270e25c6` ("btrfs: always reserve space for delayed refs when starting transaction") Reviewed-by: Alex Lyakas <alex.lyakas@zadara.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:53 +01:00
Qu Wenruo	8ceaad6cd6	btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure() [BUG] There is a bug report that when btrfs hits ENOSPC error in a critical path, btrfs flips RO (this part is expected, although the ENOSPC bug still needs to be addressed). The problem is after the RO flip, if there is a read repair pending, we can hit the ASSERT() inside btrfs_repair_io_failure() like the following: BTRFS info (device vdc): relocating block group 30408704 flags metadata\|raid1 ------------[ cut here ]------------ BTRFS: Transaction aborted (error -28) WARNING: fs/btrfs/extent-tree.c:3235 at __btrfs_free_extent.isra.0+0x453/0xfd0, CPU#1: btrfs/383844 Modules linked in: kvm_intel kvm irqbypass [...] ---[ end trace 0000000000000000 ]--- BTRFS info (device vdc state EA): 2 enospc errors during balance BTRFS info (device vdc state EA): balance: ended with status: -30 BTRFS error (device vdc state EA): parent transid verify failed on logical 30556160 mirror 2 wanted 8 found 6 BTRFS error (device vdc state EA): bdev /dev/nvme0n1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 [...] assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938 ------------[ cut here ]------------ assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938 kernel BUG at fs/btrfs/bio.c:938! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 868 Comm: kworker/u8:13 Tainted: G W N 6.19.0-rc6+ #4788 PREEMPT(full) Tainted: [W]=WARN, [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 Workqueue: btrfs-endio simple_end_io_work RIP: 0010:btrfs_repair_io_failure.cold+0xb2/0x120 RSP: 0000:ffffc90001d2bcf0 EFLAGS: 00010246 RAX: 0000000000000051 RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8305cf42 RDI: 00000000ffffffff RBP: 0000000000000002 R08: 00000000fffeffff R09: ffffffff837fa988 R10: ffffffff8327a9e0 R11: 6f69747265737361 R12: ffff88813018d310 R13: ffff888168b8a000 R14: ffffc90001d2bd90 R15: ffff88810a169000 FS: 0000000000000000(0000) GS:ffff8885e752c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 ------------[ cut here ]------------ [CAUSE] The cause of -ENOSPC error during the test case btrfs/124 is still unknown, although it's known that we still have cases where metadata can be over-committed but can not be fulfilled correctly, thus if we hit such ENOSPC error inside a critical path, we have no choice but abort the current transaction. This will mark the fs read-only. The problem is inside the btrfs_repair_io_failure() path that we require the fs not to be mount read-only. This is normally fine, but if we are doing a read-repair meanwhile the fs flips RO due to a critical error, we can enter btrfs_repair_io_failure() with super block set to read-only, thus triggering the above crash. [FIX] Just replace the ASSERT() with a proper return if the fs is already read-only. Reported-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-btrfs/20260126045555.GB31641@lst.de/ Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:53 +01:00
Jiasheng Jiang	5870ec7c8f	btrfs: reset block group size class when it becomes empty Block group size classes are managed consistently everywhere. Currently, btrfs_use_block_group_size_class() sets a block group's size class to specialize it for a specific allocation size. However, this size class remains "stale" even if the block group becomes completely empty (both used and reserved bytes reach zero). This happens in two scenarios: 1. When space reservations are freed (e.g., due to errors or transaction aborts) via btrfs_free_reserved_bytes(). 2. When the last extent in a block group is freed via btrfs_update_block_group(). While size classes are advisory, a stale size class can cause find_free_extent to unnecessarily skip candidate block groups during initial search loops. This undermines the purpose of size classes to reduce fragmentation by keeping block groups restricted to a specific size class when they could be reused for any size. Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a block group's used and reserved counts both reach zero. This ensures that empty block groups are fully available for any allocation size in the next cycle. Fixes: `52bb7a2166` ("btrfs: introduce size class to block group allocator") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:53 +01:00
Adarsh Das	be6324a809	btrfs: replace BUG() with error handling in __btrfs_balance() We search with offset (u64)-1 which should never match exactly. Previously this was handled with BUG(). Now logs an error and return -EUCLEAN. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Adarsh Das <adarshdas950@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:53 +01:00
Adarsh Das	1c88823a19	btrfs: handle unexpected exact match in btrfs_set_inode_index_count() We search with offset (u64)-1 which should never match exactly. Previously the code silently returned success without setting the index count. Now logs an error and return -EUCLEAN instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Adarsh Das <adarshdas950@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com>, Signed-off-by: David Sterba <dsterba@suse.com>	2026-02-18 15:25:53 +01:00
Jori Koolstra	ef0b64741a	minix: Correct errno in minix_new_inode The cases (!j \|\| j > sbi->s_ninodes) can never occur unless the filesystem is broken, so this should not return ENOSPC, but EFSCORRUPTED. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251201122338.90568-1-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-18 14:04:42 +01:00
Christian Brauner	4a403d7aa9	namespace: fix proc mount iteration The m->index isn't updated when m->show() overflows and retains its value before the current mount causing a restart to start at the same value. If that happens in short order to due a quickly expanding mount table this would cause the same mount to be shown again and again. Ensure that pos always equals the mount id of the mount that was returned by start/next. On restart after overflow mnt_find_id_at(pos) finds the exact mount. This should avoid duplicates, avoid skips and should handle concurrent modification just fine. Cc: <stable@vger.kernel.org> Fixed: `2eea9ce431` ("mounts: keep list of mounts in an rbtree") Link: https://patch.msgid.link/20260129-geleckt-treuhand-4bb940acacd9@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-18 14:04:42 +01:00
Christian Brauner	a41dbf5e00	mount: hold namespace_sem across copy in create_new_namespace() Fix an oversight when creating a new mount namespace. If someone had the bright idea to make the real rootfs a shared or dependent mount and it is later copied the copy will become a peer of the old real rootfs mount or a dependent mount of it. The namespace semaphore is dropped and we use mount lock exact to lock the new real root mount. If that fails or the subsequent do_loopback() fails we rely on the copy of the real root mount to be cleaned up by path_put(). The problem is that this doesn't deal with mount propagation and will leave the mounts linked in the propagation lists. When creating a new mount namespace create_new_namespace() first acquires namespace_sem to clone the nullfs root, drops it, then reacquires it via LOCK_MOUNT_EXACT which takes inode_lock first to respect the inode_lock -> namespace_sem lock ordering. This drop-and-reacquire pattern is fragile and was the source of the propagation cleanup bug fixed in the preceding commit. Extend lock_mount_exact() with a copy_mount mode that clones the mount under the locks atomically. When copy_mount is true, path_overmounted() is skipped since we're copying the mount, not mounting on top of it - the nullfs root always has rootfs mounted on top so the check would always fail. If clone_mnt() fails after get_mountpoint() has pinned the mountpoint, __unlock_mount() is used to properly unpin the mountpoint and release both locks. This allows create_new_namespace() to use LOCK_MOUNT_EXACT_COPY which takes inode_lock and namespace_sem once and holds them throughout the clone and subsequent mount operations, eliminating the drop-and-reacquire pattern entirely. Reported-by: syzbot+a89f9434fb5a001ccd58@syzkaller.appspotmail.com Fixes: `9b8a0ba682` ("mount: add OPEN_TREE_NAMESPACE") # mainline only Link: https://lore.kernel.org/699047f6.050a0220.2757fb.0024.GAE@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-18 14:04:42 +01:00
Eric Biggers	5959495449	fsverity: remove fsverity_verify_page() Now that fsverity_verify_page() has no callers, remove it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260218010630.7407-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2026-02-17 23:11:36 -08:00
Eric Biggers	78cdb14893	f2fs: make f2fs_verify_cluster() partially large-folio-aware f2fs_verify_cluster() is the only remaining caller of the non-large-folio-aware function fsverity_verify_page(). To unblock the removal of that function, change f2fs_verify_cluster() to verify the entire folio of each page and mark it up-to-date. Note that this doesn't actually make f2fs_verify_cluster() large-folio-aware, as it is still passed an array of pages. Currently, it's never called with large folios. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260218010630.7407-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2026-02-17 23:11:36 -08:00
Eric Biggers	079220c56f	f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster() Remove the unnecessary clearing of PG_uptodate. It's guaranteed to already be clear. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20260218010630.7407-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2026-02-17 23:11:36 -08:00
Linus Torvalds	75a452d31b	Merge tag 'ntfs3_for_7.0' of https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: "New code: - improve readahead for bitmap initialization and large directory scans - fsync files by syncing parent inodes - drop of preallocated clusters for sparse and compressed files - zero-fill folios beyond i_valid in ntfs_read_folio() - implement llseek SEEK_DATA/SEEK_HOLE by scanning data runs - implement iomap-based file operations - allow explicit boolean acl/prealloc mount options - fall-through between switch labels - delayed-allocation (delalloc) support Fixes: - check return value of indx_find to avoid infinite loop - initialize new folios before use - infinite loop in attr_load_runs_range on inconsistent metadata - infinite loop triggered by zero-sized ATTR_LIST - ntfs_mount_options leak in ntfs_fill_super() - deadlock in ni_read_folio_cmpr - circular locking dependency in run_unpack_ex - prevent infinite loops caused by the next valid being the same - restore NULL folio initialization in ntfs_writepages() - slab-out-of-bounds read in DeleteIndexEntryRoot Updates: - allow readdir() to finish after directory mutations without rewinddir() - handle attr_set_size() errors when truncating files - make ntfs_writeback_ops static - refactor duplicate kmemdup pattern in do_action() - avoid calling run_get_entry() when run == NULL in ntfs_read_run_nb_ra() Replaced: - use wait_on_buffer() directly - rename ni_readpage_cmpr into ni_read_folio_cmpr" * tag 'ntfs3_for_7.0' of https://github.com/Paragon-Software-Group/linux-ntfs3: (26 commits) fs/ntfs3: add delayed-allocation (delalloc) support fs/ntfs3: avoid calling run_get_entry() when run == NULL in ntfs_read_run_nb_ra() fs/ntfs3: add fall-through between switch labels fs/ntfs3: allow explicit boolean acl/prealloc mount options fs/ntfs3: Fix slab-out-of-bounds read in DeleteIndexEntryRoot ntfs3: Restore NULL folio initialization in ntfs_writepages() ntfs3: Refactor duplicate kmemdup pattern in do_action() fs/ntfs3: prevent infinite loops caused by the next valid being the same fs/ntfs3: make ntfs_writeback_ops static ntfs3: fix circular locking dependency in run_unpack_ex fs/ntfs3: implement iomap-based file operations fs/ntfs3: fix deadlock in ni_read_folio_cmpr fs/ntfs3: implement llseek SEEK_DATA/SEEK_HOLE by scanning data runs fs/ntfs3: zero-fill folios beyond i_valid in ntfs_read_folio() fs/ntfs3: handle attr_set_size() errors when truncating files fs/ntfs3: drop preallocated clusters for sparse and compressed files fs/ntfs3: fsync files by syncing parent inodes fs/ntfs3: fix ntfs_mount_options leak in ntfs_fill_super() fs/ntfs3: allow readdir() to finish after directory mutations without rewinddir() fs/ntfs3: improve readahead for bitmap initialization and large directory scans ...	2026-02-17 15:37:06 -08:00
Linus Torvalds	87a367f1bf	Merge tag 'ceph-for-7.0-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "This adds support for the upcoming aes256k key type in CephX that is based on Kerberos 5 and brings a bunch of assorted CephFS fixes from Ethan and Sam. One of Sam's patches in particular undoes a change in the fscrypt area that had an inadvertent side effect of making CephFS behave as if mounted with wsize=4096 and leading to the corresponding degradation in performance, especially for sequential writes" * tag 'ceph-for-7.0-rc1' of https://github.com/ceph/ceph-client: ceph: assert loop invariants in ceph_writepages_start() ceph: remove error return from ceph_process_folio_batch() ceph: fix write storm on fscrypted files ceph: do not propagate page array emplacement errors as batch errors ceph: supply snapshot context in ceph_uninline_data() ceph: supply snapshot context in ceph_zero_partial_object() libceph: adapt ceph_x_challenge_blob hashing and msgr1 message signing libceph: add support for CEPH_CRYPTO_AES256KRB5 libceph: introduce ceph_crypto_key_prepare() libceph: generalize ceph_x_encrypt_offset() and ceph_x_encrypt_buflen() libceph: define and enforce CEPH_MAX_KEY_LEN	2026-02-17 15:18:51 -08:00
Linus Torvalds	0ba83f0968	Merge tag 'ovl-update-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs update from Amir Goldstein: "Relax the semantics of uuid=off to cater to a use case of overlayfs lower layers on btrfs clones, whose UUID are ephemeral and an upper layer on a different filesystem" * tag 'ovl-update-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: relax requirement for uuid=off,index=on	2026-02-17 15:08:24 -08:00
Linus Torvalds	1d22968feb	Merge tag 'v7.0-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix three potential double free vulnerabilities - Fix data corruption due to racy lease checks - Enforce SMB1 signing verification checks - Fix invalid mount option parsing - Remove unneeded tracepoint - Various minor error code corrections - Minor cleanup * tag 'v7.0-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: terminate session upon failed client required signing cifs: some missing initializations on replay cifs: remove unnecessary tracing after put tcon cifs: update internal module version number smb: client: fix data corruption due to racy lease checks smb/client: move NT_STATUS_MORE_ENTRIES smb/client: rename to NT_ERROR_INVALID_DATATYPE smb/client: rename to NT_STATUS_SOME_NOT_MAPPED smb/client: map NT_STATUS_PRIVILEGE_NOT_HELD smb/client: map NT_STATUS_MORE_PROCESSING_REQUIRED smb/client: map NT_STATUS_BUFFER_OVERFLOW smb/client: map NT_STATUS_NOTIFY_ENUM_DIR cifs: SMB1 split: Remove duplicate include of cifs_debug.h smb: client: fix regression with mount options parsing	2026-02-17 15:02:49 -08:00
Linus Torvalds	45a43ac5ac	Merge tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull more misc vfs updates from Christian Brauner: "Features: - Optimize close_range() from O(range size) to O(active FDs) by using find_next_bit() on the open_fds bitmap instead of linearly scanning the entire requested range. This is a significant improvement for large-range close operations on sparse file descriptor tables. - Add FS_XFLAG_VERITY file attribute for fs-verity files, retrievable via FS_IOC_FSGETXATTR and file_getattr(). The flag is read-only. Add tracepoints for fs-verity enable and verify operations, replacing the previously removed debug printk's. - Prevent nfsd from exporting special kernel filesystems like pidfs and nsfs. These filesystems have custom ->open() and ->permission() export methods that are designed for open_by_handle_at(2) only and are incompatible with nfsd. Update the exportfs documentation accordingly. Fixes: - Fix KMSAN uninit-value in ovl_fill_real() where strcmp() was used on a non-null-terminated decrypted directory entry name from fscrypt. This triggered on encrypted lower layers when the decrypted name buffer contained uninitialized tail data. The fix also adds VFS-level name_is_dot(), name_is_dotdot(), and name_is_dot_dotdot() helpers, replacing various open-coded "." and ".." checks across the tree. - Fix read-only fsflags not being reset together with xflags in vfs_fileattr_set(). Currently harmless since no read-only xflags overlap with flags, but this would cause inconsistencies for any future shared read-only flag - Return -EREMOTE instead of -ESRCH from PIDFD_GET_INFO when the target process is in a different pid namespace. This lets userspace distinguish "process exited" from "process in another namespace", matching glibc's pidfd_getpid() behavior Cleanups: - Use C-string literals in the Rust seq_file bindings, replacing the kernel::c_str!() macro (available since Rust 1.77) - Fix typo in d_walk_ret enum comment, add porting notes for the readlink_copy() calling convention change" * tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add porting notes about readlink_copy() pidfs: return -EREMOTE when PIDFD_GET_INFO is called on another ns nfsd: do not allow exporting of special kernel filesystems exportfs: clarify the documentation of open()/permission() expotrfs ops fsverity: add tracepoints fs: add FS_XFLAG_VERITY for fs-verity files rust: seq_file: replace `kernel::c_str!` with C-Strings fs: dcache: fix typo in enum d_walk_ret comment ovl: use name_is_dot* helpers in readdir code fs: add helpers name_is_dot{,dot,_dotdot} ovl: Fix uninit-value in ovl_fill_real fs: reset read-only fsflags together with xflags fs/file: optimize close_range() complexity from O(N) to O(Sparse)	2026-02-16 13:00:36 -08:00
Linus Torvalds	543b9b6339	Merge tag 'kernel-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: - pid: introduce task_ppid_vnr() helper - pidfs: convert rb-tree to rhashtable Mateusz reported performance penalties during task creation because pidfs uses pidmap_lock to add elements into the rbtree. Switch to an rhashtable to have separate fine-grained locking and to decouple from pidmap_lock moving all heavy manipulations outside of it Also move inode allocation outside of pidmap_lock. With this there's nothing happening for pidfs under pidmap_lock - pid: reorder fields in pid_namespace to reduce false sharing - Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers" - ipc: Add SPDX license id to mqueue.c * tag 'kernel-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pid: introduce task_ppid_vnr() helper pidfs: implement ino allocation without the pidmap lock Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers" pid: reorder fields in pid_namespace to reduce false sharing pidfs: convert rb-tree to rhashtable ipc: Add SPDX license id to mqueue.c	2026-02-16 12:37:13 -08:00
Konstantin Komarov	10d7c95af0	fs/ntfs3: add delayed-allocation (delalloc) support This patch implements delayed allocation (delalloc) in ntfs3 driver. It introduces an in-memory delayed-runlist (run_da) and the helpers to track, reserve and later convert those delayed reservations into real clusters at writeback time. The change keeps on-disk formats untouched and focuses on pagecache integration, correctness and safe interaction with fallocate, truncate, and dio/iomap paths. Key points: - add run_da (delay-allocated run tree) and bookkeeping for delayed clusters. - mark ranges as delalloc (DELALLOC_LCN) instead of immediately allocating. Actual allocation performed later (writeback / attr_set_size_ex / explicit flush paths). - direct i/o / iomap paths updated to avoid dio collisions with delalloc: dio falls back or forces allocation of delayed blocks before proceeding. - punch/collapse/truncate/fallocate check and cancel delay-alloc reservations. Sparse/compressed files handled specially. - free-space checks updated (ntfs_check_free_space) to account for reserved delalloc clusters and MFT record budgeting. - delayed allocations are committed on last writer (file release) and on explicit allocation flush paths. Tested-by: syzbot@syzkaller.appspotmail.com Reported-by: syzbot+2bd8e813c7f767aa9bb1@syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2026-02-16 17:23:51 +01:00
Aaditya Kansal	dc96f01d54	smb: client: terminate session upon failed client required signing Currently, when smb signature verification fails, the behaviour is to log the failure without any action to terminate the session. Call cifs_reconnect() when client required signature verification fails. Otherwise, log the error without reconnecting. Signed-off-by: Aaditya Kansal <aadityakansal390@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-02-15 18:35:34 -06:00
Shyam Prasad N	14f66f4464	cifs: some missing initializations on replay In several places in the code, we have a label to signify the start of the code where a request can be replayed if necessary. However, some of these places were missing the necessary reinitializations of certain local variables before replay. This change makes sure that these variables get initialized after the label. Cc: stable@vger.kernel.org Reported-by: Yuchan Nam <entropy1110@gmail.com> Tested-by: Yuchan Nam <entropy1110@gmail.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-02-15 18:32:39 -06:00
Shyam Prasad N	a5a50f1415	cifs: remove unnecessary tracing after put tcon This code was recently changed from manually decrementing tcon ref to using cifs_put_tcon. But even before that change this tracing happened after decrementing the ref count, which is wrong. With cifs_put_tcon, tracing already happens inside it. So just removing the extra tracing here. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-02-14 19:31:32 -06:00
Fedor Pchelkin	a09dc10d13	ksmbd: call ksmbd_vfs_kern_path_end_removing() on some error paths There are two places where ksmbd_vfs_kern_path_end_removing() needs to be called in order to balance what the corresponding successful call to ksmbd_vfs_kern_path_start_removing() has done, i.e. drop inode locks and put the taken references. Otherwise there might be potential deadlocks and unbalanced locks which are caught like: BUG: workqueue leaked lock or atomic: kworker/5:21/0x00000000/7596 last function: handle_ksmbd_work 2 locks held by kworker/5:21/7596: #0: ffff8881051ae448 (sb_writers#3){.+.+}-{0:0}, at: ksmbd_vfs_kern_path_locked+0x142/0x660 #1: ffff888130e966c0 (&type->i_mutex_dir_key#3/1){+.+.}-{4:4}, at: ksmbd_vfs_kern_path_locked+0x17d/0x660 CPU: 5 PID: 7596 Comm: kworker/5:21 Not tainted 6.1.162-00456-gc29b353f383b #138 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 Workqueue: ksmbd-io handle_ksmbd_work Call Trace: <TASK> dump_stack_lvl+0x44/0x5b process_one_work.cold+0x57/0x5c worker_thread+0x82/0x600 kthread+0x153/0x190 ret_from_fork+0x22/0x30 </TASK> Found by Linux Verification Center (linuxtesting.org). Fixes: `d5fc1400a3` ("smb/server: avoid deadlock when linking with ReplaceIfExists") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-02-14 19:26:26 -06:00
Chen Ni	1aade89ecc	smb: server: Remove duplicate include of misc.h Remove duplicate inclusion of misc.h in server.c to clean up redundant code. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-02-14 19:26:26 -06:00

1 2 3 4 5 ...

104202 Commits