linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-19 03:32:45 -04:00

Author	SHA1	Message	Date
Linus Torvalds	b54345928f	Merge tag 'gfs2-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 revert from Andreas Gruenbacher: "Revert bad commit "gfs2: Fix use of bio_chain" I was originally assuming that there must be a bug in gfs2 because gfs2 chains bios in the opposite direction of what bio_chain_and_submit() expects. It turns out that the bio chains are set up in "reverse direction" intentionally so that the first bio's bi_end_io callback is invoked rather than the last bio's callback. We want the first bio's callback invoked for the following reason: The initial bio starts page aligned and covers one or more pages. When it terminates at a non-page-aligned offset, subsequent bios are added to handle the remaining portion of the final page. Upon completion of the bio chain, all affected pages need to be be marked as read, and only the first bio references all of these pages" * tag 'gfs2-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Fix use of bio_chain"	2026-01-13 10:04:34 -08:00
Andreas Gruenbacher	469d71512d	Revert "gfs2: Fix use of bio_chain" This reverts commit `8a157e0a0a`. That commit incorrectly assumed that the bio_chain() arguments were swapped in gfs2. However, gfs2 intentionally constructs bio chains so that the first bio's bi_end_io callback is invoked when all bios in the chain have completed, unlike bio chains where the last bio's callback is invoked. Fixes: `8a157e0a0a` ("gfs2: Fix use of bio_chain") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2026-01-12 14:58:32 +01:00
Thomas Gleixner	2e4b28c48f	treewide: Update email address In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-11 06:09:11 -10:00
Gao Xiang	7893cc1225	erofs: fix file-backed mounts no longer working on EROFS partitions Sheng Yong reported [1] that Android APEX images didn't work with commit `072a7c7cdb` ("erofs: don't bother with s_stack_depth increasing for now") because "EROFS-formatted APEX file images can be stored within an EROFS-formatted Android system partition." In response, I sent a quick fat-fingered [PATCH v3] to address the report. Unfortunately, the updated condition was incorrect: if (erofs_is_fileio_mode(sbi)) { - sb->s_stack_depth = - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1; - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { - erofs_err(sb, "maximum fs stacking depth exceeded"); + inode = file_inode(sbi->dif0.file); + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) \|\| + inode->i_sb->s_stack_depth) { The condition `!sb->s_bdev` is always true for all file-backed EROFS mounts, making the check effectively a no-op. The real fix tested and confirmed by Sheng Yong [2] at that time was [PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup works: EROFS (on a block device) + EROFS (file-backed mount) But sadly I screwed it up again by upstreaming the outdated [PATCH v3]. This patch applies the same logic as the delta between the upstream [PATCH v3] and the real fix [PATCH v3 RESEND]. Reported-by: Sheng Yong <shengyong1@xiaomi.com> Closes: https://lore.kernel.org/r/3acec686-4020-4609-aee4-5dae7b9b0093@gmail.com [1] Fixes: `072a7c7cdb` ("erofs: don't bother with s_stack_depth increasing for now") Link: https://lore.kernel.org/r/243f57b8-246f-47e7-9fb1-27a771e8e9e8@gmail.com [2] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-10 06:39:20 -10:00
Linus Torvalds	b6151c4e60	Merge tag 'erofs-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: - Don't increase s_stack_depth which caused regressions in some composefs mount setups (EROFS + ovl^2) Instead just allow one extra unaccounted fs stacking level for straightforward cases. * tag 'erofs-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: don't bother with s_stack_depth increasing for now	2026-01-09 19:34:50 -10:00
Gao Xiang	072a7c7cdb	erofs: don't bother with s_stack_depth increasing for now Previously, commit `d53cd891f0` ("erofs: limit the level of fs stacking for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel stack overflow when stacking an unlimited number of EROFS on top of each other. This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes (and such setups are already used in production for quite a long time). One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH from 2 to 3, but proving that this is safe in general is a high bar. After a long discussion on GitHub issues [1] about possible solutions, one conclusion is that there is no need to support nesting file-backed EROFS mounts on stacked filesystems, because there is always the option to use loopback devices as a fallback. As a quick fix for the composefs regression for this cycle, instead of bumping `s_stack_depth` for file backed EROFS mounts, we disallow nesting file-backed EROFS over EROFS and over filesystems with `s_stack_depth` > 0. This works for all known file-backed mount use cases (composefs, containerd, and Android APEX for some Android vendors), and the fix is self-contained. Essentially, we are allowing one extra unaccounted fs stacking level of EROFS below stacking filesystems, but EROFS can only be used in the read path (i.e. overlayfs lower layers), which typically has much lower stack usage than the write path. We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more stack usage analysis or using alternative approaches, such as splitting the `s_stack_depth` limitation according to different combinations of stacking. Fixes: `d53cd891f0` ("erofs: limit the level of fs stacking for file-backed mounts") Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com> Reported-by: Timothée Ravier <tim@siosm.fr> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1] Reported-by: "Alekséi Naidénov" <an@digitaltide.io> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com Acked-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Alexander Larsson <alexl@redhat.com> Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2026-01-10 13:01:15 +08:00
Linus Torvalds	372800cb95	Merge tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential NULL pointer dereference when replaying tree log after an error - release path before initializing extent tree to avoid potential deadlock when allocating new inode - on filesystems with block size > page size - fix potential read out of bounds during encoded read of an inline extent - only enforce free space tree if v1 cache is required - print correct tree id in error message * tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: show correct warning if can't read data reloc tree btrfs: fix NULL pointer dereference in do_abort_log_replay() btrfs: force free space tree for bs > ps cases btrfs: only enforce free space tree if v1 cache is required for bs < ps cases btrfs: release path before initializing extent tree in btrfs_read_locked_inode() btrfs: avoid access-beyond-folio for bs > ps encoded writes	2026-01-09 07:02:38 -10:00
Linus Torvalds	2bfe3e0da6	Merge tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Remove incorrect __user annotation from struct xattr_args::value - Documentation fix: Add missing kernel-doc description for the @isnew parameter in ilookup5_nowait() to silence Sphinx warnings - Documentation fix: Fix kernel-doc comment for __start_dirop() - the function name in the comment was wrong and the @state parameter was undocumented - Replace dynamic folio_batch allocation with stack allocation in iomap_zero_range(). The dynamic allocation was problematic for ext4-on-iomap work (didn't handle allocation failure properly) and triggered lockdep complaints. Uses a flag instead to control batch usage - Re-add #ifdef guards around PIDFD_GET_<ns-type>_NAMESPACE ioctls. When a namespace type is disabled, ns->ops is NULL, causes crashes during inode eviction when closing the fd. The ifdefs were removed in a recent simplification but are still needed - Fixe a race where a folio could be unlocked before the trailing zeros (for EOF within the page) were written - Split out a dedicated lease_dispose_list() helper since lease code paths always know they're disposing of leases. Removes unnecessary runtime flag checks and prepares for upcoming lease_manager enhancements - Fix userland delegation requests succeeding despite conflicting opens. Previously, FL_LAYOUT and FL_DELEG leases bypassed conflict checks (a hack for nfsd). Adds new ->lm_open_conflict() lease_manager operation so userland delegations get proper conflict checking while nfsd can continue its own conflict handling - Fix LOOKUP_CACHED path lookups incorrectly falling through to the slow path. After legitimize_links() calls were conditionally elided, the routine would always fail with LOOKUP_CACHED regardless of whether there were any links. Now the flag is checked at the two callsites before calling legitimize_links() - Fix bug in media fd allocation in media_request_alloc() - Fix mismatched API calls in ecryptfs_mknod(): was calling end_removing() instead of end_creating() after ecryptfs_start_creating_dentry() - Fix dentry reference count leak in ecryptfs_mkdir(): a dget() of the lower parent dir was added but never dput()'d, causing BUG during lower filesystem unmount due to the still-in-use dentry * tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: pidfs: protect PIDFD_GET_* ioctls() via ifdef ecryptfs: Release lower parent dentry after creating dir ecryptfs: Fix improper mknod pairing of start_creating()/end_removing() get rid of bogus __user in struct xattr_args::value VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED netfs: Fix early read unlock of page with EOF in middle filelock: allow lease_managers to dictate what qualifies as a conflict filelock: add lease_dispose_list() helper iomap: replace folio_batch allocation with stack allocation media: mc: fix potential use-after-free in media_request_alloc()	2026-01-09 05:57:57 -10:00
Christian Brauner	75ddaa4ddc	pidfs: protect PIDFD_GET_* ioctls() via ifdef We originally protected PIDFD_GET_<ns-type>_NAMESPACE ioctls() through ifdefs and recent rework made it possible to drop them. There was an oversight though. When the relevant namespace is turned off ns->ops will be NULL so even though opening a file descriptor is perfectly legitimate it would fail during inode eviction when the file was closed. The simple fix would be to check ns->ops for NULL and continue allow to retrieve namespace fds from pidfds but we don't allow retrieving them when the relevant namespace type is turned off. So keep the simplification but add the ifdefs back in. Link: https://lore.kernel.org/20251222214907.GA189632@quark Link: https://patch.msgid.link/20251224-ununterbrochen-gagen-ea949b83f8f2@brauner Fixes: `a71e4f103a` ("pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls") Tested-by: Brendan Jackman <jackmanb@kernel.org> Tested-by: Eric Biggers <ebiggers@kernel.org> Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-06 23:08:12 +01:00
Linus Torvalds	f0b9d8eb98	Merge tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "A set of NFSD fixes for stable that arrived after the merge window: - Remove an invalid NFS status code - Fix an fstests failure when using pNFS - Fix a UAF in v4_end_grace() - Fix the administrative interface used to revoke NFSv4 state - Fix a memory leak reported by syzbot" * tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: net ref data still needs to be freed even if net hasn't startup nfsd: check that server is running in unlock_filesystem nfsd: use correct loop termination in nfsd4_revoke_states() nfsd: provide locking for v4_end_grace NFSD: Fix permission check for read access to executable-only files NFSD: Remove NFSERR_EAGAIN	2026-01-06 09:12:52 -08:00
Mark Harmstone	2bb83bc42b	btrfs: show correct warning if can't read data reloc tree If a filesystem is missing its data reloc tree, we get something like this in dmesg: BTRFS warning (device loop11): failed to read root (objectid=4): -2 BTRFS error (device loop11): open_ctree failed: -2 objectid is BTRFS_DEV_TREE_OBJECTID, but this should actually be the value of BTRFS_DATA_RELOC_TREE_OBJECTID. btrfs_read_roots() prints location.objectid on failure, but this isn't set when reading the data reloc tree. Set location.objectid to the correct value on failure, so that the error message makes sense. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-06 01:23:00 +01:00
Suchit Karunakaran	530e3d4af5	btrfs: fix NULL pointer dereference in do_abort_log_replay() Coverity reported a NULL pointer dereference issue (CID 1666756) in do_abort_log_replay(). When btrfs_alloc_path() fails in replay_one_buffer(), wc->subvol_path is NULL, but btrfs_abort_log_replay() calls do_abort_log_replay() which unconditionally dereferences wc->subvol_path when attempting to print debug information. Fix this by adding a NULL check before dereferencing wc->subvol_path in do_abort_log_replay(). Fixes: `2753e49176` ("btrfs: dump detailed info and specific messages on log replay failures") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-06 01:23:00 +01:00
Qu Wenruo	cefd809251	btrfs: force free space tree for bs > ps cases [BUG] Currently we only enforcing the free space tree for bs < ps cases, but with the recently added bs > ps support, we lack the free space tree enforcing, causing explicit v1 cache mount option to fail on bs > ps cases: # mount -o space_cache=v1 /dev/test/scratch1 /mnt/btrfs/ mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/mapper/test-scratch1, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. # dmesg -t \| tail -n7 BTRFS: device fsid ac14a6fa-4ec9-449e-aec9-7d1777bfdc06 devid 1 transid 11 /dev/mapper/test-scratch1 (253:3) scanned by mount (2849) BTRFS info (device dm-3): first mount of filesystem ac14a6fa-4ec9-449e-aec9-7d1777bfdc06 BTRFS info (device dm-3): using crc32c checksum algorithm BTRFS warning (device dm-3): support for block size 8192 with page size 4096 is experimental, some features may be missing BTRFS warning (device dm-3): space cache v1 is being deprecated and will be removed in a future release, please use -o space_cache=v2 BTRFS warning (device dm-3): v1 space cache is not supported for page size 4096 with sectorsize 8192 BTRFS error (device dm-3): open_ctree failed: -22 [FIX] Just enable the same free space tree for bs > ps cases, aligning the behavior to bs < ps cases. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-06 01:22:59 +01:00
Qu Wenruo	30bcf4e824	btrfs: only enforce free space tree if v1 cache is required for bs < ps cases [BUG] Since the introduction of btrfs bs < ps support, v1 cache was never on the plan due to its hard coded PAGE_SIZE usage, and the future plan to properly deprecate it. However for bs < ps cases, even if 'nospace_cache,clear_cache' mount option is specified, it's never respected and free space tree is always enabled: mkfs.btrfs -f -O ^bgt,fst $dev mount $dev $mnt -o clear_cache,nospace_cache umount $mnt btrfs ins dump-super $dev ... compat_ro_flags 0x3 ( FREE_SPACE_TREE \| FREE_SPACE_TREE_VALID ) ... This means a different behavior compared to bs >= ps cases. [CAUSE] The forcing usage of v2 space cache is done inside btrfs_set_free_space_cache_settings(), however it never checks if we're even using space cache but always enabling v2 cache. [FIX] Instead unconditionally enable v2 cache, only forcing v2 cache if the old v1 cache is required. Now v2 space cache can be properly disabled on bs < ps cases: mkfs.btrfs -f -O ^bgt,fst $dev mount $dev $mnt -o clear_cache,nospace_cache umount $mnt btrfs ins dump-super $dev ... compat_ro_flags 0x0 ... Fixes: `9f73f1aef9` ("btrfs: force v2 space cache usage for subpage mount") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-06 01:22:59 +01:00
Filipe Manana	8731f2c50b	btrfs: release path before initializing extent tree in btrfs_read_locked_inode() In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree() while holding a path with a read locked leaf from a subvolume tree, and btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can trigger reclaim. This can create a circular lock dependency which lockdep warns about with the following splat: [6.1433] ====================================================== [6.1574] WARNING: possible circular locking dependency detected [6.1583] 6.18.0+ #4 Tainted: G U [6.1591] ------------------------------------------------------ [6.1599] kswapd0/117 is trying to acquire lock: [6.1606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1625] but task is already holding lock: [6.1633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60 [6.1646] which lock already depends on the new lock. [6.1657] the existing dependency chain (in reverse order) is: [6.1667] -> #2 (fs_reclaim){+.+.}-{0:0}: [6.1677] fs_reclaim_acquire+0x9d/0xd0 [6.1685] __kmalloc_cache_noprof+0x59/0x750 [6.1694] btrfs_init_file_extent_tree+0x90/0x100 [6.1702] btrfs_read_locked_inode+0xc3/0x6b0 [6.1710] btrfs_iget+0xbb/0xf0 [6.1716] btrfs_lookup_dentry+0x3c5/0x8e0 [6.1724] btrfs_lookup+0x12/0x30 [6.1731] lookup_open.isra.0+0x1aa/0x6a0 [6.1739] path_openat+0x5f7/0xc60 [6.1746] do_filp_open+0xd6/0x180 [6.1753] do_sys_openat2+0x8b/0xe0 [6.1760] __x64_sys_openat+0x54/0xa0 [6.1768] do_syscall_64+0x97/0x3e0 [6.1776] entry_SYSCALL_64_after_hwframe+0x76/0x7e [6.1784] -> #1 (btrfs-tree-00){++++}-{3:3}: [6.1794] lock_release+0x127/0x2a0 [6.1801] up_read+0x1b/0x30 [6.1808] btrfs_search_slot+0x8e0/0xff0 [6.1817] btrfs_lookup_inode+0x52/0xd0 [6.1825] __btrfs_update_delayed_inode+0x73/0x520 [6.1833] btrfs_commit_inode_delayed_inode+0x11a/0x120 [6.1842] btrfs_log_inode+0x608/0x1aa0 [6.1849] btrfs_log_inode_parent+0x249/0xf80 [6.1857] btrfs_log_dentry_safe+0x3e/0x60 [6.1865] btrfs_sync_file+0x431/0x690 [6.1872] do_fsync+0x39/0x80 [6.1879] __x64_sys_fsync+0x13/0x20 [6.1887] do_syscall_64+0x97/0x3e0 [6.1894] entry_SYSCALL_64_after_hwframe+0x76/0x7e [6.1903] -> #0 (&delayed_node->mutex){+.+.}-{3:3}: [6.1913] __lock_acquire+0x15e9/0x2820 [6.1920] lock_acquire+0xc9/0x2d0 [6.1927] __mutex_lock+0xcc/0x10a0 [6.1934] __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1944] btrfs_evict_inode+0x20b/0x4b0 [6.1952] evict+0x15a/0x2f0 [6.1958] prune_icache_sb+0x91/0xd0 [6.1966] super_cache_scan+0x150/0x1d0 [6.1974] do_shrink_slab+0x155/0x6f0 [6.1981] shrink_slab+0x48e/0x890 [6.1988] shrink_one+0x11a/0x1f0 [6.1995] shrink_node+0xbfd/0x1320 [6.1002] balance_pgdat+0x67f/0xc60 [6.1321] kswapd+0x1dc/0x3e0 [6.1643] kthread+0xff/0x240 [6.1965] ret_from_fork+0x223/0x280 [6.1287] ret_from_fork_asm+0x1a/0x30 [6.1616] other info that might help us debug this: [6.1561] Chain exists of: &delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim [6.1503] Possible unsafe locking scenario: [6.1110] CPU0 CPU1 [6.1411] ---- ---- [6.1707] lock(fs_reclaim); [6.1998] lock(btrfs-tree-00); [6.1291] lock(fs_reclaim); [6.1581] lock(&delayed_node->mutex); [6.1874] * DEADLOCK * [6.1716] 2 locks held by kswapd0/117: [6.1999] #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60 [6.1294] #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0 [6.1596] stack backtrace: [6.1183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.18.0+ #4 PREEMPT(lazy) [6.1185] Tainted: [U]=USER [6.1186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [6.1187] Call Trace: [6.1187] <TASK> [6.1189] dump_stack_lvl+0x6e/0xa0 [6.1192] print_circular_bug.cold+0x17a/0x1c0 [6.1194] check_noncircular+0x175/0x190 [6.1197] __lock_acquire+0x15e9/0x2820 [6.1200] lock_acquire+0xc9/0x2d0 [6.1201] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1204] __mutex_lock+0xcc/0x10a0 [6.1206] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1208] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1211] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1213] __btrfs_release_delayed_node.part.0+0x39/0x2f0 [6.1215] btrfs_evict_inode+0x20b/0x4b0 [6.1217] ? lock_acquire+0xc9/0x2d0 [6.1220] evict+0x15a/0x2f0 [6.1222] prune_icache_sb+0x91/0xd0 [6.1224] super_cache_scan+0x150/0x1d0 [6.1226] do_shrink_slab+0x155/0x6f0 [6.1228] shrink_slab+0x48e/0x890 [6.1229] ? shrink_slab+0x2d2/0x890 [6.1231] shrink_one+0x11a/0x1f0 [6.1234] shrink_node+0xbfd/0x1320 [6.1236] ? shrink_node+0xa2d/0x1320 [6.1236] ? shrink_node+0xbd3/0x1320 [6.1239] ? balance_pgdat+0x67f/0xc60 [6.1239] balance_pgdat+0x67f/0xc60 [6.1241] ? finish_task_switch.isra.0+0xc4/0x2a0 [6.1246] kswapd+0x1dc/0x3e0 [6.1247] ? __pfx_autoremove_wake_function+0x10/0x10 [6.1249] ? __pfx_kswapd+0x10/0x10 [6.1250] kthread+0xff/0x240 [6.1251] ? __pfx_kthread+0x10/0x10 [6.1253] ret_from_fork+0x223/0x280 [6.1255] ? __pfx_kthread+0x10/0x10 [6.1257] ret_from_fork_asm+0x1a/0x30 [6.1260] </TASK> This is because: 1) The fsync task is holding an inode's delayed node mutex (for a directory) while calling __btrfs_update_delayed_inode() and that needs to do a search on the subvolume's btree (therefore read lock some extent buffers); 2) The lookup task, at btrfs_lookup(), triggered reclaim with the GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while holding a read lock on a subvolume leaf; 3) The reclaim triggered kswapd which is doing inode eviction for the directory inode the fsync task is using as an argument to btrfs_commit_inode_delayed_inode() - but in that call chain we are trying to read lock the same leaf that the lookup task is holding while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL allocation. Fix this by calling btrfs_init_file_extent_tree() after we don't need the path anymore and release it in btrfs_read_locked_inode(). Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/linux-btrfs/6e55113a22347c3925458a5d840a18401a38b276.camel@linux.intel.com/ Fixes: `8679d2687c` ("btrfs: initialize inode::file_extent_tree after i_mode has been set") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-06 01:22:59 +01:00
Qu Wenruo	1aff297ffb	btrfs: avoid access-beyond-folio for bs > ps encoded writes [POTENTIAL BUG] If the system page size is 4K and fs block size is 8K, and max_inline mount option is set to 6K, we can inline a 6K sized data extent. Then a encoded write submitted a compressed extent which is at file offset 0, and the compressed length is 6K, which is allowed to be inlined. Now a read beyond page boundary is triggered inside write_extent_buffer() from insert_inline_extent(). [CAUSE] Currently the function __cow_file_range_inline() can only accept a single folio. For regular compressed write path, we always allocate the compressed folios using the minimal order matching the block size, thus the @compressed_folio should always cover a full fs block thus it is fine. But for encoded writes, they allocate page size folios, this means we can hit a case where the compressed data is smaller than block size but still larger than page size, in that case __cow_file_range_inline() will be called with @compressed_size larger than a page. In that case we will trigger a read beyond the folio inside insert_inline_extent(). Thankfully this is not that common, as the default max_inline is only 2048 bytes, smaller than PAGE_SIZE, and bs > ps support is still experimental. [FIX] We need to either allow insert_inline_extent() to accept a page array to properly support such case, or reject such inline extent. The latter is a much simpler solution, and considering bs > ps will stay as a corner case and non-default max_inline will be even rarer, I don't think we really need to fulfill such niche. So just reject any inline extent that's larger than PAGE_SIZE, and add an extra ASSERT() to insert_inline_extent() to catch such beyond-boundary access. Fixes: `ec20799064` ("btrfs: enable encoded read/write/send for bs > ps cases") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-06 01:22:59 +01:00
Linus Torvalds	7f98ab9da0	Merge tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential deadlock due to mismatching transaction states when waiting for the current transaction - fix squota accounting with nested snapshots - fix quota inheritance of qgroups with multiple parent qgroups - fix NULL inode pointer in evict tracepoint - fix writes beyond end of file on systems with 64K page size and 4K block size - fix logging of inodes after exchange rename - fix use after free when using ref_tracker feature - space reservation fixes * tag 'for-6.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix reservation leak in some error paths when inserting inline extent btrfs: do not free data reservation in fallback from inline due to -ENOSPC btrfs: fix use-after-free warning in btrfs_get_or_create_delayed_node() btrfs: always detect conflicting inodes when logging inode refs btrfs: fix beyond-EOF write handling btrfs: fix deadlock in wait_current_trans() due to ignored transaction type btrfs: fix NULL dereference on root when tracing inode eviction btrfs: qgroup: update all parent qgroups when doing quick inherit btrfs: fix qgroup_snapshot_quick_inherit() squota bug	2026-01-05 14:10:48 -08:00
Edward Adam Davis	0b88bfa42e	NFSD: net ref data still needs to be freed even if net hasn't startup When the NFSD instance doesn't to startup, the net ref data memory is not properly reclaimed, which triggers the memory leak issue reported by syzbot [1]. To avoid the problem reported in [1], the net ref data memory reclamation action is moved outside of nfsd_net_up when the net is shutdown. [1] unreferenced object 0xffff88812a39dfc0 (size 64): backtrace (crc a2262fc6): percpu_ref_init+0x94/0x1e0 lib/percpu-refcount.c:76 nfsd_create_serv+0xbe/0x260 fs/nfsd/nfssvc.c:605 nfsd_nl_listener_set_doit+0x62/0xb00 fs/nfsd/nfsctl.c:1882 genl_family_rcv_msg_doit+0x11e/0x190 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x2fd/0x440 net/netlink/genetlink.c:1210 BUG: memory leak Reported-by: syzbot+6ee3b889bdeada0a6226@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6ee3b889bdeada0a6226 Fixes: `39972494e3` ("nfsd: update percpu_ref to manage references on nfsd_net") Cc: stable@vger.kernel.org Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-01-02 13:50:14 -05:00
Olga Kornievskaia	d0424066fc	nfsd: check that server is running in unlock_filesystem If we are trying to unlock the filesystem via an administrative interface and nfsd isn't running, it crashes the server. This happens currently because nfsd4_revoke_states() access state structures (eg., conf_id_hashtbl) that has been freed as a part of the server shutdown. [ 59.465072] Call trace: [ 59.465308] nfsd4_revoke_states+0x1b4/0x898 [nfsd] (P) [ 59.465830] write_unlock_fs+0x258/0x440 [nfsd] [ 59.466278] nfsctl_transaction_write+0xb0/0x120 [nfsd] [ 59.466780] vfs_write+0x1f0/0x938 [ 59.467088] ksys_write+0xfc/0x1f8 [ 59.467395] __arm64_sys_write+0x74/0xb8 [ 59.467746] invoke_syscall.constprop.0+0xdc/0x1e8 [ 59.468177] do_el0_svc+0x154/0x1d8 [ 59.468489] el0_svc+0x40/0xe0 [ 59.468767] el0t_64_sync_handler+0xa0/0xe8 [ 59.469138] el0t_64_sync+0x1ac/0x1b0 Ensure this can't happen by taking the nfsd_mutex and checking that the server is still up, and then holding the mutex across the call to nfsd4_revoke_states(). Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: `1ac3629bf0` ("nfsd: prepare for supporting admin-revocation of state") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-01-02 13:49:55 -05:00
NeilBrown	fb321998de	nfsd: use correct loop termination in nfsd4_revoke_states() The loop in nfsd4_revoke_states() stops one too early because the end value given is CLIENT_HASH_MASK where it should be CLIENT_HASH_SIZE. This means that an admin request to drop all locks for a filesystem will miss locks held by clients which hash to the maximum possible hash value. Fixes: `1ac3629bf0` ("nfsd: prepare for supporting admin-revocation of state") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-01-02 13:49:38 -05:00
NeilBrown	2857bd59fe	nfsd: provide locking for v4_end_grace Writing to v4_end_grace can race with server shutdown and result in memory being accessed after it was freed - reclaim_str_hashtbl in particularly. We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is held while client_tracking_op->init() is called and that can wait for an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a deadlock. nfsd4_end_grace() is also called by the landromat work queue and this doesn't require locking as server shutdown will stop the work and wait for it before freeing anything that nfsd4_end_grace() might access. However, we must be sure that writing to v4_end_grace doesn't restart the work item after shutdown has already waited for it. For this we add a new flag protected with nn->client_lock. It is set only while it is safe to make client tracking calls, and v4_end_grace only schedules work while the flag is set with the spinlock held. So this patch adds a nfsd_net field "client_tracking_active" which is set as described. Another field "grace_end_forced", is set when v4_end_grace is written. After this is set, and providing client_tracking_active is set, the laundromat is scheduled. This "grace_end_forced" field bypasses other checks for whether the grace period has finished. This resolves a race which can result in use-after-free. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t Fixes: `7f5ef2e900` ("nfsd: add a v4_end_grace file to /proc/fs/nfsd") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-01-02 13:48:22 -05:00
Scott Mayhew	e901c7fce5	NFSD: Fix permission check for read access to executable-only files Commit `abc02e5602` ("NFSD: Support write delegations in LAYOUTGET") added NFSD_MAY_OWNER_OVERRIDE to the access flags passed from nfsd4_layoutget() to fh_verify(). This causes LAYOUTGET to fail for executable-only files, and causes xfstests generic/126 to fail on pNFS SCSI. To allow read access to executable-only files, what we really want is: 1. The "permissions" portion of the access flags (the lower 6 bits) must be exactly NFSD_MAY_READ 2. The "hints" portion of the access flags (the upper 26 bits) can contain any combination of NFSD_MAY_OWNER_OVERRIDE and NFSD_MAY_READ_IF_EXEC Fixes: `abc02e5602` ("NFSD: Support write delegations in LAYOUTGET") Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-01-02 13:44:05 -05:00
Chuck Lever	c6c209ceb8	NFSD: Remove NFSERR_EAGAIN I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881. None of these RFCs have an NFS status code that match the numeric value "11". Based on the meaning of the EAGAIN errno, I presume the use of this status in NFSD means NFS4ERR_DELAY. So replace the one usage of nfserr_eagain, and remove it from NFSD's NFS status conversion tables. As far as I can tell, NFSERR_EAGAIN has existed since the pre-git era, but was not actually used by any code until commit `f4e44b3933` ("NFSD: delay unmount source's export after inter-server copy completed."), at which time it become possible for NFSD to return a status code of 11 (which is not valid NFS protocol). Fixes: `f4e44b3933` ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-01-02 13:43:41 -05:00
Linus Torvalds	e3a97ab1bb	Merge tag 'v6.19-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - Fix memory leak - Fix two refcount leaks - Fix error path in create_smb2_pipe * tag 'v6.19-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd: smb/server: fix refcount leak in smb2_open() smb/server: fix refcount leak in parse_durable_handle_context() smb/server: call ksmbd_session_rpc_close() on error path in create_smb2_pipe() ksmbd: Fix memory leak in get_file_all_info()	2026-01-02 09:24:43 -08:00
Linus Torvalds	047b4e783c	Merge tag 'v6.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix array out of bounds error in copy_file_range - Add tracepoint to help debug ioctl failures * tag 'v6.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix UBSAN array-index-out-of-bounds in smb2_copychunk_range smb3 client: add missing tracepoint for unsupported ioctls	2026-01-02 09:14:13 -08:00
Linus Torvalds	c8ebd43345	Merge tag 'nfsd-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "A set of NFSD fixes that arrived just a bit late for the 6.19 merge window. Regression fix: - Avoid unnecessarily breaking a timestamp delegation Stable fixes: - Fix a crasher in nlm4svc_proc_test() - Fix nfsd_file reference leak during write delegation - Fix error flow in client_states_open()" * tag 'nfsd-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: Drop the client reference in client_states_open() nfsd: use ATTR_DELEG in nfsd4_finalize_deleg_timestamps() nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg() lockd: fix vfs_test_lock() calls	2025-12-30 17:56:26 -08:00
Henrique Carvalho	fa2fd0b10f	smb: client: fix UBSAN array-index-out-of-bounds in smb2_copychunk_range struct copychunk_ioctl_req::ChunkCount is annotated with __counted_by_le() as the number of elements in Chunks[]. smb2_copychunk_range reuses ChunkCount to store the number of chunks sent in the current iteration. If a later iteration populates more chunks than a previous one, the stale smaller value trips UBSAN. Set ChunkCount to chunk_count (allocated capacity) before populating Chunks[]. Fixes: `cc26f593dc` ("smb: move copychunk definitions to common/smb2pdu.h") Link: https://lore.kernel.org/linux-cifs/CAH2r5ms9AWLy8WZ04Cpq5XOeVK64tcrUQ6__iMW+yk1VPzo1BA@mail.gmail.com Tested-by: Youling Tang <tangyouling@kylinos.cn> Acked-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-30 09:17:41 -06:00
Steve French	bc31161162	smb3 client: add missing tracepoint for unsupported ioctls In debugging a recent problem with an xfstest, noticed that we weren't tracing cases where the ioctl was not supported. Add dynamic tracepoint: "trace-cmd record -e smb3_unsupported_ioctl" and then after running an app which calls unsupported ioctl, "trace-cmd show"would display e.g. xfs_io-7289 [012] ..... 1205.137765: smb3_unsupported_ioctl: xid=19 fid=0x4535bb84 ioctl cmd=0x801c581f Acked-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-30 09:16:45 -06:00
ZhangGuoDong	f416c55699	smb/server: fix refcount leak in smb2_open() When ksmbd_vfs_getattr() fails, the reference count of ksmbd_file must be released. Suggested-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-29 17:39:58 -06:00
ZhangGuoDong	3296c3012a	smb/server: fix refcount leak in parse_durable_handle_context() When the command is a replay operation and -ENOEXEC is returned, the refcount of ksmbd_file must be released. Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-29 17:39:58 -06:00
ZhangGuoDong	7c28f8eef5	smb/server: call ksmbd_session_rpc_close() on error path in create_smb2_pipe() When ksmbd_iov_pin_rsp() fails, we should call ksmbd_session_rpc_close(). Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-29 17:39:57 -06:00
Zilin Guan	0c56693b06	ksmbd: Fix memory leak in get_file_all_info() In get_file_all_info(), if vfs_getattr() fails, the function returns immediately without freeing the allocated filename, leading to a memory leak. Fix this by freeing the filename before returning in this error case. Fixes: `5614c8c487` ("ksmbd: replace generic_fillattr with vfs_getattr") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-29 17:39:57 -06:00
Linus Torvalds	04688d6128	Merge tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fix from Steve French: - Fix potential memory leak * tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix memory and information leak in smb3_reconfigure()	2025-12-26 16:19:45 -08:00
Linus Torvalds	1e5e062ad8	Merge tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Introduce DMA Rust helpers to avoid build errors when !CONFIG_HAS_DMA - Remove unnecessary (and hence incorrect) endian conversion in the Rust PCI driver sample code - Fix memory leak in the unwind path of debugfs_change_name() - Support non-const struct software_node pointers in SOFTWARE_NODE_REFERENCE(), after introducing _Generic() - Avoid NULL pointer dereference in the unwind path of simple_xattrs_free() * tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: fs/kernfs: null-ptr deref in simple_xattrs_free() software node: Also support referencing non-constant software nodes debugfs: Fix memleak in debugfs_change_name(). samples: rust: fix endianness issue in rust_driver_pci rust: dma: add helpers for architectures without CONFIG_HAS_DMA	2025-12-26 13:41:02 -08:00
Linus Torvalds	e2cc644089	Merge tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - Fix parsing of SMB1 negotiate request by adjusting offsets affected by the removal of the RFC1002 length field from the SMB header - Update minimum PDU size macros for both SMB1 and SMB2 - Rename smb2_get_msg function to smb_get_msg to better reflect its role in handling both SMB1 and SMB2 requests * tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd: smb/server: fix minimum SMB2 PDU size smb/server: fix minimum SMB1 PDU size ksmbd: rename smb2_get_msg to smb_get_msg ksmbd: Fix to handle removal of rfc1002 header from smb_hdr	2025-12-26 10:03:25 -08:00
Haoxiang Li	1f941b2c23	nfsd: Drop the client reference in client_states_open() In error path, call drop_client() to drop the reference obtained by get_nfsdfs_clp(). Fixes: `78599c42ae` ("nfsd4: add file to display list of client's opens") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-24 21:33:12 -05:00
Jeff Layton	8f9e967830	nfsd: use ATTR_DELEG in nfsd4_finalize_deleg_timestamps() When finalizing timestamps that have never been updated and preparing to release the delegation lease, the notify_change() call can trigger a delegation break, and fail to update the timestamps. When this happens, there will be messages like this in dmesg: [ 2709.375785] Unable to update timestamps on inode 00:39:263: -11 Since this code is going to release the lease just after updating the timestamps, breaking the delegation is undesirable. Fix this by setting ATTR_DELEG in ia_valid, in order to avoid the delegation break. Fixes: `e5e9b24ab8` ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-24 21:32:34 -05:00
Chuck Lever	8072e34e13	nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg() nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if the client already has a SHARE_ACCESS_READ open from a previous OPEN operation, this action overwrites the existing pointer without releasing its reference, orphaning the previous reference. Additionally, the function originally stored the same nfsd_file pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with only a single reference. When put_deleg_file() runs, it clears fi_rdeleg_file and calls nfs4_file_put_access() to release the file. However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when the fi_access[O_RDONLY] counter drops to zero. If another READ open exists on the file, the counter remains elevated and the nfsd_file reference from the delegation is never released. This potentially causes open conflicts on that file. Then, on server shutdown, these leaks cause __nfsd_file_cache_purge() to encounter files with an elevated reference count that cannot be cleaned up, ultimately triggering a BUG() in kmem_cache_destroy() because there are still nfsd_file objects allocated in that cache. Fixes: `e7a8ebc305` ("NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-24 21:29:24 -05:00
NeilBrown	a49a2a1baa	lockd: fix vfs_test_lock() calls Usage of vfs_test_lock() is somewhat confused. Documentation suggests it is given a "lock" but this is not the case. It is given a struct file_lock which contains some details of the sort of lock it should be looking for. In particular passing a "file_lock" containing fl_lmops or fl_ops is meaningless and possibly confusing. This is particularly problematic in lockd. nlmsvc_testlock() receives an initialised "file_lock" from xdr-decode, including manager ops and an owner. It then mistakenly passes this to vfs_test_lock() which might replace the owner and the ops. This can lead to confusion when freeing the lock. The primary role of the 'struct file_lock' passed to vfs_test_lock() is to report a conflicting lock that was found, so it makes more sense for nlmsvc_testlock() to pass "conflock", which it uses for returning the conflicting lock. With this change, freeing of the lock is not confused and code in __nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified. Documentation for vfs_test_lock() is improved to reflect its real purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the future. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com Signed-off-by: NeilBrown <neil@brown.name> Fixes: `20fa190272` ("nfs: add export operations") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-12-24 21:18:46 -05:00
Linus Torvalds	ccd1cdca5c	Merge tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "A set of NFSD fixes that arrived just a bit late for the 6.19 merge window. Regression fixes: - Mark variable __maybe_unused to avoid W=1 build break Stable fixes: - NFSv4 file creation neglects setting ACL - Clear TIME_DELEG in the suppattr_exclcreat bitmap - Clear SECLABEL in the suppattr_exclcreat bitmap - Fix memory leak in nfsd_create_serv error paths - Bound check rq_pages index in inline path - Return 0 on success from svc_rdma_copy_inline_range - Use rc_pageoff for memcpy byte offset - Avoid NULL deref on zero length gss_token in gss_read_proxy_verf" * tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: NFSv4 file creation neglects setting ACL NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap nfsd: fix memory leak in nfsd_create_serv error paths nfsd: Mark variable __maybe_unused to avoid W=1 build break svcrdma: bound check rq_pages index in inline path svcrdma: return 0 on success from svc_rdma_copy_inline_range svcrdma: use rc_pageoff for memcpy byte offset SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf	2025-12-24 09:23:04 -08:00
Linus Torvalds	ce93692d68	Merge tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: "Junbeom reported that synchronous reads could hit unintended EIOs under memory pressure due to incorrect error propagation in z_erofs_decompress_queue(), where earlier physical clusters in the same decompression queue may be served for another readahead. This addresses the issue by decompressing each physical cluster independently as long as disk I/Os succeed, rather than being impacted by the error status of previous physical clusters in the same queue. Summary: - Fix unexpected EIOs under memory pressure caused by recent incorrect error propagation logic" * tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix unexpected EIO under memory pressure	2025-12-24 09:15:30 -08:00
Zilin Guan	cb6d5aa9c0	cifs: Fix memory and information leak in smb3_reconfigure() In smb3_reconfigure(), if smb3_sync_session_ctx_passwords() fails, the function returns immediately without freeing and erasing the newly allocated new_password and new_password2. This causes both a memory leak and a potential information leak. Fix this by calling kfree_sensitive() on both password buffers before returning in this error case. Fixes: `0f0e357902` ("cifs: during remount, make sure passwords are in sync") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-24 11:07:15 -06:00
Tyler Hicks	5c56afd204	ecryptfs: Release lower parent dentry after creating dir Fix a mkdir-induced usage count imbalance that tripped a umount_check() BUG while unmounting the lower filesystem. Commit `f046fbb4d8` ("ecryptfs: use new start_creating/start_removing APIs") added a new dget() of the lower parent dir, in ecryptfs_mkdir(), but did not dput() the dentry before returning from that function. The BUG output as seen while running the eCryptfs test suite: $ ./run_tests.sh -b 131072 -c safe,destructive -f ext4 -K -t lp-926292.sh ... Running eCryptfs filesystem tests on ext4 lp-926292 ------------[ cut here ]------------ BUG: Dentry ffff8e6692d11988{i=c,n=ECRYPTFS_FNEK_ENCRYPTED.FXZuRGZL7QAFtER.JeA46DtdKqkkQx9H2Vpmv234J5CU8YSsrUwZJK4AbXbrN5WkZ348wnqstovKKxA-} still in use (1) [unmount of ext4 loop0] WARNING: CPU: 7 PID: 950 at fs/dcache.c:1590 umount_check+0x5e/0x80 Modules linked in: md5 libmd5 ecryptfs encrypted_keys ext4 crc16 mbcache jbd2 CPU: 7 UID: 0 PID: 950 Comm: umount Not tainted 6.18.0-rc1-00013-gf046fbb4d81d #17 PREEMPT(full) Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:umount_check+0x5e/0x80 Code: 88 38 06 00 00 48 8b 40 28 4c 8b 08 48 8b 46 68 48 85 c0 74 04 48 8b 50 38 51 48 c7 c7 60 32 9c b5 48 89 f1 e8 43 5e ca ff 90 <0f> 0b 90 90 58 31 c0 e9 46 9d 6c 00 41 83 f8 01 75 b8 eb a3 66 66 RSP: 0018:ffffa19940c4bdd0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8e6692fad4c0 RCX: 0000000000000000 RDX: 0000000000000004 RSI: ffffa19940c4bc70 RDI: 00000000ffffffff RBP: ffffffffb4eb5930 R08: 00000000ffffdfff R09: 0000000000000001 R10: 00000000ffffdfff R11: ffffffffb5c8a9e0 R12: ffff8e6692fad4c0 R13: ffff8e6692fad4c0 R14: ffff8e6692d11a40 R15: ffff8e6692d11988 FS: 00007f6b4b491800(0000) GS:ffff8e670506e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6b4b5f8d40 CR3: 0000000114eb7001 CR4: 0000000000772ef0 PKRU: 55555554 Call Trace: <TASK> d_walk+0xfd/0x370 shrink_dcache_for_umount+0x4d/0x140 generic_shutdown_super+0x20/0x160 kill_block_super+0x1a/0x40 ext4_kill_sb+0x22/0x40 [ext4] deactivate_locked_super+0x33/0xa0 cleanup_mnt+0xba/0x150 task_work_run+0x5c/0xa0 exit_to_user_mode_loop+0xac/0xb0 do_syscall_64+0x2ab/0xfa0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f6b4b6c2a2b Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 b9 83 0d 00 f7 d8 RSP: 002b:00007ffcd5b8b498 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 000055b84af0b9e0 RCX: 00007f6b4b6c2a2b RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055b84af0bdf0 RBP: 00007ffcd5b8b570 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000103 R11: 0000000000000246 R12: 000055b84af0bae0 R13: 0000000000000000 R14: 000055b84af0bdf0 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- EXT4-fs (loop0): unmounting filesystem 00d9ea41-f61e-43d0-a449-6be03e7e8428. EXT4-fs (loop0): sb orphan head is 12 sb_info orphan list: inode loop0:12 at ffff8e66950e1df0: mode 40700, nlink 0, next 0 Assertion failure in ext4_put_super() at fs/ext4/super.c:1345: 'list_empty(&sbi->s_orphan)' Fixes: `f046fbb4d8` ("ecryptfs: use new start_creating/start_removing APIs") Signed-off-by: Tyler Hicks <code@tyhicks.com> Link: https://patch.msgid.link/20251223194153.2818445-3-code@tyhicks.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:58:04 +01:00
Tyler Hicks	5f9ad16bcc	ecryptfs: Fix improper mknod pairing of start_creating()/end_removing() The ecryptfs_start_creating_dentry() function must be paired with the end_creating() function. Fix ecryptfs_mknod() so that end_creating() is properly called in the return path, instead of end_removing(). Fixes: `f046fbb4d8` ("ecryptfs: use new start_creating/start_removing APIs") Signed-off-by: Tyler Hicks <code@tyhicks.com> Link: https://patch.msgid.link/20251223194153.2818445-2-code@tyhicks.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:58:04 +01:00
Bagas Sanjaya	73a91ef328	VFS: fix __start_dirop() kernel-doc warnings Sphinx report kernel-doc warnings: WARNING: ./fs/namei.c:2853 function parameter 'state' not described in '__start_dirop' WARNING: ./fs/namei.c:2853 expecting prototype for start_dirop(). Prototype was for __start_dirop() instead Fix them up. Fixes: `ff7c4ea11a` ("VFS: add start_creating_killable() and start_removing_killable()") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20251219024620.22880-3-bagasdotme@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:33:24 +01:00
Bagas Sanjaya	fe33729d29	fs: Describe @isnew parameter in ilookup5_nowait() Sphinx reports kernel-doc warning: WARNING: ./fs/inode.c:1607 function parameter 'isnew' not described in 'ilookup5_nowait' Describe the parameter. Fixes: `a27628f436` ("fs: rework I_NEW handling to operate without fences") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20251219024620.22880-2-bagasdotme@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:33:24 +01:00
Mateusz Guzik	46af9ae130	fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED Otherwise the slowpath can be taken by the caller, defeating the flag. This regressed after calls to legitimize_links() started being conditionally elided and stems from the routine always failing after seeing the flag, regardless if there were any links. In order to address both the bug and the weird semantics make it illegal to call legitimize_links() with LOOKUP_CACHED and handle the problem at the two callsites. Fixes: `7c179096e7` ("fs: add predicts based on nd->depth") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251220054023.142134-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:31:56 +01:00
David Howells	570ad253a3	netfs: Fix early read unlock of page with EOF in middle The read result collection for buffered reads seems to run ahead of the completion of subrequests under some circumstances, as can be seen in the following log snippet: 9p_client_res: client 18446612686390831168 response P9_TREAD tag 0 err 0 ... netfs_sreq: R=00001b55[1] DOWN TERM f=192 s=0 5fb2/5fb2 s=5 e=0 ... netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2 netfs_folio: i=157f3 ix=00004-00004 read-done netfs_folio: i=157f3 ix=00004-00004 read-unlock netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2 netfs_folio: i=157f3 ix=00005-00005 read-done netfs_folio: i=157f3 ix=00005-00005 read-unlock ... netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8 ... netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0 netfs_sreq: R=00001b55[2] ZERO TERM f=102 s=5fb2 4e/4e s=5 e=0 The 'cto=5fb2' indicates the collected file pos we've collected results to so far - but we still have 0x4e more bytes to go - so we shouldn't have collected folio ix=00005 yet. The 'ZERO' subreq that clears the tail happens after we unlock the folio, allowing the application to see the uncleared tail through mmap. The problem is that netfs_read_unlock_folios() will unlock a folio in which the amount of read results collected hits EOF position - but the ZERO subreq lies beyond that and so happens after. Fix this by changing the end check to always be the end of the folio and never the end of the file. In the future, I should look at clearing to the end of the folio here rather than adding a ZERO subreq to do this. On the other hand, the ZERO subreq can run in parallel with an async READ subreq. Further, the ZERO subreq may still be necessary to, say, handle extents in a ceph file that don't have any backing store and are thus implicitly all zeros. This can be reproduced by creating a file, the size of which doesn't align to a page boundary, e.g. 24998 (0x5fb2) bytes and then doing something like: xfs_io -c "mmap -r 0 0x6000" -c "madvise -d 0 0x6000" \ -c "mread -v 0 0x6000" /xfstest.test/x The last 0x4e bytes should all be 00, but if the tail hasn't been cleared yet, you may see rubbish there. This can be reproduced with kafs by modifying the kernel to disable the call to netfs_read_subreq_progress() and to stop afs_issue_read() from doing the async call for NETFS_READAHEAD. Reproduction can be made easier by inserting an mdelay(100) in netfs_issue_read() for the ZERO-subreq case. AFS and CIFS are normally unlikely to show this as they dispatch READ ops asynchronously, which allows the ZERO-subreq to finish first. 9P's READ op is completely synchronous, so the ZERO-subreq will always happen after. It isn't seen all the time, though, because the collection may be done in a worker thread. Reported-by: Christian Schoenebeck <linux_oss@crudebyte.com> Link: https://lore.kernel.org/r/8622834.T7Z3S40VBb@weasel/ Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/938162.1766233900@warthog.procyon.org.uk Fixes: `e2d46f2ec3` ("netfs: Change the read result collector to only use one work item") Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com> Acked-by: Dominique Martinet <asmadeus@codewreck.org> Suggested-by: Dominique Martinet <asmadeus@codewreck.org> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:30:24 +01:00
Will Rosenberg	2b74209458	fs/kernfs: null-ptr deref in simple_xattrs_free() There exists a null pointer dereference in simple_xattrs_free() as part of the __kernfs_new_node() routine. Within __kernfs_new_node(), err_out4 calls simple_xattr_free(), but kn->iattr may be NULL if __kernfs_setattr() was never called. As a result, the first argument to simple_xattrs_free() may be NULL + 0x38, and no NULL check is done internally, causing an incorrect pointer dereference. Add a check to ensure kn->iattr is not NULL, meaning __kernfs_setattr() has been called and kn->iattr is allocated. Note that struct kernfs_node kn is allocated with kmem_cache_zalloc, so we can assume kn->iattr will be NULL if not allocated. An alternative fix could be to not call simple_xattrs_free() at all. As was previously discussed during the initial patch, simple_xattrs_free() is not strictly needed and is included to be consistent with kernfs_free_rcu(), which also helps the function maintain correctness if changes are made in __kernfs_new_node(). Reported-by: syzbot+6aaf7f48ae034ab0ea97@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6aaf7f48ae034ab0ea97 Fixes: `382b1e8f30` ("kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node") Signed-off-by: Will Rosenberg <whrosenb@asu.edu> Link: https://patch.msgid.link/20251217060107.4171558-1-whrosenb@asu.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-12-23 16:14:43 +01:00
ChenXiaoSong	4c7d8eb9a7	smb/server: fix minimum SMB2 PDU size The minimum SMB2 PDU size should be updated to the size of `struct smb2_pdu` (that is, the size of `struct smb2_hdr` + 2). Suggested-by: David Howells <dhowells@redhat.com> Suggested-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-12-21 19:20:46 -06:00

1 2 3 4 5 ...

103123 Commits