linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-08 18:00:19 -04:00

Author	SHA1	Message	Date
Darrick J. Wong	bd3138e891	xfs: fix remote xattr valuelblk check In debugging other problems with generic/753, it turns out that it's possible for the system go to down in the middle of a remote xattr set operation such that the leaf block entry is marked incomplete and valueblk is set to zero. Make this no longer a failure. Cc: <stable@vger.kernel.org> # v4.15 Fixes: `13791d3b83` ("xfs: scrub extended attribute leaf space") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2026-01-23 09:27:33 -08:00
Darrick J. Wong	6fed827044	xfs: fix the xattr scrub to detect freemap/entries array collisions In the previous patches, we observed that it's possible for there to be freemap entries with zero size but a nonzero base. This isn't an inconsistency per se, but older kernels can get confused by this and corrupt the block, leading to corruption. If we see this, flag the xattr structure for optimization so that it gets rebuilt. Cc: <stable@vger.kernel.org> # v4.15 Fixes: `13791d3b83` ("xfs: scrub extended attribute leaf space") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2026-01-23 09:27:33 -08:00
Darrick J. Wong	27a0c41f33	xfs: strengthen attr leaf block freemap checking Check for erroneous overlapping freemap regions and collisions between freemap regions and the xattr leaf entry array. Note that we must explicitly zero out the extra freemaps in xfs_attr3_leaf_compact so that the in-memory buffer has a correctly initialized freemap array to satisfy the new verification code, even if subsequent code changes the contents before unlocking the buffer. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2026-01-23 09:27:32 -08:00
Darrick J. Wong	a165f7e763	xfs: refactor attr3 leaf table size computation Replace all the open-coded callsites with a single static inline helper. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2026-01-23 09:27:31 -08:00
Darrick J. Wong	3eefc0c2b7	xfs: fix freemap adjustments when adding xattrs to leaf blocks xfs/592 and xfs/794 both trip this assertion in the leaf block freemap adjustment code after ~20 minutes of running on my test VMs: ASSERT(ichdr->firstused >= ichdr->count * sizeof(xfs_attr_leaf_entry_t) + xfs_attr3_leaf_hdr_size(leaf)); Upon enabling quite a lot more debugging code, I narrowed this down to fsstress trying to set a local extended attribute with namelen=3 and valuelen=71. This results in an entry size of 80 bytes. At the start of xfs_attr3_leaf_add_work, the freemap looks like this: i 0 base 448 size 0 rhs 448 count 46 i 1 base 388 size 132 rhs 448 count 46 i 2 base 2120 size 4 rhs 448 count 46 firstused = 520 where "rhs" is the first byte past the end of the leaf entry array. This is inconsistent -- the entries array ends at byte 448, but freemap[1] says there's free space starting at byte 388! By the end of the function, the freemap is in worse shape: i 0 base 456 size 0 rhs 456 count 47 i 1 base 388 size 52 rhs 456 count 47 i 2 base 2120 size 4 rhs 456 count 47 firstused = 440 Important note: 388 is not aligned with the entries array element size of 8 bytes. Based on the incorrect freemap, the name area starts at byte 440, which is below the end of the entries array! That's why the assertion triggers and the filesystem shuts down. How did we end up here? First, recall from the previous patch that the freemap array in an xattr leaf block is not intended to be a comprehensive map of all free space in the leaf block. In other words, it's perfectly legal to have a leaf block with: * 376 bytes in use by the entries array * freemap[0] has [base = 376, size = 8] * freemap[1] has [base = 388, size = 1500] * the space between 376 and 388 is free, but the freemap stopped tracking that some time ago If we add one xattr, the entries array grows to 384 bytes, and freemap[0] becomes [base = 384, size = 0]. So far, so good. But if we add a second xattr, the entries array grows to 392 bytes, and freemap[0] gets pushed up to [base = 392, size = 0]. This is bad, because freemap[1] hasn't been updated, and now the entries array and the free space claim the same space. The fix here is to adjust all freemap entries so that none of them collide with the entries array. Note that this fix relies on commit `2a2b5932db` ("xfs: fix attr leaf header freemap.size underflow") and the previous patch that resets zero length freemap entries to have base = 0. Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2026-01-23 09:27:31 -08:00
Darrick J. Wong	6f13c1d2a6	xfs: delete attr leaf freemap entries when empty Back in commit `2a2b5932db` ("xfs: fix attr leaf header freemap.size underflow"), Brian Foster observed that it's possible for a small freemap at the end of the end of the xattr entries array to experience a size underflow when subtracting the space consumed by an expansion of the entries array. There are only three freemap entries, which means that it is not a complete index of all free space in the leaf block. This code can leave behind a zero-length freemap entry with a nonzero base. Subsequent setxattr operations can increase the base up to the point that it overlaps with another freemap entry. This isn't in and of itself a problem because the code in _leaf_add that finds free space ignores any freemap entry with zero size. However, there's another bug in the freemap update code in _leaf_add, which is that it fails to update a freemap entry that begins midway through the xattr entry that was just appended to the array. That can result in the freemap containing two entries with the same base but different sizes (0 for the "pushed-up" entry, nonzero for the entry that's actually tracking free space). A subsequent _leaf_add can then allocate xattr namevalue entries on top of the entries array, leading to data loss. But fixing that is for later. For now, eliminate the possibility of confusion by zeroing out the base of any freemap entry that has zero size. Because the freemap is not intended to be a complete index of free space, a subsequent failure to find any free space for a new xattr will trigger block compaction, which regenerates the freemap. It looks like this bug has been in the codebase for quite a long time. Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2026-01-23 09:27:30 -08:00
Wenwu Hou	a1ca658d64	xfs: fix incorrect context handling in xfs_trans_roll The memalloc_nofs_save() and memalloc_nofs_restore() calls are incorrectly paired in xfs_trans_roll. Call path: xfs_trans_alloc() __xfs_trans_alloc() // tp->t_pflags = memalloc_nofs_save(); xfs_trans_set_context() ... xfs_defer_trans_roll() xfs_trans_roll() xfs_trans_dup() // old_tp->t_pflags = 0; xfs_trans_switch_context() __xfs_trans_commit() xfs_trans_free() // memalloc_nofs_restore(tp->t_pflags); xfs_trans_clear_context() The code passes 0 to memalloc_nofs_restore() when committing the original transaction, but memalloc_nofs_restore() should always receive the flags returned from the paired memalloc_nofs_save() call. Before commit `3f6d5e6a46` ("mm: introduce memalloc_flags_{save,restore}"), calling memalloc_nofs_restore(0) would unset the PF_MEMALLOC_NOFS flag, which could cause memory allocation deadlocks[1]. Fortunately, after that commit, memalloc_nofs_restore(0) does nothing, so this issue is currently harmless. Fixes: `756b1c3433` ("xfs: use current->journal_info for detecting transaction recursion") Link: https://lore.kernel.org/linux-xfs/20251104131857.1587584-1-leo.lilong@huawei.com [1] Signed-off-by: Wenwu Hou <hwenwur@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:59:10 +01:00
Hans Holmberg	01a2896154	xfs: always allocate the free zone with the lowest index Zones in the beginning of the address space are typically mapped to higer bandwidth tracks on HDDs than those at the end of the address space. So, in stead of allocating zones "round robin" across the whole address space, always allocate the zone with the lowest index. This increases average write bandwidth for overwrite workloads when less than the full capacity is being used. At ~50% utilization this improves bandwidth for a random file overwrite benchmark with 128MiB files and 256MiB zone capacity by 30%. Running the same benchmark with small 2-8 MiB files at 67% capacity shows no significant difference in performance. Due to heavy fragmentation the whole zone range is in use, greatly limiting the number of free zones with high bw. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:17 +01:00
Darrick J. Wong	4d6d335ea9	xfs: promote metadata directories and large block support Large block support was merged upstream in 6.12 (Dec 2024) and metadata directories was merged in 6.13 (Jan 2025). We've not received any serious complaints about the ondisk formats of these two features in the past year, so let's remove the experimental warnings. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:17 +01:00
Christoph Hellwig	12d12dcc15	xfs: use blkdev_get_zone_info to simplify zone reporting Unwind the callback based programming model by querying the cached zone information using blkdev_get_zone_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:17 +01:00
Christoph Hellwig	b37c1e4e9a	xfs: check that used blocks are smaller than the write pointer Any used block must have been written, this reject used blocks > write pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:17 +01:00
Christoph Hellwig	19c5b6051e	xfs: split and refactor zone validation Currently xfs_zone_validate mixes validating the software zone state in the XFS realtime group with validating the hardware state reported in struct blk_zone and deriving the write pointer from that. Move all code that works on the realtime group to xfs_init_zone, and only keep the hardware state validation in xfs_zone_validate. This makes the code more clear, and allows for better reuse in userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:17 +01:00
Christoph Hellwig	776b76f754	xfs: pass the write pointer to xfs_init_zone Move the two methods to query the write pointer out of xfs_init_zone into the callers, so that xfs_init_zone doesn't have to bother with the blk_zone structure and instead operates purely at the XFS realtime group level. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:17 +01:00
Christoph Hellwig	fc633b5c5b	xfs: add a xfs_rtgroup_raw_size helper Add a helper to figure the on-disk size of a group, accounting for the XFS_SB_FEAT_INCOMPAT_ZONE_GAPS feature if needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:17 +01:00
Damien Le Moal	41263267ef	xfs: add missing forward declaration in xfs_zones.h Add the missing forward declaration for struct blk_zone in xfs_zones.h. This avoids headaches with the order of header file inclusion to avoid compilation errors. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	3a65ea768b	xfs: remove xfs_attr_leaf_hasname The calling convention of xfs_attr_leaf_hasname() is problematic, because it returns a NULL buffer when xfs_attr3_leaf_read fails, a valid buffer when xfs_attr3_leaf_lookup_int returns -ENOATTR or -EEXIST, and a non-NULL buffer pointer for an already released buffer when xfs_attr3_leaf_lookup_int fails with other error values. Fix this by simply open coding xfs_attr_leaf_hasname in the callers, so that the buffer release code is done by each caller of xfs_attr3_leaf_read. Cc: stable@vger.kernel.org # v5.19+ Fixes: `07120f1abd` ("xfs: Add xfs_has_attr and subroutines") Reported-by: Mark Tinguely <mark.tinguely@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Darrick J. Wong	f39854a3fb	xfs: mark data structures corrupt on EIO and ENODATA I learned a few things this year: first, blk_status_to_errno can return ENODATA for critical media errors; and second, the scrub code doesn't mark data structures as corrupt on ENODATA or EIO. Currently, scrub failing to capture these errors isn't all that impactful -- the checking code will exit to userspace with EIO/ENODATA, and xfs_scrub will log a complaint and exit with nonzero status. Most people treat fsck tools failing as a sign that the fs is corrupt, but online fsck should mark the metadata bad and keep moving. Cc: stable@vger.kernel.org # v4.15 Fixes: `4700d22980` ("xfs: create helpers to record and deal with scrub problems") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	102f444b57	xfs: rework zone GC buffer management The double buffering where just one scratch area is used at a time does not efficiently use the available memory. It was originally implemented when GC I/O could happen out of order, but that was removed before upstream submission to avoid fragmentation. Now that all GC I/Os are processed in order, just use a number of buffers as a simple ring buffer. For a synthetic benchmark that fills 256MiB HDD zones and punches out holes to free half the space this leads to a decrease of GC time by a little more than 25%. Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and benchmarking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	0506d32f7c	xfs: use bio_reuse in the zone GC code Replace our somewhat fragile code to reuse the bio, which caused a regression in the past with the block layer bio_reuse helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	cf9b52fa7d	xfs: directly include xfs_platform.h The xfs.h header conflicts with the public xfs.h in xfsprogs, leading to a spurious difference in all shared libxfs files that have to include libxfs_priv.h in userspace. Directly include xfs_platform.h so that we can add a header of the same name to xfsprogs and remove this major annoyance for the shared code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	19a46f1246	xfs: move the remaining content from xfs.h to xfs_platform.h Move the global defines from xfs.h to xfs_platform.h to prepare for removing xfs.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	501a5161d2	xfs: include global headers first in xfs_platform.h Ensure we have all kernel headers included by the time we do our own thing, just like the rest of the tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	971ffb6341	xfs: rename xfs_linux.h to xfs_platform.h Rename xfs_linux.h to prepare for including including it directly from source files including those shared with xfsprogs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	a10b44cf10	xfs: factor out a xlog_write_space_advance helper Add a new xlog_write_space_advance that returns the current place in the iclog that data is written to, and advances the various counters by the amount taken from xlog_write_iovec, and also use it xlog_write_partial, which open codes the counter adjustments, but misses the asserts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	e2663443da	xfs: improve the iclog space assert in xlog_write_iovec We need enough space for the length we copy into the iclog, not just some space, so tighten up the check a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	865970d49a	xfs: add a xlog_write_space_left helper Various places check how much space is left in the current iclog, add a helper for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	a3eb1f9cf8	xfs: improve the calling convention for the xlog_write helpers The xlog_write chain passes around the same seven variables that are often passed by reference. Add a xlog_write_data structure to contain them to improve code generation and readability. This change increases the generated code size by about 140 bytes for my x86_64 build, which is hopefully worth the much easier to follow code: $ size fs/xfs/xfs_log.o* text data bss dec hex filename 29300 1730 176 31206 79e6 fs/xfs/xfs_log.o 29160 1730 176 31066 795a fs/xfs/xfs_log.o.old Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	a82d7aac75	xfs: regularize iclog space accounting in xlog_write_partial When xlog_write_partial splits a log region over multiple iclogs, it has to include the continuation ophder in the length requested for the new iclog. Currently is simply adds that to the request, which makes the accounting of the used space below look slightly different from the other users of iclog space that decrement it. To prepare for more code sharing, add the ophdr size to the len variable that tracks the number of bytes still are left in this xlog_write operation before the calling xlog_write_get_more_iclog_space, and then decrement it later when consuming that space. This changes the value of len when xlog_write_get_more_iclog_space returns an error, but as nothing looks at len in that case the difference doesn't matter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	2499d91180	xfs: move struct xfs_log_vec to xfs_log_priv.h The log_vec is a private type for the log/CIL code and should not be exposed to anything else. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	0274105914	xfs: move struct xfs_log_iovec to xfs_log_priv.h This structure is now only used by the core logging and CIL code. Also remove the unused typedef. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	8e76253443	xfs: improve the ->iop_format interface Export a higher level interface to format log items. The xlog_format_buf structure is hidden inside xfs_log_cil.c and only accessed using two helpers (and a wrapper build on top), hiding details of log iovecs from the log items. This also allows simply using an index into lv_iovecp instead of keeping a cursor vec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	c53fbeedbe	xfs: set lv_bytes in xlog_write_one_vec lv_bytes is mostly just use by the CIL code, but has crept into the low-level log writing code to decide on a full or partial iclog write. Ensure it is valid even for the special log writes that don't go through the CIL by initializing it in xlog_write_one_vec. Note that even without this fix, the checkpoint commits would never trigger a partial iclog write, as they have no payload beyond the opheader. The unmount record on the other hand could in theory trigger a an overflow of the iclog, but given that is has never been seen in the wild this has probably been masked by the small size of it and the fact that the unmount process does multiple log forces before writing the unmount record and we thus usually operate on an empty or almost empty iclog. Fixes: `110dc24ad2` ("xfs: log vector rounding leaks log space") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Christoph Hellwig	2d4521e4c0	xfs: add a xlog_write_one_vec helper Add a wrapper for xlog_write for the two callers who need to build a log_vec and add it to a single-entry chain instead of duplicating the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-21 12:57:16 +01:00
Linus Torvalds	f8907398a6	Merge tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: - Fix an inconsistency in structure size on 32-bit platforms caused by padding differences for the new EXT4_IOC_[GS]ET_TUNE_SB_PARAM ioctls - Fix a buffer leak on the error path when dropping the refcount an xattr value stored in an inode - Fix missing locking on the error path for the file defragmentation ioctl leading to a BUG * tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref ext4: add missing down_write_data_sem in mext_move_extent(). ext4: fix ext4_tune_sb_params padding	2026-01-18 14:01:20 -08:00
Yang Erkun	d250bdf531	ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref The error branch for ext4_xattr_inode_update_ref forget to release the refcount for iloc.bh. Find this when review code. Fixes: `57295e8354` ("ext4: guard against EA inode refcount underflow in xattr update") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20251213055706.3417529-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-01-18 11:23:10 -05:00
Julian Sun	0ef7ef4227	ext4: add missing down_write_data_sem in mext_move_extent(). Commit `962e8a01ea` ("ext4: introduce mext_move_extent()") attempts to call ext4_swap_extents() on the failure path to recover the swapped extents, but fails to acquire locks for the two inode->i_data_sem, triggering the BUG_ON statement in ext4_swap_extents(). This issue can be fixed by calling ext4_double_down_write_data_sem() before ext4_swap_extents(). Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reported-by: syzbot+4ea6bd8737669b423aae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69368649.a70a0220.38f243.0093.GAE@google.com/ Fixes: `962e8a01ea` ("ext4: introduce mext_move_extent()") Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20251208123713.1971068-1-sunjunchao@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-01-18 11:23:10 -05:00
Linus Torvalds	e84d960149	Merge tag 'for-6.19-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - with large folios in use, fix partial incorrect update of a reflinked range - fix potential deadlock in iget when lookup fails and eviction is needed - in send, validate inline extent type while detecting file holes - fix memory leak after an error when creating a space info - remove zone statistics from sysfs again, the output size limitations make it unusable, we'll do it in another way in another release - test fixes: - return proper error codes from block remapping tests - fix tree root leaks in qgroup tests after errors * tag 'for-6.19-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: remove zoned statistics from sysfs btrfs: fix memory leaks in create_space_info() error paths btrfs: invalidate pages instead of truncate after reflinking btrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTAL btrfs: send: check for inline extents in range_is_hole_in_parent() btrfs: tests: fix return 0 on rmap test failure btrfs: tests: fix root tree leak in btrfs_test_qgroups() btrfs: release path before iget_failed() in btrfs_read_locked_inode()	2026-01-17 19:29:32 -08:00
Linus Torvalds	353c6f43ab	Merge tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: "Just a few obvious fixes and some 'cosmetic' changes" * tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: set max_agbno to allow sparse alloc of last full inode chunk xfs: Fix xfs_grow_last_rtg() xfs: improve the assert at the top of xfs_log_cover xfs: fix an overly long line in xfs_rtgroup_calc_geometry xfs: mark __xfs_rtgroup_extents static xfs: Fix the return value of xfs_rtcopy_summary() xfs: fix memory leak in xfs_growfs_check_rtgeom()	2026-01-16 09:09:41 -08:00
Linus Torvalds	603c05a163	Merge tag 'nfs-for-6.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: - Fix another deadlock involving nfs_release_folio() - localio: - Stop I/O upon hitting a fatal error - Deal with page offsets that are > PAGE_SIZE - Fix size read races in truncate, fallocate and copy offload - Several bugfixes for the NFSv4.x directory delegation client code - pNFS: - Fix a deadlock when returning delegations during open - Fix memory leaks in various error paths * tag 'nfs-for-6.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix size read races in truncate, fallocate and copy offload NFS: Don't immediately return directory delegations when disabled NFS/localio: Deal with page bases that are > PAGE_SIZE NFS/localio: Stop further I/O upon hitting an error NFSv4.x: Directory delegations don't require any state recovery NFSv4: Don't free slots prematurely if requesting a directory delegation NFSv4: Fix nfs_clear_verifier_delegated() for delegated directories NFS: Fix directory delegation verifier checks pnfs/blocklayout: Fix memory leak in bl_parse_scsi() pnfs/flexfiles: Fix memory leak in nfs4_ff_alloc_deviceid_node() NFS: Fix a deadlock involving nfs_release_folio() pNFS: Fix a deadlock when returning a delegation during open()	2026-01-15 11:59:49 -08:00
Trond Myklebust	d5811e6297	NFS: Fix size read races in truncate, fallocate and copy offload If the pre-operation file size is read before locking the inode and quiescing O_DIRECT writes, then nfs_truncate_last_folio() might end up overwriting valid file data. Fixes: `b1817b18ff` ("NFS: Protect against 'eof page pollution'") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-01-15 14:38:25 -05:00
Johannes Thumshirn	437cc6057e	btrfs: remove zoned statistics from sysfs Remove the newly introduced zoned statistics from sysfs, as sysfs can only show a single page this will truncate the output on a busy filesystem. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-14 22:08:04 +01:00
Linus Torvalds	b54345928f	Merge tag 'gfs2-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 revert from Andreas Gruenbacher: "Revert bad commit "gfs2: Fix use of bio_chain" I was originally assuming that there must be a bug in gfs2 because gfs2 chains bios in the opposite direction of what bio_chain_and_submit() expects. It turns out that the bio chains are set up in "reverse direction" intentionally so that the first bio's bi_end_io callback is invoked rather than the last bio's callback. We want the first bio's callback invoked for the following reason: The initial bio starts page aligned and covers one or more pages. When it terminates at a non-page-aligned offset, subsequent bios are added to handle the remaining portion of the final page. Upon completion of the bio chain, all affected pages need to be be marked as read, and only the first bio references all of these pages" * tag 'gfs2-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Fix use of bio_chain"	2026-01-13 10:04:34 -08:00
Brian Foster	c360004c01	xfs: set max_agbno to allow sparse alloc of last full inode chunk Sparse inode cluster allocation sets min/max agbno values to avoid allocating an inode cluster that might map to an invalid inode chunk. For example, we can't have an inode record mapped to agbno 0 or that extends past the end of a runt AG of misaligned size. The initial calculation of max_agbno is unnecessarily conservative, however. This has triggered a corner case allocation failure where a small runt AG (i.e. 2063 blocks) is mostly full save for an extent to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this case, which happens to be the offset of the last possible valid inode chunk in the AG. In practice, we should be able to allocate the 4-block cluster at agbno 2052 to map to the parent inode record at agbno 2048, but the max_agbno value precludes it. Note that this can result in filesystem shutdown via dirty trans cancel on stable kernels prior to commit `9eb775968b` ("xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because the tail AG selection by the allocator sets t_highest_agno on the transaction. If the inode allocator spins around and finds an inode chunk with free inodes in an earlier AG, the subsequent dir name creation path may still fail to allocate due to the AG restriction and cancel. To avoid this problem, update the max_agbno calculation to the agbno prior to the last chunk aligned agbno in the AG. This is not necessarily the last valid allocation target for a sparse chunk, but since inode chunks (i.e. records) are chunk aligned and sparse allocs are cluster sized/aligned, this allows the sb_spino_align alignment restriction to take over and round down the max effective agbno to within the last valid inode chunk in the AG. Note that even though the allocator improvements in the aforementioned commit seem to avoid this particular dirty trans cancel situation, the max_agbno logic improvement still applies as we should be able to allocate from an AG that has been appropriately selected. The more important target for this patch however are older/stable kernels prior to this allocator rework/improvement. Cc: stable@vger.kernel.org # v4.2 Fixes: `56d1115c9b` ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-13 10:45:55 +01:00
Nirjhar Roy (IBM)	a65fd81207	xfs: Fix xfs_grow_last_rtg() The last rtg should be able to grow when the size of the last is less than (and not equal to) sb_rgextents. xfs_growfs with realtime groups fails without this patch. The reason is that, xfs_growfs_rtg() tries to grow the last rt group even when the last rt group is at its maximal size i.e, sb_rgextents. It fails with the following messages: XFS (loop0): Internal error block >= mp->m_rsumblocks at line 253 of file fs/xfs/libxfs/xfs_rtbitmap.c. Caller xfs_rtsummary_read_buf+0x20/0x80 XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 976 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_rt_bmblock+0x402/0x450 XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x10a/0x1f0 (fs/xfs/xfs_trans.c:977). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-13 10:40:45 +01:00
Christoph Hellwig	df7ec7226f	xfs: improve the assert at the top of xfs_log_cover Move each condition into a separate assert so that we can see which on triggered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-13 10:36:23 +01:00
Christoph Hellwig	baed03efe2	xfs: fix an overly long line in xfs_rtgroup_calc_geometry Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-13 10:34:29 +01:00
Christoph Hellwig	e0aea42a32	xfs: mark __xfs_rtgroup_extents static __xfs_rtgroup_extents is not used outside of xfs_rtgroup.c, so mark it static. Move it and xfs_rtgroup_extents up in the file to avoid forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-13 10:34:29 +01:00
Nirjhar Roy (IBM)	6b2d155366	xfs: Fix the return value of xfs_rtcopy_summary() xfs_rtcopy_summary() should return the appropriate error code instead of always returning 0. The caller of this function which is xfs_growfs_rt_bmblock() is already handling the error. Fixes: `e94b53ff69` ("xfs: cache last bitmap block in realtime allocator") Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # v6.7 Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-01-13 10:32:12 +01:00
Anna Schumaker	803e18641f	NFS: Don't immediately return directory delegations when disabled The function nfs_inode_evict_delegation() immediately and synchronously returns a delegation when called. This means we can't call it from nfs4_have_delegation(), since that function could be called under a lock. Instead we should mark the delegation for return and let the state manager handle it for us. Fixes: `b6d2a520f4` ("NFS: Add a module option to disable directory delegations") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-01-12 11:50:22 -05:00
Jiasheng Jiang	a11224a016	btrfs: fix memory leaks in create_space_info() error paths In create_space_info(), the 'space_info' object is allocated at the beginning of the function. However, there are two error paths where the function returns an error code without freeing the allocated memory: 1. When create_space_info_sub_group() fails in zoned mode. 2. When btrfs_sysfs_add_space_info_type() fails. In both cases, 'space_info' has not yet been added to the fs_info->space_info list, resulting in a memory leak. Fix this by adding an error handling label to kfree(space_info) before returning. Fixes: `2be12ef79f` ("btrfs: Separate space_info create/update") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-01-12 16:21:55 +01:00

1 2 3 4 5 ...

103189 Commits