linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 09:02:21 -04:00

Author	SHA1	Message	Date
Brian Foster	c35a3e273e	xfs: flush eof folio before insert range size update The flush in xfs_buffered_write_iomap_begin() for zero range over a data fork hole fronted by COW fork prealloc is primarily designed to provide correct zeroing behavior in particular pagecache conditions. As it turns out, this also partially masks some odd behavior in insert range (via zero range via setattr). Insert range bumps i_size the length of the new range, flushes, unmaps pagecache and cancels COW prealloc, and then right shifts extents from the end of the file back to the target offset of the insert. Since the i_size update occurs before the pagecache flush, this creates a transient situation where writeback around EOF can behave differently. This appears to be corner case situation, but if happens to be fronted by COW fork speculative preallocation and a large, dirty folio that contains at least one full COW block beyond EOF, the writeback after i_size is bumped may remap that COW fork block into the data fork within EOF. The block is zeroed and then shifted back out to post-eof, but this is unexpected in that it leads to a written post-eof data fork block. This can cause a zero range warning on a subsequent size extension, because we should never find blocks that require physical zeroing beyond i_size. To avoid this quirk, flush the EOF folio before the i_size update during insert range. The entire range will be flushed, unmapped and invalidated anyways, so this should be relatively unnoticeable. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-23 11:07:59 +01:00
Brian Foster	a35bb0dec9	iomap, xfs: lift zero range hole mapping flush into xfs iomap zero range has a wart in that it also flushes dirty pagecache over hole mappings (rather than only unwritten mappings). This was included to accommodate a quirk in XFS where COW fork preallocation can exist over a hole in the data fork, and the associated range is reported as a hole. This is because the range actually is a hole, but XFS also has an optimization where if COW fork blocks exist for a range being written to, those blocks are used regardless of whether the data fork blocks are shared or not. For zeroing, COW fork blocks over a data fork hole are only relevant if the range is dirty in pagecache, otherwise the range is already considered zeroed. The easiest way to deal with this corner case is to flush the pagecache to trigger COW remapping into the data fork, and then operate on the updated on-disk state. The problem is that ext4 cannot accommodate a flush from this context due to being a transaction deadlock vector. Outside of the hole quirk, ext4 can avoid the flush for zero range by using the recently introduced folio batch lookup mechanism for unwritten mappings. Therefore, take the next logical step and lift the hole handling logic into the XFS iomap_begin handler. iomap will still flush on unwritten mappings without a folio batch, and XFS will flush and retry mapping lookups in the case where it would otherwise report a hole with dirty pagecache during a zero range. Note that this is intended to be a fairly straightforward lift and otherwise not change behavior. Now that the flush exists within XFS, follow on patches can further optimize it. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-23 11:07:59 +01:00
Brian Foster	2f46c239fc	xfs: flush dirty pagecache over hole in zoned mode zero range For zoned filesystems a window exists between the first write to a sparse range (i.e. data fork hole) and writeback completion where we might spuriously observe holes in both the COW and data forks. This occurs because a buffered write populates the COW fork with delalloc, writeback submission removes the COW fork delalloc blocks and unlocks the inode, and then writeback completion remaps the physically allocated blocks into the data fork. If a zero range operation does a lookup during this window where both forks show a hole, it incorrectly reports a hole mapping for a range that contains data. This currently works because iomap checks for dirty pagecache over holes and unwritten mappings. If found, it flushes and retries the lookup. We plan to remove the hole flush logic from iomap, however, so lift the flush into xfs_zoned_buffered_write_iomap_begin() to preserve behavior and document the purpose for it. Zoned XFS filesystems don't support unwritten extents, so if zoned mode can come up with a way to close this transient hole window in the future, this flush can likely be removed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-23 11:07:59 +01:00
Brian Foster	92e9dff9ca	xfs: fix iomap hole map reporting for zoned zero range The hole mapping logic for zero range in zoned mode is not quite correct. It currently reports a hole whenever one exists in the data fork. If the first write to a sparse range has completed and not yet written back, the blocks exist in the COW fork as delalloc until writeback completes, at which point they are allocated and mapped into the data fork. If a zero range occurs on a range that has not yet populated the data fork, we will incorrectly report it as a hole. Note that this currently functions correctly because we are bailed out by the pagecache flush in iomap_zero_range(). If a hole or unwritten mapping is reported with dirty pagecache, it assumes there is pending data, flushes to induce any pending block allocations/remaps, and retries the lookup. We want to remove this hack from iomap, however, so update iomap_begin() to only report a hole for zeroing when one exists in both forks. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-23 11:07:59 +01:00
Lukas Herbolt	8da6fd0884	xfs: Use xarray to track SB UUIDs instead of plain array. Removing the plain array to track the UUIDs and switch to xarray makes it more readable. Signed-off-by: Lukas Herbolt <lukas@herbolt.com> [cem: remove unneeded return from xfs_uuid_unmount] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-23 10:40:18 +01:00
Damien Le Moal	c1f9554374	xfs: avoid unnecessary calculations in xfs_zoned_need_gc() If zonegc_low_space is set to zero (which is the default), the second condition in xfs_zoned_need_gc() that triggers GC never evaluates to true because the calculated threshold will always be 0. So there is no need to calculate the threshold and to evaluate that condition. Return early when zonegc_low_space is zero. While at it, add comments to document the intent of each of the 3 tests used to determine the return value to control the execution of garbage collection. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-18 10:08:07 +01:00
Damien Le Moal	68aa101bf2	xfs: display more zone related information in mountstats Modify xfs_zoned_show_stats() to add to the information displayed with /proc/self/mountstats the total number of zones (RT groups) and the number of open zones together with the maximum number of open zones. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-18 10:08:07 +01:00
Damien Le Moal	6a82a691b0	xfs: fix a comment typo in xfs_select_zone_nowait() Fix a typo in the comment describing the second call to xfs_select_open_zone_lru() in xfs_select_zone_nowait(). Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-18 10:08:07 +01:00
Damien Le Moal	770323d418	xfs: avoid unnecessary open zone check in xfs_select_zone_nowait() When xfs_select_zone_nowait() is called with pack_tight equal to true, the function xfs_select_open_zone_mru() is called if no open zone is returned by xfs_select_open_zone_lru(), that is, when oz is NULL. The open zone pointer return of xfs_select_zone_nowait() is then checked, but this check is outside of the "if (pack_tight)" that trigered the call to xfs_select_open_zone_mru(). In other word, this check is unnecessarily done even when pack_tight is false. Move the check for the return value of the call to xfs_select_open_zone_mru() inside the if that controls the call to this function, so that we do not uselessly test again the value of oz when pack_tight is false. No functional changes. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-18 10:08:07 +01:00
Carlos Maiolino	01478f356f	xfs: opencode xfs_zone_record_blocks We only have a single caller, no need to keep it in its own function. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> [hch: add zone_record_blocks trace back] Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-18 09:33:52 +01:00
Carlos Maiolino	3bdc20b005	xfs: factor out xfs_zone_inc_written Move the written blocks increment and full zone check into a new helper. Also add an assert to ensure rmap lock is held here. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-12 08:49:36 +01:00
Carlos Maiolino	02a5d8993b	xfs: factor out xfs_dio_write_zoned_end_io Stop sharing direct IO end_io between regular and zoned devices by factoring out zoned dio end_io to its own function. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-12 08:49:36 +01:00
Carlos Maiolino	db8367f63b	xfs: factor out isize updates from xfs_dio_write_end_io This is the only code needed for zoned inodes, so factor it out so we can move zoned inodes ioend to its own callback. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-12 08:49:36 +01:00
Wilfred Mallawa	c6ce65cb17	xfs: add write pointer to xfs_rtgroup_geometry There is currently no XFS ioctl that allows userspace to retrieve the write pointer for a specific realtime group block for zoned XFS. On zoned block devices, userspace can obtain this information via zone reports from the underlying device. However, for zoned XFS operating on regular block devices, no equivalent mechanism exists. Access to the realtime group write pointer is useful to userspace development and analysis tools such as Zonar [1]. So extend the existing struct xfs_rtgroup_geometry to add a new rg_writepointer field. This field is valid if XFS_RTGROUP_GEOM_WRITEPOINTER flag is set. The rg_writepointer field specifies the location of the current writepointer as a block offset into the respective rtgroup. [1] https://lwn.net/Articles/1059364/ Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-04 09:58:24 +01:00
Wilfred Mallawa	650b774cf9	xfs: add static size checks for ioctl UABI The ioctl structures in libxfs/xfs_fs.h are missing static size checks. It is useful to have static size checks for these structures as adding new fields to them could cause issues (e.g. extra padding that may be inserted by the compiler). So add these checks to xfs/xfs_ondisk.h. Due to different padding/alignment requirements across different architectures, to avoid build failures, some structures are ommited from the size checks. For example, structures with "compat_" definitions in xfs/xfs_ioctl32.h are ommited. Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:50 +01:00
Wilfred Mallawa	e97cbf863d	xfs: remove duplicate static size checks In libxfs/xfs_ondisk.h, remove some duplicate entries of XFS_CHECK_STRUCT_SIZE(). Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)	9a654a8fa3	xfs: Add comments for usages of some macros. Add comments explaining when to use XFS_IS_CORRUPT() and ASSERT() Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)	c2368fc89a	xfs: Update lazy counters in xfs_growfs_rt_bmblock() Update lazy counters in xfs_growfs_rt_bmblock() similar to the way it is done xfs_growfs_data_private(). This is because the lazy counters are not always updated and synching the counters will avoid inconsistencies between frexents and rtextents(total realtime extent count). This will be more useful once realtime shrink is implemented as this will prevent some transient state to occur where frexents might be greater than total rtextents. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)	ac1d977096	xfs: Add a comment in xfs_log_sb() Add a comment explaining why the sb_frextents are updated outside the if (xfs_has_lazycount(mp) check even though it is a lazycounter. RT groups are supported only in v5 filesystems which always have lazycounter enabled - so putting it inside the if(xfs_has_lazycount(mp) check is redundant. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)	8baa9bccc0	xfs: Fix xfs_last_rt_bmblock() Bug description: If the size of the last rtgroup i.e, the rtg passed to xfs_last_rt_bmblock() is such that the last rtextent falls in 0th word offset of a bmblock of the bitmap file tracking this (last) rtgroup, then in that case xfs_last_rt_bmblock() incorrectly returns the next bmblock number instead of the current/last used bmblock number. When xfs_last_rt_bmblock() incorrectly returns the next bmblock, the loop to grow/modify the bmblocks in xfs_growfs_rtg() doesn't execute and xfs_growfs basically does a nop in certain cases. xfs_growfs will do a nop when the new size of the fs will have the same number of rtgroups i.e, we are only growing the last rtgroup. Reproduce: $ mkfs.xfs -m metadir=0 -r rtdev=/dev/loop1 /dev/loop0 \ -r size=32769b -f $ mount -o rtdev=/dev/loop1 /dev/loop0 /mnt/scratch $ xfs_growfs -R $(( 32769 + 1 )) /mnt/scratch $ xfs_info /mnt/scratch \| grep rtextents $ # We can see that rtextents hasn't changed Fix: Fix this by returning the current/last used bmblock when the last rtgroup size is not a multiple xfs_rtbitmap_rtx_per_rbmblock() and the next bmblock when the rtgroup size is a multiple of xfs_rtbitmap_rtx_per_rbmblock() i.e, the existing blocks are completely used up. Also, I have renamed xfs_last_rt_bmblock() to xfs_last_rt_bmblock_to_extend() to signify that this function returns the bmblock number to extend and NOT always the last used bmblock number. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Darrick J. Wong	115ea07b94	xfs: don't report half-built inodes to fserror Sam Sun apparently found a syzbot way to fuzz a filesystem such that xfs_iget_cache_miss would free the inode before the fserror code could catch up. Frustratingly he doesn't use the syzbot dashboard so there's no C reproducer and not even a full error report, so I'm guessing that: Inodes that are being constructed or torn down inside XFS are not visible to the VFS. They should never be reported to fserror. Also, any inode that has been freshly allocated in _cache_miss should be marked INEW immediately because, well, it's an incompletely constructed inode that isn't yet visible to the VFS. Reported-by: Sam Sun <samsun1006219@gmail.com> Fixes: `5eb4cb18e4` ("xfs: convey metadata health events to the health monitor") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Darrick J. Wong	75690e5fdd	xfs: don't report metadata inodes to fserror Internal metadata inodes are not exposed to userspace programs, so it makes no sense to pass them to the fserror functions (aka fsnotify). Instead, report metadata file problems as general filesystem corruption. Fixes: `5eb4cb18e4` ("xfs: convey metadata health events to the health monitor") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Darrick J. Wong	94014a23e9	xfs: fix potential pointer access race in xfs_healthmon_get Pankaj Raghav asks about this code in xfs_healthmon_get: hm = mp->m_healthmon; if (hm && !refcount_inc_not_zero(&hm->ref)) hm = NULL; rcu_read_unlock(); return hm; (slightly edited to compress a mailing list thread) "Nit: Should we do a READ_ONCE(mp->m_healthmon) here to avoid any compiler tricks that can result in an undefined behaviour? I am not sure if I am being paranoid here. "So this is my understanding: RCU guarantees that we get a valid object (actual data of m_healthmon) but does not guarantee the compiler will not reread the pointer between checking if hm is !NULL and accessing the pointer as we are doing it lockless. "So just a barrier() call in rcu_read_lock is enough to make sure this doesn't happen and probably adding a READ_ONCE() is not needed?" After some initial confusion I concluded that he's correct. The compiler could very well eliminate the hm variable in favor of walking the pointers twice, turning the code into: if (mp->m_healthmon && !refcount_inc_not_zero(&mp->m_healthmon->ref)) If this happens, then xfs_healthmon_detach can sneak in between the two sides of the && expression and set mp->m_healthmon to NULL, and thereby cause a null pointer dereference crash. Fix this by using the rcu pointer assignment and dereference functions, which ensure that the proper reordering barriers are in place. Practically speaking, gcc seems to allocate an actual variable for hm and only reads mp->m_healthmon once (as intended), but we ought to be more explicit about requiring this. Reported-by: Pankaj Raghav <pankaj.raghav@linux.dev> Fixes: `a48373e7d3` ("xfs: start creating infrastructure for health monitoring") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Darrick J. Wong	eb8550fb75	xfs: fix xfs_group release bug in xfs_dax_notify_dev_failure Chris Mason reports that his AI tools noticed that we were using xfs_perag_put and xfs_group_put to release the group reference returned by xfs_group_next_range. However, the iterator function returns an object with an active refcount, which means that we must use the correct function to release the active refcount, which is _rele. Cc: <stable@vger.kernel.org> # v6.0 Fixes: `6f643c57d5` ("xfs: implement ->notify_failure() for XFS") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:49 +01:00
Darrick J. Wong	161456987a	xfs: fix xfs_group release bug in xfs_verify_report_losses Chris Mason reports that his AI tools noticed that we were using xfs_perag_put and xfs_group_put to release the group reference returned by xfs_group_next_range. However, the iterator function returns an object with an active refcount, which means that we must use the correct function to release the active refcount, which is _rele. Fixes: `b8accfd65d` ("xfs: add media verification ioctl") Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-xfs/20260206030527.2506821-1-clm@meta.com/ Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Darrick J. Wong	e764dd439d	xfs: fix copy-paste error in previous fix Chris Mason noticed that there is a copy-paste error in a recent change to xrep_dir_teardown that nulls out pointers after freeing the resources. Fixes: `ba408d299a` ("xfs: only call xf{array,blob}_destroy if we have a valid pointer") Link: https://lore.kernel.org/linux-xfs/20260205194211.2307232-1-clm@meta.com/ Reported-by: Chris Mason <clm@meta.com> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Ethan Tidmore	cddfa648f1	xfs: Fix error pointer dereference The function try_lookup_noperm() can return an error pointer and is not checked for one. Add checks for error pointer in xrep_adoption_check_dcache() and xrep_adoption_zap_dcache(). Detected by Smatch: fs/xfs/scrub/orphanage.c:449 xrep_adoption_check_dcache() error: 'd_child' dereferencing possible ERR_PTR() fs/xfs/scrub/orphanage.c:485 xrep_adoption_zap_dcache() error: 'd_child' dereferencing possible ERR_PTR() Fixes: `73597e3e42` ("xfs: ensure dentry consistency when the orphanage adopts a file") Cc: stable@vger.kernel.org # v6.16 Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Christoph Hellwig	47553dd60b	xfs: remove metafile inodes from the active inode stat The active inode (or active vnode until recently) stat can get much larger than expected on file systems with a lot of metafile inodes like zoned file systems on SMR hard disks with 10.000s of rtg rmap inodes. Remove all metafile inodes from the active counter to make it more useful to track actual workloads and add a separate counter for active metafile inodes. This fixes xfs/177 on SMR hard drives. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Christoph Hellwig	03a6d6c4c8	xfs: cleanup inode counter stats Most of them are unused, so mark them as such. Give the remaining ones names that match their use instead of the historic IRIX ones based on vnodes. Note that the names are purely internal to the XFS code, the user interface is based on section names and arrays of counters. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Wilfred Mallawa	fd81d3fd01	xfs: fix code alignment issues in xfs_ondisk.c Fixup some code alignment issues in xfs_ondisk.c Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Nirjhar Roy (IBM)	4ad85e633b	xfs: Replace &rtg->rtg_group with rtg_group() Use the already existing rtg_group() wrapper instead of directly accessing the struct xfs_group member in struct xfs_rtgroup. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> [cem: Conflict resolution against `06873dbd94`] Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Nirjhar Roy (IBM)	a49b7ff63f	xfs: Refactoring the nagcount and delta calculation Introduce xfs_growfs_compute_delta() to calculate the nagcount and delta blocks and refactor the code from xfs_growfs_data_private(). No functional changes. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Nirjhar Roy (IBM)	18c16f602a	xfs: Replace ASSERT with XFS_IS_CORRUPT in xfs_rtcopy_summary() Replace ASSERT(sum > 0) with an XFS_IS_CORRUPT() and place it just after the call to xfs_rtget_summary() so that we don't end up using an illegal value of sum. Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-02-25 13:58:48 +01:00
Linus Torvalds	6de23f81a5	Linux 7.0-rc1 v7.0-rc1	2026-02-22 13:18:59 -08:00
Linus Torvalds	fbf3380361	Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux Pull fsverity fixes from Eric Biggers: - Fix a build error on parisc - Remove the non-large-folio-aware function fsverity_verify_page() * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: fix build error by adding fsverity_readahead() stub fsverity: remove fsverity_verify_page() f2fs: make f2fs_verify_cluster() partially large-folio-aware f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()	2026-02-22 13:12:04 -08:00
Linus Torvalds	75e1f66a9e	Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library fix from Eric Biggers: "Fix a big endian specific issue in the PPC64-optimized AES code" * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: powerpc/aes: Fix rndkey_from_vsx() on big endian CPUs	2026-02-22 13:09:33 -08:00
Mark Brown	aaf96df959	CREDITS: Add -next to Stephen Rothwell's entry Stephen retired and stepped back from -next maintainership, update his entry in CREDITS to recognise his 18 years of hard work making it what it is today and all the impact it's had on our development process. Also update to his current GnuPG key while we're here. Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 12:11:33 -08:00
Arnd Bergmann	746b9ef5d5	x509: select CONFIG_CRYPTO_LIB_SHA256 The x509 public key code gained a dependency on the sha256 hash implementation, causing a rare link time failure in randconfig builds: arm-linux-gnueabi-ld: crypto/asymmetric_keys/x509_public_key.o: in function `x509_get_sig_params': x509_public_key.c:(.text.x509_get_sig_params+0x12): undefined reference to `sha256' arm-linux-gnueabi-ld: (sha256): Unknown destination type (ARM/Thumb) in crypto/asymmetric_keys/x509_public_key.o x509_public_key.c:(.text.x509_get_sig_params+0x12): dangerous relocation: unsupported relocation Select the necessary library code from Kconfig. Fixes: `2c62068ac8` ("x509: Separately calculate sha256 for blacklist") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 12:09:23 -08:00
Haiyue Wang	fd1d6b9d13	xz: fix arm fdt compile error for kmalloc replacement Align to the commit `bf4afc53b7` ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") update the 'kmalloc_obj' declaration for userspace to fix below compile error: In file included from arch/arm/boot/compressed/../../../../lib/decompress_unxz.c:241, from arch/arm/boot/compressed/decompress.c:56: arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c: In function 'xz_dec_init': arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c:787:28: error: implicit declaration of function 'kmalloc_obj'; did you mean 'kmalloc'? [-Wimplicit-function-declaration] 787 \| struct xz_dec s = kmalloc_obj(s); \| ^~~~~~~~~~~ \| kmalloc Signed-off-by: Haiyue Wang <haiyuewa@163.com> Fixes: `69050f8d6d` ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Fixes: `bf4afc53b7` ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 12:05:31 -08:00
Linus Torvalds	5f2eac7767	Merge tag 'rtc-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: - loongson: Loongson-2K0300 support - s35390a: nvmem support - zynqmp: rework calibration * tag 'rtc-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: ds1390: fix number of bytes read from RTC rtc: class: Remove duplicate check for alarm rtc: optee: simplify OP-TEE context match rtc: interface: Alarm race handling should not discard preceding error rtc: s35390a: implement nvmem support rtc: loongson: Add Loongson-2K0300 support dt-bindings: rtc: loongson: Document Loongson-2K0300 compatible dt-bindings: rtc: loongson: Correct Loongson-1C interrupts property dt-bindings: rtc: renesas,rz-rtca3: Add RZ/V2N support dt-bindings: rtc: cpcap: convert to schema rtc: zynqmp: use dynamic max and min offset ranges rtc: zynqmp: rework set_offset rtc: zynqmp: rework read_offset rtc: zynqmp: check calibration max value rtc: zynqmp: correct frequency value rtc: amlogic-a4: Remove IRQF_ONESHOT rtc: pcf8563: use correct of_node for output clock rtc: max31335: use correct CONFIG symbol in IS_REACHABLE() rtc: nvvrs: Add ARCH_TEGRA to the NV VRS RTC driver	2026-02-22 09:43:11 -08:00
Linus Torvalds	1dd419145d	Merge tag 'rust-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Pass '-Zunstable-options' flag required by the future Rust 1.95.0 - Fix 'objtool' warning for Rust 1.84.0 'kernel' crate: - 'irq' module: add missing bound detected by the future Rust 1.95.0 - 'list' module: add missing 'unsafe' blocks and placeholder safety comments to macros (an issue for future callers within the crate) 'pin-init' crate: - Clean Clippy warning that changed behavior in the future Rust 1.95.0" * tag 'rust-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: list: Add unsafe blocks for container_of and safety comments rust: pin-init: replace clippy `expect` with `allow` rust: irq: add `'static` bounds to irq callbacks objtool/rust: add one more `noreturn` Rust function rust: kbuild: pass `-Zunstable-options` for Rust 1.95.0	2026-02-22 08:43:31 -08:00
Linus Torvalds	d2ba6e9c0a	Merge tag 'trace-rv-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verifier fix from Steven Rostedt: - Fix multiple definition of __pcpu_unique_da_mon_this After refactoring monitors, we used static per-cpu variables with the same names across different per-cpu monitors. This is explicitly disallowed for modules on some architectures (alpha) or if CONFIG_DEBUG_FORCE_WEAK_PER_CPU is enabled (e.g. Fedora's debug kernel). Make sure all those variables have different names to avoid compilation issues. * tag 'trace-rv-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Fix multiple definition of __pcpu_unique_da_mon_this	2026-02-22 08:40:13 -08:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	e19e1b480a	add default_gfp() helper macro and use it in the new alloc_obj() helpers Most simple allocations use GFP_KERNEL, and with the new allocation helpers being introduced, let's just take advantage of that to simplify that default case. It's a numbers game: git grep 'alloc_obj(' \| sed 's/.$GFP_[_A-Z]$./\1/' \| sort \| uniq -c \| sort -n \| tail shows that about 90% of all those new allocator instances just use that standard GFP_KERNEL. Those helpers are already macros, and we can easily just make it be the default case when the gfp argument is missing. And yes, we could do that for all the legacy interfaces too, but let's keep it to just the new ones at least for now, since those all got converted recently anyway, so this is not any "extra" noise outside of that limited conversion. And, in fact, I want to do this before doing the -rc1 release, exactly so that we don't get extra merge conflicts. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:50 -08:00
Linus Torvalds	fa5c82f4d2	slab.h: disable completely broken overflow handling in flex allocations Commit `69050f8d6d` ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") started using the new allocation helpers, and in the process showed that they were completely non-working. The overflow logic in overflows_flex_counter_type() is completely the wrong way around, and that broke __alloc_flex() completely. By chance, the resulting code was then such a mess that clang generated sufficiently garbage code that objtool warned about it all. Which made it somewhat quicker to narrow things down. While fixing overflows_flex_counter_type() would presumably fix this all, I'm excising the whole broken overflow logic from __alloc_flex(), because we don't want that kind of code in basic allocation functions anyway. That (no longer) broken overflows_flex_counter_type() thing needs to be inserted into the actual __set_flex_counter() logic in the unlikely case that we ever want this at all. And made conditional. Fixes: `81cee9166a` ("compiler_types: Introduce __flex_counter() and family") Fixes: `69050f8d6d` ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Cc: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/all/CAHk-=whEd020BYzGTzYrENjD9Z5_82xx6h8HsQvH5xDSnv0=Hw@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 15:12:09 -08:00
Linus Torvalds	8934827db5	Merge tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kmalloc_obj conversion from Kees Cook: "This does the tree-wide conversion to kmalloc_obj() and friends using coccinelle, with a subsequent small manual cleanup of whitespace alignment that coccinelle does not handle. This uncovered a clang bug in __builtin_counted_by_ref(), so the conversion is preceded by disabling that for current versions of clang. The imminent clang 22.1 release has the fix. I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv, s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc" * tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kmalloc_obj: Clean up after treewide replacements treewide: Replace kmalloc with kmalloc_obj for non-scalar types compiler_types: Disable __builtin_counted_by_ref for Clang	2026-02-21 11:02:58 -08:00
Linus Torvalds	c7decec2f2	Merge tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: - Introduce 'perf sched stats' tool with record/report/diff workflows using schedstat counters - Add a faster libdw based addr2line implementation and allow selecting it or its alternatives via 'perf config addr2line.style=' - Data-type profiling fixes and improvements including the ability to select fields using 'perf report''s -F/-fields, e.g.: 'perf report --fields overhead,type' - Add 'perf test' regression tests for Data-type profiling with C and Rust workloads - Fix srcline printing with inlines in callchains, make sure this has coverage in 'perf test' - Fix printing of leaf IP in LBR callchains - Fix display of metrics without sufficient permission in 'perf stat' - Print all machines in 'perf kvm report -vvv', not just the host - Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1 code - Fix 'perf report's histogram entry collapsing with '-F' option - Use system's cacheline size instead of a hardcoded value in 'perf report' - Allow filtering conversion by time range in 'perf data' - Cover conversion to CTF using 'perf data' in 'perf test' - Address newer glibc const-correctness (-Werror=discarded-qualifiers) issues - Fixes and improvements for ARM's CoreSight support, simplify ARM SPE event config in 'perf mem', update docs for 'perf c2c' including the ARM events it can be used with - Build support for generating metrics from arch specific python script, add extra AMD, Intel, ARM64 metrics using it - Add AMD Zen 6 events and metrics - Add JSON file with OpenHW Risc-V CVA6 hardware counters - Add 'perf kvm' stats live testing - Add more 'perf stat' tests to 'perf test' - Fix segfault in `perf lock contention -b/--use-bpf` - Fix various 'perf test' cases for s390 - Build system cleanups, bump minimum shellcheck version to 0.7.2 - Support building the capstone based annotation routines as a plugin - Allow passing extra Clang flags via EXTRA_BPF_FLAGS * tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (255 commits) perf test script: Add python script testing support perf test script: Add perl script testing support perf script: Allow the generated script to be a path perf test: perf data --to-ctf testing perf test: Test pipe mode with data conversion --to-json perf json: Pipe mode --to-ctf support perf json: Pipe mode --to-json support perf check: Add libbabeltrace to the listed features perf build: Allow passing extra Clang flags via EXTRA_BPF_FLAGS perf test data_type_profiling.sh: Skip just the Rust tests if code_with_type workload is missing tools build: Fix feature test for rust compiler perf libunwind: Fix calls to thread__e_machine() perf stat: Add no-affinity flag perf evlist: Reduce affinity use and move into iterator, fix no affinity perf evlist: Missing TPEBS close in evlist__close() perf evlist: Special map propagation for tool events that read on 1 CPU perf stat-shadow: In prepare_metric fix guard on reading NULL perf_stat_evsel Revert "perf tool_pmu: More accurately set the cpus for tool events" tools build: Emit dependencies file for test-rust.bin tools build: Make test-rust.bin be removed by the 'clean' target ...	2026-02-21 10:51:08 -08:00

1 2 3 4 5 ...

1426930 Commits