linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-04-29 02:19:54 -04:00

Author	SHA1	Message	Date
Lukas Herbolt	64c80dfd04	xfs: Print XFS UUID on mount and umount events. As of now only device names are printed out over __xfs_printk(). The device names are not persistent across reboots which in case of searching for origin of corruption brings another task to properly identify the devices. This patch add XFS UUID upon every mount/umount event which will make the identification much easier. Signed-off-by: Lukas Herbolt <lukas@herbolt.com> [sandeen: rebase onto current upstream kernel] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 19:20:21 -08:00
Long Li	59f6ab40fd	xfs: fix sb write verify for lazysbcount When lazysbcount is enabled, fsstress and loop mount/unmount test report the following problems: XFS (loop0): SB summary counter sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x13b/0x460, xfs_sb block 0x0 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 28 00 00 XFSB.........(.. 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 69 fb 7c cd 5f dc 44 af 85 74 e0 cc d4 e3 34 5a i.\|._.D..t....4Z 00000030: 00 00 00 00 00 20 00 06 00 00 00 00 00 00 00 80 ..... .......... 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ 00000050: 00 00 00 01 00 0a 00 00 00 00 00 04 00 00 00 00 ................ 00000060: 00 00 0a 00 b4 b5 02 00 02 00 00 08 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 14 00 00 19 ................ XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply +0xe1e/0x10e0 (fs/xfs/xfs_buf.c:1580). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed This corruption will shutdown the file system and the file system will no longer be mountable. The following script can reproduce the problem, but it may take a long time. #!/bin/bash device=/dev/sda testdir=/mnt/test round=0 function fail() { echo "$" exit 1 } mkdir -p $testdir while [ $round -lt 10000 ] do echo "**** round $round ******" mkfs.xfs -f $device mount $device $testdir \|\| fail "mount failed!" fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null & sleep 4 killall -w fsstress umount $testdir xfs_repair -e $device > /dev/null if [ $? -eq 2 ];then echo "ERR CODE 2: Dirty log exception during repair." exit 1 fi round=$(($round+1)) done With lazysbcount is enabled, There is no additional lock protection for reading m_ifree and m_icount in xfs_log_sb(), if other cpu modifies the m_ifree, this will make the m_ifree greater than m_icount. For example, consider the following sequence and ifreedelta is postive: CPU0 CPU1 xfs_log_sb xfs_trans_unreserve_and_mod_sb ---------- ------------------------------ percpu_counter_sum(&mp->m_icount) percpu_counter_add_batch(&mp->m_icount, idelta, XFS_ICOUNT_BATCH) percpu_counter_add(&mp->m_ifree, ifreedelta); percpu_counter_sum(&mp->m_ifree) After this, incorrect inode count (sb_ifree > sb_icount) will be writen to the log. In the subsequent writing of sb, incorrect inode count (sb_ifree > sb_icount) will fail to pass the boundary check in xfs_validate_sb_write() that cause the file system shutdown. When lazysbcount is enabled, we don't need to guarantee that Lazy sb counters are completely correct, but we do need to guarantee that sb_ifree <= sb_icount. On the other hand, the constraint that m_ifree <= m_icount must be satisfied any time that there /cannot/ be other threads allocating or freeing inode chunks. If the constraint is violated under these circumstances, sb_i{count,free} (the ondisk superblock inode counters) maybe incorrect and need to be marked sick at unmount, the count will be rebuilt on the next mount. Fixes: `8756a5af18` ("libxfs: add more bounds checking to sb sanity checks") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-11-16 19:20:20 -08:00
Darrick J. Wong	2653d53345	xfs: fix incorrect error-out in xfs_remove Clean up resources if resetting the dotdot entry doesn't succeed. Observed through code inspection. Fixes: `5838d0356b` ("xfs: reset child dir '..' entry when unlinking child") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>	2022-11-16 19:20:20 -08:00
Darrick J. Wong	7b082b5e8a	Merge tag 'scrub-check-metadata-inode-records-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA xfs: scrub inode core when checking metadata files Running the online fsck QA fuzz tests, I noticed that we were consistently missing fuzzed records in the inode cores of the realtime freespace files and the quota files. This patch adds the ability to check inode cores in xchk_metadata_inode_forks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'scrub-check-metadata-inode-records-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: check inode core when scrubbing metadata files xfs: don't warn about files that are exactly s_maxbytes long	2022-11-16 19:19:30 -08:00
Darrick J. Wong	cc5f38fa12	Merge tag 'scrub-bmap-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA xfs: strengthen file mapping scrub This series strengthens the file extent mapping scrubber in various ways, such as confirming that there are enough bmap records to match up with the rmap records for this file, checking delalloc reservations, checking for no unwritten extents in metadata files, invalid CoW fork formats, and weird things like shared CoW fork extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'scrub-bmap-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: teach scrub to flag non-extents format cow forks xfs: check that CoW fork extents are not shared xfs: check quota files for unwritten extents xfs: block map scrub should handle incore delalloc reservations xfs: teach scrub to check for adjacent bmaps when rmap larger than bmap xfs: fix perag loop in xchk_bmap_check_rmaps	2022-11-16 19:19:22 -08:00
Darrick J. Wong	7aab8a05e7	Merge tag 'scrub-fscounters-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA xfs: enhance fs summary counter scrubber This series makes two changes to the fs summary counter scrubber: first, we should mark the scrub incomplete when we can't read the AG headers. Second, it fixes a functionality gap where we don't actually check the free rt extent count. v23.2: fix pointless inline Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'scrub-fscounters-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: online checking of the free rt extent count xfs: skip fscounters comparisons when the scan is incomplete	2022-11-16 19:19:13 -08:00
Darrick J. Wong	b76f593b33	Merge tag 'scrub-fix-rtmeta-ilocking-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA xfs: improve rt metadata use for scrub This short series makes some small changes to the way we handle the realtime metadata inodes. First, we now make sure that the bitmap and summary file forks are always loaded at mount time so that every scrubber won't have to call xfs_iread_extents. This won't be easy if we're, say, cross-referencing realtime space allocations. The second change makes the ILOCK annotations more consistent with the rest of XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'scrub-fix-rtmeta-ilocking-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file xfs: load rtbitmap and rtsummary extent mapping btrees at mount time	2022-11-16 19:19:05 -08:00
Darrick J. Wong	3d8426b13b	Merge tag 'scrub-fix-return-value-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA xfs: fix incorrect return values in online fsck Here we fix a couple of problems with the errno values that we return to userspace. v23.2: fix vague wording of comment v23.3: fix the commit message to discuss what's really going on in this patch Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'scrub-fix-return-value-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: don't return -EFSCORRUPTED from repair when resources cannot be grabbed xfs: don't retry repairs harder when EAGAIN is returned xfs: fix return code when fatal signal encountered during dquot scrub xfs: return EINTR when a fatal signal terminates scrub	2022-11-16 19:18:54 -08:00
Darrick J. Wong	af1077fa87	Merge tag 'scrub-cleanup-malloc-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA xfs: clean up memory allocations in online fsck This series standardizes the GFP_ flags that we use for memory allocation in online scrub, and convert the callers away from the old kmem_alloc code that was ported from Irix. Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'scrub-cleanup-malloc-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: pivot online scrub away from kmem.[ch] xfs: initialize the check_owner object fully xfs: standardize GFP flags usage in online scrub	2022-11-16 19:18:38 -08:00
Darrick J. Wong	823ca26a8f	Merge tag 'scrub-fix-ag-header-handling-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA xfs: fix handling of AG[IF] header buffers during scrub While reading through the online fsck code, I noticed that the setup code for AG metadata scrubs will attach the AGI, the AGF, and the AGFL buffers to the transaction. It isn't necessary to hold the AGFL buffer, since any code that wants to do anything with the AGFL will need to hold the AGF to know which parts of the AGFL are active. Therefore, we only need to hold the AGFL when scrubbing the AGFL itself. The second bug fixed by this patchset is one that I observed while testing online repair. When a buffer is held across a transaction roll, its buffer log item will be detached if the bli was clean before the roll. If we are holding the AG headers to maintain a lock on an AG, we then need to set the buffer type on the new bli to avoid confusing the logging code later. There's also a bug fix for uninitialized memory in the directory scanner that didn't fit anywhere else. Ths patchset finishes off by teaching the AGFL repair function to look for and discard crosslinked blocks instead of putting them back on the AGFL. v23.2: Log the buffers before rolling the transaction to keep the moving forward in the log and avoid the bli falling off. Signed-off-by: Darrick J. Wong <djwong@kernel.org> * tag 'scrub-fix-ag-header-handling-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: make AGFL repair function avoid crosslinked blocks xfs: log the AGI/AGF buffers when rolling transactions during an AG repair xfs: don't track the AGFL buffer in the scrub AG context xfs: fully initialize xfs_da_args in xchk_directory_blocks	2022-11-16 19:18:11 -08:00
Darrick J. Wong	f36b954a1f	xfs: check inode core when scrubbing metadata files Metadata files (e.g. realtime bitmaps and quota files) do not show up in the bulkstat output, which means that scrub-by-handle does not work; they can only be checked through a specific scrub type. Therefore, each scrub type calls xchk_metadata_inode_forks to check the metadata for whatever's in the file. Unfortunately, that function doesn't actually check the inode record itself. Refactor the function a bit to make that happen. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 16:11:51 -08:00
Darrick J. Wong	bd5ab5f987	xfs: don't warn about files that are exactly s_maxbytes long We can handle files that are exactly s_maxbytes bytes long; we just can't handle anything larger than that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 16:11:51 -08:00
Darrick J. Wong	5eef46358f	xfs: teach scrub to flag non-extents format cow forks CoW forks only exist in memory, which means that they can only ever have an incore extent tree. Hence they must always be FMT_EXTENTS, so check this when we're scrubbing them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:05 -08:00
Darrick J. Wong	3178553701	xfs: check that CoW fork extents are not shared Ensure that extents in an inode's CoW fork are not marked as shared in the refcount btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:04 -08:00
Darrick J. Wong	f23c40443d	xfs: check quota files for unwritten extents Teach scrub to flag quota files containing unwritten extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:04 -08:00
Darrick J. Wong	830ffa09fb	xfs: block map scrub should handle incore delalloc reservations Enhance the block map scrubber to check delayed allocation reservations. Though there are no physical space allocations to check, we do need to make sure that the range of file offsets being mapped are correct, and to bump the lastoff cursor so that key order checking works correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:04 -08:00
Darrick J. Wong	6a5777865e	xfs: teach scrub to check for adjacent bmaps when rmap larger than bmap When scrub is checking file fork mappings against rmap records and the rmap record starts before or ends after the bmap record, check the adjacent bmap records to make sure that they're adjacent to the one we're checking. This helps us to detect cases where the rmaps cover territory that the bmaps do not. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:04 -08:00
Darrick J. Wong	033985b6fe	xfs: fix perag loop in xchk_bmap_check_rmaps sparse complains that we can return an uninitialized error from this function and that pag could be uninitialized. We know that there are no zero-AG filesystems and hence we had to call xchk_bmap_check_ag_rmaps at least once, so this is not actually possible, but I'm too worn out from automated complaints from unsophisticated AIs so let's just fix this and move on to more interesting problems, eh? Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:04 -08:00
Darrick J. Wong	e74331d6fa	xfs: online checking of the free rt extent count Teach the summary count checker to count the number of free realtime extents and compare that to the superblock copy. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:03 -08:00
Darrick J. Wong	5f369dc5b4	xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file xfs_rtalloc_query_range scans the realtime bitmap file in order of increasing file offset, so this caller can take ILOCK_SHARED on the rt bitmap inode instead of ILOCK_EXCL. This isn't going to yield any practical benefits at mount time, but we'd like to make the locking usage consistent around xfs_rtalloc_query_all calls. Make all the places we do this use the same xfs_ilock lockflags for consistency. Fixes: `4c934c7dd6` ("xfs: report realtime space information via the rtbitmap") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:03 -08:00
Darrick J. Wong	93b0c58ed0	xfs: don't return -EFSCORRUPTED from repair when resources cannot be grabbed If we tried to repair something but the repair failed with -EDEADLOCK, that means that the repair function couldn't grab some resource it needed and wants us to try again. If we try again (with TRY_HARDER) but still can't get all the resources we need, the repair fails and errors remain on the filesystem. Right now, repair returns the -EDEADLOCK to the caller as -EFSCORRUPTED, which results in XFS_SCRUB_OFLAG_CORRUPT being passed out to userspace. This is not correct because repair has not determined that anything is corrupt. If the repair had been invoked on an object that could be optimized but wasn't corrupt (OFLAG_PREEN), the inability to grab resources will be reported to userspace as corrupt metadata, and users will be unnecessarily alarmed that their suboptimal metadata turned into a corruption. Fix this by returning zero so that the results of the actual scrub will be copied back out to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:03 -08:00
Darrick J. Wong	11f97e6845	xfs: skip fscounters comparisons when the scan is incomplete If any part of the per-AG summary counter scan loop aborts without collecting all of the data we need, the scrubber's observation data will be invalid. Set the incomplete flag so that we abort the scrub without reporting false corruptions. Document the data dependency here too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:03 -08:00
Darrick J. Wong	9e13975bb0	xfs: load rtbitmap and rtsummary extent mapping btrees at mount time It turns out that GETFSMAP and online fsck have had a bug for years due to their use of ILOCK_SHARED to coordinate their linear scans of the realtime bitmap. If the bitmap file's data fork happens to be in BTREE format and the scan occurs immediately after mounting, the incore bmbt will not be populated, leading to ASSERTs tripping over the incorrect inode state. Because the bitmap scans always lock bitmap buffers in increasing order of file offset, it is appropriate for these two callers to take a shared ILOCK to improve scalability. To fix this problem, load both data and attr fork state into memory when mounting the realtime inodes. Realtime metadata files aren't supposed to have an attr fork so the second step is likely a nop. On most filesystems this is unlikely since the rtbitmap data fork is usually in extents format, but it's possible to craft a filesystem that will by fragmenting the free space in the data section and growfsing the rt section. Fixes: `4c934c7dd6` ("xfs: report realtime space information via the rtbitmap") Also-Fixes: `46d9bfb5e7` ("xfs: cross-reference the realtime bitmap") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:03 -08:00
Darrick J. Wong	306195f355	xfs: pivot online scrub away from kmem.[ch] Convert all the online scrub code to use the Linux slab allocator functions directly instead of going through the kmem wrappers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:02 -08:00
Darrick J. Wong	6bf2f87915	xfs: don't retry repairs harder when EAGAIN is returned Repair functions will not return EAGAIN -- if they were not able to obtain resources, they should return EDEADLOCK (like the rest of online fsck) to signal that we need to grab all the resources and try again. Hence we don't need to deal with this case except as a debugging assertion. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:02 -08:00
Darrick J. Wong	fcd2a43488	xfs: initialize the check_owner object fully Initialize the check_owner list head so that we don't corrupt the list. Reduce the scope of the object pointer. Fixes: `858333dcf0` ("xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:02 -08:00
Darrick J. Wong	0a713bd41e	xfs: fix return code when fatal signal encountered during dquot scrub If the scrub process is sent a fatal signal while we're checking dquots, the predicate for this will set the error code to -EINTR. Don't then squash that into -ECANCELED, because the wrong errno turns up in the trace output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:02 -08:00
Darrick J. Wong	a7a0f9a550	xfs: return EINTR when a fatal signal terminates scrub If the program calling online fsck is terminated with a fatal signal, bail out to userspace by returning EINTR, not EAGAIN. EAGAIN is used by scrubbers to indicate that we should try again with more resources locked, and not to indicate that the operation was cancelled. The miswiring is mostly harmless, but it shows up in the trace data. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:02 -08:00
Darrick J. Wong	b255fab0f8	xfs: make AGFL repair function avoid crosslinked blocks Teach the AGFL repair function to check each block of the proposed AGFL against the rmap btree. If the rmapbt finds any mappings that are not OWN_AG, strike that block from the list. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:01 -08:00
Darrick J. Wong	48ff40458f	xfs: standardize GFP flags usage in online scrub Memory allocation usage is the same throughout online fsck -- we want kernel memory, we have to be able to back out if we can't allocate memory, and we don't want to spray dmesg with memory allocation failure reports. Standardize the GFP flag usage and document these requirements. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:01 -08:00
Darrick J. Wong	3e59c0103e	xfs: log the AGI/AGF buffers when rolling transactions during an AG repair Currently, the only way to lock an allocation group is to hold the AGI and AGF buffers. If a repair needs to roll the transaction while repairing some AG metadata, it maintains that lock by holding the two buffers across the transaction roll and joins them afterwards. However, repair is not like other parts of XFS that employ the bhold - roll - bjoin sequence because it's possible that the AGI or AGF buffers are not actually dirty before the roll. This presents two problems -- First, we need to redirty those buffers to keep them moving along in the log to avoid pinning the log tail. Second, a clean buffer log item can detach from the buffer. If this happens, the buffer type state is discarded along with the bli and must be reattached before the next time the buffer is logged. If it is not, the logging code will complain and log recovery will not work properly. An earlier version of this patch tried to fix the second problem by re-setting the buffer type in the bli after joining the buffer to the new transaction, but that looked weird and didn't solve the first problem. Instead, solve both problems by logging the buffer before rolling the transaction. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:01 -08:00
Darrick J. Wong	be1317fdb8	xfs: don't track the AGFL buffer in the scrub AG context While scrubbing an allocation group, we don't need to hold the AGFL buffer as part of the scrub context. All that is necessary to lock an AG is to hold the AGI and AGF buffers, so fix all the existing users of the AGFL buffer to grab them only when necessary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:01 -08:00
Darrick J. Wong	9a48b4a6fd	xfs: fully initialize xfs_da_args in xchk_directory_blocks While running the online fsck test suite, I noticed the following assertion in the kernel log (edited for brevity): XFS: Assertion failed: 0, file: fs/xfs/xfs_health.c, line: 571 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 11667 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] CPU: 3 PID: 11667 Comm: xfs_scrub Tainted: G W 5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:assfail+0x46/0x4a [xfs] Call Trace: <TASK> xfs_dir2_isblock+0xcc/0xe0 xchk_directory_blocks+0xc7/0x420 xchk_directory+0x53/0xb0 xfs_scrub_metadata+0x2b6/0x6b0 xfs_scrubv_metadata+0x35e/0x4d0 xfs_ioc_scrubv_metadata+0x111/0x160 xfs_file_ioctl+0x4ec/0xef0 __x64_sys_ioctl+0x82/0xa0 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This assertion triggers in xfs_dirattr_mark_sick when the caller passes in a whichfork value that is neither of XFS_{DATA,ATTR}_FORK. The cause of this is that xchk_directory_blocks only partially initializes the xfs_da_args structure that is passed to xfs_dir2_isblock. If the data fork is not correct, the XFS_IS_CORRUPT clause will trigger. My development branch reports this failure to the health monitoring subsystem, which accesses the uninitialized args->whichfork field, leading the the assertion tripping. We really shouldn't be passing random stack contents around, so the solution here is to force the compiler to zero-initialize the struct. Found by fuzzing u3.bmx[0].blockcount = middlebit on xfs/1554. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-11-16 15:25:01 -08:00
Linus Torvalds	f0c4d9fc9c	Linux 6.1-rc4 v6.1-rc4	2022-11-06 15:07:11 -08:00
Linus Torvalds	16c7a368c8	Merge tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "Several fixes for CXL region creation crashes, leaks and failures. This is mainly fallout from the original implementation of dynamic CXL region creation (instantiate new physical memory pools) that arrived in v6.0-rc1. Given the theme of "failures in the presence of pass-through decoders" this also includes new regression test infrastructure for that case. Summary: - Fix region creation crash with pass-through decoders - Fix region creation crash when no decoder allocation fails - Fix region creation crash when scanning regions to enforce the increasing physical address order constraint that CXL mandates - Fix a memory leak for cxl_pmem_region objects, track 1:N instead of 1:1 memory-device-to-region associations. - Fix a memory leak for cxl_region objects when regions with active targets are deleted - Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window) emulated proximity domains. - Fix region creation failure for switch attached devices downstream of a single-port host-bridge - Fix false positive memory leak of cxl_region objects by recycling recently used region ids rather than freeing them - Add regression test infrastructure for a pass-through decoder configuration - Fix some mailbox payload handling corner cases" * tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Recycle region ids cxl/region: Fix 'distance' calculation with passthrough ports tools/testing/cxl: Add a single-port host-bridge regression config tools/testing/cxl: Fix some error exits cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak cxl/region: Fix cxl_region leak, cleanup targets at region delete cxl/region: Fix region HPA ordering validation cxl/pmem: Use size_add() against integer overflow cxl/region: Fix decoder allocation crash ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA. cxl/region: Fix null pointer dereference due to pass through decoder commit cxl/mbox: Add a check on input payload size	2022-11-06 13:09:52 -08:00
Linus Torvalds	aa52994915	Merge tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: "Fix two regressions: - Commit `54cc3dbfc1` ("hwmon: (pmbus) Add regulator supply into macro") resulted in regulator undercount when disabling regulators. Revert it. - The thermal subsystem rework caused the scmi driver to no longer register with the thermal subsystem because index values no longer match. To fix the problem, the scmi driver now directly registers with the thermal subsystem, no longer through the hwmon core" * tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: Revert "hwmon: (pmbus) Add regulator supply into macro" hwmon: (scmi) Register explicitly with Thermal Framework	2022-11-06 12:59:12 -08:00
Linus Torvalds	727ea09e99	Merge tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Add Cooper Lake's stepping to the PEBS guest/host events isolation fixed microcode revisions checking quirk - Update Icelake and Sapphire Rapids events constraints - Use the standard energy unit for Sapphire Rapids in RAPL - Fix the hw_breakpoint test to fail more graciously on !SMP configs * tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[] perf/x86/intel: Fix pebs event constraints for SPR perf/x86/intel: Fix pebs event constraints for ICL perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain perf/hw_breakpoint: test: Skip the test if dependencies unmet	2022-11-06 12:41:32 -08:00
Linus Torvalds	f6f5204727	Merge tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add new Intel CPU models - Enforce that TDX guests are successfully loaded only on TDX hardware where virtualization exception (#VE) delivery on kernel memory is disabled because handling those in all possible cases is "essentially impossible" - Add the proper include to the syscall wrappers so that BTF can see the real pt_regs definition and not only the forward declaration * tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add several Intel server CPU model numbers x86/tdx: Panic on bad configs that #VE on "private" memory access x86/tdx: Prepare for using "INFO" call for a second purpose x86/syscall: Include asm/ptrace.h in syscall_wrapper header	2022-11-06 12:36:47 -08:00
Linus Torvalds	35697d81a7	Merge tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Use POSIX-compatible grep options - Document git-related tips for reproducible builds - Fix a typo in the modpost rule - Suppress SIGPIPE error message from gcc-ar and llvm-ar - Fix segmentation fault in the menuconfig search * tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: fix segmentation fault in menuconfig search kbuild: fix SIGPIPE error message for AR=gcc-ar and AR=llvm-ar kbuild: fix typo in modpost Documentation: kbuild: Add description of git for reproducible builds kbuild: use POSIX-compatible grep option	2022-11-06 12:23:10 -08:00
Linus Torvalds	089d1c3122	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the pKVM stage-1 walker erronously using the stage-2 accessor - Correctly convert vcpu->kvm to a hyp pointer when generating an exception in a nVHE+MTE configuration - Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them - Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE - Document the boot requirements for FGT when entering the kernel at EL1 x86: - Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit() - Make argument order consistent for kvcalloc() - Userspace API fixes for DEBUGCTL and LBRs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix a typo about the usage of kvcalloc() KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit() KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl() KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs arm64: booting: Document our requirements for fine grained traps with SME KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them KVM: arm64: Fix bad dereference on MTE-enabled systems KVM: arm64: Use correct accessor to parse stage-1 PTEs	2022-11-06 10:46:59 -08:00
Linus Torvalds	6e8c78d32b	Merge tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "One fix for silencing a smatch warning, and a small cleanup patch" * tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: simplify sysenter and syscall setup x86/xen: silence smatch warning in pmu_msr_chk_emulated()	2022-11-06 10:42:29 -08:00
Linus Torvalds	9761070d14	Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a number of bugs, including some regressions, the most serious of which was one which would cause online resizes to fail with file systems with metadata checksums enabled. Also fix a warning caused by the newly added fortify string checker, plus some bugs that were found using fuzzed file systems" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix fortify warning in fs/ext4/fast_commit.c:1551 ext4: fix wrong return err in ext4_load_and_init_journal() ext4: fix warning in 'ext4_da_release_space' ext4: fix BUG_ON() when directory entry has invalid rec_len ext4: update the backup superblock's at the end of the online resize	2022-11-06 10:30:29 -08:00
Linus Torvalds	90153f928b	Merge tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "One symlink handling fix and two fixes foir multichannel issues with iterating channels, including for oplock breaks when leases are disabled" * tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix use-after-free on the link name cifs: avoid unnecessary iteration of tcp sessions cifs: always iterate smb sessions using primary channel	2022-11-06 10:19:39 -08:00
Linus Torvalds	8391aa4b4c	Merge tag 'trace-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull `lTracing fixes for 6.1-rc3: - Fixed NULL pointer dereference in the ring buffer wait-waiters code for machines that have less CPUs than what nr_cpu_ids returns. The buffer array is of size nr_cpu_ids, but only the online CPUs get initialized. - Fixed use after free call in ftrace_shutdown. - Fix accounting of if a kprobe is enabled - Fix NULL pointer dereference on error path of fprobe rethook_alloc(). - Fix unregistering of fprobe_kprobe_handler - Fix memory leak in kprobe test module * tag 'trace-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd() tracing/fprobe: Fix to check whether fprobe is registered correctly fprobe: Check rethook_alloc() return in rethook initialization kprobe: reverse kp->flags when arm_kprobe failed ftrace: Fix use-after-free for dynamic ftrace_ops ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()	2022-11-06 09:57:38 -08:00
Paolo Bonzini	f4298cac2b	Merge tag 'kvmarm-fixes-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD * Fix the pKVM stage-1 walker erronously using the stage-2 accessor * Correctly convert vcpu->kvm to a hyp pointer when generating an exception in a nVHE+MTE configuration * Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them * Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE * Document the boot requirements for FGT when entering the kernel at EL1	2022-11-06 03:30:49 -05:00
Paolo Bonzini	1462014966	Merge branch 'kvm-master' into HEAD x86: * Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit() * Make argument order consistent for kvcalloc() * Userspace API fixes for DEBUGCTL and LBRs	2022-11-06 03:30:38 -05:00
Theodore Ts'o	0d043351e5	ext4: fix fortify warning in fs/ext4/fast_commit.c:1551 With the new fortify string system, rework the memcpy to avoid this warning: memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4) Cc: stable@kernel.org Fixes: `54d9469bc5` ("fortify: Add run-time WARN for cross-field memcpy()") Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-11-06 01:07:59 -04:00
Jason Yan	9f2a1d9fb3	ext4: fix wrong return err in ext4_load_and_init_journal() The return value is wrong in ext4_load_and_init_journal(). The local variable 'err' need to be initialized before goto out. The original code in __ext4_fill_super() is fine because it has two return values 'ret' and 'err' and 'ret' is initialized as -EINVAL. After we factor out ext4_load_and_init_journal(), this code is broken. So fix it by directly returning -EINVAL in the error handler path. Cc: stable@kernel.org Fixes: `9c1dd22d74` ("ext4: factor out ext4_load_and_init_journal()") Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-11-06 01:07:59 -04:00
Ye Bin	1b8f787ef5	ext4: fix warning in 'ext4_da_release_space' Syzkaller report issue as follows: EXT4-fs (loop0): Free/Dirty block details EXT4-fs (loop0): free_blocks=0 EXT4-fs (loop0): dirty_blocks=0 EXT4-fs (loop0): Block reservation details EXT4-fs (loop0): i_reserved_data_blocks=0 EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks ------------[ cut here ]------------ WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524 Modules linked in: CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528 RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296 RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00 RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000 RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5 R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000 R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740 FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461 mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589 ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852 do_writepages+0x3c3/0x680 mm/page-writeback.c:2469 __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870 wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> Above issue may happens as follows: ext4_da_write_begin ext4_create_inline_data ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); __ext4_ioctl ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag ext4_da_write_begin ext4_da_convert_inline_data_to_extent ext4_da_write_inline_data_begin ext4_da_map_blocks ext4_insert_delayed_block if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk)) ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1 allocated = true; ext4_es_insert_delayed_block(inode, lblk, allocated); ext4_writepages mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1 ext4_es_remove_extent ext4_da_release_space(inode, reserved); if (unlikely(to_free > ei->i_reserved_data_blocks)) -> to_free == 1 but ei->i_reserved_data_blocks == 0 -> then trigger warning as above To solve above issue, forbid inode do migrate which has inline data. Cc: stable@kernel.org Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-11-06 01:07:59 -04:00
Luís Henriques	17a0bc9bd6	ext4: fix BUG_ON() when directory entry has invalid rec_len The rec_len field in the directory entry has to be a multiple of 4. A corrupted filesystem image can be used to hit a BUG() in ext4_rec_len_to_disk(), called from make_indexed_dir(). ------------[ cut here ]------------ kernel BUG at fs/ext4/ext4.h:2413! ... RIP: 0010:make_indexed_dir+0x53f/0x5f0 ... Call Trace: <TASK> ? add_dirent_to_buf+0x1b2/0x200 ext4_add_entry+0x36e/0x480 ext4_add_nondir+0x2b/0xc0 ext4_create+0x163/0x200 path_openat+0x635/0xe90 do_filp_open+0xb4/0x160 ? __create_object.isra.0+0x1de/0x3b0 ? _raw_spin_unlock+0x12/0x30 do_sys_openat2+0x91/0x150 __x64_sys_open+0x6c/0xa0 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The fix simply adds a call to ext4_check_dir_entry() to validate the directory entry, returning -EFSCORRUPTED if the entry is invalid. CC: stable@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540 Signed-off-by: Luís Henriques <lhenriques@suse.de> Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-11-06 01:07:49 -04:00

1 2 3 4 5 ...

1137335 Commits