Commit Graph

869 Commits

Author SHA1 Message Date
Darrick J. Wong
6e33017c32 xfs: fix data fork format filtering during inode repair
Coverity noticed that xrep_dinode_bad_metabt_fork never runs because
XFS_DINODE_FMT_META_BTREE is always filtered out in the mode selection
switch of xrep_dinode_check_dfork.

Metadata btrees are allowed only in the data forks of regular files, so
add this case explicitly.  I guess this got fubard during a refactoring
prior to 6.13 and I didn't notice until now. :/

Coverity-id: 1617714
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-02-14 09:40:24 +01:00
Darrick J. Wong
66314e9a57 xfs: fix online repair probing when CONFIG_XFS_ONLINE_REPAIR=n
I received a report from the release engineering side of the house that
xfs_scrub without the -n flag (aka fix it mode) would try to fix a
broken filesystem even on a kernel that doesn't have online repair built
into it:

 # xfs_scrub -dTvn /mnt/test
 EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
 Phase 1: Find filesystem geometry.
 /mnt/test: using 1 threads to scrub.
 Phase 1: Memory used: 132k/0k (108k/25k), time:  0.00/ 0.00/ 0.00s
 <snip>
 Phase 4: Repair filesystem.
 <snip>
 Info: /mnt/test/some/victimdir directory entries: Attempting repair. (repair.c line 351)
 Corruption: /mnt/test/some/victimdir directory entries: Repair unsuccessful; offline repair required. (repair.c line 204)

Source: https://blogs.oracle.com/linux/post/xfs-online-filesystem-repair

It is strange that xfs_scrub doesn't refuse to run, because the kernel
is supposed to return EOPNOTSUPP if we actually needed to run a repair,
and xfs_io's repair subcommand will perror that.  And yet:

 # xfs_io -x -c 'repair probe' /mnt/test
 #

The first problem is commit dcb660f922 (4.15) which should have had
xchk_probe set the CORRUPT OFLAG so that any of the repair machinery
will get called at all.

It turns out that some refactoring that happened in the 6.6-6.8 era
broke the operation of this corner case.  What we *really* want to
happen is that all the predicates that would steer xfs_scrub_metadata()
towards calling xrep_attempt() should function the same way that they do
when repair is compiled in; and then xrep_attempt gets to return the
fatal EOPNOTSUPP error code that causes the probe to fail.

Instead, commit 8336a64eb7 (6.6) started the failwhale swimming by
hoisting OFLAG checking logic into a helper whose non-repair stub always
returns false, causing scrub to return "repair not needed" when in fact
the repair is not supported.  Prior to that commit, the oflag checking
that was open-coded in scrub.c worked correctly.

Similarly, in commit 4bdfd7d157 (6.8) we hoisted the IFLAG_REPAIR
and ALREADY_FIXED logic into a helper whose non-repair stub always
returns false, so we never enter the if test body that would have called
xrep_attempt, let alone fail to decode the OFLAGs correctly.

The final insult (yes, we're doing The Naked Gun now) is commit
48a72f6086 (6.8) in which we hoisted the "are we going to try a
repair?" predicate into yet another function with a non-repair stub
always returns false.

Fix xchk_probe to trigger xrep_probe if repair is enabled, or return
EOPNOTSUPP directly if it is not.  For all the other scrub types, we
need to fix the header predicates so that the ->repair functions (which
are all xrep_notsupported) get called to return EOPNOTSUPP.  Commit
48a72 is tagged here because the scrub code prior to LTS 6.12 are
incomplete and not worth patching.

Reported-by: David Flynn <david.flynn@oracle.com>
Cc: <stable@vger.kernel.org> # v6.8
Fixes: 8336a64eb7 ("xfs: don't complain about unfixed metadata when repairs were injected")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-02-14 09:37:25 +01:00
Christoph Hellwig
183d988ae9 xfs: constify feature checks
They will eventually be needed to be const for zoned growfs, but even
now having such simpler helpers as const as possible is a good thing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-01-13 14:57:08 +01:00
Darrick J. Wong
fd97fe1112 xfs: fix CoW forks for realtime files
Port the copy on write fork repair to realtime files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:17 -08:00
Darrick J. Wong
12f4d20328 xfs: check for shared rt extents when rebuilding rt file's data fork
When we're rebuilding the data fork of a realtime file, we need to
cross-reference each mapping with the rt refcount btree to ensure that
the reflink flag is set if there are any shared extents found.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:16 -08:00
Darrick J. Wong
92b2019493 xfs: repair inodes that have a refcount btree in the data fork
Plumb knowledge of refcount btrees into the inode core repair code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:16 -08:00
Darrick J. Wong
83ccffc489 xfs: online repair of the realtime refcount btree
Port the data device's refcount btree repair code to the realtime
refcount btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:16 -08:00
Darrick J. Wong
fe2efe9508 xfs: capture realtime CoW staging extents when rebuilding rt rmapbt
Walk the realtime refcount btree to find the CoW staging extents when
we're rebuilding the realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:16 -08:00
Darrick J. Wong
477493082f xfs: walk the rt reference count tree when rebuilding rmap
When we're rebuilding the data device rmap, if we encounter a "refcount"
format fork, we have to walk the (realtime) refcount btree inode to
build the appropriate mappings.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:16 -08:00
Darrick J. Wong
6470ceef32 xfs: check new rtbitmap records against rt refcount btree
When we're rebuilding the realtime bitmap, check the proposed free
extents against the rt refcount btree to make sure we don't commit any
grievous errors.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:16 -08:00
Darrick J. Wong
cca34a3054 xfs: don't flag quota rt block usage on rtreflink filesystems
Quota space usage is allowed to exceed the size of the physical storage
when reflink is enabled.  Now that we have reflink for the realtime
volume, apply this same logic to the rtb repair logic.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:15 -08:00
Darrick J. Wong
ca757af07f xfs: scrub the metadir path of rt refcount btree files
Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:15 -08:00
Darrick J. Wong
a9600db96f xfs: detect and repair misaligned rtinherit directory cowextsize hints
If we encounter a directory that has been configured to pass on a CoW
extent size hint to a new realtime file and the hint isn't an integer
multiple of the rt extent size, we should flag the hint for
administrative review and/or turn it off because that is a
misconfiguration.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:15 -08:00
Darrick J. Wong
48bc170f2c xfs: allow dquot rt block count to exceed rt blocks on reflink fs
Update the quota scrubber to allow dquots where the realtime block count
exceeds the block count of the rt volume if reflink is enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:15 -08:00
Darrick J. Wong
30f47950dc xfs: check reference counts of gaps between rt refcount records
If there's a gap between records in the rt refcount btree, we ought to
cross-reference the gap with the rtrmap records to make sure that there
aren't any overlapping records for a region that doesn't have any shared
ownership.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:15 -08:00
Darrick J. Wong
2d9a3e9805 xfs: allow overlapping rtrmapbt records for shared data extents
Allow overlapping realtime reverse mapping records if they both describe
shared data extents and the fs supports reflink on the realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:15 -08:00
Darrick J. Wong
91683bb3f2 xfs: cross-reference checks with the rt refcount btree
Use the realtime refcount btree to implement cross-reference checks in
other data structures.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:14 -08:00
Darrick J. Wong
c27929670d xfs: scrub the realtime refcount btree
Add code to scrub realtime refcount btrees.  Similar to the refcount
btree checking code for the data device, we walk the rmap btree for each
refcount record to confirm that the reference counts are correct.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:14 -08:00
Darrick J. Wong
c3d3605f96 xfs: allow inodes to have the realtime and reflink flags
Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:13 -08:00
Darrick J. Wong
fd9300679c xfs: add a realtime flag to the refcount update log redo items
Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:11 -08:00
Darrick J. Wong
70fcf68665 xfs: namespace the maximum length/refcount symbols
Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:10 -08:00
Darrick J. Wong
9515572be6 xfs: hook live realtime rmap operations during a repair operation
Hook the regular realtime rmap code when an rtrmapbt repair operation is
running so that we can unlock the AGF buffer to scan the filesystem and
keep the in-memory btree up to date during the scan.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:09 -08:00
Darrick J. Wong
4a61f12eb1 xfs: create a shadow rmap btree during realtime rmap repair
Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:09 -08:00
Darrick J. Wong
6a849bd81b xfs: online repair of the realtime rmap btree
Repair the realtime rmap btree while mounted.  Similar to the regular
rmap btree repair code, we walk the data fork mappings of every realtime
file in the filesystem to collect reverse-mapping records in an xfarray.
Then we sort the xfarray, and use the btree bulk loader to create a new
rtrmap btree ondisk.  Finally, we swap the btree roots, and reap the old
blocks in the usual way.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:09 -08:00
Darrick J. Wong
c6904f6788 xfs: support repairing metadata btrees rooted in metadir inodes
Adapt the repair code so that we can stage a new btree in the data fork
area of a metadir inode and reap the old blocks.  We already have nearly
all of the infrastructure; the only parts that were missing were the
metadata inode reservation handling.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:08 -08:00
Darrick J. Wong
8defee8dff xfs: online repair of realtime bitmaps for a realtime group
For a given rt group, regenerate the bitmap contents from the group's
realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:08 -08:00
Darrick J. Wong
3dd3aba6b9 xfs: repair rmap btree inodes
Teach the inode repair code how to deal with realtime rmap btree inodes
that won't load properly.  This is most likely moot since the filesystem
generally won't mount without the rtrmapbt inodes being usable, but
we'll add this for completeness.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:08 -08:00
Darrick J. Wong
1bd0843027 xfs: repair inodes that have realtime extents
Plumb into the inode core repair code the ability to search for extents
on realtime devices.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:08 -08:00
Darrick J. Wong
f1a6d9b4c3 xfs: online repair of realtime file bmaps
Now that we have a reverse-mapping index of the realtime device, we can
rebuild the data fork forward-mappings of any realtime file.  Enhance
the existing bmbt repair code to walk the rtrmap btrees to gather this
information.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:08 -08:00
Darrick J. Wong
2e0629e17c xfs: walk the rt reverse mapping tree when rebuilding rmap
When we're rebuilding the data device rmap, if we encounter an "rmap"
format fork, we have to walk the (realtime) rmap btree inode to build
the appropriate mappings.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:08 -08:00
Darrick J. Wong
366243cc99 xfs: scrub the metadir path of rt rmap btree files
Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:07 -08:00
Darrick J. Wong
a5542712f9 xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings
Teach the bmbt scrubber how to perform a comprehensive check that the
rmapbt does not contain /any/ mappings that are not described by bmbt
records when it's dealing with a realtime file.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:07 -08:00
Darrick J. Wong
037a44d827 xfs: cross-reference the realtime rmapbt
Teach the data fork and realtime bitmap scrubbers to cross-reference
information with the realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:07 -08:00
Darrick J. Wong
1ebecab5ad xfs: cross-reference realtime bitmap to realtime rmapbt scrubber
When we're checking the realtime rmap btree entries, cross-reference
those entries with the realtime bitmap too.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:07 -08:00
Darrick J. Wong
9a6cc4f6d0 xfs: scrub the realtime rmapbt
Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:07 -08:00
Darrick J. Wong
428e488465 xfs: allow queued realtime intents to drain before scrubbing
When a writer thread executes a chain of log intent items for the
realtime volume, the ILOCKs taken during each step are for each rt
metadata file, not the entire rt volume itself.  Although scrub takes
all rt metadata ILOCKs, this isn't sufficient to guard against scrub
checking the rt volume while that writer thread is in the middle of
finishing a chain because there's no higher level locking primitive
guarding the realtime volume.

When there's a collision, cross-referencing between data structures
(e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if
repair is running, this results in incorrect repairs, which is
catastrophic.

Fix this by adding to the mount structure the same drain that we use to
protect scrub against concurrent AG updates, but this time for the
realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:06 -08:00
Darrick J. Wong
702c90f451 xfs: support file data forks containing metadata btrees
Create a new fork format type for metadata btrees.  This fork type
requires that the inode is in the metadata directory tree, and only
applies to the data fork.  The actual type of the metadata btree itself
is determined by the di_metatype field.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:05 -08:00
Darrick J. Wong
9e823fc274 xfs: add a realtime flag to the rmap update log redo items
Extend the rmap update (RUI) log items to handle realtime volumes by
adding a new log intent item type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:04 -08:00
Darrick J. Wong
953f76bf7a xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:03 -08:00
Darrick J. Wong
af32541081 xfs: add some rtgroup inode helpers
Create some simple helpers to reduce the amount of typing whenever we
access rtgroup inodes.  Conversion was done with this spatch and some
minor reformatting:

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_BITMAP]
+ rtg_bitmap(rtg)

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_SUMMARY]
+ rtg_summary(rtg)

and the CLI command:

$ spatch --sp-file /tmp/moo.cocci --dir fs/xfs/ --use-gitgrep --in-place

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:03 -08:00
Darrick J. Wong
84140a96cf xfs: prepare to reuse the dquot pointer space in struct xfs_inode
Files participating in the metadata directory tree are not accounted to
the quota subsystem.  Therefore, the i_[ugp]dquot pointers in struct
xfs_inode are never used and should always be NULL.

In the next patch we want to add a u64 count of fs blocks reserved for
metadata btree expansion, but we don't want every inode in the fs to pay
the memory price for this feature.  The intent is to union those three
pointers with the u64 counter, but for that to work we must guard
against all access to the dquot pointers for metadata files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23 13:06:03 -08:00
Darrick J. Wong
c004a793e0 xfs: fix zero byte checking in the superblock scrubber
The logic to check that the region past the end of the superblock is all
zeroes is wrong -- we don't want to check only the bytes past the end of
the maximally sized ondisk superblock structure as currently defined in
xfs_format.h; we want to check the bytes beyond the end of the ondisk as
defined by the feature bits.

Port the superblock size logic from xfs_repair and then put it to use in
xfs_scrub.

Cc: <stable@vger.kernel.org> # v4.15
Fixes: 21fb4cb198 ("xfs: scrub the secondary superblocks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:13 -08:00
Darrick J. Wong
06b20ef09b xfs: check pre-metadir fields correctly
The checks that were added to the superblock scrubber for metadata
directories aren't quite right -- the old inode pointers are now defined
to be zeroes until someone else reuses them.  Also consolidate the new
metadir field checks to one place; they were inexplicably scattered
around.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 28d756d4d5 ("xfs: update sb field checks when metadir is turned on")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:12 -08:00
Darrick J. Wong
3853b5e1d7 xfs: don't move nondir/nonreg temporary repair files to the metadir namespace
Only directories or regular files are allowed in the metadata directory
tree.  Don't move the repair tempfile to the metadir namespace if this
is not true; this will cause the inode verifiers to trip.

xrep_tempfile_adjust_directory_tree opportunistically moves sc->tempip
from the regular directory tree to the metadata directory tree if sc->ip
is part of the metadata directory tree.  However, the scrub setup
functions grab sc->ip and create sc->tempip before we actually get
around to checking if the file mode is the right type for the scrubber.

IOWs, you can invoke the symlink scrubber with the file handle of a
subdirectory in the metadir.  xrep_setup_symlink will create a temporary
symlink file, xrep_tempfile_adjust_directory_tree will foolishly try to
set the METADATA flag on the temp symlink, which trips the inode
verifier in the inode item precommit, which shuts down the filesystem
when expensive checks are turned on.  If they're /not/ turned on, then
xchk_symlink will return ENOENT when it sees that it's been passed a
symlink, but the invalid inode could still get flushed to disk.  We
don't want that.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 9dc31acb01 ("xfs: move repair temporary files to the metadata directory tree")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:12 -08:00
Darrick J. Wong
ffc3ea4f3c xfs: fix scrub tracepoints when inode-rooted btrees are involved
Fix a minor mistakes in the scrub tracepoints that can manifest when
inode-rooted btrees are enabled.  The existing code worked fine for bmap
btrees, but we should tighten the code up to be less sloppy.

Cc: <stable@vger.kernel.org> # v5.7
Fixes: 92219c292a ("xfs: convert btree cursor inode-private member names")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:10 -08:00
Darrick J. Wong
dc5a052739 xfs: mark metadir repair tempfiles with IRECOVERY
Once in a long while, xfs/566 and xfs/801 report directory corruption in
one of the metadata subdirectories while it's forcibly rebuilding all
filesystem metadata.  I observed the following sequence of events:

1. Initiate a repair of the parent pointers for the /quota/user file.
   This is the secret file containing user quota data.

2. The pptr repair thread creates a temporary file and begins staging
   parent pointers in the ondisk metadata in preparation for an
   exchange-range to commit the new pptr data.

3. At the same time, initiate a repair of the /quota directory itself.

4. The dir repair thread finds the temporary file from (2), scans it for
   parent pointers, and stages a dirent in its own temporary dir in
   preparation to commit the fixed directory.

5. The parent pointer repair completes and frees the temporary file.

6. The dir repair commits the new directory and scans it again.  It
   finds the dirent that points to the old temporary file in (2) and
   marks the directory corrupt.

Oops!  Repair code must never scan the temporary files that other repair
functions create to stage new metadata.  They're not supposed to do
that, but the predicate function xrep_is_tempfile is incorrect because
it assumes that any XFS_DIFLAG2_METADATA file cannot ever be a temporary
file, but xrep_tempfile_adjust_directory_tree creates exactly that.

Fix this by setting the IRECOVERY flag on temporary metadata directory
inodes and using that to correct the predicate.  Repair code is supposed
to erase all the data in temporary files before releasing them, so it's
ok if a thread scans the temporary file after we drop IRECOVERY.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: bb6cdd5529 ("xfs: hide metadata inodes from everyone because they are special")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
6f4669708a xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
If we need to reset a symlink target to the "durr it's busted" string,
then we clear the zapped flag as well.  However, this should be using
the provided helper so that we don't set the zapped state on an
otherwise ok symlink.

Cc: <stable@vger.kernel.org> # v6.10
Fixes: 2651923d8d ("xfs: online repair of symbolic links")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
aa7bfb537e xfs: separate healthy clearing mask during repair
In commit d9041681dd we introduced some XFS_SICK_*ZAPPED flags so
that the inode record repair code could clean up a damaged inode record
enough to iget the inode but still be able to remember that the higher
level repair code needs to be called.  As part of that, we introduced a
xchk_mark_healthy_if_clean helper that is supposed to cause the ZAPPED
state to be removed if that higher level metadata actually checks out.
This was done by setting additional bits in sick_mask hoping that
xchk_update_health will clear all those bits after a healthy scrub.

Unfortunately, that's not quite what sick_mask means -- bits in that
mask are indeed cleared if the metadata is healthy, but they're set if
the metadata is NOT healthy.  fsck is only intended to set the ZAPPED
bits explicitly.

If something else sets the CORRUPT/XCORRUPT state after the
xchk_mark_healthy_if_clean call, we end up marking the metadata zapped.
This can happen if the following sequence happens:

1. Scrub runs, discovers that the metadata is fine but could be
   optimized and calls xchk_mark_healthy_if_clean on a ZAPPED flag.
   That causes the ZAPPED flag to be set in sick_mask because the
   metadata is not CORRUPT or XCORRUPT.

2. Repair runs to optimize the metadata.

3. Some other metadata used for cross-referencing in (1) becomes
   corrupt.

4. Post-repair scrub runs, but this time it sets CORRUPT or XCORRUPT due
   to the events in (3).

5. Now the xchk_health_update sets the ZAPPED flag on the metadata we
   just repaired.  This is not the correct state.

Fix this by moving the "if healthy" mask to a separate field, and only
ever using it to clear the sick state.

Cc: <stable@vger.kernel.org> # v6.8
Fixes: d9041681dd ("xfs: set inode sick state flags when we zap either ondisk fork")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
bd27c7bcdc xfs: return a 64-bit block count from xfs_btree_count_blocks
With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings.  Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks.  Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.

It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.

Cc: <stable@vger.kernel.org> # v5.19
Fixes: df9ad5cc7a ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
e1d8602b6c xfs: keep quota directory inode loaded
In the same vein as the previous patch, there's no point in the metapath
scrub setup function doing a lookup on the quota metadir just so it can
validate that lookups work correctly.  Instead, retain the quota
directory inode in memory for the lifetime of the mount so that we can
check this meaningfully.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 128a055291 ("xfs: scrub quota file metapaths")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:08 -08:00