Commit Graph

1352613 Commits

Author SHA1 Message Date
Kent Overstreet
09fb85ae56 bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm()
We had a bug where bch2_evict_inode() incorrectly called bch2_inode_rm()
- the journal clearly showed the inode was not unlinked.

We've got checks that we use in recovery when cleaning up deleted
inodes, lift them to bch2_inode_rm() as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:41 -04:00
Kent Overstreet
bb6689bbee bcachefs: delete dead code from may_delete_deleted_inode()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:41 -04:00
Kent Overstreet
bfaac2c546 bcachefs: Add flags to subvolume_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:41 -04:00
Kent Overstreet
9f2dc5f394 bcachefs: Fix oops in btree_node_seq_matches()
btree_update_nodes_written() needs to wait on in-flight writes to old
nodes before marking them as freed. But it has no reason to pin those
old nodes in memory, so some trickyness ensues.

The update we're completing deleted references to those nodes from the
btree, so we know if they've been evicted they can't be pulled back in.
We just have to check if the nodes we have pointers to are still those
old nodes, and haven't been reused.

To do that we check the node's "sequence number" (actually a random 64
bit cookie), but that lives in the node's data buffer. 'struct btree'
can't be freed until filesystem shutdown (as they're quite small), but
the data buffers can be freed or swapped around.

Commit 1f88c35674, which was fixing a kmsan warning, assumed that we
could safely do this locklessly with just a READ_ONCE() - if we've got a
non-null ptr it would be safe to read from.

But that's not true if the data buffer is a vmalloc allocation, so we
need to restore the locking that commit deleted (or alternatively RCU
free those data buffers, but there's no other reason for that).

Fixes: 1f88c35674 ("bcachefs: Fix a KMSAN splat in btree_update_nodes_written()")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:41 -04:00
Kent Overstreet
2bf380c005 bcachefs: Fix dirent_casefold_mismatch repair
Instead of simply recreating a mis-casefolded dirent, use the str_hash
repair code, which will rename it if necessary - the dirent might have
been created again with the correct casefolding.

Factor out out bch2_str_hash_repair key() from
__bch2_str_hash_check_key() for the new path to use, and export
bch2_dirent_create_key() as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:41 -04:00
Kent Overstreet
b938d3c970 bcachefs: Fix bch2_fsck_rename_dirent() for casefold
bch2_fsck_renamed_dirent was creating bch_dirent keys open-coded - but
we need to use the appropriate helper, if the directory is casefolded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:41 -04:00
Kent Overstreet
35c1f131bc bcachefs: Redo bch2_dirent_init_name()
Redo (and simplify somewhat) how casefolded and non casefolded dirents
are initialized, and export this to be used by fsck_rename_dirent().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:41 -04:00
Nathan Chancellor
01d925f7e1 bcachefs: Fix -Wc23-extensions in bch2_check_dirents()
Clang warns (or errors with CONFIG_WERROR=y):

  fs/bcachefs/fsck.c:2325:2: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
   2325 |         int ret = bch2_trans_run(c,
        |         ^

On clang-17 and older, this is an unconditional error:

  fs/bcachefs/fsck.c:2325:2: error: expected expression
   2325 |         int ret = bch2_trans_run(c,
        |         ^

Move the declaration of ret to the top of the function to resolve both
ways this issue manifests.

Fixes: c72def5237 ("bcachefs: Run check_dirents second time if required")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04 16:45:38 -04:00
Kent Overstreet
c72def5237 bcachefs: Run check_dirents second time if required
If we move a key backwards, we'll need a second pass to run the rest of
the fsck checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
a4907d7f33 bcachefs: Run snapshot deletion out of system_long_wq
We don't want this running out of the same workqueue, and blocking,
writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
e49cf9b54b bcachefs: Make check_key_has_snapshot safer
Snapshot deletion v2 added sentinal values for deleted snapshots, so
"key for deleted snapshot" - i.e. snapshot deletion missed something -
is safe to repair automatically.

But if we find a key for a missing snapshot we have no idea what
happened, and we shouldn't delete it unless we're very sure that
everything else is consistent.

So hook it up to the new bch2_require_recovery_pass(), we'll now only
delete if snapshots and subvolumes have recenlty been checked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
0942b852d4 bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT
Add a superblock flag to temporarily disable ratelimiting for a recovery
pass.

This will be used to make check_key_has_snapshot safer: we don't want to
delete a key for a missing snapshot unless we know that the snapshots
and subvolumes btrees are consistent, i.e. check_snapshots and
check_subvols have run recently.

Changing those btrees - creating/deleting a subvolume or snapshot - will
set the "disable ratelimit" flag, i.e. ensuring that those passes run if
check_key_has_snapshot discovers an error.

We're only disabling ratelimiting in the snapshot/subvol delete paths,
we're not so concerned about the create paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
a2ffab0e65 bcachefs: bch2_require_recovery_pass()
Add a helper for requiring that a recovery pass has already run: either
run it directly, if we're still in recovery, or if we're not in recovery
check if it has run recently and schedule it if it hasn't.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
09b9c72bd4 bcachefs: bch_err_throw()
Add a tracepoint for any time we return an error and unwind.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
36a2fdf7c5 bcachefs: Repair code for directory i_size
We had a bug due due to an incomplete revert of the patch implementing
directory i_size (summing up the size of the dirents), leading to
completely screwy i_size values that underflow.

Most userspace programs don't seem to care (e.g. du ignores it), but it
turns out this broke sshfs, so needs to be repaired.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
95fafc0f34 bcachefs: Kill un-reverted directory i_size code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
d47db3e636 bcachefs: Delete redundant fsck_err()
'inode_has_wrong_backpointer'; we have more specific errors for every
case afterwards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
165815c296 bcachefs: Convert BUG() to error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
132263220d bcachefs: Add better logging to fsck_rename_dirent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
18dad454cd bcachefs: Replace rcu_read_lock() with guards
The new guard(), scoped_guard() allow for more natural code.

Some of the uses with creative flow control have been left.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
9cb49fbf73 bcachefs: CLASS(btree_trans)
Allow btree_trans to be used with CLASS().

Automatic cleanup, instead of manually calling bch2_trans_put().

We don't use DEFINE_CLASS because using a static inline for the
constructor breaks bch2_trans_get()'s use of __func__, so we have to
open code it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
42359f1615 bcachefs: CLASS(darray)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
237a8e16bd bcachefs: CLASS(printbuf)
Add a DEFINE_CLASS() for printbufs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
a0f7437906 bcachefs: sysfs trigger_journal_commit
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
1f42a0335a bcachefs: sysfs trigger_emergency_read_only
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
5802caf74f bcachefs: darray_find(), darray_find_p()
New helpers to avoid open coded loops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
9a1accd3a5 bcachefs: Journal keys are retained until shutdown, or journal replay finishes
If we don't finish journal replay we need to keep journal keys around
until the filesystem shuts down - otherwise e.g. -o norecovery, various
tools (dump, list) break, and eventually we'll be doing journal replay
in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
6447544c3d bcachefs: Improve error printing in btree_node_check_topology()
We had a bug report where the errors from btree_node_check_topology()
don't seem to be getting printed; log_fsck_err() does some fancy
ratelimiting-type stuff that we don't want here.

Instead, just use bch2_count_fsck_err(); this is simpler, and modelled
after how we're currently handling bucket ref update errors in
buckets.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
f402d9710b bcachefs: bch2_readdir() now calls str_hash_check_key()
More self healing code: readdir will now notice if there are dirents
hashed incorrectly, and it'll repair them if errors=fix_safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
a592268260 bcachefs: bch2_str_hash_check_key() may now be called without snapshots_seen
We don't track snapshot overwrites outside of fsck, so for this to be
called at runtime outside of fsck we need to create it on demand, when
we have repair to do.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
cb6f5d0dec bcachefs: __bch2_insert_snapshot_whiteouts() refactoring
Now uses bch2_get_snapshot_overwrites(), and much shorter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
801cb2bd6c bcachefs: bch2_get_snapshot_overwrites()
New helper for getting a list of snapshot IDs that have overwritten a
given key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
d21262d4e3 bcachefs: bch2_dev_journal_bucket_delete()
Recover from "journal and btree in same bucket".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
0224d17d76 bcachefs: Runtime self healing for keys for deleted snapshots
If snapshot deletion incorrectly missing some keys and leaves keys for
deleted snapshots, that causes a bit of a problem for data move - we
can't move an extent for a nonexistent snapshot, because the extent
might have to be fragmented, and maintaining correct visibility in child
snapshots doesn't work if it doesn't have a snapshot.

Previously we'd just skip these keys, but it turns out that causes
copygc to spin.

So we need runtime self healing, i.e. calling check_key_has_snapshot()
from the data move path.

Snapshot deletion v2 included sentinal values for deleted snapshot
nodes, so this is quite safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
f02d153274 bcachefs: Don't unlock trans before data_update_init()
data_update_init() does need to do btree operations, delay doing the
unlock-before-io.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
642c1aabb0 bcachefs: Use bch2_err_matches() for BCH_ERR_fsck_(fix|ignore)
We'll be adding subtypes of these errors, and new error code tracing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:16 -04:00
Kent Overstreet
dc43f6a70b bcachefs: Mark bch_errcode helpers __attribute__((const))
These don't access global memory or defer pointer arguments - this
enables CSE optimizations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 11:20:18 -04:00
Kent Overstreet
66621f016d bcachefs: Add missing printbuf_reset() in bch2_check_dirent_inode_dirent()
We were accidentally including the contents from the previous
fsck_err().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 11:20:18 -04:00
Kent Overstreet
f1dc067bc1 bcachefs: sysfs/errors
Make the superblock error counters available in sysfs; the only other
way they can be seen is 'show-super', but we don't write the superblock
every time the error count gets incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 11:20:18 -04:00
Kent Overstreet
66b7c51ceb bcachefs: bch2_check_fix_ptrs() can now repair btree roots
This is straightforward enough: check_fix_ptrs() currently only runs
before we go RW, so updating the btree root pointer in c->btree_roots
suffices - it'll be written out in the first journal write we do.

For that, do_bch2_trans_commit_to_journal_replay() now handles
JSET_ENTRY_btree_root entries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
a7c9add482 bcachefs: Include b->ob.nr in cached_btree_node_to_text()
We have a bug report that looks like we might be leaking open buckets -
let's check if they got left attached to the cached btree node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
e87de7d491 bcachefs: Move devs_sorted to alloc_request
More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
ff6369da9a bcachefs: reduce stack usage in alloc_sectors_start()
with typical config options, variables in different inline functions
aren't sharing stack space - and these are slowpaths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
eabef52ff8 bcachefs: bch2_alloc_v4_to_text()
Specialize the .to_text() for alloc_v4, to avoid the temporary on the
stack for conversion from old versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
0c34e7ff69 bcachefs: Tweak bch2_data_update_init() for stack usage
- Separate out a slowpath for bkey_nocow_lock()
- Don't call bch2_bkey_ptrs_c() or loop over pointers more than
  necessary

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
56e5c7f65f bcachefs: kill replicas_sectors arg to __trigger_extent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
92caf17189 bcachefs: Don't stack allocate bch_writepage_state
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
cd831a9494 bcachefs: factor out break_cycle_fail()
More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
19c0a8aa8a bcachefs: btree_node_missing_err()
Factor out an error path for a small stack usage improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
0d25264ecf bcachefs: Kill bkey_buf in btree_path_down()
Allocate some (smaller) temporary storage in btree_trans for this -
btree_path_down() is in our max-stack call stack.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00