Commit Graph

1352605 Commits

Author SHA1 Message Date
Kent Overstreet
c72def5237 bcachefs: Run check_dirents second time if required
If we move a key backwards, we'll need a second pass to run the rest of
the fsck checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
a4907d7f33 bcachefs: Run snapshot deletion out of system_long_wq
We don't want this running out of the same workqueue, and blocking,
writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
e49cf9b54b bcachefs: Make check_key_has_snapshot safer
Snapshot deletion v2 added sentinal values for deleted snapshots, so
"key for deleted snapshot" - i.e. snapshot deletion missed something -
is safe to repair automatically.

But if we find a key for a missing snapshot we have no idea what
happened, and we shouldn't delete it unless we're very sure that
everything else is consistent.

So hook it up to the new bch2_require_recovery_pass(), we'll now only
delete if snapshots and subvolumes have recenlty been checked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
0942b852d4 bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT
Add a superblock flag to temporarily disable ratelimiting for a recovery
pass.

This will be used to make check_key_has_snapshot safer: we don't want to
delete a key for a missing snapshot unless we know that the snapshots
and subvolumes btrees are consistent, i.e. check_snapshots and
check_subvols have run recently.

Changing those btrees - creating/deleting a subvolume or snapshot - will
set the "disable ratelimit" flag, i.e. ensuring that those passes run if
check_key_has_snapshot discovers an error.

We're only disabling ratelimiting in the snapshot/subvol delete paths,
we're not so concerned about the create paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:36 -04:00
Kent Overstreet
a2ffab0e65 bcachefs: bch2_require_recovery_pass()
Add a helper for requiring that a recovery pass has already run: either
run it directly, if we're still in recovery, or if we're not in recovery
check if it has run recently and schedule it if it hasn't.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
09b9c72bd4 bcachefs: bch_err_throw()
Add a tracepoint for any time we return an error and unwind.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
36a2fdf7c5 bcachefs: Repair code for directory i_size
We had a bug due due to an incomplete revert of the patch implementing
directory i_size (summing up the size of the dirents), leading to
completely screwy i_size values that underflow.

Most userspace programs don't seem to care (e.g. du ignores it), but it
turns out this broke sshfs, so needs to be repaired.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
95fafc0f34 bcachefs: Kill un-reverted directory i_size code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
d47db3e636 bcachefs: Delete redundant fsck_err()
'inode_has_wrong_backpointer'; we have more specific errors for every
case afterwards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
165815c296 bcachefs: Convert BUG() to error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
132263220d bcachefs: Add better logging to fsck_rename_dirent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
18dad454cd bcachefs: Replace rcu_read_lock() with guards
The new guard(), scoped_guard() allow for more natural code.

Some of the uses with creative flow control have been left.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
9cb49fbf73 bcachefs: CLASS(btree_trans)
Allow btree_trans to be used with CLASS().

Automatic cleanup, instead of manually calling bch2_trans_put().

We don't use DEFINE_CLASS because using a static inline for the
constructor breaks bch2_trans_get()'s use of __func__, so we have to
open code it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
42359f1615 bcachefs: CLASS(darray)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
237a8e16bd bcachefs: CLASS(printbuf)
Add a DEFINE_CLASS() for printbufs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
a0f7437906 bcachefs: sysfs trigger_journal_commit
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
1f42a0335a bcachefs: sysfs trigger_emergency_read_only
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
5802caf74f bcachefs: darray_find(), darray_find_p()
New helpers to avoid open coded loops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
9a1accd3a5 bcachefs: Journal keys are retained until shutdown, or journal replay finishes
If we don't finish journal replay we need to keep journal keys around
until the filesystem shuts down - otherwise e.g. -o norecovery, various
tools (dump, list) break, and eventually we'll be doing journal replay
in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
6447544c3d bcachefs: Improve error printing in btree_node_check_topology()
We had a bug report where the errors from btree_node_check_topology()
don't seem to be getting printed; log_fsck_err() does some fancy
ratelimiting-type stuff that we don't want here.

Instead, just use bch2_count_fsck_err(); this is simpler, and modelled
after how we're currently handling bucket ref update errors in
buckets.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
f402d9710b bcachefs: bch2_readdir() now calls str_hash_check_key()
More self healing code: readdir will now notice if there are dirents
hashed incorrectly, and it'll repair them if errors=fix_safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
a592268260 bcachefs: bch2_str_hash_check_key() may now be called without snapshots_seen
We don't track snapshot overwrites outside of fsck, so for this to be
called at runtime outside of fsck we need to create it on demand, when
we have repair to do.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
cb6f5d0dec bcachefs: __bch2_insert_snapshot_whiteouts() refactoring
Now uses bch2_get_snapshot_overwrites(), and much shorter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
801cb2bd6c bcachefs: bch2_get_snapshot_overwrites()
New helper for getting a list of snapshot IDs that have overwritten a
given key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
d21262d4e3 bcachefs: bch2_dev_journal_bucket_delete()
Recover from "journal and btree in same bucket".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
0224d17d76 bcachefs: Runtime self healing for keys for deleted snapshots
If snapshot deletion incorrectly missing some keys and leaves keys for
deleted snapshots, that causes a bit of a problem for data move - we
can't move an extent for a nonexistent snapshot, because the extent
might have to be fragmented, and maintaining correct visibility in child
snapshots doesn't work if it doesn't have a snapshot.

Previously we'd just skip these keys, but it turns out that causes
copygc to spin.

So we need runtime self healing, i.e. calling check_key_has_snapshot()
from the data move path.

Snapshot deletion v2 included sentinal values for deleted snapshot
nodes, so this is quite safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
f02d153274 bcachefs: Don't unlock trans before data_update_init()
data_update_init() does need to do btree operations, delay doing the
unlock-before-io.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
642c1aabb0 bcachefs: Use bch2_err_matches() for BCH_ERR_fsck_(fix|ignore)
We'll be adding subtypes of these errors, and new error code tracing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:16 -04:00
Kent Overstreet
dc43f6a70b bcachefs: Mark bch_errcode helpers __attribute__((const))
These don't access global memory or defer pointer arguments - this
enables CSE optimizations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 11:20:18 -04:00
Kent Overstreet
66621f016d bcachefs: Add missing printbuf_reset() in bch2_check_dirent_inode_dirent()
We were accidentally including the contents from the previous
fsck_err().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 11:20:18 -04:00
Kent Overstreet
f1dc067bc1 bcachefs: sysfs/errors
Make the superblock error counters available in sysfs; the only other
way they can be seen is 'show-super', but we don't write the superblock
every time the error count gets incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 11:20:18 -04:00
Kent Overstreet
66b7c51ceb bcachefs: bch2_check_fix_ptrs() can now repair btree roots
This is straightforward enough: check_fix_ptrs() currently only runs
before we go RW, so updating the btree root pointer in c->btree_roots
suffices - it'll be written out in the first journal write we do.

For that, do_bch2_trans_commit_to_journal_replay() now handles
JSET_ENTRY_btree_root entries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
a7c9add482 bcachefs: Include b->ob.nr in cached_btree_node_to_text()
We have a bug report that looks like we might be leaking open buckets -
let's check if they got left attached to the cached btree node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
e87de7d491 bcachefs: Move devs_sorted to alloc_request
More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
ff6369da9a bcachefs: reduce stack usage in alloc_sectors_start()
with typical config options, variables in different inline functions
aren't sharing stack space - and these are slowpaths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
eabef52ff8 bcachefs: bch2_alloc_v4_to_text()
Specialize the .to_text() for alloc_v4, to avoid the temporary on the
stack for conversion from old versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
0c34e7ff69 bcachefs: Tweak bch2_data_update_init() for stack usage
- Separate out a slowpath for bkey_nocow_lock()
- Don't call bch2_bkey_ptrs_c() or loop over pointers more than
  necessary

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
56e5c7f65f bcachefs: kill replicas_sectors arg to __trigger_extent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
92caf17189 bcachefs: Don't stack allocate bch_writepage_state
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
cd831a9494 bcachefs: factor out break_cycle_fail()
More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
19c0a8aa8a bcachefs: btree_node_missing_err()
Factor out an error path for a small stack usage improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
0d25264ecf bcachefs: Kill bkey_buf in btree_path_down()
Allocate some (smaller) temporary storage in btree_trans for this -
btree_path_down() is in our max-stack call stack.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
99813d88e3 bcachefs: Add missing error logging in delete_dead_inodes()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
f54b2a80d0 bcachefs: Fix misaligned bucket check in journal space calculations
Fix an assertion pop in the tiering_misaligned test: rounding down to
bucket size at the end of the journal space calculations leaves
cur_entry_sectors == 0, which is incorrect with !cur_entry_err.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
813825d241 bcachefs: Fix incorrect multiple dev check in journal write path
It's uncomon to have multiple devices with journalling only on a subset,
but can be specified with the 'data_allowed' option. We need to know if
we're doing data/metadata writes to multiple devices, as that requires
issuing flushes before the journal writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
327971cef5 bcachefs: Catch data_update_done events in trace_io_move_start_fail
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
c7897b5055 bcachefs: io_move_evacuate_bucket tracepoint, counter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
060ff4b794 bcachefs: trace_io_move_pred
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
d6efd42a84 bcachefs: Fix infinite loop in journal_entry_btree_keys_to_text()
Fix an infinite loop when bkey_i->k.u64s is 0.

This only happens in userspace, where 'bcachefs list_journal' can print
the entire contents of the journal, and non-dirty entries aren't
validated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
cd04497b10 bcachefs: Journal read error message improvements
- Don't print a checksum error when we first read a journal entry: we
  print a checksum error later if we'll be using the journal entry.

- Continuing with the theme of of improving error messages and grouping
  errors into a single log message per error, print a single 'checksum
  error' message per journal entry, and use bch2_journal_ptr_to_text()
  to print out where on the device it was.

- Factor out checksum error messages and checking for missing journal
  entries into helpers, bch2_journal_read() has gotten obnoxiously big.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00