Commit Graph

1337140 Commits

Author SHA1 Message Date
Andreas Gruenbacher
d7cd33f7ef bcachefs: add eytzinger0_find_gt self test
Add an eytzinger0_find_gt() self test similar to eytzinger0_find_le().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
d384dada0e bcachefs: simplify eytzinger0_find_le
Replace the over-complicated implementation of eytzinger0_find_le() by
an equivalent, simpler version.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
d148d804f2 bcachefs: convert eytzinger0_find_le to be 1-based
eytzinger0_find_le() is also easy to concert to 1-based eytzinger (but
see the next commit).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
c722b818a2 bcachefs: improve eytzinger0_find_le self test
Rename eytzinger0_find_test_val() to eytzinger0_find_test_le() and add a
new eytzinger0_find_test_val() wrapper that calls it.

We have already established that the array is sorted in eytzinger order,
so we can use the eytzinger iterator functions and check the boundary
conditions to verify the result of eytzinger0_find_le().

Only scan the entire array if we get an incorrect result.  When we need
to scan, use eytzinger0_for_each_prev() so that we'll stop at the
highest matching element in the array in case there are duplicates;
going through the array linearly wouldn't give us that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
dc5ceaaad8 bcachefs: add eytzinger0_for_each_prev
Add an eytzinger0_for_each_prev() macro for iterating through an
eytzinger array in reverse.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
e8a0966ffa bcachefs: eytzinger0_find_test improvement
In eytzinger0_find_test(), remember the smallest element seen so far
instead of comparing adjacent array elements.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
ec70103f9b bcachefs: eytzinger[01]_test improvement
In eytzinger[01]_test(), make sure that eytzinger[01]_for_each()
iterates over all array elements.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
0766f5599c bcachefs: eytzinger self tests: fix cmp_u16 typo
Fix an obvious typo in cmp_u16().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
0ede49212a bcachefs: eytzinger self tests: missing newline termination
pr_info() format strings need to be newline terminated.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
217ad1d7c7 bcachefs: eytzinger self tests: loop cleanups
The iterator variable of eytzinger0_for_each() loops has been changed to
be locally scoped at some point, so remove variables defined outside the
loop that are now unused.  In addition and for clarity, use a different
variable inside those loops where an outside variable would be shadowed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
d54b82ecc4 bcachefs: EYTZINGER_DEBUG fix
When EYTZINGER_DEBUG is defined, <linux/bug.h> needs to be included.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Andreas Gruenbacher
f7f9be0238 bcachefs: bch2_blacklist_entries_gc cleanup
Use an eytzinger0_for_each() loop here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
34a493089a bcachefs: bch2_bkey_ptr_data_type() now correctly returns cached for cached ptrs
Necessary for adding backpointers for cached pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
fd49882f12 bcachefs: Add time_stat for btree writes
We have other metadata IO types covered, this was missing.

Note: this includes the time until completion, i.e. including parent
pointer update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
b7f648e2ec bcachefs: Add comment explaining why asserts in invalidate_one_bucket() are impossible
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
7606fb4d26 bcachefs: Ignore backpointers to stripes in ec_stripe_update_extents()
Prep work for stripe backpointers: this path previously would get very
confused at being asked to process (remove redundant replicas) stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
898bda5b72 bcachefs: Increase JOURNAL_BUF_NR
Increase journal pipelining.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
35282ce9e8 bcachefs: Free journal bufs when not in use
Since we're increasing the number of 'struct journal_bufs', we don't
want them all permanently holding onto buffers for the journal data -
that'd be 16 * 2MB = 32MB, or potentially more.

Add a single-element mempool (open coded, since buffer size varies),
this also means we won't be hitting the memory allocator every time we
open and close a journal entry/buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
2e853fdbc7 bcachefs: Don't touch journal_buf->data->seq in journal_res_get
This is a small optimization, reducing the number of cachelines we touch
in the fast path - and it's also necessary for the next patch that
increases JOURNAL_BUF_NR.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:13 -04:00
Kent Overstreet
199a3578ed bcachefs: Kill journal_res.idx
More dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
c2be81d48a bcachefs: Kill journal_res_state.unwritten_idx
Dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
3eccc02035 bcachefs: add progress indicator to check_allocations
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
491eda6394 bcachefs: Add a progress indicator to bch2_dev_data_drop()
This code needs quite a bit of work: we don't want to be walking all
metadata in the filesystem, we should just be walking backpointers, and
it should be switched to a data ioctl that can report progress via a
file descriptor, not the system console.

But that'll take more work - before we can safely walk only backpointers
we need to change device add to not reuse device indexes, since with
that change accounting being wrong introduces the possibility of
removing a device that still has pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
baabeb4997 bcachefs: Factor out progress.[ch]
the backpointers code has progress indicators; these aren't great, since
they print to the dmesg console and we much prefer to have progress
indicators reporting to a specific userspace program so they're not
spamming the system console.

But not all codepaths that need progress indicators support that yet,
and we don't want users to think "this is hung".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
06284963e3 bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts
we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
45f0e6c838 bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number
We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.

Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
e63cf203d7 bcachefs: Convert migrate to move_data_phys()
Iterating over backpointers on a specific device is potentially much
cheaper than walking all filesystem data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
157ea58341 bcachefs: Read/move path counter work
Reorganize counters a bit, grouping related counters together.

New counters:
- io_read_inline
- io_read_hole

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Alan Huang
7d8321a286 bcachefs: Fix subtraction underflow
When ancestor is less than IS_ANCESTOR_BITMAP, we would get an incorrect
result.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
f269ae55d2 bcachefs: Scrub
Add a new data op to walk all data and metadata in a filesystem,
checking if it can be read successfully, and on error repairing from
another copy if possible.

- New helper: bch2_dev_idx_is_online(), so that we can bail out and
  report to userspace when we're unable to scrub because the device is
  offline

- data_update_opts, which controls the data move path, now understands
  scrub: data is only read, not written. The read path is responsible
  for rewriting on read error, as with other reads.

- scrub_pred skips data extents that don't have checksums

- bch_ioctl_data has a new scrub member, which has a data_types field
  for data types to check - i.e. all data types, or only metadata.

- Add new entries to bch_move_stats so that we can report numbers for
  corrected and uncorrected errors

- Add a new enum to bch_ioctl_data_event for explicitly reporting
  completion and return code (i.e. device offline)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
3e2ad29865 bcachefs: bch2_btree_node_scrub()
Add a function for scrubbing btree nodes - reading them in, and kicking
off a rewrite if there's an error.

The btree_node_read_done() checks have to be duplicated because we're
not using a pointer to a struct btree - the btree node might already be
in cache, and we need to check a specific replica, which might not be
the one we previously read from.

This will be used in the next patch implementing high-level scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
ca24130ee4 bcachefs: bch2_bkey_pick_read_device() can now specify a device
To be used for scrub, where we want the read to come from a specific
device.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
2a2f7aaa8d bcachefs: __bch2_move_data_phys() now uses bch2_btree_node_rewrite_pos()
Kill most of the separate logic for btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
987fdbdb40 bcachefs: bch2_move_data_phys()
Add a more general version of bch2_evacuate_bucket - to be used for
scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
12188c9e2b bcachefs: bch2_btree_node_rewrite_pos()
Add a new helper for rewriting a btree node given a just the key, not a
pointer to the node itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
ca16fa6b86 bcachefs: backpointer_get_key() doesn't pull in btree node
We may not need to pull in a btree node when walking backpointers -
don't do so unnecessarily when using backpointer_get_key().

It'll still fall back to backpointer_get_node() in a few situations,
including btree roots (where an iterator can't point at just the key),
and races due to the interior update path not having deleted a
backpointer to an old node yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
dff6de9518 bcachefs: Internal reads can now correct errors
Rework the read path so that BCH_READ_NODECODE reads now also self-heal
after a read error and a successful retry - prerequisite for scrub.

- __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE
  and a bounce.

  Normally, we don't want a BCH_READ_NODECODE read to ever allocate a
  split bch_read_bio: we want to maintain the relationship between the
  bch_read_bio and the data_update it's embedded in.

  But correcting read errors requires allocating a split/bounce rbio
  that's embedded in a promote_op. We do still have a 1-1 relationship,
  i.e. we only allocate a single split/bounce if it's a
  BCH_READ_NODECODE, so things hopefully don't get too crazy.

- __bch2_read_extent() now is allowed to allocate the promote_op for
  rewriting after a failed read, even if it's BCH_READ_NODECODE.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
7b1d655106 bcachefs: Don't self-heal if a data update is already rewriting
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
4dfb76e0ad bcachefs: Don't start promotes from bch2_rbio_free()
we don't want to block completion of the read - starting a promote calls
into the write path, which will block.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
7e9ed60f5f bcachefs: Bail out early on alloc_nowait data updates
If a data update doesn't want to block on allocations (promotes, self
healing on read error) - check if the allocation would fail before
kicking off the data update and calling into the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
c37d42a0e2 bcachefs: Rework init order in bch2_data_update_init()
Initialize the write op first, so that in the next patch we can check if
the allocator would block (for BCH_WRITE_alloc_nowait ops) and bail out
before taking nocow locks/dev refs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
29ad31c780 bcachefs: Self healing writes are BCH_WRITE_alloc_nowait
If a drive is failing and we're moving data off of it, we can't
necessairly depend on capacity/disk reservation calculations to avoid
deadlocking/blocking on the allocator.

And, we don't want to queue up infinite self healing moves anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
8ff92a9e4e bcachefs: Promotes should use BCH_WRITE_only_specified_devs
Promotes, like most other internal moves, should only go to the
specified target and not fall back to allocating from the full
filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
d0148e7169 bcachefs: Be stricter in bch2_read_retry_nodecode()
Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that
the read is embedded in a a data_update - and we can check in the retry
path if the extent has changed and bail out.

This likely fixes some subtle bugs with read errors and data moves.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
6f7111f820 bcachefs: cleanup redundant code around data_update_op initialization
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
536d789781 bcachefs: bch2_update_unwritten_extent() no longer depends on wbio
Prep work for improving bch2_data_update_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
8f97793d67 bcachefs: promote_op uses embedded bch_read_bio
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
a70bd97630 bcachefs: data_update now embeds bch_read_bio
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
dfa204b169 bcachefs: rbio_init() cleanup
Move more initialization to rbio_init(), to assist in further cleanups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
0f856b7228 bcachefs: rbio_init_fragment()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00