Commit Graph

1337121 Commits

Author SHA1 Message Date
Kent Overstreet
199a3578ed bcachefs: Kill journal_res.idx
More dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
c2be81d48a bcachefs: Kill journal_res_state.unwritten_idx
Dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
3eccc02035 bcachefs: add progress indicator to check_allocations
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
491eda6394 bcachefs: Add a progress indicator to bch2_dev_data_drop()
This code needs quite a bit of work: we don't want to be walking all
metadata in the filesystem, we should just be walking backpointers, and
it should be switched to a data ioctl that can report progress via a
file descriptor, not the system console.

But that'll take more work - before we can safely walk only backpointers
we need to change device add to not reuse device indexes, since with
that change accounting being wrong introduces the possibility of
removing a device that still has pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
baabeb4997 bcachefs: Factor out progress.[ch]
the backpointers code has progress indicators; these aren't great, since
they print to the dmesg console and we much prefer to have progress
indicators reporting to a specific userspace program so they're not
spamming the system console.

But not all codepaths that need progress indicators support that yet,
and we don't want users to think "this is hung".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
06284963e3 bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts
we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
45f0e6c838 bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number
We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.

Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
e63cf203d7 bcachefs: Convert migrate to move_data_phys()
Iterating over backpointers on a specific device is potentially much
cheaper than walking all filesystem data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
157ea58341 bcachefs: Read/move path counter work
Reorganize counters a bit, grouping related counters together.

New counters:
- io_read_inline
- io_read_hole

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Alan Huang
7d8321a286 bcachefs: Fix subtraction underflow
When ancestor is less than IS_ANCESTOR_BITMAP, we would get an incorrect
result.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
f269ae55d2 bcachefs: Scrub
Add a new data op to walk all data and metadata in a filesystem,
checking if it can be read successfully, and on error repairing from
another copy if possible.

- New helper: bch2_dev_idx_is_online(), so that we can bail out and
  report to userspace when we're unable to scrub because the device is
  offline

- data_update_opts, which controls the data move path, now understands
  scrub: data is only read, not written. The read path is responsible
  for rewriting on read error, as with other reads.

- scrub_pred skips data extents that don't have checksums

- bch_ioctl_data has a new scrub member, which has a data_types field
  for data types to check - i.e. all data types, or only metadata.

- Add new entries to bch_move_stats so that we can report numbers for
  corrected and uncorrected errors

- Add a new enum to bch_ioctl_data_event for explicitly reporting
  completion and return code (i.e. device offline)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
3e2ad29865 bcachefs: bch2_btree_node_scrub()
Add a function for scrubbing btree nodes - reading them in, and kicking
off a rewrite if there's an error.

The btree_node_read_done() checks have to be duplicated because we're
not using a pointer to a struct btree - the btree node might already be
in cache, and we need to check a specific replica, which might not be
the one we previously read from.

This will be used in the next patch implementing high-level scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
ca24130ee4 bcachefs: bch2_bkey_pick_read_device() can now specify a device
To be used for scrub, where we want the read to come from a specific
device.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
2a2f7aaa8d bcachefs: __bch2_move_data_phys() now uses bch2_btree_node_rewrite_pos()
Kill most of the separate logic for btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
987fdbdb40 bcachefs: bch2_move_data_phys()
Add a more general version of bch2_evacuate_bucket - to be used for
scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
12188c9e2b bcachefs: bch2_btree_node_rewrite_pos()
Add a new helper for rewriting a btree node given a just the key, not a
pointer to the node itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
ca16fa6b86 bcachefs: backpointer_get_key() doesn't pull in btree node
We may not need to pull in a btree node when walking backpointers -
don't do so unnecessarily when using backpointer_get_key().

It'll still fall back to backpointer_get_node() in a few situations,
including btree roots (where an iterator can't point at just the key),
and races due to the interior update path not having deleted a
backpointer to an old node yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
dff6de9518 bcachefs: Internal reads can now correct errors
Rework the read path so that BCH_READ_NODECODE reads now also self-heal
after a read error and a successful retry - prerequisite for scrub.

- __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE
  and a bounce.

  Normally, we don't want a BCH_READ_NODECODE read to ever allocate a
  split bch_read_bio: we want to maintain the relationship between the
  bch_read_bio and the data_update it's embedded in.

  But correcting read errors requires allocating a split/bounce rbio
  that's embedded in a promote_op. We do still have a 1-1 relationship,
  i.e. we only allocate a single split/bounce if it's a
  BCH_READ_NODECODE, so things hopefully don't get too crazy.

- __bch2_read_extent() now is allowed to allocate the promote_op for
  rewriting after a failed read, even if it's BCH_READ_NODECODE.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
7b1d655106 bcachefs: Don't self-heal if a data update is already rewriting
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
4dfb76e0ad bcachefs: Don't start promotes from bch2_rbio_free()
we don't want to block completion of the read - starting a promote calls
into the write path, which will block.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:12 -04:00
Kent Overstreet
7e9ed60f5f bcachefs: Bail out early on alloc_nowait data updates
If a data update doesn't want to block on allocations (promotes, self
healing on read error) - check if the allocation would fail before
kicking off the data update and calling into the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
c37d42a0e2 bcachefs: Rework init order in bch2_data_update_init()
Initialize the write op first, so that in the next patch we can check if
the allocator would block (for BCH_WRITE_alloc_nowait ops) and bail out
before taking nocow locks/dev refs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
29ad31c780 bcachefs: Self healing writes are BCH_WRITE_alloc_nowait
If a drive is failing and we're moving data off of it, we can't
necessairly depend on capacity/disk reservation calculations to avoid
deadlocking/blocking on the allocator.

And, we don't want to queue up infinite self healing moves anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
8ff92a9e4e bcachefs: Promotes should use BCH_WRITE_only_specified_devs
Promotes, like most other internal moves, should only go to the
specified target and not fall back to allocating from the full
filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
d0148e7169 bcachefs: Be stricter in bch2_read_retry_nodecode()
Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that
the read is embedded in a a data_update - and we can check in the retry
path if the extent has changed and bail out.

This likely fixes some subtle bugs with read errors and data moves.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
6f7111f820 bcachefs: cleanup redundant code around data_update_op initialization
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
536d789781 bcachefs: bch2_update_unwritten_extent() no longer depends on wbio
Prep work for improving bch2_data_update_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
8f97793d67 bcachefs: promote_op uses embedded bch_read_bio
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
a70bd97630 bcachefs: data_update now embeds bch_read_bio
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
dfa204b169 bcachefs: rbio_init() cleanup
Move more initialization to rbio_init(), to assist in further cleanups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
0f856b7228 bcachefs: rbio_init_fragment()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
14e2523fc5 bcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enums
The uppercase/lowercase style is nice for making the namespace explicit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
9157b3ddfb bcachefs: x-macroize BCH_READ flags
Will be adding a bch2_read_bio_to_text().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
9f37016cb2 bcachefs: kill bch_read_bio.devs_have
Dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
3075e68d26 bcachefs: bch2_data_update_inflight_to_text()
Add a new helper for bch2_moving_ctxt_to_text(), which may be used to
debug if moving_ios are getting stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
50ca857457 bcachefs: BCH_IOCTL_QUERY_COUNTERS
Add an ioctl for querying counters, the same ones provided in
/sys/fs/bcachefs/<uuid>/counters/, but more suitable for a 'bcachefs
top' command.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
5ee760f667 bcachefs: BCH_COUNTER_bucket_discard_fast
Add a separate counter for fastpath bucket discards, which don't require
a journal flush.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
bbd804f2ad bcachefs: enum bch_persistent_counters_stable
Persistent counters, like recovery passes, include a stable enum in
their definition - but this was never correctly plumbed.

This allows us to add new counters and properly organize them with a
non-stable "presentation order", which can also be used in userspace by
the new 'bcachefs fs top' tool.

Fortunatel, since we haven't yet added any new counters where
presentation order ID doesn't match stable ID, this won't cause any
reordering issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
999cc1bb68 bcachefs: Separate running/runnable in wp stats
We've got per-writepoint statistics to see how well the writepoint index
update threads are pipelining; this separates running vs. runnable so we
can see at a glance if they're blocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
78c9c6f6cd bcachefs: Move write_points to debugfs
this was hitting the sysfs 4k limit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:11 -04:00
Kent Overstreet
55a132c37a bcachefs: Don't inc io_(read|write) counters for moves
This makes 'bcachefs fs top' more useful; we can now see at a glance
whether the IO to the device is being done for user reads/writes, or
copygc/rebalance.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:10 -04:00
Kent Overstreet
e5a63ad343 bcachefs: Fix missing increment of move_extent_write counter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:10 -04:00
Kent Overstreet
c3c9957c81 bcachefs: check_bp_exists() check for backpointers for stale pointers
Early version of 'bcachefs_metadata_version_cached_backpointers' was
creating backpointers for stale cached pointers - whoops. Now we have to
repair those.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:10 -04:00
Kent Overstreet
2deae55804 bcachefs: btree_node_(rewrite|update_key) cleanup
Factor out get_iter_to_node() and use it for
btree_node_rewrite_get_iter(), to be used for fixing btree node write
error behaviour.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:10 -04:00
Kent Overstreet
be212d86b1 bcachefs: bs > ps support
bcachefs removed most PAGE_SIZE references long ago, so this is easy;
only readpage_bio_extend() has to be tweaked to respect the minimum
order.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 19:45:57 -04:00
Kent Overstreet
1a2b74d0a2 bcachefs: fix build on 32 bit in get_random_u64_below()
bare 64 bit divides not allowed, whoops

arm-linux-gnueabi-ld: drivers/char/random.o: in function `__get_random_u64_below':
drivers/char/random.c:602:(.text+0xc70): undefined reference to `__aeabi_uldivmod'

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 19:45:54 -04:00
Kent Overstreet
90fd9ad5b0 bcachefs: Change btree wb assert to runtime error
We just had a report of the assert for "btree in write buffer for
non-write buffer btree" popping during the 6.14 upgrade.

- 150TB filesystem, after a reboot the upgrade was able to continue from
  where it left off, so no major damage.

But with 6.14 about to come out we want to get this tracked down asap,
and need more data if other users hit this.

Convert the BUG_ON() to an emergency read-only, and print out btree, the
key itself, and stack trace from the original write buffer update (which
did not have this check before).

Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 10:25:25 -04:00
Kent Overstreet
9c18ea7ffe bcachefs: bch2_get_random_u64_below()
steal the (clever) algorithm from get_random_u32_below()

this fixes a bug where we were passing roundup_pow_of_two() a 64 bit
number - we're squaring device latencies now:

[  +1.681698] ------------[ cut here ]------------
[  +0.000010] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[  +0.000011] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[  +0.000011] CPU: 1 UID: 0 PID: 196 Comm: kworker/u32:13 Not tainted 6.14.0-rc6-dave+ #10
[  +0.000012] Hardware name: ASUS System Product Name/PRIME B460I-PLUS, BIOS 1301 07/13/2021
[  +0.000005] Workqueue: events_unbound __bch2_read_endio [bcachefs]
[  +0.000354] Call Trace:
[  +0.000005]  <TASK>
[  +0.000007]  dump_stack_lvl+0x5d/0x80
[  +0.000018]  ubsan_epilogue+0x5/0x30
[  +0.000008]  __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
[  +0.000011]  bch2_rand_range.cold+0x17/0x20 [bcachefs]
[  +0.000231]  bch2_bkey_pick_read_device+0x547/0x920 [bcachefs]
[  +0.000229]  __bch2_read_extent+0x1e4/0x18e0 [bcachefs]
[  +0.000241]  ? bch2_btree_iter_peek_slot+0x3df/0x800 [bcachefs]
[  +0.000180]  ? bch2_read_retry_nodecode+0x270/0x330 [bcachefs]
[  +0.000230]  bch2_read_retry_nodecode+0x270/0x330 [bcachefs]
[  +0.000230]  bch2_rbio_retry+0x1fa/0x600 [bcachefs]
[  +0.000224]  ? bch2_printbuf_make_room+0x71/0xb0 [bcachefs]
[  +0.000243]  ? bch2_read_csum_err+0x4a4/0x610 [bcachefs]
[  +0.000278]  bch2_read_csum_err+0x4a4/0x610 [bcachefs]
[  +0.000227]  ? __bch2_read_endio+0x58b/0x870 [bcachefs]
[  +0.000220]  __bch2_read_endio+0x58b/0x870 [bcachefs]
[  +0.000268]  ? try_to_wake_up+0x31c/0x7f0
[  +0.000011]  ? process_one_work+0x176/0x330
[  +0.000008]  process_one_work+0x176/0x330
[  +0.000008]  worker_thread+0x252/0x390
[  +0.000008]  ? __pfx_worker_thread+0x10/0x10
[  +0.000006]  kthread+0xec/0x230
[  +0.000011]  ? __pfx_kthread+0x10/0x10
[  +0.000009]  ret_from_fork+0x31/0x50
[  +0.000009]  ? __pfx_kthread+0x10/0x10
[  +0.000008]  ret_from_fork_asm+0x1a/0x30
[  +0.000012]  </TASK>
[  +0.000046] ---[ end trace ]---

Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-13 12:40:22 -04:00
Kent Overstreet
69a5a13a22 bcachefs: target_congested -> get_random_u32_below()
get_random_u32_below() has a better algorithm than bch2_rand_range(),
it just didn't exist at the time.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-13 12:39:21 -04:00
Kent Overstreet
3bcde88d38 bcachefs: fix tiny leak in bch2_dev_add()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-13 00:23:19 -04:00