Commit Graph

1216993 Commits

Author SHA1 Message Date
Kent Overstreet
e28307a106 bcachefs: Silence unimportant tracepoints
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
7c0732b88d bcachefs: Fix move path when move_stats == NULL
This isn't done very often, but it is legitimate

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
4081ace307 bcachefs: Get ref on c->writes in move.c
There's no point reading an extent in order to move it if the write is
going to fail because we're shutting down. This patch changes the move
path so that moving_io now owns a ref on c->writes - as a bonus,
rebalance and copygc will now notice that we're shutting down and exit
quicker.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
0337cc7eee bcachefs: move.c refactoring
- add bch2_moving_ctxt_(init|exit)
 - split out __bch2_evacutae_bucket() which takes an existing
   moving_ctxt, this will be used for improving copygc performance by
   pipelining across multiple buckets

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:35 -04:00
Daniel Hill
c91996c50a bcachefs: data jobs, including rebalance wait for copygc.
move_ratelimit() now has a bool that specifies whether we want to
wait for copygc to finish.

When copygc is running, we're probably low on free buckets instead
of consuming the remaining buckets, we want to wait for copygc to
finish.

This should help with performance, and run away bucket fragmentation.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:35 -04:00
Kent Overstreet
7f5c5d20f0 bcachefs: Redo data_update interface
This patch significantly cleans up and simplifies the data_update
interface. Instead of only being able to specify a single pointer by
device to rewrite, we're now able to specify any or all of the pointers
in the original extent to be rewrited, as a bitmask.

data_cmd is no more: the various pred functions now just return true if
the extent should be moved/updated. All the data_update path does is
rewrite existing replicas, or add new ones.

This fixes a bug where with background compression on replicated
filesystems, where rebalance -> data_update would incorrectly drop the
wrong old replica, and keep trying to recompress an extent pointer and
each time failing to drop the right replica. Oops.

Now, the data update path doesn't look at the io options to decide which
pointers to keep and which to drop - it only goes off of the
data_update_options passed to it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
47ab0c5f6a bcachefs: Fix bch2_check_alloc_key()
bch2_check_alloc_key() was failing to check buckets that didn't have
alloc keys yet (because they'd never been used) - they still need to be
added to the freespace btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
e34da43e33 bcachefs: Improve bch2_check_alloc_info
- In check_alloc_key(), previously we were re-initializing iterators
   for the need_discard and freespace btrees for every alloc key we
   checked. But this was causing us to redo lookups into the journal
   keys every time, since those lookups are cached in struct btree_iter.
   This initializes the iterators in bch2_check_alloc_info and passes
   them into check_alloc_key().

 - Make the looping more consistent/efficient in bch2_check_alloc_info()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
22add2ec67 bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info()
This runs before we go rw for journal replay, but after we're allowed to
go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW,
though.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
3858536744 bcachefs: Bucket invalidate path improvements
- invalidate_one_bucket() now returns 1 when we don't have any buckets
   on this device to invalidate, ensuring we don't spin
 - the tracepoint invocation is moved to after the transaction commit,
   and we now include the number of cached sectors in the tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
962ad1a766 bcachefs: Don't BUG_ON() inode link count underflow
This switches that assertion to a bch2_trans_inconsistent() call, as it
should be.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
7a47d0993b bcachefs: Always descend to leaf nodes it btree_gc
If a btree node is unreadable, it's the topology repair that fixes that
and it's kicked off by btree_gc, so btree_gc needs to touch every node
and very that they can be read.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Daniel Hill
58aaa0836b bcachefs: fix __dev_available().
__dev_available() now calculates available buckets correctly. Previously
it would almost always return 0 when we have cached data.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet
2817d45381 bcachefs: Fix assertion in topology repair
If we were at the end of the node, when breaking out of the loop we'd
pop the assertion on line 446 when cur wasn't NULL.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
5a3c24714c bcachefs: Make verbose option settable at runtime
-o verbose is very useful, and we're starting to use it more for runtime
debug statements - making it possible to enable at runtime is a no
brainer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
54feff0a7a bcachefs: Improve "copygc requested to run" error message
This improves the "copygc requested to run but no buckets found" to show
the device that requires copygc to be run on - we'll definitely need to
improve this more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
c501fef6de bcachefs: Pull out data_update.c
This is the start of reorganizing the data IO paths. The plan is to also
break apart io.c into data_read.c and data_write.c, and migrate_write
will be renamed to the data_update path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet
30f0349d62 bcachefs: Split out dev_buckets_free()
Previously, dev_buckets_available() only counted buckets that are
eligible to be allocated right now - i.e. buckets that don't have cached
data, or need discard, or need gc gens, etc.

But most users of this function want to know how many buckets are
eligible to be allocated from without moving data around - copygc,
allocator striping, which means we should be including cached data
buckets etc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
8f7f566f57 bcachefs: btree key cache pcpu freedlist
Originally, the btree key cache code would always allocate new entries
by reusing from the recently-freed list, if that list wasn't empty. But
that behaviour was dropped, for lock contention reasons.

But it seems that entries stranded on the freed list have been
contributing to some of our oom issues, because long running btree
transactions will prevent them from being freed.

This patch re-adds allocating from the freed list, but it also adds
percpu buffers to solve the lock contention issues - and the new percpu
freed lists will improve the evict paths, too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
7bb61e8c0e bcachefs: Make IO in flight by copygc/rebalance configurable
This adds a new option, move_bytes_in_flight, for configuring the amount
of IO in flight by copygc/rebalance - users with many devices in their
filesystem will want to increase this.

In the future we should be smarter about this, but this is an easy
improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
b5f73fd79f bcachefs: Check for extents with too many ptrs
We have a hardcoded maximum on number of pointers in an extent that's
used by some other data structures - notably bch_devs_list - but we
weren't actually checking for it. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
1c6ff39445 bcachefs: Fix refcount leak in bch2_do_invalidates()
If we fail to queue the work item because it's already in process, we
need to drop the ref we just took.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
a3d7afa5c1 bcachefs: Always use percpu_ref_tryget_live() on c->writes
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
23189da9eb bcachefs: Improve checksum error messages
We're seeing checksum errors in the bch2_rechecksum_bio() path - give it
a better error message to help track this down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
50b13beef0 bcachefs: Improve an error message
When inserting a key type that's not valid for a given btree, we should
print out which btree we were inserting into.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet
2ed6248ab3 bcachefs: Fix assertion in bch2_dev_list_add_dev()
We were only allowing 4 devices in a dev_list, not 16.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
b7c1104612 bcachefs: Increase max size for btree_trans bump allocator
With backpointers, alloc keys have gotten bigger, so we're needing more
memory here.

We're probably going to need to go with something more sophisticated
than a bump allocator, but - let's see if we can avoid doing that just
yet.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
6f44a9940c bcachefs: Add a persistent counter for bucket discards
Like the previous patch for bucket invalidates, add another counter for
a core allocator path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
c9bd67321e bcachefs: Fix btree node read retries
b->written wasn't being reset to 0 in the btree node read retry path,
causing decrypting & validation of previously read bsets to not be
re-run - ouch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
440c15cc91 bcachefs: Add a persistent counter for bucket invalidation
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
0e96f5dcd7 bcachefs: Call bch2_do_invalidates() when going read write
Like bch2_do_discards(), we should check if this needs to be done when
going rw.

Also, add some sysfs code for debugging bucket invalidation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
a5d18f9ec0 bcachefs: Improved human readable integer parsing
Printbufs recently switched to using string_get_size() for printing
integers in human readable units. This updates __bch2_strtoh() to parse
numbers printed by string_get_size() - we now have to handle floating
point numbers, and new unit suffixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
df8c2ccb93 bcachefs: Fix freespace initialization
bch2_dev_freespace_init() was using __bch2_trans_do() incorrectly, and
calling bch2_bucket_do_index() with a stale alloc key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
401ec4db63 bcachefs: Printbuf rework
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Kent Overstreet
652018d661 bcachefs: Fix btree node read error path
We were forgetting to clear the read_in_flight flag - oops. This also
fixes it to not call bch2_fatal_error() before topology repair has had a
chance to do its thing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
576179021c bcachefs: Fix btree_and_journal_iter
We had a bug where btree_and_journal_iter would return the same key
twice - after deleting it (perhaps because it was present in both the
btree and the journal?)

This reworks btree_and_journal_iter to track the current position, much
like btree_paths, which makes the logic considerably simpler and more
robust.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
f2aa026575 bcachefs: Fix for cmd_list_journal
cmd_list_journal wasn't correctly listing the most recent journal
entries as blacklisted - because in the recovery path when just reading
the journal, we were failing to add those to the blacklist table.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
cb685ce72c bcachefs: Also log overwrites in journal
Lately we've been doing a lot of debugging by looking at the journal to
see what was changed, and by what code path. This patch adds a new
journal entry type for recording overwrites, so that we don't have to
search backwards through the journal to see what was being overwritten
in order to work out what the triggers were supposed to be doing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
43ddf44834 bcachefs: Refactor journal entry adding
This takes copying the payload out of bch2_journal_add_entry(), which
means we can use it for journal_transaction_name() - also prep work for
journalling overwrites.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
4a7a7ea1f5 bcachefs: Add some missing error messages
bch2_opt_parse() was failing to generate error messages in error path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
636d4eef1e bcachefs: Fix memory corruption in encryption path
When do_encrypt() was passed a vmalloc address and the buffer spanned
more than a single page, we were encrypting/decrypting completely
different pages than the ones intended.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
0fbf71f80d bcachefs: bch2_trans_reset_updates()
Factor out a new helper.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
9b688da350 bcachefs: Fix error checking in bch2_fs_alloc()
One of the init calls had a ; instead of a ?:, and errors after that got
dropped - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
c737267821 bcachefs: Print message on btree node read retry success
Right now, we print an error message on btree node read error, and we
print that we're retrying, but we don't explicitly say if the retry
succeeded - this makes things a little clearer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
30525f6863 bcachefs: Fix journal_keys_search() overhead
Previously, on every btree_iter_peek() operation we were searching the
journal keys, doing a full binary search - which was slow.

This patch fixes that by saving our position in the journal keys, so
that we only do a full binary search when moving our position backwards
or a large jump forwards.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
11f5e595bf bcachefs: Always print when doing journal replay in fsck
This logging improvement helps see when the previous fsck pass has
completed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Daniel Hill
a8dea22703 bcachefs: Rename group to label for remaining strings.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Kent Overstreet
c346def9af bcachefs: Fix encryption path on arm
flush_dcache_page() is not a noop on arm, but we were using
virt_to_page() instead of vmalloc_to_page() for an address on the kernel
stack - vmalloc memory, leading to an oops in flush_dcache_page().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
232697ab9d bcachefs: Switch to key_type_user, not logon
The only difference key_type_logon and key_type_user is that
key_type_logon keys can't be read by userspace.

However, userspace has actually been adding keys to both the logon and
user keychains, because userspace fsck requires the keychain interface -
so we might as well just use user and drop the logon keychain.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
d8a161ad54 bcachefs: LRU repair tweaks
- Drop old unneeded parameter for whether we're in initial GC - which
   was from when btree updates had to be done differently before we
   went RW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00