Originally, btree splits always succeeded once we got to the point of
recursing to the btree_insert_node() call.
But that changed when we switched to not taking intent locks all the way
up to the root, and that introduced a bug, because
bch2_btree_interior_update_will_free_node() cancels paending writes and
reparents a node that's going to be made visible on disk by another
btree update to the current btree update.
This was discovered in recent backpointers work, because
bch2_btree_interior_update_will_free_node() also clears the
will_make_reachable flag, causing backpointer target lookup to
spuriously thing it had found a dangling backpointer (when the
backpointer just hadn't been created yet by
btree_update_nodes_written()).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We should always signal to rewind if the requested pass hasn't been run,
even if called multiple times.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This lets us use darray macros on dev_alloc_list (and it will become a
darray eventually, when we increase the maximum number of devices).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When allocating a journal write fails, then retries after doing
discards, we were failing to count already allocated replicas.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a tracepoint for inserting new accounting entries: we're seeing odd
spinning behaviour in accounting read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The validate late path was iterating over accounting entries in
eytzinger order, which is unnecessarily tricky when we may have to
remove entries.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
we wish to use the logged ops btree for other items that aren't strictly
logged ops: cursors for inode allocation
There's no reason to create another cached btree for inode allocator
cursors - so reserve different parts of the keyspace for different
purposes.
Older versions will ignore or delete the cursors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Introduce a typedef to handle the difference between unsigned
long/struct urcu_gp_poll_state.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When tracing is disabled, there is no point in asking the user about
enabling extra btree_path tracepoints in bcachefs.
Fixes: 32ed4a620c ("bcachefs: Btree path tracepoints")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The "journal space available" calculations didn't take into account
mismatched bucket sizes; we need to take the minimum space available out
of our devices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a method to flush btree node rewrites at the end of recovery, to
ensure that corrected errors are persisted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Ensure that "invalid bkey" repair gets persisted, so that it doesn't
repeatedly spam the logs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a function for walking backpointers to find a path from a given
inode number, and convert various error messages to use it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When calling check_discard_freeespace_key from the allocator, we can't
repair without recursing - run it asynchronously instead.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We should add support for cryptographic macs on the superblock - and it
won't be hard, but it'll need an incompatible feature bit (and we have a
new incompatible feature versioning scheme coming).
For now, just add a guard to avoid a dull ptr deref in gen_poly_key().
Reported-by: syzbot+dd3d9835055dacb66f35@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
transaction commits invalidate pointers to btree values, and they also
downgrade intent locks.
This breaks the interior btree update path, which takes intent locks and
then calls into the allocator.
This isn't an ideal solution: we can't unconditionally issue a restart
after a transaction commit, because that would break other codepaths.
Reported-by: syzbot+78d82470c16a49702682@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Wraparound is impractical to handle since in various places we use 0 as
a sentinal value - but 64 bits (or 56, because the btree write buffer
steals a few bits) is enough for all practical purposes.
Reported-by: syzbot+73ed43fbe826227bd4e0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
These repair paths are well tested, we can repair them without explicit
user intervention
This also tweaks bch2_topology_error() so that we run topology repair if
we're in recovery, not just fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a new parameter to bkey validate functions, and use it to improve
invalid bkey error messages: we can now print the btree and depth it
came from, or if it came from the journal, or is a btree root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, when mounting read-write after a clean shutdown, we wouldn't
go read-write until after all the recovery passes completed.
Now, go RW early in recovery, the same as any other situation we'll need
to go read-write. This fixes a bug where we discover unlinked inodes
after a clean shutdown: repair fails because we're read only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Fix an assertion pop from the recent btree cache freelist fixes.
Fixes: baefd3f849 ("bcachefs: btree_cache.freeable list fixes")
Reported-by: Tyler <th020394@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6.11 had a bug where we'd sometimes create disk accounting keys with
version 0, which causes issues for journal replay - but we don't need to
delete existing accounting keys with version 0.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>