Commit Graph

1352564 Commits

Author SHA1 Message Date
Kent Overstreet
0d25264ecf bcachefs: Kill bkey_buf in btree_path_down()
Allocate some (smaller) temporary storage in btree_trans for this -
btree_path_down() is in our max-stack call stack.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
99813d88e3 bcachefs: Add missing error logging in delete_dead_inodes()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
f54b2a80d0 bcachefs: Fix misaligned bucket check in journal space calculations
Fix an assertion pop in the tiering_misaligned test: rounding down to
bucket size at the end of the journal space calculations leaves
cur_entry_sectors == 0, which is incorrect with !cur_entry_err.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
813825d241 bcachefs: Fix incorrect multiple dev check in journal write path
It's uncomon to have multiple devices with journalling only on a subset,
but can be specified with the 'data_allowed' option. We need to know if
we're doing data/metadata writes to multiple devices, as that requires
issuing flushes before the journal writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
327971cef5 bcachefs: Catch data_update_done events in trace_io_move_start_fail
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
c7897b5055 bcachefs: io_move_evacuate_bucket tracepoint, counter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
060ff4b794 bcachefs: trace_io_move_pred
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
d6efd42a84 bcachefs: Fix infinite loop in journal_entry_btree_keys_to_text()
Fix an infinite loop when bkey_i->k.u64s is 0.

This only happens in userspace, where 'bcachefs list_journal' can print
the entire contents of the journal, and non-dirty entries aren't
validated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
cd04497b10 bcachefs: Journal read error message improvements
- Don't print a checksum error when we first read a journal entry: we
  print a checksum error later if we'll be using the journal entry.

- Continuing with the theme of of improving error messages and grouping
  errors into a single log message per error, print a single 'checksum
  error' message per journal entry, and use bch2_journal_ptr_to_text()
  to print out where on the device it was.

- Factor out checksum error messages and checking for missing journal
  entries into helpers, bch2_journal_read() has gotten obnoxiously big.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
72ab5136e8 bcachefs: Don't rewind to run a recovery pass we already ran
Fix a small regression from the "run recovery passes" rewrite, which
enabled async recovery passes.

This fixes getting stuck in a loop in recovery.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27 00:03:45 -04:00
Kent Overstreet
686db67a8e bcachefs: Move unicode message to after the startup message
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27 00:03:45 -04:00
Kent Overstreet
1cda5b88e6 bcachefs: Fix missing commit in check_dirents
Other repair code seems to be doing commits themselves, but
check_key_has_snapshot() does not.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27 00:02:44 -04:00
Kent Overstreet
9e2c3c2ed4 bcachefs: Fix lost rebalance wakeups
Fix a missing wakeup in

'bcachefs set-file-option' -> xattr option update -> inode_write

this was missing because the wakeup needs to happen after transaction
commit. Also, add a 'kick' counter, to make sure we don't miss a wakeup
that occured right after we finished checking the rebalance_work btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27 00:02:44 -04:00
Kent Overstreet
dc37dcca8c bcachefs: bch2_kthread_io_clock_wait_once()
Add a version of bch2_kthread_io_clock_wait() that only schedules once -
behaving more like schedule_timeout().

This will be used for fixing rebalance wakeups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27 00:02:44 -04:00
Kent Overstreet
ff875d4b47 bcachefs: Ensure we print output of run_recovery_pass if it errors
Also, don't error out in bucket_ref_update_err(): we don't want to
return -BCH_ERR_cannot_rewind_recovery if it's not an insert, if it's an
overwrite we continue.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27 00:02:44 -04:00
Kent Overstreet
97e69f12ed bcachefs: Fix missing BTREE_UPDATE_internal_snapshot_node
Repair code will do updates on older snapshot versions, so needs the
correct annotation.

Reported-by: syzbot+42581416dba62b364750@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-25 03:22:18 -04:00
Kent Overstreet
7098ba57c4 bcachefs: fix REFLINK_P_MAY_UPDATE_OPTIONS
If we're doing a reflink copy of existing reflinked data, we may only
set REFLINK_P_MAY_UPDATE_OPTIONS if it was set on the reflink pointer
we're copying from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-25 03:22:18 -04:00
Kent Overstreet
9caea9208f bcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGE
Large folios aren't supported without TRANSPARENT_HUGEPAGE

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 22:00:07 -04:00
Kent Overstreet
3f2f028814 bcachefs: Fix btree_iter_next_node() for new locking asserts
We can't unlock a should_be_locked path unless we're in a transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 22:00:07 -04:00
Kent Overstreet
521f9584c2 bcachefs: Ensure we don't use a blacklisted journal seq
Different versions differ on the size of the blacklist range; it is
theoretically possible that we could end up with blacklisted journal
sequence numbers newer than the newest seq we find in the journal, and
pick a new start seq that's blacklisted.

Explicitly check for this in bch2_fs_journal_start().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 19:52:31 -04:00
Kent Overstreet
9b133c0d74 bcachefs: Small check_fix_ptr fixes
We don't want to change the bucket gen, on gen mismatch: it's possible
to have multiple btree nodes with different gens in the same bucket that
we want to keep, if we have to recover from btree node scan.

It's also not necessary to set g->gen_valid; add a comment to that
effect.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 19:52:31 -04:00
Kent Overstreet
cade003209 bcachefs: Fix opts.recovery_pass_last
This was lost in the giant recovery pass rework - but it's used heavily
by bcachefs subcommand utilities.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 19:52:31 -04:00
Kent Overstreet
f351d91edd bcachefs: Fix allocate -> self healing path
When we go to allocate and find taht a bucket in the freespace btree is
actually allocated, we're supposed to return nonzero to tell the
allocator to skip it.

This fixes an emergency read only due to a bucket/ptr gen mismatch - we
also don't return the correct bucket gen when this happens.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 19:52:31 -04:00
Kent Overstreet
016c4b48b8 bcachefs: Fix endianness in casefold check/repair
Fixes: 010c894681 ("bcachefs: Check for casefolded dirents in non casefolded dirs")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 19:52:31 -04:00
Kent Overstreet
b41ac97fe0 bcachefs: Path must be locked if trans->locked && should_be_locked
If path->should_be_locked is true, that means user code (of the btree
API) has seen, in this transaction, something guarded by the node this
path has locked, and we have to keep it locked until the end of the
transaction.

Assert that we're not violating this; should_be_locked should also be
cleared only in _very_ special situations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
22e921a6f9 bcachefs: Simplify bch2_path_put()
Simplify the "do we need to keep this locked?" checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
80a160e494 bcachefs: Plumb btree_trans for more locking asserts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
df92f3500b bcachefs: Clear trans->locked before unlock
We're adding new should_be_locked assertions: it's going to be illegal
to unlock a should_be_locked path when trans->locked is true.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
eb34365ada bcachefs: Clear should_be_locked before unlock in key_cache_drop()
We're adding new should_be_locked assertions, also add a comment
explaining why clearing should_be_locked is safe here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
be9fecdcda bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_locked
Small additional optimization over the previous patch, bringing us
closer to the original behaviour, except when we need to clone to avoid
a transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
aac49471b6 bcachefs: Give out new path if upgrade fails
Avoid transaction restarts due to failure to upgrade - we can traverse a
new iterator without a transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
66782b2acb bcachefs: Fix btree_path_get_locks when not doing trans restart
btree_path_get_locks, on failure, shouldn't unlock if we're not issuing
a transaction restart: we might drop locks we're not supposed to (if
path->should_be_locked is set).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
5b7b342c40 bcachefs: btree_node_locked_type_nowrite()
Small helper to improve locking assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
659489f37b bcachefs: Kill bch2_path_put_nokeep()
bch2_path_put_nokeep() was intended for paths we wouldn't need to
preserve for a transaction restart - it always frees them right away
when the ref hits 0.

But since paths are shared, freeing unconditionally is a bug, the path
might have been used elsewhere and have should_be_locked set, i.e. we
need to keep it locked until the end of the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
2a6c0136ae bcachefs: bch2_journal_write_checksum()
We need to delay checksumming the journal write; we don't know the
blocksize until after we allocate the write.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22 15:13:17 -04:00
Kent Overstreet
d385ca5603 bcachefs: Reduce stack usage in data_update_index_update()
Separate tracepoint message generation and other slowpath code into
non-inline functions, and use bch2_trans_log_str() instead of using a
printbuf for our journal message.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22 15:13:17 -04:00
Kent Overstreet
7d886a82bf bcachefs: bch2_trans_log_str()
The data update path doesn't need a printbuf for its log message - this
will help reduce stack usage.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22 15:13:17 -04:00
Kent Overstreet
4a9eb20efa bcachefs: Kill bkey_buf usage in data_update_index_update()
Reduce stack usage - bkey_buf has a 96 byte buffer on the stack, but the
btree_trans bump allocator works just fine here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22 15:13:17 -04:00
Kent Overstreet
bfc0c6fecf bcachefs: Drop empty accounting updates
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:19:24 -04:00
Kent Overstreet
136d082abc bcachefs: Improve trace_trans_restart_upgrade
- Convert to a 'fs_str' tracepoint that just emits as a string: this
  lets us build up the tracepoint with a printbuf, using our pretty
  printers, and they're much easier to manage

- Include locks_held, before and after

- Include the btree node pointer we failed on (error pointer, null, or
  real node)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:11 -04:00
Kent Overstreet
f638b84224 bcachefs: fix bch2_inum_snapshot_to_path()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:11 -04:00
Kent Overstreet
2faa8ab0d0 bcachefs: fix duplicate printk
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:10 -04:00
Kent Overstreet
4ba99dde33 bcachefs: BCH_INODE_has_case_insensitive
Add a flag for tracking whether a directory has case-insensitive
descendents - so that overlayfs can disallow mounting, even though the
filesystem supports case insensitivity.

This is a new on disk format version, with a (cheap) upgrade to ensure
the flag is correctly set on existing inodes.

Create, rename and fssetxattr are all plumbed to ensure the new flag is
set, and we've got new fsck code that hooks into check_inode(0.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:10 -04:00
Kent Overstreet
77eac89c79 bcachefs: bch2_inode_find_by_inum_snapshot()
Move a fsck.c helper into inode.c, eliminate some duplicate and organize
the inode lookup helpers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:09 -04:00
Kent Overstreet
77aeaa2f0f bcachefs: bch2_inum_snapshot_to_path()
Add a better helper for printing out paths of inodes when we don't know
the subvolume, for fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:09 -04:00
Kent Overstreet
7c4f22af25 bcachefs: bch2_rename_trans() only runs rename-to-dir code if needed
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:08 -04:00
Kent Overstreet
011d644b76 bcachefs: subvol_inum_eq()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:08 -04:00
Kent Overstreet
c3a7fd95e0 bcachefs: Don't set bi_casefold on non directories
bi_casefold only makes sense for directories, and since it's one of the
variable length fields setting it unnecessarily wastes space.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:08 -04:00
Alan Huang
a96c5e5045 bcachefs: Remove duplicate call to bch2_trans_begin()
There is one in for_each_btree_key_max().

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:08 -04:00
Kent Overstreet
c631bb41f5 bcachefs: Call bch2_bkey_set_needs_rebalance() earlier in write path
There's no reason to be running this inside our transaction; it forces
us to copy the key we're updating to a temporary, which we'd like to
skip.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:07 -04:00