We now more often do repair automatically, without the user invoking
fsck - and sometimes that can involve fixing lots of errors, so let's
avoid flooding the dmesg log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The inconsistency error path calls print_string_as_lines, which calls
console_lock, which is a potentially-sleeping function and so can't be
called in an atomic context.
Replace calls to it with the nonblocking variant which is safe to call.
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Two fixes from the recent logging changes:
bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
context, or with rcu_read_lock() held.
The one syzbot found is in
bch2_bkey_pick_read_device
bch2_dev_rcu
bch2_fs_inconsistent
We're starting to switch to lift the printbufs up to higher levels so we
can emit better log messages and print them all in one go (avoid
garbling), so that conversion will help with spotting these in the
future; when we declare a printbuf it must be flagged if we're in an
atomic context.
Secondly, in btree_node_write_endio:
00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6
00085 preempt_count: 10001, expected: 0
00085 RCU nest depth: 0, expected: 0
00085 4 locks held by bch-reclaim/fa6/618:
00085 #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198
00085 #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440
00085 #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440
00085 #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130
00085 irq event stamp: 328
00085 hardirqs last enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0
00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60
00085 softirqs last enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118
00085 softirqs last disabled at (0): [<0000000000000000>] 0x0
00085 Preemption disabled at:
00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90
00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798
00085 Hardware name: linux,dummy-virt (DT)
00085 Call trace:
00085 show_stack+0x1c/0x30 (C)
00085 dump_stack_lvl+0x84/0xc0
00085 dump_stack+0x14/0x20
00085 __might_resched+0x180/0x288
00085 __might_sleep+0x4c/0x88
00085 __kmalloc_node_track_caller_noprof+0x34c/0x3e0
00085 krealloc_noprof+0x1a0/0x2d8
00085 bch2_printbuf_make_room+0x9c/0x120
00085 bch2_prt_printf+0x60/0x1b8
00085 btree_node_write_endio+0x1b0/0x2d8
00085 bio_endio+0x138/0x1f0
00085 btree_node_write_endio+0xe8/0x2d8
00085 bio_endio+0x138/0x1f0
00085 blk_update_request+0x220/0x4c0
00085 blk_mq_end_request+0x28/0x148
00085 virtblk_request_done+0x64/0xe8
00085 blk_mq_complete_request+0x34/0x40
00085 virtblk_done+0x78/0x130
00085 vring_interrupt+0x6c/0xb0
00085 __handle_irq_event_percpu+0x8c/0x2e0
00085 handle_irq_event+0x50/0xb0
00085 handle_fasteoi_irq+0xc4/0x250
00085 handle_irq_desc+0x44/0x60
00085 generic_handle_domain_irq+0x20/0x30
00085 gic_handle_irq+0x54/0xc8
00085 call_on_irq_stack+0x24/0x40
Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.
Also, add better indenting.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.
Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We're improving our handling of write errors - we shouldn't write
degraded data just because a write failed once, we should retry it (on
other devices, if possible).
But for this to work, we need to kick devices out when they're only
returning errors - otherwise those retries will loop infinitely.
This adds a configurable timeout - if writes are failing for too long,
we'll set that device read-only.
In the future we should also implement more tracking and another knob
for an "allowed error rate", so that we can kick out drives that are
acting "unhealthy".
Another thing we'll want is a mechanism (likely in userspace) for
bringing a device back in after a transient error - perhaps a cable was
jiggled, or there was a controller reset.
After transient errors we also need a mechanism to walk (from the
journal) recent btree updates that weren't flushed to that device and
treat them as "degraded", since unflushed data may well not have been
written. Out of scope for this patch, but becoming relevant.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
A user has been seeing the "error verifying existing checksum while
rewriting existing data (memory corruption?)" error.
This generally indicates a hardware issue (and that may be the case
here), but it might also indicate a bug, in which case we need more
information to look for patterns.
Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.
Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This lets us print the exact location in the journal if it was found in
the journal, or correctly print if it was found in the superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a function for walking backpointers to find a path from a given
inode number, and convert various error messages to use it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a new parameter to bkey validate functions, and use it to improve
invalid bkey error messages: we can now print the btree and depth it
came from, or if it came from the journal, or is a btree root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Factor out a common helper, need_discard_or_freespace_err(), which is
now used by both fsck and the runtime checks, and can repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If we find an error that indicates that we need to run fsck, we can
specify that directly with run_explicit_recovery_pass().
These are now log_fsck_err() calls: we're just logging in the superblock
that an error occurred - and possibly doing an emergency shutdown,
depending on policy.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Newly generated keys, in the transaction commit path or write path,
should not be AUTOFIX; those indicate bugs that we need to fail fast
for.
Fixes: 5612daafb7 ("bcachefs: Fix fsck warnings from bkey validation")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
__bch2_fsck_err() warns if the current task has a btree_trans object and
it wasn't passed in, because if it has to prompt for user input it has
to be able to unlock it.
But plumbing the btree_trans through bkey_validate(), as well as
transaction restarts, is problematic - so instead make bkey fsck errors
FSCK_AUTOFIX, which doesn't need to warn.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.
This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.
Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We now read the line from the buffer atomically, which means we have to
allow the buffer to grow past STDIO_REDIRECT_BUFSIZE if we're waiting
for a full line - this behaviour is necessary for
stdio_redirect_readline_timeout() in the next patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
fsck_err() now optionally takes a btree_trans; if the current thread has
one, it is required that it be passed.
The next patch will use this to unlock when waiting for user input.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
i.e. the start of automatic self healing:
If errors=continue or fix_safe, we now automatically fix simple errors
without user intervention.
New error action option: fix_safe
This replaces the existing errors=ro option, which gets a new slot, i.e.
existing errors=ro users now get errors=fix_safe.
This is currently only enabled for a limited set of errors - initially
just disk accounting; errors we would never not want to fix, and we
don't want to require user intervention (i.e. to make sure a bug report
gets filed).
Errors will still be counted in the superblock, so we (developers) will
still know they've been occuring if a bug report gets filed (as bug
reports typically include the errors superblock section).
Eventually we'll be enabling this for a much wider set of errors, after
we've done thorough error injection testing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.
So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing
fsck errors. Also; set fix_errors to ask by default when fsck is
running.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.
- recovery_passes_requried: recovery passes that must be run on the
next mount
- errors_silent: errors that will be silently fixed
These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.
Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a new superblock section to keep counts of errors seen since
filesystem creation: we'll be addingcounters for every distinct fsck
error.
The new superblock section has entries of the for [ id, count,
time_of_last_error ]; this is intended to let us see what errors are
occuring - and getting fixed - via show-super output.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We now track IO errors per device since filesystem creation.
IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and
explicitly running a specific recovery pass - this is a more general
replacement for how we were running topology repair before.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Before, it was parsed as a bool but internally it was really an enum:
this lets us pass in all the possible values.
But we special case the option parsing: no supplied value is parsed as
FSCK_FIX_yes, to match the previous behaviour.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- getline() output includes a newline, without stripping that we were
just looping
- Make the prompt clearer
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We had some bugs with setting/using first_this_inode in the inode
walker in the dirents/xattr code.
This patch changes to not clear first_this_inode until after
initializing the new hash info.
Also, we fix an error message to not print on transaction restart, and
add a comment to related fsck error code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This changes the ask_yn() function used by fsck to accept Y or N,
meaning yes or no for all errors of a given type.
With this, the user can be prompted only for distinct error types -
useful when a filesystem has lots of errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>