Commit Graph

1337207 Commits

Author SHA1 Message Date
Kent Overstreet
e75993b0bf bcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry path
When we do a read to a buffer that's mapped into userspace, it's
possible to get a spurious checksum error if userspace was modified the
buffer at the same time.

When we retry those, they have to be bounced before we know definitively
whether we're reading corrupt data.

But the retry path propagates read flags differently, so needs special
handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Kent Overstreet
943f0cfb15 bcachefs: Convert read path to standard error codes
Kill the READ_ERR/READ_RETRY/READ_RETRY_AVOID enums, and add standard
error codes that describe precisely which error occured.

This is going to be used for the data move path, to move but poison
extents with checksum errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Kent Overstreet
5a06cb8000 bcachefs: Debug params for data corruption injection
dm-flakey is busted, and this is simpler anyways - this lets us test the
checksum error retry ptahs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Kent Overstreet
6d80fca9ef bcachefs: Don't create bch_io_failures unless it's needed
Only needed in retry path, no point in wasting stack space.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Kent Overstreet
9ec0089149 bcachefs: bch2_bkey_ptrs_rebalance_opts()
Small optimization for bch2_bkey_sectors_need_rebalance()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Kent Overstreet
7c1e2a254f bcachefs: Add a cond_resched() to btree cache teardown
[12308.606480] watchdog: BUG: soft lockup - CPU#18 stuck for 26s! [umount:48479]
[12308.606485] Modules linked in: bcachefs lz4hc_compress lz4_compress lz4_decompress sunrpc overlay nf_conntrack_netlink xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE bridge stp llc xfrm_user ip6table_nat ip6table_filter ip6_tables iptable_nat xt_addrtype iptable_filter ip_tables x_tables nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 psample ext4 mbcache jbd2 nls_iso8859_1 nls_cp850 vfat fat binfmt_misc skx_edac_common nfit edac_core libnvdimm cbc encrypted_keys intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common ipmi_ssif x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm drivetemp rapl intel_cstate coretemp mgag200 i2c_algo_bit ixgbe drm_shmem_helper drm_kms_helper mdio_devres xfrm_algo mdio drm ptp intel_uncore mei_me efi_pstore evdev uas pl2303 pps_core libphy usb_storage usbserial lpc_ich mei drm_panel_orientation_quirks acpi_power_meter tiny_power_button ipmi_si mfd_core intel_pch_thermal acpi_tad acpi_ipmi ioatdma
[12308.606541]  ipmi_devintf ipmi_msghandler dca wmi button efivarfs polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sha1_generic xhci_pci xhci_hcd aesni_intel ehci_pci ehci_hcd gf128mul crypto_simd cryptd usbcore hpwdt usb_common
[12308.606557] CPU: 18 UID: 0 PID: 48479 Comm: umount Tainted: G             L     6.14.0-rc6-x86_64-00159-ga09496a03e63 #1
[12308.606560] Tainted: [L]=SOFTLOCKUP
[12308.606561] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 07/20/2023
[12308.606563] RIP: 0010:clear_page_erms+0x7/0x10
[12308.606570] Code: 48 89 47 38 48 8d 7f 40 75 d9 90 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[12308.606572] RSP: 0018:ffff9ed5b622fba0 EFLAGS: 00010246
[12308.606574] RAX: 0000000000000000 RBX: ffff90347fffe6c0 RCX: 00000000000004c0
[12308.606575] RDX: ffffe34ea9bec1c0 RSI: 00000000000405f0 RDI: ffff902eafb07b40
[12308.606576] RBP: ffff9ed5b622fbf0 R08: 0000000000000001 R09: 0000000000000006
[12308.606577] R10: 0000000000040001 R11: 0000000000000000 R12: ffffe34ea9bec000
[12308.606578] R13: 0000000000000000 R14: 0000000000000006 R15: ffffe34ea9bed000
[12308.606580] FS:  00007fe704ecfb68(0000) GS:ffff9053fea00000(0000) knlGS:0000000000000000
[12308.606581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12308.606582] CR2: 00007f18159068ae CR3: 00000001314d0005 CR4: 00000000007726f0
[12308.606583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[12308.606584] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[12308.606584] PKRU: 55555554
[12308.606585] Call Trace:
[12308.606587]  <IRQ>
[12308.606590]  ? show_regs.cold+0x19/0x28
[12308.606595]  ? watchdog_timer_fn.cold+0x3d/0x9d
[12308.606598]  ? __pfx_watchdog_timer_fn+0x10/0x10
[12308.606602]  ? __hrtimer_run_queues+0x12e/0x250
[12308.606607]  ? hrtimer_interrupt+0xfd/0x220
[12308.606609]  ? __sysvec_apic_timer_interrupt+0x53/0xe0
[12308.606614]  ? sysvec_apic_timer_interrupt+0x76/0xa0
[12308.606619]  </IRQ>
[12308.606620]  <TASK>
[12308.606620]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[12308.606626]  ? clear_page_erms+0x7/0x10
[12308.606628]  ? __free_pages_ok+0x374/0x640
[12308.606633]  free_frozen_pages+0x34/0x570
[12308.606636]  __folio_put+0x87/0xe0
[12308.606641]  free_large_kmalloc+0x70/0x80
[12308.606645]  kfree+0x2f6/0x390
[12308.606648]  kvfree+0x2d/0x40
[12308.606653]  __btree_node_data_free+0xaf/0xf0 [bcachefs]
[12308.606726]  btree_node_data_free+0x6a/0x80 [bcachefs]
[12308.606778]  bch2_fs_btree_cache_exit+0x262/0x440 [bcachefs]
[12308.606829]  bch2_fs_release+0xe8/0x340 [bcachefs]
[12308.606905]  kobject_put+0x60/0xc0
[12308.606908]  bch2_fs_free+0xdd/0x120 [bcachefs]
[12308.606981]  bch2_kill_sb+0x1e/0x30 [bcachefs]
[12308.607051]  deactivate_locked_super+0x32/0xb0
[12308.607055]  deactivate_super+0x40/0x50
[12308.607057]  cleanup_mnt+0xc3/0x160
[12308.607060]  __cleanup_mnt+0x12/0x20
[12308.607062]  task_work_run+0x5f/0xa0
[12308.607064]  syscall_exit_to_user_mode+0x194/0x1a0
[12308.607066]  do_syscall_64+0x67/0x170
[12308.607068]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[12308.607070] RIP: 0033:0x7fe704e66eed
[12308.607073] Code: 08 49 89 ca b8 a5 00 00 00 0f 05 48 89 c7 e8 8a e6 ff ff 48 83 c4

Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Kent Overstreet
c991fbee8e bcachefs: rebalance, copygc status also print stacktrace
These are commonly needed when debugging, and saves from having to ask
users to dig.

Also, rebalance_status now includes pending rebalance work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16 13:47:55 -04:00
Kent Overstreet
8dc4514d58 bcachefs: Kill bch2_remount()
Single caller, so inline it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:03:16 -04:00
Kent Overstreet
a2e9e68746 bcachefs: Kill a bit of dead code
Found with CC=clang W=1

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Thorsten Blum
ff4cb203cc bcachefs: Use max() to improve gen_after()
Use max() to simplify gen_after() and improve its readability.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Thorsten Blum
c073ec6bec bcachefs: Remove unnecessary byte allocation
The extra byte is not used - remove it.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
94373026d9 bcachefs: We no longer read stripes into memory at startup
And the stripes heap gets deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
434a3f2ffa bcachefs: trace_stripe_create
Add a simple tracepoint for stripe creation, we'll want to expand this
later.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
6c336144b9 bcachefs: get_existing_stripe() uses new stripe lru
Convert to the new persistent stripe LRU.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
039790cfb5 bcachefs: ec_stripe_delete() uses new stripe lru
Convert to the new persistent stripe LRU.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
4b0fac4bed bcachefs: journal write path comment
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
981e380144 bcachefs: Kick devices out after too many write IO errors
We're improving our handling of write errors - we shouldn't write
degraded data just because a write failed once, we should retry it (on
other devices, if possible).

But for this to work, we need to kick devices out when they're only
returning errors - otherwise those retries will loop infinitely.

This adds a configurable timeout - if writes are failing for too long,
we'll set that device read-only.

In the future we should also implement more tracking and another knob
for an "allowed error rate", so that we can kick out drives that are
acting "unhealthy".

Another thing we'll want is a mechanism (likely in userspace) for
bringing a device back in after a transient error - perhaps a cable was
jiggled, or there was a controller reset.

After transient errors we also need a mechanism to walk (from the
journal) recent btree updates that weren't flushed to that device and
treat them as "degraded", since unflushed data may well not have been
written. Out of scope for this patch, but becoming relevant.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
d71e023376 bcachefs: Change BCH_MEMBER_STATE_failed semantics
Previously, we woudn't try to read at all from a failed device - that
doesn't make much sense, the device may be unhealthy (perhaps taking
longer than it should to service reads), but if it's our only option we
should still try to read from it.

Now, bch2_bkey_pick_read_device() will pick failed devices only if there
are no non-failed replicas to read from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
cf164a9106 bcachefs: bch2_dev_get_ioref() may now sleep
The next patch implementing freezing will change bch2_dev_get_ioref() to
sleep if a device is currently frozen.

Add an annotation and fix the journal code accordingly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
2efa8397ca bcachefs: Fix btree_node_scan io_ref handling
This was completely fubar; it's now simplified a bit as well.
Note that for_each_online_member() takes and releases io_refs as it
iterates, so we need to release that if we break.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
d5308203a8 bcachefs: Implement blk_holder_ops
We can't use the standard fs_holder_ops because they're meant for single
device filesystems - fs_bdev_mark_dead() in particular - and they assume
that the blk_holder is the super_block, which also doesn't work for a
multi device filesystem.

These generally follow the standard fs_holder_ops; the
locking/refcounting is a bit simplified because c->ro_ref suffices, and
bch2_fs_bdev_mark_dead() is not necessarily shutting down the entire
filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
1fdbe0b184 bcachefs: Make sure c->vfs_sb is set before starting fs
This is necessary for the new blk_holder_ops, which want the vfs
super_block available for synchronization.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
13fd6be102 bcachefs: Stash a pointer to the filesystem for blk_holder_ops
Note that we open block devices before we allocate bch_fs, but once
attached to a filesystem they will be closed before the bch_fs is torn
down - so stashing a pointer without a refcount looks incorrect but it's
not.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
b31c070407 bcachefs: Finish bch2_account_io_completion() conversions
More prep work for automatically kicking devices out after too many IO
errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
3526bca36b bcachefs: bch2_account_io_completion()
We need to start accounting successes for every IO, not just failures,
so introduce a unified hook for io completion accounting and convert
io_read.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
3480aecd5f bcachefs: Fix read path io_ref handling
We were using our device pointer after we'd released our ref to it.

Unlikely to be a race that's practical to hit, since actually removing a
member device is a whole process besides just taking it offline, but -
needs to be fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
7bc5808168 bcachefs: data_update now checks for extents that can't be moved
If a device is ro or failed, we might not have anywhere to move a
replica.

Check for this early, before doing the read and attempting to write.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
fba513a9ee bcachefs: give bch2_write_super() a proper error code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
4a90675cfe bcachefs: bcachefs_metadata_version_extent_flags
This implements a new extent field bitflags that apply to the whole
extent. There's been a couple things we've wanted this for in the past,
but the immediate need is extent poisoning, to solve a rebalance issue.

Unknown extent fields can't be parsed (we won't known their size, so we
can't advance to the next field), so this is an incompat feature, and
using it prevents the filesystem from being mounted by old versions.

This also adds the BCH_EXTENT_poisoned flag; this indicates that the
data is known to be bad (i.e. there was a checksum error, and we had to
write a new checksum) and reads will return errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
6422bf8117 bcachefs: bch2_request_incompat_feature() now returns error code
For future usage, we'll want a dedicated error code for better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Thorsten Blum
bafd41b435 bcachefs: Fix error type in bch2_alloc_v3_validate()
Use error type alloc_v3_unpack_error in bch2_alloc_v3_validate().

Fixes: b65db750e2 ("bcachefs: Enumerate fsck errors")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
fb195fa753 bcachefs: BCH_SB_FEATURES_ALL includes BCH_FEATURE_incompat_verison_field
These features are set on format and incompat upgarde.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya
a42d685ff2 Documentation: bcachefs: SubmittingPatches: Convert footnotes to reST syntax
Footnotes list are outputted in htmldocs simply as long-running
paragraph instead. Use reST numbered footnotes syntax for the job.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya
76d6305dca Documentation: bcachefs: SubmittingPatches: Demote section headings
SubmttingPatches.rst has 4 section headings, all under the same heading
levels. In absence of title headings, these section headings are all
ended up as title headings in the docs output, which also affect
the index toctree (increasing titles to 6 from the original 2)
due to :numbered: option.

Demote second-to-last section headings, making "Submitting patches
to bcachefs" as title heading.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya
93422e0b33 Documentation: bcachefs: Split index toctree
bcachefs subsystem currently has 4 docs: two are development notes and
the rest are actual filesystem docs. These two groups are clearly
distinct and can be organized.

Split the toctree into two, one for each docs group. While at it, also
reduce :maxdepth: so that only title headings are listed in the
toctrees.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya
7442ef7082 Documentation: bcachefs: Add casefolding toctree entry
Sphinx reports htmldocs toctree warning:

Documentation/filesystems/bcachefs/casefolding.rst: WARNING: document isn't included in any toctree

Fix the warning by adding casefolding documentation entry to bcachefs
toctree.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161728.32739f85@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya
47d4100b15 Documentation: bcachefs: casefolding: Use bullet list for dirent structure
The doc lists dirent structure for both regular and casefolded names,
yet it is written (and rendered) as long paragraph instead.

Write the structure list as bullet list.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya
210997859a Documentation: bcachefs: casefolding: Fix dentry/dcache considerations section
Sphinx reports htmldocs warnings on dentry/dcache section:

Documentation/filesystems/bcachefs/casefolding.rst:75: WARNING: Title underline too short.

dentry/dcache considerations
--------- [docutils]
Documentation/filesystems/bcachefs/casefolding.rst:84: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]

Fix the section by:

* Extending the section underline to match the section title length;
* Separating problem list from surrounding paragraphs.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161911.2d16138b@canb.auug.org.au/
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya
82b5666912 Documentation: bcachefs: casefolding: Do not italicize NUL
Sphinx reports htmldocs warning:

Documentation/filesystems/bcachefs/casefolding.rst:36: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils]

That's because NUL word is italicized but it is written in plural form
instead (`NUL`s). Sphinx, however, doesn't tip over when the italicized
word in this fashion is followed by punctuation instead.

Do not italicize the word to keep Sphinx happy.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
24d790a7da bcachefs: sysfs internal/trigger_btree_updates
Add a debug knob to manually trigger the btree updates worker.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Joshua Ashton
d37c14ac6f bcachefs: bcachefs_metadata_version_casefolding
This patch implements support for case-insensitive file name lookups
in bcachefs.

The implementation uses the same UTF-8 lowering and normalization that
ext4 and f2fs is using.

More information is provided in Documentation/bcachefs/casefolding.rst

Compatibility notes:

This uses the new versioning scheme for incompatible features where an
incompatible feature is tied to a version number: the superblock says
"we may use incompat features up to x" and "incompat features up to x
are in use", disallowing mounting by previous versions.

Additionally, and old style incompat feature bit is used, so that
kernels without utf8 casefolding support know if casefolding
specifically is in use and they're allowed to mount.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Joshua Ashton
76872d46b7 bcachefs: Split out dirent alloc and name initialization
Splits out the code that allocates the dirent and initializes the name
to make things easier to implement casefolding in a future commit.

Cc: André Almeida <andrealmeid@igalia.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
72f4edcf45 bcachefs: Kill dirent_occupied_size() in create path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
68171d91ce bcachefs: Kill dirent_occupied_size() in rename path
Cc: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
6756e385a5 bcachefs: bcachefs_metadata_version_stripe_lru
Add a persistent LRU for stripes, ordered by "number of empty blocks",
i.e. order in which we wish to reuse them.

This will replace the in-memory stripes heap, so we can kill off reading
stripes into memory at startup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
88d961b518 bcachefs: bcachefs_metadata_version_stripe_backpointers
Stripes now have backpointers.

This is needed for proper scrub - stripe checksums need to be verified,
separately from extents within the stripe, since a block may not be full
of live extents but it's still needed for reconstruct.

And this will be needed for (efficient) evacuate/repair paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Kent Overstreet
69bd8a9277 bcachefs: Advance bch_alloc.oldest_gen if no stale pointers
Now that we've got cached backpointers and aren't leaving around stale
pointers on bucket invalidation, we no longer need the periodic (rare)
gc_gens - which recalculates each bucket's oldest gen to avoid wraparound.

We can't delete that code because we've got to support existing
filesystems that will still have stale pointers, but this gets rid of
another scalability limit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:14 -04:00
Kent Overstreet
942a418c7a bcachefs: Invalidate cached data by backpointers
If we don't leave stale pointers around, we won't have to deal with
bucket gen wraparound.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:14 -04:00
Kent Overstreet
15800f3d4b bcachefs: bcachefs_metadata_version_cached_backpointers
Cached pointers now have backpointers.

This means that we'll be able to kill cached pointers in the
bucket_invalidate path, when invalidating/reusing buckets containing
cached data, instead of leaving them around to be cleaned up by gc_gens
garbago collection - which requires a full metadata scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:14 -04:00
Kent Overstreet
65bc7688b8 bcachefs: rework bch2_trans_commit_run_triggers()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:14 -04:00