linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-11 01:53:20 -04:00

Author	SHA1	Message	Date
Mark Harmstone	68d3b27e05	btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages() Change btrfs_encoded_read_regular_fill_pages() so that the priv struct is allocated rather than stored on the stack, in preparation for adding an asynchronous mode to the function. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:21 +01:00
Mark Harmstone	973a432637	btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set Change btrfs_encoded_read() so that it returns -EAGAIN rather than sleeps if IOCB_NOWAIT is set in iocb->ki_flags. The conditions that require sleeping are: inode lock, writeback, extent lock, ordered range. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:21 +01:00
Mark Harmstone	26efd44796	btrfs: change btrfs_encoded_read() so that reading of extent is done by caller Change the behaviour of btrfs_encoded_read() so that if it needs to read an extent from disk, it leaves the extent and inode locked and returns -EIOCBQUEUED. The caller is then responsible for doing the I/O via btrfs_encoded_read_regular() and unlocking the extent and inode. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:21 +01:00
Mark Harmstone	4bca7412b8	btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read() iocb->ki_pos isn't used after this function, so there's no point in changing its value. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:21 +01:00
Filipe Manana	7f13360ef9	btrfs: remove no longer used delayed ref head search functionality After the previous patch, which converted the rb-tree used to track delayed ref heads into an xarray, the find_ref_head() function is now used only by one caller which always passes false to the 'return_bigger' argument. So remove the 'return_bigger' logic, simplifying the function, and move all the function code to the single caller. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:21 +01:00
Filipe Manana	928ed1349d	btrfs: track delayed ref heads in an xarray Currently we use a red black tree (rb-tree) to track the delayed ref heads (in struct btrfs_delayed_ref_root::href_root). This however is not very efficient when the number of delayed ref heads is large (and it's very common to be at least in the order of thousands) since rb-trees are binary trees. For example for 10K delayed ref heads, the tree has a depth of 13. Besides that, inserting into the tree requires navigating through it and pulling useless cache lines in the process since the red black tree nodes are embedded within the delayed ref head structure - on the other hand, by being embedded, it requires no extra memory allocations. We can improve this by using an xarray instead which has a much higher branching factor than a red black tree (binary balanced tree) and is more cache friendly and behaves like a resizable array, with a much better search and insertion complexity than a red black tree. This only has one small disadvantage which is that insertion will sometimes require allocating memory for the xarray - which may fail (not that often since it uses a kmem_cache) - but on the other hand we can reduce the delayed ref head structure size by 24 bytes (from 152 down to 128 bytes) after removing the embedded red black tree node, meaning than we can now fit 32 delayed ref heads per 4K page instead of 26, and that gain compensates for the occasional memory allocations needed for the xarray nodes. We also end up using only 2 cache lines instead of 3 per delayed ref head. Running the following fs_mark test showed some improvements: $ cat test.sh #!/bin/bash DEV=/dev/nullb0 MNT=/mnt/nullb0 MOUNT_OPTIONS="-o ssd" FILES=100000 THREADS=$(nproc --all) echo "performance" \| \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor mkfs.btrfs -f $DEV mount $MOUNT_OPTIONS $DEV $MNT OPTS="-S 0 -L 5 -n $FILES -s 0 -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT Before this patch: FSUse% Count Size Files/sec App Overhead 10 1200000 0 171845.7 12253839 16 2400000 0 230898.7 12308254 23 3600000 0 212292.9 12467768 30 4800000 0 195737.8 12627554 46 6000000 0 171055.2 12783329 After this patch: FSUse% Count Size Files/sec App Overhead 10 1200000 0 173835.0 12246131 16 2400000 0 233537.8 12271746 23 3600000 0 220398.7 12307737 30 4800000 0 204483.6 12392318 40 6000000 0 182923.3 12771843 Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:21 +01:00
Filipe Manana	d3aaeea771	btrfs: add comments regarding locking to struct btrfs_delayed_ref_root Add some comments to struct btrfs_delayed_ref_root's fields to mention what its spinlock protects. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:21 +01:00
Filipe Manana	a8985ac6be	btrfs: assert delayed refs lock is held at add_delayed_ref_head() The delayed refs lock must be held when calling add_delayed_ref_head(), so assert that it's being held. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	64a71f0b8a	btrfs: assert delayed refs lock is held at find_first_ref_head() The delayed refs lock must be held when calling find_first_ref_head(), so assert that it's being held. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	7226ed7d44	btrfs: assert delayed refs lock is held at find_ref_head() We have 3 callers for find_ref_head() so assert at find_ref_head() that we have the delayed refs lock held, removing the assertion from one of its callers (btrfs_find_delayed_ref_head()). Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	5f54384c73	btrfs: pass fs_info to btrfs_delete_ref_head() One of the following patches in the series will need to access fs_info at btrfs_delete_ref_head(), so pass a fs_info argument to it. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	765f828902	btrfs: pass fs_info to functions that search for delayed ref heads One of the following patches in the series will need to access fs_info in the function find_ref_head(), so pass a fs_info argument to it as well as to the functions btrfs_select_ref_head() and btrfs_find_delayed_ref_head() which call find_ref_head(). Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	58a4391810	btrfs: move delayed ref head unselection to delayed-ref.c The unselect_delayed_ref_head() at extent-tree.c doesn't really belong in that file as it's a delayed refs specific detail and therefore should be at delayed-ref.c. Further its inverse, btrfs_select_ref_head(), is at delayed-ref.c, so it only makes sense to have it there too. So move unselect_delayed_ref_head() into delayed-ref.c and rename it to btrfs_unselect_ref_head() so that its name closely matches its inverse (btrfs_select_ref_head()). Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	a98048e10d	btrfs: simplify obtaining a delayed ref head Instead of doing it in two steps outside of delayed-ref.c, leaking low level details such as locking, move the logic entirely to delayed-ref.c under btrfs_select_ref_head(), reducing code and making things simpler for the caller. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	7ef3604886	btrfs: change return type of btrfs_delayed_ref_lock() to boolean The function only returns 0, meaning it was able to lock the delayed ref head, or -EAGAIN in case it wasn't able to lock it. So simplify this and use a boolean return type instead, returning true if it was able to lock and false otherwise. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	f7d4b4924d	btrfs: remove num_entries atomic counter from delayed ref root The atomic counter 'num_entries' is not used anymore, we increment it and decrement it but then we don't ever read it to use for any logic. Its last use was removed with commit `61a56a992f` ("btrfs: delayed refs pre-flushing should only run the heads we have"). So remove it. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	055903c4e7	btrfs: use helper to find first ref head at btrfs_destroy_delayed_refs() Instead of open coding it, use the find_first_ref_head() helper at btrfs_destroy_delayed_refs(). This avoids duplicating the logic, specially with the upcoming changes in subsequent patches. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	8d07a8f4c6	btrfs: remove duplicated code to drop delayed ref during transaction abort When destroying delayed refs during a transaction abort, we have open coded the removal of a delayed ref, which is also done by the static helper function drop_delayed_ref(). So remove that duplicated code and use drop_delayed_ref() instead. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:20 +01:00
Filipe Manana	c3a5888e0f	btrfs: remove fs_info parameter from btrfs_cleanup_one_transaction() The fs_info parameter is redundant because it can be extracted from the transaction given as another parameter. So remove it and use the fs_info accessible from the transaction. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Filipe Manana	2f6e05a5cc	btrfs: remove fs_info parameter from btrfs_destroy_delayed_refs() The fs_info parameter is redundant because it can be extracted from the transaction given as another parameter. So remove it and use the fs_info accessible from the transaction. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Filipe Manana	22a0ae1889	btrfs: move btrfs_destroy_delayed_refs() to delayed-ref.c It's better suited at delayed-ref.c since it's about delayed refs and contains logic to iterate over them (using the red black tree, doing all the locking, freeing, etc), so move it from disk-io.c, which is pretty big, into delayed-ref.c, hiding implementation details of how delayed refs are tracked and managed. This also facilitates the next patches in the series. This change moves the code between files but also does the following simple cleanups: 1) Rename the 'cache' variable to 'bg', since it's a block group (the 'cache' logic comes from old days where the block group structure was named 'btrfs_block_group_cache'); 2) Move the 'ref' variable declaration to the scope of the inner while loop, since it's not used outside that loop. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Filipe Manana	00f529661b	btrfs: remove BUG_ON() at btrfs_destroy_delayed_refs() At btrfs_destroy_delayed_refs() it's unexpected to not find the block group to which a delayed reference's extent belongs to, so we have this BUG_ON(), not just because it's highly unexpected but also because we don't know what to do there. Since we are in the transaction abort path, there's nothing we can do other than proceed and cleanup all used resources we can. So remove the BUG_ON() and deal with a missing block group by logging an error message and continuing to cleanup all we can related to the current delayed ref head and moving to other delayed refs. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Robbie Ko	1d16c2761b	btrfs: reduce extent tree lock contention when searching for inline backref When inserting extent backref, in order to check whether refs other than inline refs are used, we always use path keep locks for tree search, which will increase the lock contention of extent tree. We do not need the parent node every time to determine whether normal refs are used. It is only needed when the extent item is the last item in a leaf. Therefore, we change it to first use keep_locks=0 for search. If the extent item happens to be the last item in the leaf, we then change to keep_locks=1 for the second search to reduce lock contention. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Johannes Thumshirn	6e6ecdec22	btrfs: tests: implement case for partial RAID stripe-tree delete Implement self-tests for partial deletion of RAID stripe-tree entries. These two new tests cover both the deletion of the front of a RAID stripe-tree stripe extent as well as truncation of an item to make it smaller. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Johannes Thumshirn	6aea95ee31	btrfs: implement partial deletion of RAID stripe extents In our CI system, the RAID stripe tree configuration sometimes fails with the following ASSERT(): assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64 This ASSERT()ion triggers, because for the initial design of RAID stripe-tree, I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind. But for a RAID stripe-tree based system, that is not hosted on a zoned storage device, but on a regular device this rule doesn't apply. So in case the range we want to delete starts in the middle of the previous item, grab the item and "truncate" it's length. That is, clone the item, subtract the deleted portion from the key's offset, delete the old item and insert the new one. In case the range to delete ends in the middle of an item, we have to adjust both the item's key as well as the stripe extents and then re-insert the modified clone into the tree after deleting the old stripe extent. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Anand Jain	d07eaa9995	btrfs: use filemap_get_folio() helper When fgp_flags and gfp_flags are zero, use filemap_get_folio(A, B) instead of __filemap_get_folio(A, B, 0, 0)—no need for the extra arguments 0, 0. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Qu Wenruo	e820dbeb6a	btrfs: convert btrfs_buffered_write() to use folios The buffered write path is still heavily utilizing the page interface. Since we have converted it to do a page-by-page copying, it's much easier to convert all involved functions to folio interface, this involves: - btrfs_copy_from_user() - btrfs_drop_folio() - prepare_uptodate_page() - prepare_one_page() - lock_and_cleanup_extent_if_need() - btrfs_dirty_page() All function are changed to accept a folio parameter, and if the word "page" is in the function name, change that to "folio" too. The function btrfs_dirty_page() is exported for v1 space cache, convert v1 cache call site to convert its page to folio for the new interface. And there is a small enhancement for prepare_one_folio(), instead of manually waiting for the page writeback, let __filemap_get_folio() to handle that by using FGP_WRITEBEGIN, which implies (FGP_LOCK \| FGP_WRITE \| FGP_CREAT \| FGP_STABLE). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Qu Wenruo	c87c299776	btrfs: make buffered write to copy one page a time Currently the btrfs_buffered_write() is preparing multiple page a time, allowing a better performance. But the current trend is to support larger folio as an optimization, instead of implementing own multi-page optimization. This is inspired by generic_perform_write(), which is copying one folio a time. Such change will prepare us to migrate to implement the write_begin() and write_end() callbacks, and make every involved function a little easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:19 +01:00
Mark Harmstone	b1c5f6eda2	btrfs: fix wrong sizeof in btrfs_do_encoded_write() btrfs_do_encoded_write() was converted to use folios in `400b172b8c`, but we're still allocating based on sizeof(struct page ) rather than sizeof(struct folio ). There's no functional change. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Thorsten Blum	4f285a7752	btrfs: use str_yes_no() helper function in btrfs_dump_free_space() Remove hard-coded strings by using the str_yes_no() and str_no_yes() helper functions. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Qu Wenruo	0f71202665	btrfs: rename btrfs_folio_(set\|start\|end)_writer_lock() Since there is no user of reader locks, rename the writer locks into a more generic name, by removing the "_writer" part from the name. And also rename btrfs_subpage::writer into btrfs_subpage::locked. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Qu Wenruo	336e69f302	btrfs: unify to use writer locks for subpage locking Since commit `d7172f52e9` ("btrfs: use per-buffer locking for extent_buffer reading"), metadata read no longer relies on the subpage reader locking. This means we do not need to maintain a different metadata/data split for locking, so we can convert the existing reader lock users by: - add_ra_bio_pages() Convert to btrfs_folio_set_writer_lock() - end_folio_read() Convert to btrfs_folio_end_writer_lock() - begin_folio_read() Convert to btrfs_folio_set_writer_lock() - folio_range_has_eb() Remove the subpage->readers checks, since it is always 0. - Remove btrfs_subpage_start_reader() and btrfs_subpage_end_reader() Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Qu Wenruo	8511074c42	btrfs: remove unused btrfs_folio_start_writer_lock() This function is not really suitable to lock a folio, as it lacks the proper mapping checks, thus the locked folio may not even belong to btrfs. And due to the above reason, the last user inside lock_delalloc_folios() is already removed, and we can remove this function. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Boris Burkov	70958a949d	btrfs: do not clear read-only when adding sprout device If you follow the seed/sprout wiki, it suggests the following workflow: btrfstune -S 1 seed_dev mount seed_dev mnt btrfs device add sprout_dev mount -o remount,rw mnt The first mount mounts the FS readonly, which results in not setting BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add somewhat surprisingly clears the readonly bit on the sb (though the mount is still practically readonly, from the users perspective...). Finally, the remount checks the readonly bit on the sb against the flag and sees no change, so it does not run the code intended to run on ro->rw transitions, leaving BTRFS_FS_OPEN unset. As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and does no work. This results in leaking deleted snapshots until we run out of space. I propose fixing it at the first departure from what feels reasonable: when we clear the readonly bit on the sb during device add. A new fstest I have written reproduces the bug and confirms the fix. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Filipe Manana	4b5c1200f7	btrfs: remove local generation variable from read_block_for_search() It's redundant to have the 'gen' variable since we already have the same value in the local btrfs_tree_parent_check structure. So remove it and instead use the structure's field. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Filipe Manana	b8e63ea405	btrfs: remove redundant initializations for struct btrfs_tree_parent_check It's pointless to initialize the has_first_key field of the stack local btrfs_tree_parent_check structure at btrfs_tree_parent_check() and at btrfs_qgroup_trace_subtree() since all fields not explicitly initialized are zeroed out. In the case of the first function it's a bit odd because we are assigning 0 and the field is of type bool, however not incorrect since a 0 is converted to false. Just remove the explicit initializations due to their redundancy. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Filipe Manana	c88ebf1db5	btrfs: simplify arguments for btrfs_verify_level_key() The only caller of btrfs_verify_level_key() is read_block_for_search() and it's passing 3 arguments to it that can be extracted from its on stack variable of type struct btrfs_tree_parent_check. So change btrfs_verify_level_key() to accept an argument of type struct btrfs_tree_parent_check instead of level, first key and parent transid arguments. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Filipe Manana	2b1ef80d68	btrfs: remove redundant level argument from read_block_for_search() The level parameter passed to read_block_for_search() always matches the level of the extent buffer passed in the "eb_ret" parameter, which we are also extracting into the "parent_level" local variable. So remove the level parameter and instead use the "parent_level" variable which in fact has a better name (it's the level of the parent node from which we are reading a child node/leaf). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Filipe Manana	a8371fccf0	btrfs: re-enable the extent map shrinker Now that the extent map shrinker can only be run by a single task and runs asynchronously as a work queue job, enable it as it can no longer cause stalls on tasks allocating memory and entering the extent map shrinker through the fs shrinker (implemented by btrfs_free_cached_objects()). This is crucial to prevent exhaustion of memory due to unbounded extent map creation, primarily with direct IO but also for buffered IO on files with holes. This problem, for the direct IO case, was first reported in the Link tag below. That report was added to a Link tag of the first patch that introduced the extent map shrinker, commit `956a17d9d0` ("btrfs: add a shrinker for extent maps"), however the Link tag disappeared somehow from the committed patch (but was included in the submitted patch to the mailing list), so adding it below for future reference. Link: https://lore.kernel.org/linux-btrfs/13f94633dcf04d29aaf1f0a43d42c55e@amazon.com/ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:18 +01:00
Filipe Manana	e7fa845010	btrfs: rename extent map shrinker members from struct btrfs_fs_info The names for the members of struct btrfs_fs_info related to the extent map shrinker are a bit too long, so rename them to be shorter by replacing the "extent_map_" prefix with the "em_" prefix. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
Filipe Manana	70a5f9e266	btrfs: simplify tracking progress for the extent map shrinker Now that the extent map shrinker can only be run by a single task (as a work queue item) there is no need to keep the progress of the shrinker protected by a spinlock and passing the progress to trace events as parameters. So remove the lock and simplify the arguments for the trace events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
Filipe Manana	1020443840	btrfs: make the extent map shrinker run asynchronously as a work queue job Currently the extent map shrinker is run synchronously for kswapd tasks that end up calling the fs shrinker (fs/super.c:super_cache_scan()). This has some disadvantages and for some heavy workloads with memory pressure it can cause some delays and stalls that make a machine unresponsive for some periods. This happens because: 1) We can have several kswapd tasks on machines with multiple NUMA zones, and running the extent map shrinker concurrently can cause high contention on some spin locks, namely the spin locks that protect the radix tree that tracks roots, the per root xarray that tracks open inodes and the list of delayed iputs. This not only delays the shrinker but also causes high CPU consumption and makes the task running the shrinker monopolize a core, resulting in the symptoms of an unresponsive system. This was noted in previous commits such as commit `ae1e766f62` ("btrfs: only run the extent map shrinker from kswapd tasks"); 2) The extent map shrinker's iteration over inodes can often be slow, even after changing the data structure that tracks open inodes for a root from a red black tree (up to kernel 6.10) to an xarray (kernel 6.10+). The transition to the xarray while it made things a bit faster, it's still somewhat slow - for example in a test scenario with 10000 inodes that have no extent maps loaded, the extent map shrinker took between 5ms to 8ms, using a release, non-debug kernel. Iterating over the extent maps of an inode can also be slow if have an inode with many thousands of extent maps, since we use a red black tree to track and search extent maps. So having the extent map shrinker run synchronously adds extra delay for other things a kswapd task does. So make the extent map shrinker run asynchronously as a job for the system unbounded workqueue, just like what we do for data and metadata space reclaim jobs. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
Filipe Manana	03ba050583	btrfs: add and use helper to remove extent map from its inode's tree Move the common code to remove an extent map from its inode's tree into a helper function and use it, reducing duplicated code. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
Robbie Ko	99785998ed	btrfs: reduce lock contention when eb cache miss for btree search When crawling btree, if an eb cache miss occurs, we change to use the eb read lock and release all previous locks (including the parent lock) to reduce lock contention. If an eb cache miss occurs in a leaf and needs to execute IO, before this change we released locks only from level 2 and up and we read a leaf's content from disk while holding a lock on its parent (level 1), causing the unnecessary lock contention on the parent, after this change we release locks from level 1 and up, but we lock level 0, and read leaf's content from disk. Because we have prepared the check parameters and the read lock of eb we hold, we can ensure that no race will occur during the check and cause unexpected errors. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
David Sterba	a9c50c9756	btrfs: drop unused parameter level from alloc_heuristic_ws() The compression heuristic pass does not need a level, so we can drop the parameter. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
David Sterba	8c7cd2b6c9	btrfs: drop unused parameter fs_info from btrfs_match_dir_item_name() Cascaded removal of fs_info that is not needed in several functions. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
David Sterba	d12a1a2a30	btrfs: drop unused parameter transaction from alloc_log_tree() The function got split in commit `6ab6ebb760` ("btrfs: split alloc_log_tree()") and since then transaction parameter has been unused. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
David Sterba	01c5db782e	btrfs: drop unused parameter data from btrfs_fill_super() The only caller passes NULL, we can drop the parameter. This is since the new mount option parser done in `3bb17a25bc` ("btrfs: add get_tree callback for new mount API"). Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
David Sterba	87cbab8636	btrfs: drop unused parameter options from open_ctree() Since the new mount option parser in commit `ad21f15b0f` ("btrfs: switch to the new mount API") we don't pass the options like that anymore. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00
David Sterba	ec315b4b9f	btrfs: drop unused parameter fs_info from folio_range_has_eb() The parameter was added in `8ff8466d29` ("btrfs: support subpage for extent buffer page release") for page but hasn't been used since, so we can drop it. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:17 +01:00

1 2 3 4 5 ...

1311142 Commits