mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-05-05 21:44:23 -04:00
35d99c68af40a8ca175babc5a89ef7e2226fb3ca
1325358 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
35d99c68af |
btrfs: fix a leaked chunk map issue in read_one_chunk()
Add btrfs_free_chunk_map() to free the memory allocated
by btrfs_alloc_chunk_map() if btrfs_add_chunk_map() fails.
Fixes:
|
||
|
|
2df2c6ed89 |
btrfs: replace deprecated strncpy() with strscpy()
strncpy() is deprecated for NUL-terminated destination buffers. Use strscpy() instead and don't zero-initialize the param array. Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
5a4041f2c4 |
btrfs: zoned: fix extent range end unlock in cow_file_range()
Running generic/751 on the for-next branch often results in a hang like
below. They are both stack by locking an extent. This suggests someone
forget to unlock an extent.
INFO: task kworker/u128:1:12 blocked for more than 323 seconds.
Not tainted 6.13.0-BTRFS-ZNS+ #503
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u128:1 state:D stack:0 pid:12 tgid:12 ppid:2 flags:0x00004000
Workqueue: btrfs-fixup btrfs_work_helper [btrfs]
Call Trace:
<TASK>
__schedule+0x534/0xdd0
schedule+0x39/0x140
__lock_extent+0x31b/0x380 [btrfs]
? __pfx_autoremove_wake_function+0x10/0x10
btrfs_writepage_fixup_worker+0xf1/0x3a0 [btrfs]
btrfs_work_helper+0xff/0x480 [btrfs]
? lock_release+0x178/0x2c0
process_one_work+0x1ee/0x570
? srso_return_thunk+0x5/0x5f
worker_thread+0x1d1/0x3b0
? __pfx_worker_thread+0x10/0x10
kthread+0x10b/0x230
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
INFO: task kworker/u134:0:184 blocked for more than 323 seconds.
Not tainted 6.13.0-BTRFS-ZNS+ #503
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u134:0 state:D stack:0 pid:184 tgid:184 ppid:2 flags:0x00004000
Workqueue: writeback wb_workfn (flush-btrfs-4)
Call Trace:
<TASK>
__schedule+0x534/0xdd0
schedule+0x39/0x140
__lock_extent+0x31b/0x380 [btrfs]
? __pfx_autoremove_wake_function+0x10/0x10
find_lock_delalloc_range+0xdb/0x260 [btrfs]
writepage_delalloc+0x12f/0x500 [btrfs]
? srso_return_thunk+0x5/0x5f
extent_write_cache_pages+0x232/0x840 [btrfs]
btrfs_writepages+0x72/0x130 [btrfs]
do_writepages+0xe7/0x260
? srso_return_thunk+0x5/0x5f
? lock_acquire+0xd2/0x300
? srso_return_thunk+0x5/0x5f
? find_held_lock+0x2b/0x80
? wbc_attach_and_unlock_inode.part.0+0x102/0x250
? wbc_attach_and_unlock_inode.part.0+0x102/0x250
__writeback_single_inode+0x5c/0x4b0
writeback_sb_inodes+0x22d/0x550
__writeback_inodes_wb+0x4c/0xe0
wb_writeback+0x2f6/0x3f0
wb_workfn+0x32a/0x510
process_one_work+0x1ee/0x570
? srso_return_thunk+0x5/0x5f
worker_thread+0x1d1/0x3b0
? __pfx_worker_thread+0x10/0x10
kthread+0x10b/0x230
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
This happens because we have another success path for the zoned mode. When
there is no active zone available, btrfs_reserve_extent() returns
-EAGAIN. In this case, we have two reactions.
(1) If the given range is never allocated, we can only wait for someone
to finish a zone, so wait on BTRFS_FS_NEED_ZONE_FINISH bit and retry
afterward.
(2) Or, if some allocations are already done, we must bail out and let
the caller to send IOs for the allocation. This is because these IOs
may be necessary to finish a zone.
The commit
|
||
|
|
efa11fd269 |
btrfs: fix data overwriting bug during buffered write when block size < page size
[BUG]
When running generic/418 with a btrfs whose block size < page size
(subpage cases), it always fails.
And the following minimal reproducer is more than enough to trigger it
reliably:
workload()
{
mkfs.btrfs -s 4k -f $dev > /dev/null
dmesg -C
mount $dev $mnt
$fsstree_dir/src/dio-invalidate-cache -r -b 4096 -n 3 -i 1 -f $mnt/diotest
ret=$?
umount $mnt
stop_trace
if [ $ret -ne 0 ]; then
fail
fi
}
for (( i = 0; i < 1024; i++)); do
echo "=== $i/$runtime ==="
workload
done
[CAUSE]
With extra trace printk added to the following functions:
- btrfs_buffered_write()
* Which folio is touched
* The file offset (start) where the buffered write is at
* How many bytes are copied
* The content of the write (the first 2 bytes)
- submit_one_sector()
* Which folio is touched
* The position inside the folio
* The content of the page cache (the first 2 bytes)
- pagecache_isize_extended()
* The parameters of the function itself
* The parameters of the folio_zero_range()
Which are enough to show the problem:
22.158114: btrfs_buffered_write: folio pos=0 start=0 copied=4096 content=0x0101
22.158161: submit_one_sector: r/i=5/257 folio=0 pos=0 content=0x0101
22.158609: btrfs_buffered_write: folio pos=0 start=4096 copied=4096 content=0x0101
22.158634: btrfs_buffered_write: folio pos=0 start=8192 copied=4096 content=0x0101
22.158650: pagecache_isize_extended: folio=0 from=4096 to=8192 bsize=4096 zero off=4096 len=8192
22.158682: submit_one_sector: r/i=5/257 folio=0 pos=4096 content=0x0000
22.158686: submit_one_sector: r/i=5/257 folio=0 pos=8192 content=0x0101
The tool dio-invalidate-cache will start 3 threads, each doing a buffered
write with 0x01 at offset 0, 4096 and 8192, do a fsync, then do a direct read,
and compare the read buffer with the write buffer.
Note that all 3 btrfs_buffered_write() are writing the correct 0x01 into
the page cache.
But at submit_one_sector(), at file offset 4096, the content is zeroed
out, by pagecache_isize_extended().
The race happens like this:
Thread A is writing into range [4K, 8K).
Thread B is writing into range [8K, 12k).
Thread A | Thread B
-------------------------------------+------------------------------------
btrfs_buffered_write() | btrfs_buffered_write()
|- old_isize = 4K; | |- old_isize = 4096;
|- btrfs_inode_lock() | |
|- write into folio range [4K, 8K) | |
|- pagecache_isize_extended() | |
| extend isize from 4096 to 8192 | |
| no folio_zero_range() called | |
|- btrfs_inode_lock() | |
| |- btrfs_inode_lock()
| |- write into folio range [8K, 12K)
| |- pagecache_isize_extended()
| | calling folio_zero_range(4K, 8K)
| | This is caused by the old_isize is
| | grabbed too early, without any
| | inode lock.
| |- btrfs_inode_unlock()
The @old_isize is grabbed without inode lock, causing race between two
buffered write threads and making pagecache_isize_extended() to zero
range which is still containing cached data.
And this is only affecting subpage btrfs, because for regular blocksize
== page size case, the function pagecache_isize_extended() will do
nothing if the block size >= page size.
[FIX]
Grab the old i_size while holding the inode lock.
This means each buffered write thread will have a stable view of the
old inode size, thus avoid the above race.
CC: stable@vger.kernel.org # 5.15+
Fixes:
|
||
|
|
b1bf18223a |
btrfs: output an error message if btrfs failed to find the seed fsid
[BUG]
If btrfs failed to locate the seed device for whatever reason, mounting
the sprouted device will fail without any meaning error message:
# mkfs.btrfs -f /dev/test/scratch1
# btrfstune -S1 /dev/test/scratch1
# mount /dev/test/scratch1 /mnt/btrfs
# btrfs dev add -f /dev/test/scratch2 /mnt/btrfs
# umount /mnt/btrfs
# btrfs dev scan -u
# btrfs mount /dev/test/scratch2 /mnt/btrfs
mount: /mnt/btrfs: fsconfig system call failed: No such file or directory.
dmesg(1) may have more information after failed mount system call.
# dmesg -t | tail -n6
BTRFS info (device dm-5): first mount of filesystem 64252ded-5953-4868-b962-cea48f7ac4ea
BTRFS info (device dm-5): using crc32c (crc32c-generic) checksum algorithm
BTRFS info (device dm-5): using free-space-tree
BTRFS error (device dm-5): failed to read chunk tree: -2
BTRFS error (device dm-5): open_ctree failed: -2
[CAUSE]
The failure to mount is pretty straight forward, just unable to find the
seed device and its fsid, caused by `btrfs dev scan -u`.
But the lack of any useful info is a problem.
[FIX]
Just add an extra error message in open_seed_devices() to indicate the
error.
Now the error message would look like this:
BTRFS info (device dm-4): first mount of filesystem 7769223d-4db1-4e4c-ac29-0a96f53576ab
BTRFS info (device dm-4): using crc32c (crc32c-generic) checksum algorithm
BTRFS info (device dm-4): using free-space-tree
BTRFS error (device dm-4): failed to find fsid e87c12e6-584b-4e98-8b88-962c33a619ff when attempting to open seed devices
BTRFS error (device dm-4): failed to read chunk tree: -2
BTRFS error (device dm-4): open_ctree failed: -2
Link: https://github.com/kdave/btrfs-progs/issues/959
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
15b3b3254d |
btrfs: do regular iput instead of delayed iput during extent map shrinking
The extent map shrinker now runs in the system unbound workqueue and no longer in kswapd context so it can directly do an iput() on inodes even if that blocks or needs to acquire any lock (we aren't holding any locks when requesting the delayed iput from the shrinker). So we don't need to add a delayed iput, wake up the cleaner and delegate the iput() to the cleaner, which also adds extra contention on the spinlock that protects the delayed iputs list. Reported-by: Ivan Shapovalov <intelfx@intelfx.name> Tested-by: Ivan Shapovalov <intelfx@intelfx.name> Link: https://lore.kernel.org/linux-btrfs/0414d690ac5680d0d77dfc930606cdc36e42e12f.camel@intelfx.name/ CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
c6c9c4d564 |
btrfs: skip inodes without loaded extent maps when shrinking extent maps
If there are inodes that don't have any loaded extent maps, we end up grabbing a reference on them and later adding a delayed iput, which wakes up the cleaner and makes it do unnecessary work. This is common when for example the inodes were open only to run stat(2) or all their extent maps were already released through the folio release callback (btrfs_release_folio()) or released by a previous run of the shrinker, or directories which never have extent maps. Reported-by: Ivan Shapovalov <intelfx@intelfx.name> Tested-by: Ivan Shapovalov <intelfx@intelfx.name> Link: https://lore.kernel.org/linux-btrfs/0414d690ac5680d0d77dfc930606cdc36e42e12f.camel@intelfx.name/ CC: stable@vger.kernel.org # 6.13+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
59f37036bb |
btrfs: fix use-after-free on inode when scanning root during em shrinking
At btrfs_scan_root() we are accessing the inode's root (and fs_info) in a
call to btrfs_fs_closing() after we have scheduled the inode for a delayed
iput, and that can result in a use-after-free on the inode in case the
cleaner kthread does the iput before we dereference the inode in the call
to btrfs_fs_closing().
Fix this by using the fs_info stored already in a local variable instead
of doing inode->root->fs_info.
Fixes:
|
||
|
|
290237fde9 |
btrfs: selftests: fix btrfs_test_delayed_refs() leak of transaction
The btrfs_transaction struct leaks, which can cause sporadic fstests
failures when kmemleak checking is enabled:
kmemleak: 5 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810fdc6c00 (size 512):
comm "modprobe", pid 203, jiffies 4294892552
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 6736050f):
__kmalloc_cache_noprof+0x133/0x2c0
btrfs_test_delayed_refs+0x6f/0xbb0 [btrfs]
btrfs_run_sanity_tests.cold+0x91/0xf9 [btrfs]
0xffffffffa02fd055
do_one_initcall+0x49/0x1c0
do_init_module+0x5b/0x1f0
init_module_from_file+0x70/0x90
idempotent_init_module+0xe8/0x2c0
__x64_sys_finit_module+0x6b/0xd0
do_syscall_64+0x54/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The transaction struct was initially stack-allocated but switched to
heap following frame size compiler warnings.
Fixes:
|
||
|
|
da2dccd745 |
btrfs: fix hole expansion when writing at an offset beyond EOF
At btrfs_write_check() if our file's i_size is not sector size aligned and
we have a write that starts at an offset larger than the i_size that falls
within the same page of the i_size, then we end up not zeroing the file
range [i_size, write_offset).
The code is this:
start_pos = round_down(pos, fs_info->sectorsize);
oldsize = i_size_read(inode);
if (start_pos > oldsize) {
/* Expand hole size to cover write data, preventing empty gap */
loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
if (ret)
return ret;
}
So if our file's i_size is 90269 bytes and a write at offset 90365 bytes
comes in, we get 'start_pos' set to 90112 bytes, which is less than the
i_size and therefore we don't zero out the range [90269, 90365) by
calling btrfs_cont_expand().
This is an old bug introduced in commit
|
||
|
|
acc18e1c1d |
btrfs: fix stale page cache after race between readahead and direct IO write
After commit |
||
|
|
01af106a07 |
btrfs: fix two misuses of folio_shift()
It is meaningless to shift a byte count by folio_shift(). The folio index is in units of PAGE_SIZE, not folio_size(). We can use folio_contains() to make this work for arbitrary-order folios, so remove the assertion that the folios are of order 0. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
fdef89ce6f |
btrfs: avoid starting new transaction when cleaning qgroup during subvolume drop
At btrfs_qgroup_cleanup_dropped_subvolume() all we want to commit the current transaction in order to have all the qgroup rfer/excl numbers up to date. However we are using btrfs_start_transaction(), which joins the current transaction if there is one that is not yet committing, but also starts a new one if there is none or if the current one is already committing (its state is >= TRANS_STATE_COMMIT_START). This later case results in unnecessary IO, wasting time and a pointless rotation of the backup roots in the super block. So instead of using btrfs_start_transaction() followed by a btrfs_commit_transaction(), use btrfs_commit_current_transaction() which achieves our purpose and avoids starting and committing new transactions. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
e2f0943cf3 |
btrfs: fix use-after-free when attempting to join an aborted transaction
When we are trying to join the current transaction and if it's aborted,
we read its 'aborted' field after unlocking fs_info->trans_lock and
without holding any extra reference count on it. This means that a
concurrent task that is aborting the transaction may free the transaction
before we read its 'aborted' field, leading to a use-after-free.
Fix this by reading the 'aborted' field while holding fs_info->trans_lock
since any freeing task must first acquire that lock and set
fs_info->running_transaction to NULL before freeing the transaction.
This was reported by syzbot and Dmitry with the following stack traces
from KASAN:
==================================================================
BUG: KASAN: slab-use-after-free in join_transaction+0xd9b/0xda0 fs/btrfs/transaction.c:278
Read of size 4 at addr ffff888011839024 by task kworker/u4:9/1128
CPU: 0 UID: 0 PID: 1128 Comm: kworker/u4:9 Not tainted 6.13.0-rc7-syzkaller-00019-gc45323b7560e #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: events_unbound btrfs_async_reclaim_data_space
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x169/0x550 mm/kasan/report.c:489
kasan_report+0x143/0x180 mm/kasan/report.c:602
join_transaction+0xd9b/0xda0 fs/btrfs/transaction.c:278
start_transaction+0xaf8/0x1670 fs/btrfs/transaction.c:697
flush_space+0x448/0xcf0 fs/btrfs/space-info.c:803
btrfs_async_reclaim_data_space+0x159/0x510 fs/btrfs/space-info.c:1321
process_one_work kernel/workqueue.c:3236 [inline]
process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
worker_thread+0x870/0xd30 kernel/workqueue.c:3398
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 5315:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4329
kmalloc_noprof include/linux/slab.h:901 [inline]
join_transaction+0x144/0xda0 fs/btrfs/transaction.c:308
start_transaction+0xaf8/0x1670 fs/btrfs/transaction.c:697
btrfs_create_common+0x1b2/0x2e0 fs/btrfs/inode.c:6572
lookup_open fs/namei.c:3649 [inline]
open_last_lookups fs/namei.c:3748 [inline]
path_openat+0x1c03/0x3590 fs/namei.c:3984
do_filp_open+0x27f/0x4e0 fs/namei.c:4014
do_sys_openat2+0x13e/0x1d0 fs/open.c:1402
do_sys_open fs/open.c:1417 [inline]
__do_sys_creat fs/open.c:1495 [inline]
__se_sys_creat fs/open.c:1489 [inline]
__x64_sys_creat+0x123/0x170 fs/open.c:1489
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5336:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2353 [inline]
slab_free mm/slub.c:4613 [inline]
kfree+0x196/0x430 mm/slub.c:4761
cleanup_transaction fs/btrfs/transaction.c:2063 [inline]
btrfs_commit_transaction+0x2c97/0x3720 fs/btrfs/transaction.c:2598
insert_balance_item+0x1284/0x20b0 fs/btrfs/volumes.c:3757
btrfs_balance+0x992/0x10c0 fs/btrfs/volumes.c:4633
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff888011839000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 36 bytes inside of
freed 2048-byte region [ffff888011839000, ffff888011839800)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11838
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000040 ffff88801ac42000 ffffea0000493400 dead000000000002
raw: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
head: 00fff00000000040 ffff88801ac42000 ffffea0000493400 dead000000000002
head: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
head: 00fff00000000003 ffffea0000460e01 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 57, tgid 57 (kworker/0:2), ts 67248182943, free_ts 67229742023
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1558
prep_new_page mm/page_alloc.c:1566 [inline]
get_page_from_freelist+0x365c/0x37a0 mm/page_alloc.c:3476
__alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4753
alloc_pages_mpol_noprof+0x3e1/0x780 mm/mempolicy.c:2269
alloc_slab_page+0x6a/0x110 mm/slub.c:2423
allocate_slab+0x5a/0x2b0 mm/slub.c:2589
new_slab mm/slub.c:2642 [inline]
___slab_alloc+0xc27/0x14a0 mm/slub.c:3830
__slab_alloc+0x58/0xa0 mm/slub.c:3920
__slab_alloc_node mm/slub.c:3995 [inline]
slab_alloc_node mm/slub.c:4156 [inline]
__do_kmalloc_node mm/slub.c:4297 [inline]
__kmalloc_node_track_caller_noprof+0x2e9/0x4c0 mm/slub.c:4317
kmalloc_reserve+0x111/0x2a0 net/core/skbuff.c:609
__alloc_skb+0x1f3/0x440 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1323 [inline]
alloc_skb_with_frags+0xc3/0x820 net/core/skbuff.c:6612
sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2884
sock_alloc_send_skb include/net/sock.h:1803 [inline]
mld_newpack+0x1c3/0xaf0 net/ipv6/mcast.c:1747
add_grhead net/ipv6/mcast.c:1850 [inline]
add_grec+0x1492/0x19a0 net/ipv6/mcast.c:1988
mld_send_cr net/ipv6/mcast.c:2114 [inline]
mld_ifc_work+0x691/0xd90 net/ipv6/mcast.c:2651
page last free pid 5300 tgid 5300 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1127 [inline]
free_unref_page+0xd3f/0x1010 mm/page_alloc.c:2659
__slab_free+0x2c2/0x380 mm/slub.c:4524
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
kasan_slab_alloc include/linux/kasan.h:250 [inline]
slab_post_alloc_hook mm/slub.c:4119 [inline]
slab_alloc_node mm/slub.c:4168 [inline]
__do_kmalloc_node mm/slub.c:4297 [inline]
__kmalloc_noprof+0x236/0x4c0 mm/slub.c:4310
kmalloc_noprof include/linux/slab.h:905 [inline]
kzalloc_noprof include/linux/slab.h:1037 [inline]
fib_create_info+0xc14/0x25b0 net/ipv4/fib_semantics.c:1435
fib_table_insert+0x1f6/0x1f20 net/ipv4/fib_trie.c:1231
fib_magic+0x3d8/0x620 net/ipv4/fib_frontend.c:1112
fib_add_ifaddr+0x40c/0x5e0 net/ipv4/fib_frontend.c:1156
fib_netdev_event+0x375/0x490 net/ipv4/fib_frontend.c:1494
notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
__dev_notify_flags+0x207/0x400
dev_change_flags+0xf0/0x1a0 net/core/dev.c:9045
do_setlink+0xc90/0x4210 net/core/rtnetlink.c:3109
rtnl_changelink net/core/rtnetlink.c:3723 [inline]
__rtnl_newlink net/core/rtnetlink.c:3875 [inline]
rtnl_newlink+0x1bb6/0x2210 net/core/rtnetlink.c:4012
Memory state around the buggy address:
ffff888011838f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888011838f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888011839000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888011839080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888011839100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Reported-by: syzbot+45212e9d87a98c3f5b42@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/678e7da5.050a0220.303755.007c.GAE@google.com/
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/linux-btrfs/CACT4Y+ZFBdo7pT8L2AzM=vegZwjp-wNkVJZQf0Ta3vZqtExaSw@mail.gmail.com/
Fixes:
|
||
|
|
c9c8637933 |
btrfs: do not output error message if a qgroup has been already cleaned up
[BUG]
There is a bug report that btrfs outputs the following error message:
BTRFS info (device nvme0n1p2): qgroup scan completed (inconsistency flag cleared)
BTRFS warning (device nvme0n1p2): failed to cleanup qgroup 0/1179: -2
[CAUSE]
The error itself is pretty harmless, and the end user should ignore it.
When a subvolume is fully dropped, btrfs will call
btrfs_qgroup_cleanup_dropped_subvolume() to delete the qgroup.
However if a qgroup rescan happened before a subvolume fully dropped,
qgroup for that subvolume will not be re-created, as rescan will only
create new qgroup if there is a BTRFS_ROOT_REF_KEY found.
But before we drop a subvolume, the subvolume is unlinked thus there is no
BTRFS_ROOT_REF_KEY.
In that case, btrfs_remove_qgroup() will fail with -ENOENT and trigger
the above error message.
[FIX]
Just ignore -ENOENT error from btrfs_remove_qgroup() inside
btrfs_qgroup_cleanup_dropped_subvolume().
Reported-by: John Shand <jshand2013@gmail.com>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1236056
Fixes:
|
||
|
|
0d85f5c2dd |
btrfs: fix assertion failure when splitting ordered extent after transaction abort
If while we are doing a direct IO write a transaction abort happens, we
mark all existing ordered extents with the BTRFS_ORDERED_IOERR flag (done
at btrfs_destroy_ordered_extents()), and then after that if we enter
btrfs_split_ordered_extent() and the ordered extent has bytes left
(meaning we have a bio that doesn't cover the whole ordered extent, see
details at btrfs_extract_ordered_extent()), we will fail on the following
assertion at btrfs_split_ordered_extent():
ASSERT(!(flags & ~BTRFS_ORDERED_TYPE_FLAGS));
because the BTRFS_ORDERED_IOERR flag is set and the definition of
BTRFS_ORDERED_TYPE_FLAGS is just the union of all flags that identify the
type of write (regular, nocow, prealloc, compressed, direct IO, encoded).
Fix this by returning an error from btrfs_extract_ordered_extent() if we
find the BTRFS_ORDERED_IOERR flag in the ordered extent. The error will
be the error that resulted in the transaction abort or -EIO if no
transaction abort happened.
This was recently reported by syzbot with the following trace:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 1
CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
fail_dump lib/fault-inject.c:53 [inline]
should_fail_ex+0x3b0/0x4e0 lib/fault-inject.c:154
should_failslab+0xac/0x100 mm/failslab.c:46
slab_pre_alloc_hook mm/slub.c:4072 [inline]
slab_alloc_node mm/slub.c:4148 [inline]
__do_kmalloc_node mm/slub.c:4297 [inline]
__kmalloc_noprof+0xdd/0x4c0 mm/slub.c:4310
kmalloc_noprof include/linux/slab.h:905 [inline]
kzalloc_noprof include/linux/slab.h:1037 [inline]
btrfs_chunk_alloc_add_chunk_item+0x244/0x1100 fs/btrfs/volumes.c:5742
reserve_chunk_space+0x1ca/0x2c0 fs/btrfs/block-group.c:4292
check_system_chunk fs/btrfs/block-group.c:4319 [inline]
do_chunk_alloc fs/btrfs/block-group.c:3891 [inline]
btrfs_chunk_alloc+0x77b/0xf80 fs/btrfs/block-group.c:4187
find_free_extent_update_loop fs/btrfs/extent-tree.c:4166 [inline]
find_free_extent+0x42d1/0x5810 fs/btrfs/extent-tree.c:4579
btrfs_reserve_extent+0x422/0x810 fs/btrfs/extent-tree.c:4672
btrfs_new_extent_direct fs/btrfs/direct-io.c:186 [inline]
btrfs_get_blocks_direct_write+0x706/0xfa0 fs/btrfs/direct-io.c:321
btrfs_dio_iomap_begin+0xbb7/0x1180 fs/btrfs/direct-io.c:525
iomap_iter+0x697/0xf60 fs/iomap/iter.c:90
__iomap_dio_rw+0xeb9/0x25b0 fs/iomap/direct-io.c:702
btrfs_dio_write fs/btrfs/direct-io.c:775 [inline]
btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880
btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397
do_iter_readv_writev+0x600/0x880
vfs_writev+0x376/0xba0 fs/read_write.c:1050
do_pwritev fs/read_write.c:1146 [inline]
__do_sys_pwritev2 fs/read_write.c:1204 [inline]
__se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1281f85d29
RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29
RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002
R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328
</TASK>
BTRFS error (device loop0 state A): Transaction aborted (error -12)
BTRFS: error (device loop0 state A) in btrfs_chunk_alloc_add_chunk_item:5745: errno=-12 Out of memory
BTRFS info (device loop0 state EA): forced readonly
assertion failed: !(flags & ~BTRFS_ORDERED_TYPE_FLAGS), in fs/btrfs/ordered-data.c:1234
------------[ cut here ]------------
kernel BUG at fs/btrfs/ordered-data.c:1234!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234
RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246
RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4
R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401
R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000
FS: 00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_extract_ordered_extent fs/btrfs/direct-io.c:702 [inline]
btrfs_dio_submit_io+0x4be/0x6d0 fs/btrfs/direct-io.c:737
iomap_dio_submit_bio fs/iomap/direct-io.c:85 [inline]
iomap_dio_bio_iter+0x1022/0x1740 fs/iomap/direct-io.c:447
__iomap_dio_rw+0x13b7/0x25b0 fs/iomap/direct-io.c:703
btrfs_dio_write fs/btrfs/direct-io.c:775 [inline]
btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880
btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397
do_iter_readv_writev+0x600/0x880
vfs_writev+0x376/0xba0 fs/read_write.c:1050
do_pwritev fs/read_write.c:1146 [inline]
__do_sys_pwritev2 fs/read_write.c:1204 [inline]
__se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1281f85d29
RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29
RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002
R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234
RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246
RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4
R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401
R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000
FS: 00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
In this case the transaction abort was due to (an injected) memory
allocation failure when attempting to allocate a new chunk.
Reported-by: syzbot+f60d8337a5c8e8d92a77@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/6777f2dd.050a0220.178762.0045.GAE@google.com/
Fixes:
|
||
|
|
a216542027 |
btrfs: fix lockdep splat while merging a relocation root
When COWing a relocation tree path, at relocation.c:replace_path(), we can trigger a lockdep splat while we are in the btrfs_search_slot() call against the relocation root. This happens in that callchain at ctree.c:read_block_for_search() when we happen to find a child extent buffer already loaded through the fs tree with a lockdep class set to the fs tree. So when we attempt to lock that extent buffer through a relocation tree we have to reset the lockdep class to the class for a relocation tree, since a relocation tree has extent buffers that used to belong to a fs tree and may currently be already loaded (we swap extent buffers between the two trees at the end of replace_path()). However we are missing calls to btrfs_maybe_reset_lockdep_class() to reset the lockdep class at ctree.c:read_block_for_search() before we read lock an extent buffer, just like we did for btrfs_search_slot() in commit |
||
|
|
9d0c23db26 |
btrfs: selftests: add a selftest for deleting two out of three extents
Add a selftest creating three extents and then deleting two out of the three extents. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
cfda28fb70 |
btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents
Test creating a range of three RAID stripe-extents and then punch a hole in the middle, deleting all of the middle extents and partially deleting the "book ends". Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
27ae15b25b |
btrfs: selftests: add selftest for punching holes into the RAID stripe extents
Add a selftest for punching a hole into a RAID stripe extent. The test create an 1M extent and punches a 64k bytes long hole at offset of 32k from the start of the extent. Afterwards it verifies the start and length of both resulting new extents "left" and "right" as well as the absence of the hole. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
1d395c3926 |
btrfs: selftests: test RAID stripe-tree deletion spanning two items
Add a selftest for RAID stripe-tree deletion with a delete range spanning two items, so that we're punching a hole into two adjacent RAID stripe extents truncating the first and "moving" the second to the right. The following diagram illustrates the operation: |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| |----- keep -----|--- drop ---|----- keep ----| Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
a0afdec255 |
btrfs: selftests: don't split RAID extents in half
The selftests for partially deleting the start or tail of RAID stripe-extents split these extents in half. This can hide errors in the calculation, so don't split the RAID stripe-extents in half but delete the first or last 16K of the 64K extents. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
d44d3d724b |
btrfs: selftests: check for correct return value of failed lookup
Commit
|
||
|
|
dc14ba1078 |
btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
Don't use btrfs_set_item_key_safe() to modify the keys in the RAID
stripe-tree, as this can lead to corruption of the tree, which is caught
by the checks in btrfs_set_item_key_safe():
BTRFS info (device nvme1n1): leaf 49168384 gen 15 total ptrs 194 free space 8329 owner 12
BTRFS info (device nvme1n1): refs 2 lock_owner 1030 current 1030
[ snip ]
item 105 key (354549760 230 20480) itemoff 14587 itemsize 16
stride 0 devid 5 physical 67502080
item 106 key (354631680 230 4096) itemoff 14571 itemsize 16
stride 0 devid 1 physical 88559616
item 107 key (354631680 230 32768) itemoff 14555 itemsize 16
stride 0 devid 1 physical 88555520
item 108 key (354717696 230 28672) itemoff 14539 itemsize 16
stride 0 devid 2 physical 67604480
[ snip ]
BTRFS critical (device nvme1n1): slot 106 key (354631680 230 32768) new key (354635776 230 4096)
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.c:2602!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 1 UID: 0 PID: 1055 Comm: fsstress Not tainted 6.13.0-rc1+ #1464
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
RIP: 0010:btrfs_set_item_key_safe+0xf7/0x270
Code: <snip>
RSP: 0018:ffffc90001337ab0 EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffff8881115fd000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff
RBP: ffff888110ed6f50 R08: 00000000ffffefff R09: ffffffff8244c500
R10: 00000000ffffefff R11: 00000000ffffffff R12: ffff888100586000
R13: 00000000000000c9 R14: ffffc90001337b1f R15: ffff888110f23b58
FS: 00007f7d75c72740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa811652c60 CR3: 0000000111398001 CR4: 0000000000370eb0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x1a
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x65/0x80
? btrfs_set_item_key_safe+0xf7/0x270
? exc_invalid_op+0x50/0x70
? btrfs_set_item_key_safe+0xf7/0x270
? asm_exc_invalid_op+0x1a/0x20
? btrfs_set_item_key_safe+0xf7/0x270
btrfs_partially_delete_raid_extent+0xc4/0xe0
btrfs_delete_raid_extent+0x227/0x240
__btrfs_free_extent.isra.0+0x57f/0x9c0
? exc_coproc_segment_overrun+0x40/0x40
__btrfs_run_delayed_refs+0x2fa/0xe80
btrfs_run_delayed_refs+0x81/0xe0
btrfs_commit_transaction+0x2dd/0xbe0
? preempt_count_add+0x52/0xb0
btrfs_sync_file+0x375/0x4c0
do_fsync+0x39/0x70
__x64_sys_fsync+0x13/0x20
do_syscall_64+0x54/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f7d7550ef90
Code: <snip>
RSP: 002b:00007ffd70237248 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f7d7550ef90
RDX: 000000000000013a RSI: 000000000040eb28 RDI: 0000000000000004
RBP: 000000000000001b R08: 0000000000000078 R09: 00007ffd7023725c
R10: 00007f7d75400390 R11: 0000000000000202 R12: 028f5c28f5c28f5c
R13: 8f5c28f5c28f5c29 R14: 000000000040b520 R15: 00007f7d75c726c8
</TASK>
While the root cause of the tree order corruption isn't clear, using
btrfs_duplicate_item() to copy the item and then adjusting both the key
and the per-device physical addresses is a safe way to counter this
problem.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
6aa0e7cc56 |
btrfs: implement hole punching for RAID stripe extents
If the stripe extent we want to delete starts before the range we want to delete and ends after the range we want to delete we're punching a hole in the stripe extent: |--- RAID Stripe Extent ---| | keep |--- drop ---| keep | This means we need to a) truncate the existing item and b) create a second item for the remaining range. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
7664311904 |
btrfs: fix deletion of a range spanning parts two RAID stripe extents
When a user requests the deletion of a range that spans multiple stripe
extents and btrfs_search_slot() returns us the second RAID stripe extent,
we need to pick the previous item and truncate it, if there's still a
range to delete left, move on to the next item.
The following diagram illustrates the operation:
|--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
|--- keep ---|--- drop ---|
While at it, comment the trivial case of a whole item delete as well.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
50cae2ca69 |
btrfs: fix tail delete of RAID stripe-extents
Fix tail delete of RAID stripe-extents, if there is a range to be deleted as well after the tail delete of the extent. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
a678543e60 |
btrfs: fix front delete range calculation for RAID stripe extents
When deleting the front of a RAID stripe-extent the delete code miscalculates the size on how much to pad the remaining extent part in the front. Fix the calculation so we're always having the sizes we expect. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
5a0e38eab7 |
btrfs: assert RAID stripe-extent length is always greater than 0
When modifying a RAID stripe-extent, ASSERT() that the length of the new RAID stripe-extent is always greater than 0. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
9257d8632a |
btrfs: don't try to delete RAID stripe-extents if we don't need to
Even if the RAID stripe-tree is not enabled in the filesystem, do_free_extent_accounting() still calls into btrfs_delete_raid_extent(). Check if the extent in question is on a block-group that has a profile which is used by RAID stripe-tree before attempting to delete a stripe extent. Return early if it doesn't, otherwise we're doing a unnecessary search. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
c221a9a29d |
btrfs: selftests: correct RAID stripe-tree feature flag setting
RAID stripe-tree is an incompatible feature not a read-only compatible, so set the incompat flag not a compat_ro one in the selftest code. Subsequent changes in btrfs_delete_raid_extent() will start checking for this flag. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
e32dcdb0af |
btrfs: add io_uring interface for encoded writes
Add an io_uring interface for encoded writes, with the same parameters as the BTRFS_IOC_ENCODED_WRITE ioctl. As with the encoded reads code, there's a test program for this at https://github.com/maharmstone/io_uring-encoded, and I'll get this worked into an fstest. How io_uring works is that it initially calls btrfs_uring_cmd with the IO_URING_F_NONBLOCK flag set, and if we return -EAGAIN it tries again in a kthread with the flag cleared. Ideally we'd honour this and call try_lock etc., but there's still a lot of work to be done to create non-blocking versions of all the functions in our write path. Instead, just validate the input in btrfs_uring_encoded_write() on the first pass and return -EAGAIN, with a view to properly optimizing the optimistic path later on. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
bf50aca633 |
btrfs: remove the unused locked_folio parameter from btrfs_cleanup_ordered_extents()
The function btrfs_cleanup_ordered_extents() is only called in error handling path, and the last caller with a @locked_folio parameter was removed to fix a bug in the btrfs_run_delalloc_range() error handling. There is no need to pass @locked_folio parameter anymore. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
975a6a8855 |
btrfs: add extra error messages for delalloc range related errors
All the error handling bugs I hit so far are all -ENOSPC from either: - cow_file_range() - run_delalloc_nocow() - submit_uncompressed_range() Previously when those functions failed, there was no error message at all, making the debugging much harder. So here we introduce extra error messages for: - cow_file_range() - run_delalloc_nocow() - submit_uncompressed_range() - writepage_delalloc() when btrfs_run_delalloc_range() failed - extent_writepage() when extent_writepage_io() failed One example of the new debug error messages is the following one: run fstests generic/750 at 2024-12-08 12:41:41 BTRFS: device fsid 461b25f5-e240-4543-8deb-e7c2bd01a6d3 devid 1 transid 8 /dev/mapper/test-scratch1 (253:4) scanned by mount (2436600) BTRFS info (device dm-4): first mount of filesystem 461b25f5-e240-4543-8deb-e7c2bd01a6d3 BTRFS info (device dm-4): using crc32c (crc32c-arm64) checksum algorithm BTRFS info (device dm-4): forcing free space tree for sector size 4096 with page size 65536 BTRFS info (device dm-4): using free-space-tree BTRFS warning (device dm-4): read-write for sector size 4096 with page size 65536 is experimental BTRFS info (device dm-4): checking UUID tree BTRFS error (device dm-4): cow_file_range failed, root=363 inode=412 start=503808 len=98304: -28 BTRFS error (device dm-4): run_delalloc_nocow failed, root=363 inode=412 start=503808 len=98304: -28 BTRFS error (device dm-4): failed to run delalloc range, root=363 ino=412 folio=458752 submit_bitmap=11-15 start=503808 len=98304: -28 Which shows an error from cow_file_range() which is called inside a nocow write attempt, along with the extra bitmap from writepage_delalloc(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
61d730731b |
btrfs: subpage: dump the involved bitmap when ASSERT() failed
For btrfs_folio_assert_not_dirty() and btrfs_folio_set_lock(), we call bitmap_test_range_all_zero() to ensure the involved range has no dirty/lock bit already set. However with my recent enhanced delalloc range error handling, I was hitting the ASSERT() inside btrfs_folio_set_lock(), and it turns out that some error handling path is not properly updating the folio flags. So add some extra dumping for the ASSERTs to dump the involved bitmap to help debug. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
396294d1af |
btrfs: subpage: fix the bitmap dump of the locked flags
We're dumping the locked bitmap into the @checked_bitmap variable,
printing incorrect values during debug.
Thankfully even during my development I haven't hit a case where I need
to dump the locked bitmap. But for the sake of consistency, fix it by
dupping the locked bitmap into @locked_bitmap variable for output.
Fixes:
|
||
|
|
c2b47df81c |
btrfs: do proper folio cleanup when run_delalloc_nocow() failed
[BUG] With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash with the following VM_BUG_ON_FOLIO(): BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28 BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28 page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664 aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774" flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff) page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2992! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : folio_clear_dirty_for_io+0x128/0x258 lr : folio_clear_dirty_for_io+0x128/0x258 Call trace: folio_clear_dirty_for_io+0x128/0x258 btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs] __process_folios_contig+0x154/0x268 [btrfs] extent_clear_unlock_delalloc+0x5c/0x80 [btrfs] run_delalloc_nocow+0x5f8/0x760 [btrfs] btrfs_run_delalloc_range+0xa8/0x220 [btrfs] writepage_delalloc+0x230/0x4c8 [btrfs] extent_writepage+0xb8/0x358 [btrfs] extent_write_cache_pages+0x21c/0x4e8 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x178/0x3a8 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] The first two lines of extra debug messages show the problem is caused by the error handling of run_delalloc_nocow(). E.g. we have the following dirtied range (4K blocksize 4K page size): 0 16K 32K |//////////////////////////////////////| | Pre-allocated | And the range [0, 16K) has a preallocated extent. - Enter run_delalloc_nocow() for range [0, 16K) Which found range [0, 16K) is preallocated, can do the proper NOCOW write. - Enter fallback_to_fow() for range [16K, 32K) Since the range [16K, 32K) is not backed by preallocated extent, we have to go COW. - cow_file_range() failed for range [16K, 32K) So cow_file_range() will do the clean up by clearing folio dirty, unlock the folios. Now the folios in range [16K, 32K) is unlocked. - Enter extent_clear_unlock_delalloc() from run_delalloc_nocow() Which is called with PAGE_START_WRITEBACK to start page writeback. But folios can only be marked writeback when it's properly locked, thus this triggered the VM_BUG_ON_FOLIO(). Furthermore there is another hidden but common bug that run_delalloc_nocow() is not clearing the folio dirty flags in its error handling path. This is the common bug shared between run_delalloc_nocow() and cow_file_range(). [FIX] - Clear folio dirty for range [@start, @cur_offset) Introduce a helper, cleanup_dirty_folios(), which will find and lock the folio in the range, clear the dirty flag and start/end the writeback, with the extra handling for the @locked_folio. - Introduce a helper to clear folio dirty, start and end writeback - Introduce a helper to record the last failed COW range end This is to trace which range we should skip, to avoid double unlocking. - Skip the failed COW range for the error handling CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
06f3642847 |
btrfs: do proper folio cleanup when cow_file_range() failed
[BUG] When testing with COW fixup marked as BUG_ON() (this is involved with the new pin_user_pages*() change, which should not result new out-of-band dirty pages), I hit a crash triggered by the BUG_ON() from hitting COW fixup path. This BUG_ON() happens just after a failed btrfs_run_delalloc_range(): BTRFS error (device dm-2): failed to run delalloc range, root 348 ino 405 folio 65536 submit_bitmap 6-15 start 90112 len 106496: -28 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1444! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 0 UID: 0 PID: 434621 Comm: kworker/u24:8 Tainted: G OE 6.12.0-rc7-custom+ #86 Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : extent_writepage_io+0x2d4/0x308 [btrfs] lr : extent_writepage_io+0x2d4/0x308 [btrfs] Call trace: extent_writepage_io+0x2d4/0x308 [btrfs] extent_writepage+0x218/0x330 [btrfs] extent_write_cache_pages+0x1d4/0x4b0 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x180/0x3b0 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: aa1403e1 9402f3ef aa1403e0 9402f36f (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] That failure is mostly from cow_file_range(), where we can hit -ENOSPC. Although the -ENOSPC is already a bug related to our space reservation code, let's just focus on the error handling. For example, we have the following dirty range [0, 64K) of an inode, with 4K sector size and 4K page size: 0 16K 32K 48K 64K |///////////////////////////////////////| |#######################################| Where |///| means page are still dirty, and |###| means the extent io tree has EXTENT_DELALLOC flag. - Enter extent_writepage() for page 0 - Enter btrfs_run_delalloc_range() for range [0, 64K) - Enter cow_file_range() for range [0, 64K) - Function btrfs_reserve_extent() only reserved one 16K extent So we created extent map and ordered extent for range [0, 16K) 0 16K 32K 48K 64K |////////|//////////////////////////////| |<- OE ->|##############################| And range [0, 16K) has its delalloc flag cleared. But since we haven't yet submit any bio, involved 4 pages are still dirty. - Function btrfs_reserve_extent() returns with -ENOSPC Now we have to run error cleanup, which will clear all EXTENT_DELALLOC* flags and clear the dirty flags for the remaining ranges: 0 16K 32K 48K 64K |////////| | | | | Note that range [0, 16K) still has its pages dirty. - Some time later, writeback is triggered again for the range [0, 16K) since the page range still has dirty flags. - btrfs_run_delalloc_range() will do nothing because there is no EXTENT_DELALLOC flag. - extent_writepage_io() finds page 0 has no ordered flag Which falls into the COW fixup path, triggering the BUG_ON(). Unfortunately this error handling bug dates back to the introduction of btrfs. Thankfully with the abuse of COW fixup, at least it won't crash the kernel. [FIX] Instead of immediately unlocking the extent and folios, we keep the extent and folios locked until either erroring out or the whole delalloc range finished. When the whole delalloc range finished without error, we just unlock the whole range with PAGE_SET_ORDERED (and PAGE_UNLOCK for !keep_locked cases), with EXTENT_DELALLOC and EXTENT_LOCKED cleared. And the involved folios will be properly submitted, with their dirty flags cleared during submission. For the error path, it will be a little more complex: - The range with ordered extent allocated (range (1)) We only clear the EXTENT_DELALLOC and EXTENT_LOCKED, as the remaining flags are cleaned up by btrfs_mark_ordered_io_finished()->btrfs_finish_one_ordered(). For folios we finish the IO (clear dirty, start writeback and immediately finish the writeback) and unlock the folios. - The range with reserved extent but no ordered extent (range(2)) - The range we never touched (range(3)) For both range (2) and range(3) the behavior is not changed. Now even if cow_file_range() failed halfway with some successfully reserved extents/ordered extents, we will keep all folios clean, so there will be no future writeback triggered on them. CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
a7858d5c36 |
btrfs: fix error handling of submit_uncompressed_range()
[BUG]
If we failed to compress the range, or cannot reserve a large enough
data extent (e.g. too fragmented free space), we will fall back to
submit_uncompressed_range().
But inside submit_uncompressed_range(), run_delalloc_cow() can also fail
due to -ENOSPC or any other error.
In that case there are 3 bugs in the error handling:
1) Double freeing for the same ordered extent
This can lead to crash due to ordered extent double accounting
2) Start/end writeback without updating the subpage writeback bitmap
3) Unlock the folio without clear the subpage lock bitmap
Both bugs 2) and 3) will crash the kernel if the btrfs block size is
smaller than folio size, as the next time the folio gets writeback/lock
updates, subpage will find the bitmap already have the range set,
triggering an ASSERT().
[CAUSE]
Bug 1) happens in the following call chain:
submit_uncompressed_range()
|- run_delalloc_cow()
| |- cow_file_range()
| |- btrfs_reserve_extent()
| Failed with -ENOSPC or whatever error
|
|- btrfs_clean_up_ordered_extents()
| |- btrfs_mark_ordered_io_finished()
| Which cleans all the ordered extents in the async_extent range.
|
|- btrfs_mark_ordered_io_finished()
Which cleans the folio range.
The finished ordered extents may not be immediately removed from the
ordered io tree, as they are removed inside a work queue.
So the second btrfs_mark_ordered_io_finished() may find the finished but
not-yet-removed ordered extents, and double free them.
Furthermore, the second btrfs_mark_ordered_io_finished() is not subpage
compatible, as it uses fixed folio_pos() with PAGE_SIZE, which can cover
other ordered extents.
Bugs 2) and 3) are more straightforward, btrfs just calls folio_unlock(),
folio_start_writeback() and folio_end_writeback(), other than the helpers
which handle subpage cases.
[FIX]
For bug 1) since the first btrfs_cleanup_ordered_extents() call is
handling the whole range, we should not do the second
btrfs_mark_ordered_io_finished() call.
And for the first btrfs_cleanup_ordered_extents(), we no longer need to
pass the @locked_page parameter, as we are already in the async extent
context, thus will never rely on the error handling inside
btrfs_run_delalloc_range().
So just let the btrfs_clean_up_ordered_extents() handle every folio
equally.
For bug 2) we should not even call
folio_start_writeback()/folio_end_writeback() anymore.
As the error handling protocol, cow_file_range() should clear
dirty flag and start/finish the writeback for the whole range passed in.
For bug 3) just change the folio_unlock() to btrfs_folio_end_lock()
helper.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
8bf334beb3 |
btrfs: fix double accounting race when extent_writepage_io() failed
[BUG]
If submit_one_sector() failed inside extent_writepage_io() for sector
size < page size cases (e.g. 4K sector size and 64K page size), then
we can hit double ordered extent accounting error.
This should be very rare, as submit_one_sector() only fails when we
failed to grab the extent map, and such extent map should exist inside
the memory and has been pinned.
[CAUSE]
For example we have the following folio layout:
0 4K 32K 48K 60K 64K
|//| |//////| |///|
Where |///| is the dirty range we need to writeback. The 3 different
dirty ranges are submitted for regular COW.
Now we hit the following sequence:
- submit_one_sector() returned 0 for [0, 4K)
- submit_one_sector() returned 0 for [32K, 48K)
- submit_one_sector() returned error for [60K, 64K)
- btrfs_mark_ordered_io_finished() called for the whole folio
This will mark the following ranges as finished:
* [0, 4K)
* [32K, 48K)
Both ranges have their IO already submitted, this cleanup will
lead to double accounting.
* [60K, 64K)
That's the correct cleanup.
The only good news is, this error is only theoretical, as the target
extent map is always pinned, thus we should directly grab it from
memory, other than reading it from the disk.
[FIX]
Instead of calling btrfs_mark_ordered_io_finished() for the whole folio
range, which can touch ranges we should not touch, instead
move the error handling inside extent_writepage_io().
So that we can cleanup exact sectors that ought to be submitted but failed.
This provides much more accurate cleanup, avoiding the double accounting.
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
72dad8e377 |
btrfs: fix double accounting race when btrfs_run_delalloc_range() failed
[BUG] When running btrfs with block size (4K) smaller than page size (64K, aarch64), there is a very high chance to crash the kernel at generic/750, with the following messages: (before the call traces, there are 3 extra debug messages added) BTRFS warning (device dm-3): read-write for sector size 4096 with page size 65536 is experimental BTRFS info (device dm-3): checking UUID tree hrtimer: interrupt took |
||
|
|
311473984c |
btrfs: async-thread: rename DFT_THRESHOLD to DEFAULT_THRESHOLD
Rename the macro so it's obvious what it means. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
ef8c0047aa |
btrfs: remove redundant variables from __process_folios_contig() and lock_delalloc_folios()
Same pattern in both functions, we really only use index, start_index is redundant. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
248c4ff393 |
btrfs: split waiting from read_extent_buffer_pages(), drop parameter wait
There are only 2 WAIT_* values left for wait parameter, we can encode this to the function name if the waiting functionality is split. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
db9eef2ea8 |
btrfs: remove unused define WAIT_PAGE_LOCK for extent io
Last use was in the readahead code that got removed by
|
||
|
|
f8e0b8f9c2 |
btrfs: unwrap folio locking helpers
Another conversion to folio API, use the folio locking directly instead of back and forth page <-> folio conversions. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
549a88acbe |
btrfs: change return type to bool type of check_eb_alignment()
The check function pattern is supposed to return true/false, currently there's only one error code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
a43caf82a1 |
btrfs: switch grab_extent_buffer() to folios
Use the folio API, remove page_folio/folio_page conversions. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
cc8f51a355 |
btrfs: rename btrfs_release_extent_buffer_pages() to mention folios
Continue page to folio updates, sync what the function does with it's name. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
a722c72bef |
btrfs: open code __free_extent_buffer()
Using the kmem cache freeing directly is clear enough, we don't need to wrap it. All the users are in the same file. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> |