Pull f2fs updates from Jaegeuk Kim:
"This focuses on two primary updates for Android devices.
First, it sets hash-based file name lookup as the default method to
improve performance, while retaining an option to fall back to a
linear lookup.
Second, it resolves a persistent issue with the 'checkpoint=enable'
feature.
The update further boosts performance by prefetching node blocks,
merging FUA writes more efficiently, and optimizing block allocation
policies.
The release is rounded out by a comprehensive set of bug fixes that
address memory safety, data integrity, and potential system hangs,
along with minor documentation and code clean-ups.
Enhancements:
- add mount option and sysfs entry to tune the lookup mode
- dump more information and add a timeout when enabling/disabling
checkpoints
- readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode
- merge FUA command with the existing writes
- allocate HOT_DATA for IPU writes
- Use allocate_section_policy to control write priority in
multi-devices setups
- add reserved nodes for privileged users
- Add bggc_io_aware to adjust the priority of BG_GC when issuing IO
- show the list of donation files
Bug fixes:
- add missing dput() when printing the donation list
- fix UAF issue in f2fs_merge_page_bio()
- add sanity check on ei.len in __update_extent_tree_range()
- fix infinite loop in __insert_extent_tree()
- fix zero-sized extent for precache extents
- fix to mitigate overhead of f2fs_zero_post_eof_page()
- fix to avoid migrating empty section
- fix to truncate first page in error path of f2fs_truncate()
- fix to update map->m_next_extent correctly in f2fs_map_blocks()
- fix wrong layout information on 16KB page
- fix to do sanity check on node footer for non inode dnode
- fix to avoid NULL pointer dereference in
f2fs_check_quota_consistency()
- fix to detect potential corrupted nid in free_nid_list
- fix to clear unusable_cap for checkpoint=enable
- fix to zero data after EOF for compressed file correctly
- fix to avoid overflow while left shift operation
- fix condition in __allow_reserved_blocks()"
* tag 'f2fs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (43 commits)
f2fs: add missing dput() when printing the donation list
f2fs: fix UAF issue in f2fs_merge_page_bio()
f2fs: readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode
f2fs: add sanity check on ei.len in __update_extent_tree_range()
f2fs: fix infinite loop in __insert_extent_tree()
f2fs: fix zero-sized extent for precache extents
f2fs: fix to mitigate overhead of f2fs_zero_post_eof_page()
f2fs: fix to avoid migrating empty section
f2fs: fix to truncate first page in error path of f2fs_truncate()
f2fs: fix to update map->m_next_extent correctly in f2fs_map_blocks()
f2fs: fix wrong layout information on 16KB page
f2fs: clean up error handing of f2fs_submit_page_read()
f2fs: avoid unnecessary folio_clear_uptodate() for cleanup
f2fs: merge FUA command with the existing writes
f2fs: allocate HOT_DATA for IPU writes
f2fs: Use allocate_section_policy to control write priority in multi-devices setups
Documentation: f2fs: Reword title
Documentation: f2fs: Indent compression_mode option list
Documentation: f2fs: Wrap snippets in literal code blocks
Documentation: f2fs: Span write hint table section rows
...
Introduces two new sys nodes: allocate_section_hint and
allocate_section_policy. The allocate_section_hint identifies the boundary
between devices, measured in sections; it defaults to the end of the device
for single storage setups, and the end of the first device for multiple
storage setups. The allocate_section_policy determines the write strategy,
with a default value of 0 for normal sequential write strategy. A value of
1 prioritizes writes before the allocate_section_hint, while a value of 2
prioritizes writes after it.
This strategy addresses the issue where, despite F2FS supporting multiple
devices, SOC vendors lack multi-devices support (currently only supporting
zoned devices). As a workaround, multiple storage devices are mapped to a
single dm device. Both this workaround and the F2FS multi-devices solution
may require prioritizing writing to certain devices, such as a device with
better performance or when switching is needed due to performance
degradation near a device's end. For scenarios with more than two devices,
sort them at mount time to utilize this feature.
When using this feature with a single storage device, it has almost no
impact. However, for configurations where multiple storage devices are
mapped to the same dm device using F2FS, utilizing this feature can provide
some optimization benefits. Therefore, I believe it should not be limited
to just multi-devices usage.
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As syzbot reported below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/file.c:1243!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5354 Comm: syz.0.0 Not tainted 6.17.0-rc1-syzkaller-00211-g90d970cade8e #0 PREEMPT(full)
RIP: 0010:f2fs_truncate_hole+0x69e/0x6c0 fs/f2fs/file.c:1243
Call Trace:
<TASK>
f2fs_punch_hole+0x2db/0x330 fs/f2fs/file.c:1306
f2fs_fallocate+0x546/0x990 fs/f2fs/file.c:2018
vfs_fallocate+0x666/0x7e0 fs/open.c:342
ksys_fallocate fs/open.c:366 [inline]
__do_sys_fallocate fs/open.c:371 [inline]
__se_sys_fallocate fs/open.c:369 [inline]
__x64_sys_fallocate+0xc0/0x110 fs/open.c:369
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1e65f8ebe9
w/ a fuzzed image, f2fs may encounter panic due to it detects inconsistent
truncation range in direct node in f2fs_truncate_hole().
The root cause is: a non-inode dnode may has the same footer.ino and
footer.nid, so the dnode will be parsed as an inode, then ADDRS_PER_PAGE()
may return wrong blkaddr count which may be 923 typically, by chance,
dn.ofs_in_node is equal to 923, then count can be calculated to 0 in below
statement, later it will trigger panic w/ f2fs_bug_on(, count == 0 || ...).
count = min(end_offset - dn.ofs_in_node, pg_end - pg_start);
This patch introduces a new node_type NODE_TYPE_NON_INODE, then allowing
passing the new_type to sanity_check_node_footer in f2fs_get_node_folio()
to detect corruption that a non-inode dnode has the same footer.ino and
footer.nid.
Scripts to reproduce:
mkfs.f2fs -f /dev/vdb
mount /dev/vdb /mnt/f2fs
touch /mnt/f2fs/foo
touch /mnt/f2fs/bar
dd if=/dev/zero of=/mnt/f2fs/foo bs=1M count=8
umount /mnt/f2fs
inject.f2fs --node --mb i_nid --nid 4 --idx 0 --val 5 /dev/vdb
mount /dev/vdb /mnt/f2fs
xfs_io /mnt/f2fs/foo -c "fpunch 6984k 4k"
Cc: stable@kernel.org
Reported-by: syzbot+b9c7ffd609c3f09416ab@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68a68e27.050a0220.1a3988.0002.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Move the fsverity_info pointer into the filesystem-specific part of the
inode by adding the field f2fs_inode_info::i_verity_info and configuring
fsverity_operations::inode_info_offs accordingly.
This is a prerequisite for a later commit that removes
inode::i_verity_info, saving memory and improving cache efficiency on
filesystems that don't support fsverity.
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/20250810075706.172910-11-ebiggers@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Move the fscrypt_inode_info pointer into the filesystem-specific part of
the inode by adding the field f2fs_inode_info::i_crypt_info and
configuring fscrypt_operations::inode_info_offs accordingly.
This is a prerequisite for a later commit that removes
inode::i_crypt_info, saving memory and improving cache efficiency with
filesystems that don't support fscrypt.
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/20250810075706.172910-5-ebiggers@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
This patch allows privileged users to reserve nodes via the
'reserve_node' mount option, which is similar to the existing
'reserve_root' option.
"-o reserve_node=<N>" means <N> nodes are reserved for privileged
users only.
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, we have encountered some issues while testing ZUFS. In
situations near the storage limit (e.g., 50GB remaining), and after
simulating fragmentation by repeatedly writing and deleting data, we found
that application installation and startup tests conducted after idling for
a few minutes take significantly longer several times that of traditional
UFS. Tracing the operations revealed that the majority of I/Os were issued
by background GC, which blocks normal I/O operations.
Under normal circumstances, ZUFS indeed requires more background GC and
employs a more aggressive GC strategy. However, I aim to find a way to
minimize the impact on regular I/O operations under these near-limit
conditions. To address this, I have introduced a bggc_io_aware feature,
which controls the prioritization of background GC in the presence of I/Os.
This switch can be adjusted at the framework level to implement different
strategies. If set to AWARE_ALL_IO, all background GC operations will be
skipped during active I/O issuance. The default option remains consistent
with the current strategy, ensuring no change in behavior.
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During f2fs_enable_checkpoint() in remount(), if we flush a large
amount of dirty pages into slow device, it may take long time which
will block write IO, let's add a timeout machanism during dirty
pages flush to avoid long time block in f2fs_enable_checkpoint().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For casefolded directories, f2fs may fall back to a linear search if
a hash-based lookup fails. This can cause severe performance
regressions.
While this behavior can be controlled by userspace tools (e.g. mkfs,
fsck) by setting an on-disk flag, a kernel-level solution is needed
to guarantee the lookup behavior regardless of the on-disk state.
This commit introduces the 'lookup_mode' mount option to provide this
kernel-side control.
The option accepts three values:
- perf: (Default) Enforces a hash-only lookup. The linear fallback
is always disabled.
- compat: Enables the linear search fallback for compatibility with
directory entries from older kernels.
- auto: Determines the mode based on the on-disk flag, preserving the
userspace-based behavior.
Signed-off-by: Daniel Lee <chullee@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If reserve_root mount option is not assigned, __allow_reserved_blocks()
will return false, it's not correct, fix it.
Fixes: 7e65be49ed ("f2fs: add reserved blocks for root user")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
checkpoint was blocked for 18643 ms
Step 0: 0 ms
Step 1: 38 ms
Step 2: 63 ms
Step 3: 4 ms
Step 4: 0 ms
Step 5: 0 ms
Step 6: 9 ms
Step 7: 0 ms
Step 8: 18277 ms
Step 9: 249 ms
Cc: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
generic/299 w/ mode=lfs will cause long time latency of checkpoint,
let's dump more information once we hit case.
CP merge:
- Queued : 0
- Issued : 1
- Total : 1
- Cur time : 9765(ms)
- Peak time : 9765(ms)
F2FS-fs (vdc): blocked on checkpoint for 9765 ms
CPU: 11 UID: 0 PID: 237 Comm: kworker/u128:29 Tainted: G O 6.16.0-rc3+ #409 PREEMPT(voluntary)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: writeback wb_workfn (flush-253:32)
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xa0
f2fs_issue_checkpoint+0x268/0x280
f2fs_write_node_pages+0x6a/0x2c0
do_writepages+0xd0/0x170
__writeback_single_inode+0x56/0x4c0
writeback_sb_inodes+0x22a/0x550
__writeback_inodes_wb+0x4c/0xf0
wb_writeback+0x300/0x400
wb_workfn+0x3de/0x500
process_one_work+0x230/0x5c0
worker_thread+0x1da/0x3d0
kthread+0x10d/0x250
ret_from_fork+0x164/0x190
ret_from_fork_asm+0x1a/0x30
Cc: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull f2fs updates from Jaegeuk Kim:
"Three main updates: folio conversion by Matthew, switch to a new mount
API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS
with finer granularity by Daeho.
There are also patches to address bugs and issues in the existing
features such as GCs, file pinning, write-while-dio-read, contingous
block allocation, and memory access violations.
Enhancements:
- switch to new mount API and folio conversion
- add sysfs nodes to controle F2FS GCs for ZUFS
- improve performance on the nat entry cache
- drop inode from the donation list when the last file is closed
- avoid splitting bio when reading multiple pages
Bug fixes:
- fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
- make sure zoned device GC to use FG_GC in shortage of free section
- fix to calculate dirty data during has_not_enough_free_secs()
- fix to update upper_p in __get_secs_required() correctly
- wait for inflight dio completion, excluding pinned files read using dio
- don't break allocation when crossing contiguous sections
- vm_unmap_ram() may be called from an invalid context
- fix to avoid out-of-boundary access in dnode page
- fix to avoid panic in f2fs_evict_inode
- fix to avoid UAF in f2fs_sync_inode_meta()
- fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
- fix UAF of f2fs_inode_info in f2fs_free_dic
- fix to avoid invalid wait context issue
- fix bio memleak when committing super block
- handle nat.blkaddr corruption in f2fs_get_node_info()
In addition, there are also clean-ups and minor bug fixes"
* tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits)
f2fs: drop inode from the donation list when the last file is closed
f2fs: add gc_boost_gc_greedy sysfs node
f2fs: add gc_boost_gc_multiple sysfs node
f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
f2fs: fix to calculate dirty data during has_not_enough_free_secs()
f2fs: fix to update upper_p in __get_secs_required() correctly
f2fs: directly add newly allocated pre-dirty nat entry to dirty set list
f2fs: avoid redundant clean nat entry move in lru list
f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio
f2fs: ignore valid ratio when free section count is low
f2fs: don't break allocation when crossing contiguous sections
f2fs: remove unnecessary tracepoint enabled check
f2fs: merge the two conditions to avoid code duplication
f2fs: vm_unmap_ram() may be called from an invalid context
f2fs: fix to avoid out-of-boundary access in dnode page
f2fs: switch to the new mount api
f2fs: introduce fs_context_operation structure
f2fs: separate the options parsing and options checking
f2fs: Add f2fs_fs_context to record the mount options
f2fs: Allow sbi to be NULL in f2fs_printk
...
Let's drop the inode from the donation list when there is no other
open file.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
- touch /mnt/f2fs/012345678901234567890123456789012345678901234567890123
- truncate -s $((1024*1024*1024)) \
/mnt/f2fs/012345678901234567890123456789012345678901234567890123
- touch /mnt/f2fs/file
- truncate -s $((1024*1024*1024)) /mnt/f2fs/file
- mkfs.f2fs /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \
-c /mnt/f2fs/file
- mount /mnt/f2fs/012345678901234567890123456789012345678901234567890123 \
/mnt/f2fs/loop
[16937.192225] F2FS-fs (loop0): Mount Device [ 0]: /mnt/f2fs/012345678901234567890123456789012345678901234567890123\xff\x01, 511, 0 - 3ffff
[16937.192268] F2FS-fs (loop0): Failed to find devices
If device path length equals to MAX_PATH_LEN, sbi->devs.path[] may
not end up w/ null character due to path array is fully filled, So
accidently, fields locate after path[] may be treated as part of
device path, result in parsing wrong device path.
struct f2fs_dev_info {
...
char path[MAX_PATH_LEN];
...
};
Let's add one byte space for sbi->devs.path[] to store null
character of device path string.
Fixes: 3c62be17d4 ("f2fs: support multiple devices")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All callers have been converted to F2FS_F_SB() so delete this wrapper.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Most callers pass NULL, and the one that passes a page already has a
folio. Also convert __submit_merged_write_cond() to take a folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All callers can simply call folio_detach_private(). This was the
only way that clear_page_private_data() could be called, so remove
that too.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The only caller already has a folio so pass it in.
f2fs_cache_compressed_page() is not used outside compress.c so
make it static. This requires a forward declaration (or would require
rearranging this file, but I've chosen not to do that for readability of
the diff).
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All callers now have a folio so pass it in. Also remove the test for
the private flag; it is redundant with checking folio->private for being
NULL.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All callers now have a folio so pass it in. Removes a call to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The only caller already has a folio so convert this function to be folio
based.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The only caller has a folio, so pass it in and operate on it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Convert the passed page to a folio and use it throughout. Removes
a use of fscrypt_is_bounce_page(), which we're trying to remove.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Name these new functions folio_test_f2fs_*(), folio_set_f2fs_*() and
folio_clear_f2fs_*(). Convert all callers which currently have a folio
and cast back to a page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All callers now have a folio so pass it in. Also make it const to help
the compiler.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Put fio->page insto a union with fio->folio. This lets us remove a
lot of folio->page and page->folio conversions.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
One caller passes NULL and the other caller already has a folio so
pass it in.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Most callers pass NULL, and the one which passes a page already has a
folio, so we can pass it in.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All callers have a folio so pass it in. Also make the argument const
as the function does not modify it. Removes a call to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All callers now have a folio, so pass it in. Also make it const as
F2FS_INODE() does not modify the struct folio passed in (the data it
describes is mutable, but it does not change the contents of the struct).
This may improve code generation.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The decompress_io_ctx may be released asynchronously after
I/O completion. If this file is deleted immediately after read,
and the kworker of processing post_read_wq has not been executed yet
due to high workloads, It is possible that the inode(f2fs_inode_info)
is evicted and freed before it is used f2fs_free_dic.
The UAF case as below:
Thread A Thread B
- f2fs_decompress_end_io
- f2fs_put_dic
- queue_work
add free_dic work to post_read_wq
- do_unlink
- iput
- evict
- call_rcu
This file is deleted after read.
Thread C kworker to process post_read_wq
- rcu_do_batch
- f2fs_free_inode
- kmem_cache_free
inode is freed by rcu
- process_scheduled_works
- f2fs_late_free_dic
- f2fs_free_dic
- f2fs_release_decomp_mem
read (dic->inode)->i_compress_algorithm
This patch store compress_algorithm and sbi in dic to avoid inode UAF.
In addition, the previous solution is deprecated in [1] may cause system hang.
[1] https://lore.kernel.org/all/c36ab955-c8db-4a8b-a9d0-f07b5f426c3f@kernel.org
Cc: Daeho Jeong <daehojeong@google.com>
Fixes: bff139b49d ("f2fs: handle decompress only post processing in softirq")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Baocong Liu <baocong.liu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces /sys/fs/f2fs/<dev>/reserved_pin_section for tuning
@needed parameter of has_not_enough_free_secs(), if we configure it w/
zero, it can avoid f2fs_gc() as much as possible while fallocating on
pinned file.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: wangzijie <wangzijie1@honor.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce a new fault type FAULT_VMALLOC to simulate no memory error in
f2fs_vmalloc().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
.init_{,de}compress_ctx uses kvmalloc() to alloc memory, it will try
to allocate physically continuous page first, it may cause more memory
allocation pressure, let's use vmalloc instead to mitigate it.
[Test]
cd /data/local/tmp
touch file
f2fs_io setflags compression file
f2fs_io getflags file
for i in $(seq 1 10); do sync; echo 3 > /proc/sys/vm/drop_caches;\
time f2fs_io write 512 0 4096 zero osync file; truncate -s 0 file;\
done
[Result]
Before After Delta
21.243 21.694 -2.12%
For compression, we recommend to use ioctl to compress file data in
background for workaround.
For decompression, only zstd will be affected.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since __f2fs_crc32() now calls crc32() directly, it no longer uses its
sbi argument. Remove that, and simplify its callers accordingly.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support to inject a timeout fault into function, currently it only
support to inject timeout to commit_atomic_write flow to reproduce
inconsistent bug, like the bug fixed by commit f098aeba04 ("f2fs:
fix to avoid atomicity corruption of atomic file").
By default, the new type fault will inject 1000ms timeout, and the
timeout process can be interrupted by SIGKILL.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In cases of removing memory donation, we need to handle some error cases
like ENOENT and EACCES (indicating the range already has been donated).
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>