It missed to call dec_valid_node_count() to release node block count
in error path, fix it.
Fixes: 141170b759 ("f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It needs to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
to avoid racing with checkpoint, otherwise, filesystem metadata including
blkaddr in dnode, inode fields and .total_valid_block_count may be
corrupted after SPO case.
Fixes: ef8d563f18 ("f2fs: introduce F2FS_IOC_RELEASE_COMPRESS_BLOCKS")
Fixes: c75488fb4d ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If inc_valid_block_count() can not allocate all requested blocks,
it needs to release block count in .total_valid_block_count and
resevation blocks in inode.
Fixes: 5460749487 ("f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we account reserved blocks and compressed blocks into
@compr_blocks, then, f2fs_i_compr_blocks_update(,compr_blocks) will
update i_compr_blocks incorrectly, fix it.
Meanwhile, for the case all blocks in cluster were reserved, fix to
update dn->ofs_in_node correctly.
Fixes: eb8fbaa533 ("f2fs: compress: fix to check unreleased compressed cluster")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
- It missed to check validation of fault attrs in parse_options(),
let's fix to add check condition in f2fs_build_fault_attr().
- Use f2fs_build_fault_attr() in __sbi_store() to clean up code.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
type of f2fs_inode.i_gc_failures, f2fs_inode_info.i_gc_failures, and
f2fs_sb_info.gc_pin_file_threshold is __le16, unsigned int, and u64,
so it will cause truncation during comparison and persistence.
Unifying variable of these three variables to unsigned short, and
add an upper boundary limitation for gc_pin_file_threshold.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
After commit 3db1de0e58 ("f2fs: change the current atomic write way"),
we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[]
array to i_pin_failure for cleanup.
Meanwhile, let's define i_current_depth and i_gc_failures as union
variable due to they won't be valid at the same time.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit b1c9d3f833 ("f2fs: support printk_ratelimited() in f2fs_printk()")
missed some cases, cover all remains for cleanup.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As for zoned-UFS, f2fs section size is forced to zone size. And zone
size may not aligned to pow2.
Fixes: 859fca6b70 ("f2fs: swap: support migrating swapfile in aligned write mode")
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Wu Bo <bo.wu@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If active_log is not 6, we never use WARM_DATA segment, let's
avoid allocating WARM_DATA segment for direct IO.
Signed-off-by: Yunlei He <heyunlei@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
To make code clean, use blk_zone_cond_str() to print debug information.
Signed-off-by: Wu Bo <bo.wu@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit d7e9a9037d ("f2fs: Support Block Size == Page Size") missed to
adjust comment in sanity_check_raw_super(), fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Convert f2fs__page tracepoint class() and its instances to use folio
and related functionality, and rename it to f2fs__folio().
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Convert f2fs_read_inline_data() to use folio and related
functionality, and also convert its caller to use folio.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Convert f2fs_read_single_page() to use folio and related
functionality.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Convert f2fs_mpage_readpages() to use folio and related
functionality.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit 930e260763 ("f2fs: remove obsolete whint_mode"), as we
decide to pass write hints to the disk.
Cc: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since the allocation happens in conventional LU for zoned storage, we
can allow direct io for that.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In a case writing without fallocate(), we can't guarantee it's allocated
in the conventional area for zoned stroage. To make it consistent across
storage devices, we disallow it regardless of storage device types.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
some user behaviors requested filesystem operations, which
will cause filesystem not idle.
Meanwhile adjust some f2fs_update_time(REQ_TIME) positions.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
While do not allocating a new section in advance for file pinning area, I
missed that we should write the sum block for the last segment of a file
pinning section.
Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
ioctl(F2FS_IOC_MOVE_RANGE) can truncate or punch hole on pinned file,
fix to disallow it.
Fixes: 5fed0be858 ("f2fs: do not allow partial truncation on pinned file")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
compress and pinfile flag should be checked after inode lock held to
avoid race condition, fix it.
Fixes: 4c8ff7095b ("f2fs: support data compression")
Fixes: 5fed0be858 ("f2fs: do not allow partial truncation on pinned file")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If the max open zones of zoned devices are less than
the active logs of F2FS, the device may error due to
insufficient zone resources when multiple active logs
are being written at the same time.
Signed-off-by: Wenjie Qi <qwjhust@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
A length that exceeds the real size of the inode may be
specified from user, although these out-of-range areas
are not mapped, but they still need to be check in
while loop, which is unnecessary.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set.
If create new file or open file during this gap, these files
will not use inlinecrypt. Worse case, it may lead to data
corruption if wrappedkey_v0 is enable.
Thread A: Thread B:
-f2fs_remount -f2fs_file_open or f2fs_new_inode
-default_options
<- clear SB_INLINECRYPT flag
-fscrypt_select_encryption_impl
-parse_options
<- set SB_INLINECRYPT again
Signed-off-by: Yunlei He <heyunlei@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As reported by Yi Zhang in mailing list [1], kernel warning was catched
during zbd/010 test as below:
./check zbd/010
zbd/010 (test gap zone support with F2FS) [failed]
runtime ... 3.752s
something found in dmesg:
[ 4378.146781] run blktests zbd/010 at 2024-02-18 11:31:13
[ 4378.192349] null_blk: module loaded
[ 4378.209860] null_blk: disk nullb0 created
[ 4378.413285] scsi_debug:sdebug_driver_probe: scsi_debug: trim
poll_queues to 0. poll_q/nr_hw = (0/1)
[ 4378.422334] scsi host15: scsi_debug: version 0191 [20210520]
dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
[ 4378.434922] scsi 15:0:0:0: Direct-Access-ZBC Linux
scsi_debug 0191 PQ: 0 ANSI: 7
[ 4378.443343] scsi 15:0:0:0: Power-on or device reset occurred
[ 4378.449371] sd 15:0:0:0: Attached scsi generic sg5 type 20
[ 4378.449418] sd 15:0:0:0: [sdf] Host-managed zoned block device
...
(See '/mnt/tests/gitlab.com/api/v4/projects/19168116/repository/archive.zip/storage/blktests/blk/blktests/results/nodev/zbd/010.dmesg'
WARNING: CPU: 22 PID: 44011 at fs/iomap/iter.c:51
CPU: 22 PID: 44011 Comm: fio Not tainted 6.8.0-rc3+ #1
RIP: 0010:iomap_iter+0x32b/0x350
Call Trace:
<TASK>
__iomap_dio_rw+0x1df/0x830
f2fs_file_read_iter+0x156/0x3d0 [f2fs]
aio_read+0x138/0x210
io_submit_one+0x188/0x8c0
__x64_sys_io_submit+0x8c/0x1a0
do_syscall_64+0x86/0x170
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Shinichiro Kawasaki helps to analyse this issue and proposes a potential
fixing patch in [2].
Quoted from reply of Shinichiro Kawasaki:
"I confirmed that the trigger commit is dbf8e63f48 as Yi reported. I took a
look in the commit, but it looks fine to me. So I thought the cause is not
in the commit diff.
I found the WARN is printed when the f2fs is set up with multiple devices,
and read requests are mapped to the very first block of the second device in the
direct read path. In this case, f2fs_map_blocks() and f2fs_map_blocks_cached()
modify map->m_pblk as the physical block address from each block device. It
becomes zero when it is mapped to the first block of the device. However,
f2fs_iomap_begin() assumes that map->m_pblk is the physical block address of the
whole f2fs, across the all block devices. It compares map->m_pblk against
NULL_ADDR == 0, then go into the unexpected branch and sets the invalid
iomap->length. The WARN catches the invalid iomap->length.
This WARN is printed even for non-zoned block devices, by following steps.
- Create two (non-zoned) null_blk devices memory backed with 128MB size each:
nullb0 and nullb1.
# mkfs.f2fs /dev/nullb0 -c /dev/nullb1
# mount -t f2fs /dev/nullb0 "${mount_dir}"
# dd if=/dev/zero of="${mount_dir}/test.dat" bs=1M count=192
# dd if="${mount_dir}/test.dat" of=/dev/null bs=1M count=192 iflag=direct
..."
So, the root cause of this issue is: when multi-devices feature is on,
f2fs_map_blocks() may return zero blkaddr in non-primary device, which is
a verified valid block address, however, f2fs_iomap_begin() treats it as
an invalid block address, and then it triggers the warning in iomap
framework code.
Finally, as discussed, we decide to use a more simple and direct way that
checking (map.m_flags & F2FS_MAP_MAPPED) condition instead of
(map.m_pblk != NULL_ADDR) to fix this issue.
Thanks a lot for the effort of Yi Zhang and Shinichiro Kawasaki on this
issue.
[1] https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/
[2] https://lore.kernel.org/linux-f2fs-devel/gngdj77k4picagsfdtiaa7gpgnup6fsgwzsltx6milmhegmjff@iax2n4wvrqye/
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 1517c1a7a4 ("f2fs: implement iomap operations")
Fixes: 8d3c1fa3fa ("f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Roman Smirnov reported as below:
"
There is a possible bug in f2fs_truncate_inode_blocks():
if (err < 0 && err != -ENOENT)
goto fail;
...
offset[1] = 0;
offset[0]++;
nofs += err;
If err = -ENOENT then nofs will sum with an error code,
which is strange behaviour. Also if nofs < ENOENT this will
cause an overflow. err will be equal to -ENOENT with the
following call stack:
truncate_nodes()
f2fs_get_node_page()
__get_node_page()
read_node_page()
"
If nat is corrupted, truncate_nodes() may return -ENOENT, and
f2fs_truncate_inode_blocks() doesn't handle such error correctly,
fix it.
Reported-by: Roman Smirnov <r.smirnov@omp.ru>
Closes: https://lore.kernel.org/linux-f2fs-devel/085b27fd2b364a3c8c3a9ca77363e246@omp.ru
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If f2fs_evict_inode is called between freeze_super and thaw_super, the
s_writer rwsem count may become negative, resulting in hang.
CPU1 CPU2
f2fs_resize_fs() f2fs_evict_inode()
f2fs_freeze
set SBI_IS_FREEZING
skip sb_start_intwrite
f2fs_unfreeze
clear SBI_IS_FREEZING
sb_end_intwrite
To solve this problem, the call to sb_end_write is determined by whether
sb_start_intwrite is called, rather than the current freezing status.
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support .shutdown callback in f2fs_sops, then, it can be called to
shut down the file system when underlying block device is marked dead.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull gfs2 fix from Andreas Gruenbacher:
- Fix boundary check in punch_hole
* tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix invalid metadata access in punch_hole
Pull crypto fixes from Herbert Xu:
"This fixes a regression that broke iwd as well as a divide by zero in
iaa"
* tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: iaa - Fix nr_cpus < nr_iaa case
Revert "crypto: pkcs7 - remove sha1 support"
Pull EFI fixes from Ard Biesheuvel:
- Fix logic that is supposed to prevent placement of the kernel image
below LOAD_PHYSICAL_ADDR
- Use the firmware stack in the EFI stub when running in mixed mode
- Clear BSS only once when using mixed mode
- Check efi.get_variable() function pointer for NULL before trying to
call it
* tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: fix panic in kdump kernel
x86/efistub: Don't clear BSS twice in mixed mode
x86/efistub: Call mixed mode boot services on the firmware's stack
efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
Pull x86 fixes from Thomas Gleixner:
- Ensure that the encryption mask at boot is properly propagated on
5-level page tables, otherwise the PGD entry is incorrectly set to
non-encrypted, which causes system crashes during boot.
- Undo the deferred 5-level page table setup as it cannot work with
memory encryption enabled.
- Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
to the default value but the cached variable is not, so subsequent
comparisons might yield the wrong result and as a consequence the
result prevents updating the MSR.
- Register the local APIC address only once in the MPPARSE enumeration
to prevent triggering the related WARN_ONs() in the APIC and topology
code.
- Handle the case where no APIC is found gracefully by registering a
fake APIC in the topology code. That makes all related topology
functions work correctly and does not affect the actual APIC driver
code at all.
- Don't evaluate logical IDs during early boot as the local APIC IDs
are not yet enumerated and the invoked function returns an error
code. Nothing requires the logical IDs before the final CPUID
enumeration takes place, which happens after the enumeration.
- Cure the fallout of the per CPU rework on UP which misplaced the
copying of boot_cpu_data to per CPU data so that the final update to
boot_cpu_data got lost which caused inconsistent state and boot
crashes.
- Use copy_from_kernel_nofault() in the kprobes setup as there is no
guarantee that the address can be safely accessed.
- Reorder struct members in struct saved_context to work around another
kmemleak false positive
- Remove the buggy code which tries to update the E820 kexec table for
setup_data as that is never passed to the kexec kernel.
- Update the resource control documentation to use the proper units.
- Fix a Kconfig warning observed with tinyconfig
* tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/64: Move 5-level paging global variable assignments back
x86/boot/64: Apply encryption mask to 5-level pagetable update
x86/cpu: Add model number for another Intel Arrow Lake mobile processor
x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
Documentation/x86: Document that resctrl bandwidth control units are MiB
x86/mpparse: Register APIC address only once
x86/topology: Handle the !APIC case gracefully
x86/topology: Don't evaluate logical IDs during early boot
x86/cpu: Ensure that CPU info updates are propagated on UP
kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
x86/pm: Work around false positive kmemleak report in msr_build_context()
x86/kexec: Do not update E820 kexec table for setup_data
x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
Pull scheduler doc clarification from Thomas Gleixner:
"A single update for the documentation of the base_slice_ns tunable to
clarify that any value which is less than the tick slice has no effect
because the scheduler tick is not guaranteed to happen within the set
time slice"
* tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation