Now, if dmesg is cleared, we have no way of knowing if the file system has
been shutdown. Moreover, ext4 allows directory reads even after the file
system has been shutdown, so when reading a file returns -EIO, we cannot
determine whether this is a hardware issue or if the file system has been
shutdown.
Therefore, when ext4 file system is shutdown, we're adding a 'shutdown'
hint to commands like mount so users can easily check the file system's
status.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250122114130.229709-8-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
After commit d3476f3dad ("ext4: don't set SB_RDONLY after filesystem
errors") in v6.12-rc1, the 'errors=remount-ro' mode no longer sets
SB_RDONLY on errors, which results in us seeing the filesystem is still
in rw state after errors.
Therefore, after setting EXT4_FLAGS_EMERGENCY_RO, display the emergency_ro
option so that users can query whether the current file system has become
emergency read-only due to errors through commands such as 'mount' or
'cat /proc/fs/ext4/sdx/options'.
Fixes: d3476f3dad ("ext4: don't set SB_RDONLY after filesystem errors")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250122114130.229709-7-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since both SHUTDOWN and EMERGENCY_RO are emergency states of the ext4 file
system, and they are checked in similar locations, we have added a helper
function, ext4_emergency_state(), to determine whether the current file
system is in one of these two emergency states.
Then, replace calls to ext4_forced_shutdown() with ext4_emergency_state()
in those functions that could potentially trigger write operations.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250122114130.229709-4-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
EXT4_FLAGS_EMERGENCY_RO Indicates that the current file system has become
read-only due to some error. Compared to SB_RDONLY, setting it does not
require a lock because we won't clear it, which avoids over-coupling with
vfs freeze. Also, add a helper function ext4_emergency_ro() to check if
the bit is set.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250122114130.229709-3-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We now print error messages in ext4_end_bio() when page writeback
encounters an error. If data_err=abort is set, the journal will also
be aborted in a kworker. This means that we now check all Buffer I/O
in all modes and decide whether to abort the journal based on the
data_err option. Therefore, we remove the ordered mode restriction
in the descriptions of data_err=abort and data_err=ignore.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250122110533.4116662-8-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The data_err=abort was initially introduced to address users' worries
about data corruption spreading unnoticed. With direct writes, we can
rely on return values to confirm successful writes to disk. But with
buffered writes, a successful return only means the data has been written
to memory. Users have no way of knowing if the data has actually written
it to disk unless they use fsync (which impacts performance and can
sometimes miss errors).
The current data_err=abort implementation relies on the ordered data list,
but past changes have inadvertently altered its behavior. For example, if
an extent is unwritten, we do not add the inode to the ordered data list.
Therefore, jbd2 will not wait for the data write-back of that inode to
complete and check for errors in the inode mapping. Moreover, the checks
performed by jbd2 can also miss errors.
Now, all buffered writes eventually call ext4_end_bio(), where I/O errors
are checked. Therefore, we can check for the data_err=abort mode at this
point and abort the journal in a kworker (due to the interrupt context).
Therefore, when data_err=abort is enabled, the journal is aborted in
ext4_end_io_end() when an I/O error is detected in ext4_end_bio() to make
users who are concerned about the contents of the file happy.
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/c7ab26f3-85ad-4b31-b132-0afb0e07bf79@huawei.com
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250122110533.4116662-6-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When dioread_nolock is turned on (the default), it will convert unwritten
extents to written at ext4_end_io_end(), even if the data writeback fails.
It leads to the possibility that stale data may be exposed when the
physical block corresponding to the file data is read-only (i.e., writes
return -EIO, but reads are normal).
Therefore a new ext4_io_end->flags EXT4_IO_END_FAILED is added, which
indicates that some bio write-back failed in the current ext4_io_end.
When this flag is set, the unwritten to written conversion is no longer
performed. Users can read the data normally until the caches are dropped,
after that, the failed extents can only be read to all 0.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250122110533.4116662-3-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch addresses an issue where some files in case-insensitive
directories become inaccessible due to changes in how the kernel
function, utf8_casefold(), generates case-folded strings from the
commit 5c26d2f1d3 ("unicode: Don't special case ignorable code
points").
There are good reasons why this change should be made; it's actually
quite stupid that Unicode seems to think that the characters ❤ and ❤️
should be casefolded. Unfortimately because of the backwards
compatibility issue, this commit was reverted in 231825b2e1.
This problem is addressed by instituting a brute-force linear fallback
if a lookup fails on case-folded directory, which does result in a
performance hit when looking up files affected by the changing how
thekernel treats ignorable Uniode characters, or when attempting to
look up non-existent file names. So this fallback can be disabled by
setting an encoding flag if in the future, the system administrator or
the manufacturer of a mobile handset or tablet can be sure that there
was no opportunity for a kernel to insert file names with incompatible
encodings.
Fixes: 5c26d2f1d3 ("unicode: Don't special case ignorable code points")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Currently, all zeroing ranges, punch holes, collapse ranges, and insert
ranges first wait for all existing direct I/O workers to complete, and
then they acquire the mapping's invalidate lock before performing the
actual work. These common components are nearly identical, so we can
simplify the code by factoring them out into the ext4_fallocate().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-11-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Simplify ext4_insert_range() and align its code style with that of
ext4_collapse_range(). Refactor it by: a) renaming variables, b)
removing redundant input parameter checks and moving the remaining
checks under i_rwsem in preparation for future refactoring, and c)
renaming the three stale error tags.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-8-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Simplify ext4_collapse_range() and align its code style with that of
ext4_zero_range() and ext4_punch_hole(). Refactor it by: a) renaming
variables, b) removing redundant input parameter checks and moving
the remaining checks under i_rwsem in preparation for future
refactoring, and c) renaming the three stale error tags.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The current implementation of ext4_zero_range() contains complex
position calculations and stale error tags. To improve the code's
clarity and maintainability, it is essential to clean up the code and
improve its readability, this can be achieved by: a) simplifying and
renaming variables, making the style the same as ext4_punch_hole(); b)
eliminating unnecessary position calculations, writing back all data in
data=journal mode, and drop page cache from the original offset to the
end, rather than using aligned blocks; c) renaming the stale out_mutex
tags.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The current implementation of ext4_punch_hole() contains complex
position calculations and stale error tags. To improve the code's
clarity and maintainability, it is essential to clean up the code and
improve its readability, this can be achieved by: a) simplifying and
renaming variables; b) eliminating unnecessary position calculations;
c) writing back all data in data=journal mode, and drop page cache from
the original offset to the end, rather than using aligned blocks,
d) renaming the stale error tags.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There is no need to write back all data before punching a hole in
non-journaled mode since it will be dropped soon after removing space.
Therefore, the call to filemap_write_and_wait_range() can be eliminated.
Besides, similar to ext4_zero_range(), we must address the case of
partially punched folios when block size < page size. It is essential to
remove writable userspace mappings to ensure that the folio can be
faulted again during subsequent mmap write access.
In journaled mode, we need to write dirty pages out before discarding
page cache in case of crash before committing the freeing data
transaction, which could expose old, stale data, even if synchronization
has been performed.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
After commit 'ad5cd4f4ee4d ("ext4: fix fallocate to use file_modified to
update permissions consistently"), we can update mtime and ctime
appropriately through file_modified() when doing zero range, collapse
rage, insert range and punch hole, hence there is no need to explicit
update times in those paths, just drop them.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When zeroing a range of folios on the filesystem which block size is
less than the page size, the file's mapped blocks within one page will
be marked as unwritten, we should remove writable userspace mappings to
ensure that ext4_page_mkwrite() can be called during subsequent write
access to these partial folios. Otherwise, data written by subsequent
mmap writes may not be saved to disk.
$mkfs.ext4 -b 1024 /dev/vdb
$mount /dev/vdb /mnt
$xfs_io -t -f -c "pwrite -S 0x58 0 4096" -c "mmap -rw 0 4096" \
-c "mwrite -S 0x5a 2048 2048" -c "fzero 2048 2048" \
-c "mwrite -S 0x59 2048 2048" -c "close" /mnt/foo
$od -Ax -t x1z /mnt/foo
000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
*
000800 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59
*
001000
$umount /mnt && mount /dev/vdb /mnt
$od -Ax -t x1z /mnt/foo
000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
*
000800 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
001000
Fix this by introducing ext4_truncate_page_cache_block_range() to remove
writable userspace mappings when truncating a partial folio range.
Additionally, move the journal data mode-specific handlers and
truncate_pagecache_range() into this function, allowing it to serve as a
common helper that correctly manages the page cache in preparation for
block range manipulations.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20241220011637.1157197-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull Kbuild fixes from Masahiro Yamada:
- Suppress false-positive -Wformat-{overflow,truncation}-non-kprintf
warnings regardless of the W= option
- Avoid CONFIG_TRIM_UNUSED_KSYMS dropping symbols passed to symbol_get()
- Fix a build regression of the Debian linux-headers package
* tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: install-extmod-build: add missing quotation marks for CC variable
kbuild: fix misspelling in scripts/Makefile.lib
kbuild: keep symbols for symbol_get() even with CONFIG_TRIM_UNUSED_KSYMS
scripts/Makefile.extrawarn: Do not show clang's non-kprintf warnings at W=1