linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-04 10:56:06 -04:00

Author	SHA1	Message	Date
Zizhi Wo	243efbdf8e	ext4: update the comment about mb_optimize_scan Commit `196e402adf` ("ext4: improve cr 0 / cr 1 group scanning") introduces the sysfs control interface "mb_max_linear_groups" to address the problem that rotational devices performance degrades when the "mb_optimize_scan" feature is enabled, which may result in distant block group allocation. However, the name of the interface was incorrect in the comment to the ext4/mballoc.c file, and this patch fixes it, without further changes. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250224012005.689549-1-wozizhi@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:15:25 -04:00
Zhang Yi	18aba2adb3	jbd2: fix off-by-one while erasing journal In __jbd2_journal_erase(), the block_stop parameter includes the last block of a contiguous region; however, the calculation of byte_stop is incorrect, as it does not account for the bytes in that last block. Consequently, the page cache is not cleared properly, which occasionally causes the ext4/050 test to fail. Since block_stop operates on inclusion semantics, it involves repeated increments and decrements by 1, significantly increasing the complexity of the calculations. Optimize the calculation and fix the incorrect byte_stop by make both block_stop and byte_stop to use exclusion semantics. This fixes a failure in fstests ext4/050. Fixes: `01d5d96542` ("ext4: add discard/zeroout flags to journal flush") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250217065955.3829229-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:15:25 -04:00
Matthew Wilcox (Oracle)	08be56fec0	ext4: remove references to bh->b_page Buffer heads are attached to folios, not to pages. Also flush_dcache_page() is now deprecated in favour of flush_dcache_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250213182303.2133205-1-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:15:25 -04:00
Baokun Li	7e91ae31e2	ext4: goto right label 'out_mmap_sem' in ext4_setattr() Otherwise, if ext4_inode_attach_jinode() fails, a hung task will happen because filemap_invalidate_unlock() isn't called to unlock mapping->invalidate_lock. Like this: EXT4-fs error (device sda) in ext4_setattr:5557: Out of memory INFO: task fsstress:374 blocked for more than 122 seconds. Not tainted 6.14.0-rc1-next-20250206-xfstests-dirty #726 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:fsstress state:D stack:0 pid:374 tgid:374 ppid:373 task_flags:0x440140 flags:0x00000000 Call Trace: <TASK> __schedule+0x2c9/0x7f0 schedule+0x27/0xa0 schedule_preempt_disabled+0x15/0x30 rwsem_down_read_slowpath+0x278/0x4c0 down_read+0x59/0xb0 page_cache_ra_unbounded+0x65/0x1b0 filemap_get_pages+0x124/0x3e0 filemap_read+0x114/0x3d0 vfs_read+0x297/0x360 ksys_read+0x6c/0xe0 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `c7fc0366c6` ("ext4: partial zero eof block on unaligned inode size extension") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Link: https://patch.msgid.link/20250213112247.3168709-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:15:25 -04:00
Ye Bin	5701875f96	ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all() There's issue as follows: BUG: KASAN: use-after-free in ext4_xattr_inode_dec_ref_all+0x6ff/0x790 Read of size 4 at addr ffff88807b003000 by task syz-executor.0/15172 CPU: 3 PID: 15172 Comm: syz-executor.0 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0xbe/0xfd lib/dump_stack.c:123 print_address_description.constprop.0+0x1e/0x280 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 ext4_xattr_inode_dec_ref_all+0x6ff/0x790 fs/ext4/xattr.c:1137 ext4_xattr_delete_inode+0x4c7/0xda0 fs/ext4/xattr.c:2896 ext4_evict_inode+0xb3b/0x1670 fs/ext4/inode.c:323 evict+0x39f/0x880 fs/inode.c:622 iput_final fs/inode.c:1746 [inline] iput fs/inode.c:1772 [inline] iput+0x525/0x6c0 fs/inode.c:1758 ext4_orphan_cleanup fs/ext4/super.c:3298 [inline] ext4_fill_super+0x8c57/0xba40 fs/ext4/super.c:5300 mount_bdev+0x355/0x410 fs/super.c:1446 legacy_get_tree+0xfe/0x220 fs/fs_context.c:611 vfs_get_tree+0x8d/0x2f0 fs/super.c:1576 do_new_mount fs/namespace.c:2983 [inline] path_mount+0x119a/0x1ad0 fs/namespace.c:3316 do_mount+0xfc/0x110 fs/namespace.c:3329 __do_sys_mount fs/namespace.c:3540 [inline] __se_sys_mount+0x219/0x2e0 fs/namespace.c:3514 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Memory state around the buggy address: ffff88807b002f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88807b002f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88807b003000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88807b003080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88807b003100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Above issue happens as ext4_xattr_delete_inode() isn't check xattr is valid if xattr is in inode. To solve above issue call xattr_check_inode() check if xattr if valid in inode. In fact, we can directly verify in ext4_iget_extra_inode(), so that there is no divergent verification. Fixes: `e50e5129f3` ("ext4: xattr-in-inode support") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250208063141.1539283-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:15:19 -04:00
Ye Bin	69f3a3039b	ext4: introduce ITAIL helper Introduce ITAIL helper to get the bound of xattr in inode. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250208063141.1539283-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:14:47 -04:00
Eric Biggers	f6fc1584f5	jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature Since commit `dd348f054b` ("jbd2: switch to using the crc32c library"), jbd2_journal_has_csum_v2or3() and jbd2_journal_has_csum_v2or3_feature() are the same. Remove jbd2_journal_has_csum_v2or3_feature() and just keep jbd2_journal_has_csum_v2or3(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250207031424.42755-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-17 11:19:41 -04:00
Eric Biggers	e224fa3b8a	ext4: remove redundant function ext4_has_metadata_csum Since commit `f2b4fa1964` ("ext4: switch to using the crc32c library"), ext4_has_metadata_csum() is just an alias for ext4_has_feature_metadata_csum(). ext4_has_feature_metadata_csum() is generated by EXT4_FEATURE_RO_COMPAT_FUNCS and uses the regular naming convention for checking a single ext4 feature. Therefore, remove ext4_has_metadata_csum() and update all its callers to use ext4_has_feature_metadata_csum() directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250207031335.42637-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-17 11:19:41 -04:00
Jan Kara	a662f3c03b	jbd2: do not try to recover wiped journal If a journal is wiped, we will set journal->j_tail to 0. However if 'write' argument is not set (as it happens for read-only device or for ocfs2), the on-disk superblock is not updated accordingly and thus jbd2_journal_recover() cat try to recover the wiped journal. Fix the check in jbd2_journal_recover() to use journal->j_tail for checking empty journal instead. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250206094657.20865-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-17 11:19:41 -04:00
Jan Kara	e6eff39dd0	jbd2: remove wrong sb->s_sequence check Journal emptiness is not determined by sb->s_sequence == 0 but rather by sb->s_start == 0 (which is set a few lines above). Furthermore 0 is a valid transaction ID so the check can spuriously trigger. Remove the invalid WARN_ON. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250206094657.20865-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-17 11:19:41 -04:00
Jan Kara	5f920d5d60	ext4: verify fast symlink length Verify fast symlink length stored in inode->i_size matches the string stored in the inode to avoid surprises from corrupted filesystems. Reported-by: syzbot+48a99e426f29859818c0@syzkaller.appspotmail.com Tested-by: syzbot+48a99e426f29859818c0@syzkaller.appspotmail.com Fixes: `bae80473f7` ("ext4: use inode_set_cached_link()") Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250206094454.20522-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-17 11:19:41 -04:00
Bhupesh	c8e008b604	ext4: ignore xattrs past end Once inside 'ext4_xattr_inode_dec_ref_all' we should ignore xattrs entries past the 'end' entry. This fixes the following KASAN reported issue: ================================================================== BUG: KASAN: slab-use-after-free in ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 Read of size 4 at addr ffff888012c120c4 by task repro/2065 CPU: 1 UID: 0 PID: 2065 Comm: repro Not tainted 6.13.0-rc2+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x1fd/0x300 ? tcp_gro_dev_warn+0x260/0x260 ? _printk+0xc0/0x100 ? read_lock_is_recursive+0x10/0x10 ? irq_work_queue+0x72/0xf0 ? __virt_addr_valid+0x17b/0x4b0 print_address_description+0x78/0x390 print_report+0x107/0x1f0 ? __virt_addr_valid+0x17b/0x4b0 ? __virt_addr_valid+0x3ff/0x4b0 ? __phys_addr+0xb5/0x160 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 kasan_report+0xcc/0x100 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 ext4_xattr_inode_dec_ref_all+0xb8c/0xe90 ? ext4_xattr_delete_inode+0xd30/0xd30 ? __ext4_journal_ensure_credits+0x5f0/0x5f0 ? __ext4_journal_ensure_credits+0x2b/0x5f0 ? inode_update_timestamps+0x410/0x410 ext4_xattr_delete_inode+0xb64/0xd30 ? ext4_truncate+0xb70/0xdc0 ? ext4_expand_extra_isize_ea+0x1d20/0x1d20 ? __ext4_mark_inode_dirty+0x670/0x670 ? ext4_journal_check_start+0x16f/0x240 ? ext4_inode_is_fast_symlink+0x2f2/0x3a0 ext4_evict_inode+0xc8c/0xff0 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0 ? do_raw_spin_unlock+0x53/0x8a0 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0 evict+0x4ac/0x950 ? proc_nr_inodes+0x310/0x310 ? trace_ext4_drop_inode+0xa2/0x220 ? _raw_spin_unlock+0x1a/0x30 ? iput+0x4cb/0x7e0 do_unlinkat+0x495/0x7c0 ? try_break_deleg+0x120/0x120 ? 0xffffffff81000000 ? __check_object_size+0x15a/0x210 ? strncpy_from_user+0x13e/0x250 ? getname_flags+0x1dc/0x530 __x64_sys_unlinkat+0xc8/0xf0 do_syscall_64+0x65/0x110 entry_SYSCALL_64_after_hwframe+0x67/0x6f RIP: 0033:0x434ffd Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 RSP: 002b:00007ffc50fa7b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000107 RAX: ffffffffffffffda RBX: 00007ffc50fa7e18 RCX: 0000000000434ffd RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005 RBP: 00007ffc50fa7be0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007ffc50fa7e08 R14: 00000000004bbf30 R15: 0000000000000001 </TASK> The buggy address belongs to the object at ffff888012c12000 which belongs to the cache filp of size 360 The buggy address is located 196 bytes inside of freed 360-byte region [ffff888012c12000, ffff888012c12168) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12c12 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x40(head\|node=0\|zone=0) page_type: f5(slab) raw: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004 raw: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000 head: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004 head: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000 head: 0000000000000001 ffffea00004b0481 ffffffffffffffff 0000000000000000 head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888012c11f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888012c12000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888012c12080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888012c12100: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc ffff888012c12180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Reported-by: syzbot+b244bda78289b00204ed@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b244bda78289b00204ed Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Bhupesh <bhupesh@igalia.com> Link: https://patch.msgid.link/20250128082751.124948-2-bhupesh@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-16 22:41:17 -04:00
Kemeng Shi	477aa77cce	ext4: remove unused input "inode" in ext4_find_dest_de Remove unused input "inode" in ext4_find_dest_de. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123162050.2114499-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-16 22:41:17 -04:00
Kemeng Shi	e8eac9fc48	ext4: remove unneeded forward declaration in namei.c Remove unneeded forward declaration in namei.c Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123162050.2114499-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-16 22:41:17 -04:00
Kemeng Shi	eb640af64d	ext4: add missing brelse() for bh2 in ext4_dx_add_entry() Add missing brelse() for bh2 in ext4_dx_add_entry(). Fixes: `ac27a0ec11` ("[PATCH] ext4: initial copy of files from ext3") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123162050.2114499-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-16 22:41:17 -04:00
Kemeng Shi	fd3b3d7f51	jbd2: Correct stale comment of release_buffer_page Update stale lock info in comment of release_buffer_page. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:41:21 -04:00
Kemeng Shi	da5803391e	jbd2: correct stale function name in comment Rename stale journal_clear_revoked_flag to jbd2_clear_buffer_revoked_flags. Rename stale journal_switch_revoke to jbd2_journal_switch_revoke_table. Rename stale __journal_file_buffer to __jbd2_journal_file_buffer. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:41:21 -04:00
Kemeng Shi	6c14627790	jbd2: remove stale comment of update_t_max_wait Commit `2d44292058` ("jbd2: remove CONFIG_JBD2_DEBUG to update t_max_wait") removed jbd2_journal_enable_debug, just remove stale comment about jbd2_journal_enable_debug. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250123155014.2097920-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:41:21 -04:00
Kemeng Shi	0d26708d8e	jbd2: remove unused return value of do_readahead Remove unused return value of do_readahead. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:41:21 -04:00
Kemeng Shi	9e6d3f9c8a	jbd2: remove unused return value of jbd2_journal_cancel_revoke Remove unused return value of jbd2_journal_cancel_revoke. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:41:21 -04:00
Kemeng Shi	ec22493849	jbd2: remove unused h_jdata flag of handle Flag h_jdata is not used, just remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:41:21 -04:00
Baokun Li	5855c35194	ext4: show 'shutdown' hint when ext4 is forced to shutdown Now, if dmesg is cleared, we have no way of knowing if the file system has been shutdown. Moreover, ext4 allows directory reads even after the file system has been shutdown, so when reading a file returns -EIO, we cannot determine whether this is a hardware issue or if the file system has been shutdown. Therefore, when ext4 file system is shutdown, we're adding a 'shutdown' hint to commands like mount so users can easily check the file system's status. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-8-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:16:35 -04:00
Baokun Li	6b76715d5e	ext4: show 'emergency_ro' when EXT4_FLAGS_EMERGENCY_RO is set After commit `d3476f3dad` ("ext4: don't set SB_RDONLY after filesystem errors") in v6.12-rc1, the 'errors=remount-ro' mode no longer sets SB_RDONLY on errors, which results in us seeing the filesystem is still in rw state after errors. Therefore, after setting EXT4_FLAGS_EMERGENCY_RO, display the emergency_ro option so that users can query whether the current file system has become emergency read-only due to errors through commands such as 'mount' or 'cat /proc/fs/ext4/sdx/options'. Fixes: `d3476f3dad` ("ext4: don't set SB_RDONLY after filesystem errors") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:16:34 -04:00
Baokun Li	8f984530c2	ext4: correct behavior under errors=remount-ro mode And after commit `95257987a6` ("ext4: drop EXT4_MF_FS_ABORTED flag") in v6.6-rc1, the EXT4_FLAGS_SHUTDOWN bit is set in ext4_handle_error() under errors=remount-ro mode. This causes the read to fail even when the error is triggered in errors=remount-ro mode. To correct the behavior under errors=remount-ro, EXT4_FLAGS_SHUTDOWN is replaced by the newly introduced EXT4_FLAGS_EMERGENCY_RO. This new flag only prevents writes, matching the previous behavior with SB_RDONLY. Fixes: `95257987a6` ("ext4: drop EXT4_MF_FS_ABORTED flag") Closes: https://lore.kernel.org/all/22d652f6-cb3c-43f5-b2fe-0a4bb6516a04@huawei.com/ Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122114130.229709-6-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:16:34 -04:00
Baokun Li	5bc27f4d73	ext4: add more ext4_emergency_state() checks around sb_rdonly() Some functions check sb_rdonly() to make sure the file system isn't modified after it's read-only. Since we also don't want the file system modified if it's in an emergency state (shutdown or emergency_ro), we're adding additional ext4_emergency_state() checks where sb_rdonly() is checked. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122114130.229709-5-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:16:34 -04:00
Baokun Li	0a1b2f5ea9	ext4: add ext4_emergency_state() helper function Since both SHUTDOWN and EMERGENCY_RO are emergency states of the ext4 file system, and they are checked in similar locations, we have added a helper function, ext4_emergency_state(), to determine whether the current file system is in one of these two emergency states. Then, replace calls to ext4_forced_shutdown() with ext4_emergency_state() in those functions that could potentially trigger write operations. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:16:34 -04:00
Baokun Li	f3054e53c2	ext4: add EXT4_FLAGS_EMERGENCY_RO bit EXT4_FLAGS_EMERGENCY_RO Indicates that the current file system has become read-only due to some error. Compared to SB_RDONLY, setting it does not require a lock because we won't clear it, which avoids over-coupling with vfs freeze. Also, add a helper function ext4_emergency_ro() to check if the bit is set. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:16:34 -04:00
Baokun Li	99708f8a9d	ext4: convert EXT4_FLAGS_* defines to enum Do away with the defines and use an enum as it's cleaner. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:16:34 -04:00
Baokun Li	bd29881aff	ext4: pack holes in ext4_inode_info When CONFIG_DEBUG_SPINLOCK is not enabled (general case), there are four 4 bytes holes and one 2 bytes hole in struct ext4_inode_info. Move the members to pack the four 4 bytes holes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-10-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:09 -04:00
Baokun Li	5a1cd0e975	ext4: remove unused member 'i_unwritten' from 'ext4_inode_info' After commit `378f32bab3` ("ext4: introduce direct I/O write using iomap infrastructure"), no one cares about the value of i_unwritten, so there is no need to maintain this variable, remove it, and clean up the associated logic. Suggested-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-9-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:09 -04:00
Baokun Li	62c3da1eac	ext4: update the descriptions of data_err=abort and data_err=ignore We now print error messages in ext4_end_bio() when page writeback encounters an error. If data_err=abort is set, the journal will also be aborted in a kworker. This means that we now check all Buffer I/O in all modes and decide whether to abort the journal based on the data_err option. Therefore, we remove the ordered mode restriction in the descriptions of data_err=abort and data_err=ignore. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122110533.4116662-8-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:09 -04:00
Baokun Li	6e969ef3d7	jbd2: drop JBD2_ABORT_ON_SYNCDATA_ERR Since ext4's data_err=abort mode doesn't depend on JBD2_ABORT_ON_SYNCDATA_ERR anymore, and nobody else uses it, we can drop it and only warn in jbd2 as it used to be long ago. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:09 -04:00
Baokun Li	ce51afb8cc	ext4: abort journal on data writeback failure if in data_err=abort mode The data_err=abort was initially introduced to address users' worries about data corruption spreading unnoticed. With direct writes, we can rely on return values to confirm successful writes to disk. But with buffered writes, a successful return only means the data has been written to memory. Users have no way of knowing if the data has actually written it to disk unless they use fsync (which impacts performance and can sometimes miss errors). The current data_err=abort implementation relies on the ordered data list, but past changes have inadvertently altered its behavior. For example, if an extent is unwritten, we do not add the inode to the ordered data list. Therefore, jbd2 will not wait for the data write-back of that inode to complete and check for errors in the inode mapping. Moreover, the checks performed by jbd2 can also miss errors. Now, all buffered writes eventually call ext4_end_bio(), where I/O errors are checked. Therefore, we can check for the data_err=abort mode at this point and abort the journal in a kworker (due to the interrupt context). Therefore, when data_err=abort is enabled, the journal is aborted in ext4_end_io_end() when an I/O error is detected in ext4_end_bio() to make users who are concerned about the contents of the file happy. Suggested-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/c7ab26f3-85ad-4b31-b132-0afb0e07bf79@huawei.com Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122110533.4116662-6-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:09 -04:00
Baokun Li	b1a49bd813	ext4: extract ext4_has_journal_option() from __ext4_fill_super() Extract the ext4_has_journal_option() helper function to reduce code duplication. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122110533.4116662-5-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:09 -04:00
Baokun Li	26343ca0df	ext4: reject the 'data_err=abort' option in nojournal mode data_err=abort aborts the journal on I/O errors. However, this option is meaningless if journal is disabled, so it is rejected in nojournal mode to reduce unnecessary checks. Also, this option is ignored upon remount. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122110533.4116662-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:08 -04:00
Baokun Li	e856f93e0f	ext4: do not convert the unwritten extents if data writeback fails When dioread_nolock is turned on (the default), it will convert unwritten extents to written at ext4_end_io_end(), even if the data writeback fails. It leads to the possibility that stale data may be exposed when the physical block corresponding to the file data is read-only (i.e., writes return -EIO, but reads are normal). Therefore a new ext4_io_end->flags EXT4_IO_END_FAILED is added, which indicates that some bio write-back failed in the current ext4_io_end. When this flag is set, the unwritten to written conversion is no longer performed. Users can read the data normally until the caches are dropped, after that, the failed extents can only be read to all 0. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:08 -04:00
Baokun Li	2f94b537c4	ext4: replace opencoded ext4_end_io_end() in ext4_put_io_end() This reduces duplicate code and ensures that a “potential data loss” warning is available if the unwritten conversion fails. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:08:08 -04:00
Charles Han	57e7239ce0	ext4: fix potential null dereference in ext4 kunit test kunit_kzalloc() may return a NULL pointer, dereferencing it without NULL check may lead to NULL dereference. Add a NULL check for grp. Fixes: `ac96b56a2f` ("ext4: Add unit test for mb_mark_used") Fixes: `b7098e1fa7` ("ext4: Add unit test for mb_free_blocks") Signed-off-by: Charles Han <hanchunchao@inspur.com> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20250110092421.35619-1-hanchunchao@inspur.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 10:01:14 -04:00
Julian Sun	30cbe84d48	ext4: Refactor out ext4_try_to_write_inline_data() Refactor ext4_try_to_write_inline_data() to simplify its implementation by directly invoking ext4_generic_write_inline_data(). Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107045730.1837808-1-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 09:57:19 -04:00
Julian Sun	f9bdb042df	ext4: Replace ext4_da_write_inline_data_begin() with ext4_generic_write_inline_data(). Replace the call to ext4_da_write_inline_data_begin() with ext4_generic_write_inline_data(), and delete the ext4_da_write_inline_data_begin(). Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107045710.1837756-1-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 09:57:19 -04:00
Julian Sun	3db572f780	ext4: Introduce a new helper function ext4_generic_write_inline_data() A new function, ext4_generic_write_inline_data(), is introduced to provide a generic implementation of the common logic found in ext4_da_write_inline_data_begin() and ext4_try_to_write_inline_data(). This function will be utilized in the subsequent two patches. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107045549.1837589-1-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 09:57:19 -04:00
Julian Sun	90c764b4b7	ext4: Don't set EXT4_STATE_MAY_INLINE_DATA for ea inodes Setting the EXT4_STATE_MAY_INLINE_DATA flag for ea inodes is meaningless because ea inodes do not use functions like ext4_write_begin(). Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107044702.1836852-3-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 09:57:19 -04:00
Julian Sun	f896776a70	ext4: Remove a redundant return statement Remove a redundant return statements in the ext4_es_remove_extent() function. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107044702.1836852-2-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-13 09:57:19 -04:00
Ojaswin Mujoo	530fea29ef	ext4: protect ext4_release_dquot against freezing Protect ext4_release_dquot against freezing so that we don't try to start a transaction when FS is frozen, leading to warnings. Further, avoid taking the freeze protection if a transaction is already running so that we don't need end up in a deadlock as described in `46e294efc3` ext4: fix deadlock with fs freezing and EA inodes Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241121123855.645335-3-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-05 22:12:27 -05:00
Theodore Ts'o	9e28059d56	ext4: introduce linear search for dentries This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit `5c26d2f1d3` ("unicode: Don't special case ignorable code points"). There are good reasons why this change should be made; it's actually quite stupid that Unicode seems to think that the characters ❤ and ❤️ should be casefolded. Unfortimately because of the backwards compatibility issue, this commit was reverted in `231825b2e1`. This problem is addressed by instituting a brute-force linear fallback if a lookup fails on case-folded directory, which does result in a performance hit when looking up files affected by the changing how thekernel treats ignorable Uniode characters, or when attempting to look up non-existent file names. So this fallback can be disabled by setting an encoding flag if in the future, the system administrator or the manufacturer of a mobile handset or tablet can be sure that there was no opportunity for a kernel to insert file names with incompatible encodings. Fixes: `5c26d2f1d3` ("unicode: Don't special case ignorable code points") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>	2025-02-13 15:05:53 -05:00
Jan Kara	a399af4e3b	jbd2: Avoid long replay times due to high number or revoke blocks Some users are reporting journal replay takes a long time when there is excessive number of revoke blocks in the journal. Reported times are like: 1048576 records - 95 seconds 2097152 records - 580 seconds The problem is that hash chains in the revoke table gets excessively long in these cases. Fix the problem by sizing the revoke table appropriately before the revoke pass. Thanks to Alexey Zhuravlev <azhuravlev@ddn.com> for benchmarking the patch with large numbers of revoke blocks [1]. [1] https://lore.kernel.org/all/20250113183107.7bfef7b6@x390.bzzz77.ru Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250121140925.17231-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-02-13 00:19:10 -05:00
Zhang Yi	2890e5e0f4	ext4: move out common parts into ext4_fallocate() Currently, all zeroing ranges, punch holes, collapse ranges, and insert ranges first wait for all existing direct I/O workers to complete, and then they acquire the mapping's invalidate lock before performing the actual work. These common components are nearly identical, so we can simplify the code by factoring them out into the ext4_fallocate(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20241220011637.1157197-11-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-02-10 07:48:25 -05:00
Zhang Yi	ea3f17efd3	ext4: move out inode_lock into ext4_fallocate() Currently, all five sub-functions of ext4_fallocate() acquire the inode's i_rwsem at the beginning and release it before exiting. This process can be simplified by factoring out the management of i_rwsem into the ext4_fallocate() function. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20241220011637.1157197-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-02-10 07:48:25 -05:00
Zhang Yi	fd2f764826	ext4: factor out ext4_do_fallocate() Now the real job of normal fallocate are open coded in ext4_fallocate(), factor out a new helper ext4_do_fallocate() to do the real job, like others functions (e.g. ext4_zero_range()) in ext4_fallocate() do, this can make the code more clear, no functional changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20241220011637.1157197-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-02-10 07:48:25 -05:00
Zhang Yi	4942550437	ext4: refactor ext4_insert_range() Simplify ext4_insert_range() and align its code style with that of ext4_collapse_range(). Refactor it by: a) renaming variables, b) removing redundant input parameter checks and moving the remaining checks under i_rwsem in preparation for future refactoring, and c) renaming the three stale error tags. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20241220011637.1157197-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-02-10 07:48:25 -05:00

1 2 3 4 5 ...

1335794 Commits