linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 11:21:26 -04:00

Author	SHA1	Message	Date
Li Chen	be81084e03	ocfs2: use jbd2 jinode dirty range accessor ocfs2 journal commit callback reads jbd2_inode dirty range fields without holding journal->j_list_lock. Use jbd2_jinode_get_dirty_range() to get the range in bytes. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-4-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:51:05 -04:00
Li Chen	660d236699	ext4: use jbd2 jinode dirty range accessor ext4 journal commit callbacks access jbd2_inode dirty range fields without holding journal->j_list_lock. Use jbd2_jinode_get_dirty_range() to get the range in bytes, and read i_transaction with READ_ONCE() in the redirty check. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-3-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:51:05 -04:00
Li Chen	5267f6ef49	jbd2: add jinode dirty range accessors Provide a helper to fetch jinode dirty ranges in bytes. This lets filesystem callbacks avoid depending on the internal representation, preparing for a later conversion to page units. Suggested-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-2-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:51:05 -04:00
Milos Nikic	f7fc28b014	jbd2: gracefully abort on transaction state corruptions Auditing the jbd2 codebase reveals several legacy J_ASSERT calls that enforce internal state machine invariants (e.g., verifying jh->b_transaction or jh->b_next_transaction pointers). When these invariants are broken, the journal is in a corrupted state. However, triggering a fatal panic brings down the entire system for a localized filesystem error. This patch targets a specific class of these asserts: those residing inside functions that natively return integer error codes, booleans, or error pointers. It replaces the hard J_ASSERTs with WARN_ON_ONCE to capture the offending stack trace, safely drops any held locks, gracefully aborts the journal, and returns -EINVAL. This prevents a catastrophic kernel panic while ensuring the corrupted journal state is safely contained and upstream callers (like ext4 or ocfs2) can gracefully handle the aborted handle. Functions modified in fs/jbd2/transaction.c: - jbd2__journal_start() - do_get_write_access() - jbd2_journal_dirty_metadata() - jbd2_journal_forget() - jbd2_journal_try_to_free_buffers() - jbd2_journal_file_inode() Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-3-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:46:36 -04:00
Milos Nikic	64924362f8	jbd2: gracefully abort instead of panicking on unlocked buffer In jbd2_journal_get_create_access(), if the caller passes an unlocked buffer, the code currently triggers a fatal J_ASSERT. While an unlocked buffer here is a clear API violation and a bug in the caller, crashing the entire system is an overly severe response. It brings down the whole machine for a localized filesystem inconsistency. Replace the J_ASSERT with a WARN_ON_ONCE to capture the offending caller's stack trace, and return an error (-EINVAL). This allows the journal to gracefully abort the transaction, protecting data integrity without causing a kernel panic. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-2-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:46:36 -04:00
Weixie Cui	af1502f98e	ext4: simplify mballoc preallocation size rounding for small files The if-else ladder in ext4_mb_normalize_request() manually rounds up the preallocation size to the next power of two for files up to 1MB, enumerating each step from 16KB to 1MB individually. Replace this with a single roundup_pow_of_two() call clamped to a 16KB minimum, which is functionally equivalent but much more concise. Also replace raw byte constants with SZ_1M and SZ_16K from <linux/sizes.h> for clarity, and remove the stale "XXX: should this table be tunable?" comment that has been there since the original mballoc code. No functional change. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Weixie Cui <cuiweixie@gmail.com> Link: https://patch.msgid.link/tencent_E9C5F1B2E9939B3037501FD04A7E9CF0C407@qq.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:43:06 -04:00
Julia Lawall	a804ecc399	ext4/move_extent: use folio_next_pos() A series of patches such as commit `60a70e6143` ("mm: Use folio_next_pos()") replace folio_pos() + folio_size() by folio_next_pos(). The former performs x << z + y << z while the latter performs (x + y) << z, which is slightly more efficient. This case was not taken into account, perhaps because the argument is not named folio. The change was performed using the following Coccinelle semantic patch: @@ expression folio; @@ - folio_pos(folio) + folio_size(folio) + folio_next_pos(folio) Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260222125049.1309075-1-Julia.Lawall@inria.fr Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:40:31 -04:00
Guoqing Jiang	2f17d1993b	ext4: remove tl argument from ext4_fc_replay_{add,del}_range Since commit `a7ba36bc94` ("ext4: fix fast commit alignment issues"), both ext4_fc_replay_add_range and ext4_fc_replay_del_range get ex based on 'val' instead of 'tl'. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260121063805.19863-1-guoqing.jiang@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:36:08 -04:00
Li Chen	eb10607628	ext4: remove unused i_fc_wait i_fc_wait is only initialized in ext4_fc_init_inode() and never used for waiting or wakeups. Drop it. Signed-off-by: Li Chen <me@linux.beauty> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260120121941.144192-1-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:34:53 -04:00
Deepanshu Kartikey	9b25f381de	ext4: unmap invalidated folios from page tables in mpage_release_unused_pages() When delayed block allocation fails (e.g., due to filesystem corruption detected in ext4_map_blocks()), the writeback error handler calls mpage_release_unused_pages(invalidate=true) which invalidates affected folios by clearing their uptodate flag via folio_clear_uptodate(). However, these folios may still be mapped in process page tables. If a subsequent operation (such as ftruncate calling ext4_block_truncate_page) triggers a write fault, the existing page table entry allows access to the now-invalidated folio. This leads to ext4_page_mkwrite() being called with a non-uptodate folio, which then gets marked dirty, triggering: WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960 __folio_mark_dirty+0x578/0x880 Call Trace: fault_dirty_shared_page+0x16e/0x2d0 do_wp_page+0x38b/0xd20 handle_pte_fault+0x1da/0x450 The sequence leading to this warning is: 1. Process writes to mmap'd file, folio becomes uptodate and dirty 2. Writeback begins, but delayed allocation fails due to corruption 3. mpage_release_unused_pages(invalidate=true) is called: - block_invalidate_folio() clears dirty flag - folio_clear_uptodate() clears uptodate flag - But folio remains mapped in page tables 4. Later, ftruncate triggers ext4_block_truncate_page() 5. This causes a write fault on the still-mapped folio 6. ext4_page_mkwrite() is called with folio that is !uptodate 7. block_page_mkwrite() marks buffers dirty 8. fault_dirty_shared_page() tries to mark folio dirty 9. block_dirty_folio() calls __folio_mark_dirty(warn=1) 10. WARNING triggers: WARN_ON_ONCE(warn && !uptodate && !dirty) Fix this by unmapping folios from page tables before invalidating them using unmap_mapping_pages(). This ensures that subsequent accesses trigger new page faults rather than reusing invalidated folios through stale page table entries. Note that this results in data loss for any writes to the mmap'd region that couldn't be written back, but this is expected behavior when writeback fails due to filesystem corruption. The existing error message already states "This should not happen!! Data will be lost". Reported-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com Tested-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20251205055914.1393799-1-kartikey406@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:17:14 -04:00
Theodore Ts'o	9ee29d20aa	ext4: always drain queued discard work in ext4_mb_release() While reviewing recent ext4 patch[1], Sashiko raised the following concern[2]: > If the filesystem is initially mounted with the discard option, > deleting files will populate sbi->s_discard_list and queue > s_discard_work. If it is then remounted with nodiscard, the > EXT4_MOUNT_DISCARD flag is cleared, but the pending s_discard_work is > neither cancelled nor flushed. [1] https://lore.kernel.org/r/20260319094545.19291-1-qiang.zhang@linux.dev/ [2] https://sashiko.dev/#/patchset/20260319094545.19291-1-qiang.zhang%40linux.dev The concern was valid, but it had nothing to do with the patch[1]. One of the problems with Sashiko in its current (early) form is that it will detect pre-existing issues and report it as a problem with the patch that it is reviewing. In practice, it would be hard to hit deliberately (unless you are a malicious syzkaller fuzzer), since it would involve mounting the file system with -o discard, and then deleting a large number of files, remounting the file system with -o nodiscard, and then immediately unmounting the file system before the queued discard work has a change to drain on its own. Fix it because it's a real bug, and to avoid Sashiko from raising this concern when analyzing future patches to mballoc.c. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Fixes: `55cdd0af2b` ("ext4: get discard out of jbd2 commit kthread contex") Cc: stable@kernel.org	2026-03-27 23:39:10 -04:00
Theodore Ts'o	bb81702370	ext4: handle wraparound when searching for blocks for indirect mapped blocks Commit `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") restricts what blocks will be allocated for indirect block based files to block numbers that fit within 32-bit block numbers. However, when using a review bot running on the latest Gemini LLM to check this commit when backporting into an LTS based kernel, it raised this concern: If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal group was populated via stream allocation from s_mb_last_groups), then start will be >= ngroups. Does this allow allocating blocks beyond the 32-bit limit for indirect block mapped files? The commit message mentions that ext4_mb_scan_groups_linear() takes care to not select unsupported groups. However, its loop uses group = *start, and the very first iteration will call ext4_mb_scan_group() with this unsupported group because next_linear_group() is only called at the end of the iteration. After reviewing the code paths involved and considering the LLM review, I determined that this can happen when there is a file system where some files/directories are extent-mapped and others are indirect-block mapped. To address this, add a safety clamp in ext4_mb_scan_groups(). Fixes: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Cc: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20260326045834.1175822-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:39:02 -04:00
hongao	3ceda17325	ext4: skip split extent recovery on corruption ext4_split_extent_at() retries after ext4_ext_insert_extent() fails by refinding the original extent and restoring its length. That recovery is only safe for transient resource failures such as -ENOSPC, -EDQUOT, and -ENOMEM. When ext4_ext_insert_extent() fails because the extent tree is already corrupted, ext4_find_extent() can return a leaf path without p_ext. ext4_split_extent_at() then dereferences path[depth].p_ext while trying to fix up the original extent length, causing a NULL pointer dereference while handling a pre-existing filesystem corruption. Do not enter the recovery path for corruption errors, and validate p_ext after refinding the extent before touching it. This keeps the recovery path limited to cases it can actually repair and turns the syzbot-triggered crash into a proper corruption report. Fixes: `716b9c23b8` ("ext4: refactor split and convert extents") Reported-by: syzbot+1ffa5d865557e51cb604@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1ffa5d865557e51cb604 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: hongao <hongao@uniontech.com> Link: https://patch.msgid.link/EF77870F23FF9C90+20260324015815.35248-1-hongao@uniontech.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:38:52 -04:00
Baokun Li	ec0a7500d8	ext4: fix iloc.bh leak in ext4_fc_replay_inode() error paths During code review, Joseph found that ext4_fc_replay_inode() calls ext4_get_fc_inode_loc() to get the inode location, which holds a reference to iloc.bh that must be released via brelse(). However, several error paths jump to the 'out' label without releasing iloc.bh: - ext4_handle_dirty_metadata() failure - sync_dirty_buffer() failure - ext4_mark_inode_used() failure - ext4_iget() failure Fix this by introducing an 'out_brelse' label placed just before the existing 'out' label to ensure iloc.bh is always released. Additionally, make ext4_fc_replay_inode() propagate errors properly instead of always returning 0. Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com> Fixes: `8016e29f43` ("ext4: fast commit recovery path") Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260323060836.3452660-1-libaokun@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:38:01 -04:00
Jan Kara	0c90eed1b9	ext4: fix deadlock on inode reallocation Currently there is a race in ext4 when reallocating freed inode resulting in a deadlock: Task1 Task2 ext4_evict_inode() handle = ext4_journal_start(); ... if (IS_SYNC(inode)) handle->h_sync = 1; ext4_free_inode() ext4_new_inode() handle = ext4_journal_start() finds the bit in inode bitmap already clear insert_inode_locked() waits for inode to be removed from the hash. ext4_journal_stop(handle) jbd2_journal_stop(handle) jbd2_log_wait_commit(journal, tid); - deadlocks waiting for transaction handle Task2 holds Fix the problem by removing inode from the hash already in ext4_clear_inode() by which time all IO for the inode is done so reuse is already fine but we are still before possibly blocking on transaction commit. Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com> Link: https://lore.kernel.org/all/abNvb2PcrKj1FBeC@ly-workstation Fixes: `88ec797c46` ("fs: make insert_inode_locked() wait for inode destruction") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260320090428.24899-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:37:48 -04:00
Jiayuan Chen	d15e4b0a41	ext4: fix use-after-free in update_super_work when racing with umount Commit `b98535d091` ("ext4: fix bug_on in start_this_handle during umount filesystem") moved ext4_unregister_sysfs() before flushing s_sb_upd_work to prevent new error work from being queued via /proc/fs/ext4/xx/mb_groups reads during unmount. However, this introduced a use-after-free because update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which accesses the kobject's kernfs_node after it has been freed by kobject_del() in ext4_unregister_sysfs(): update_super_work ext4_put_super ----------------- -------------- ext4_unregister_sysfs(sb) kobject_del(&sbi->s_kobj) __kobject_del() sysfs_remove_dir() kobj->sd = NULL sysfs_put(sd) kernfs_put() // RCU free ext4_notify_error_sysfs(sbi) sysfs_notify(&sbi->s_kobj) kn = kobj->sd // stale pointer kernfs_get(kn) // UAF on freed kernfs_node ext4_journal_destroy() flush_work(&sbi->s_sb_upd_work) Instead of reordering the teardown sequence, fix this by making ext4_notify_error_sysfs() detect that sysfs has already been torn down by checking s_kobj.state_in_sysfs, and skipping the sysfs_notify() call in that case. A dedicated mutex (s_error_notify_mutex) serializes ext4_notify_error_sysfs() against kobject_del() in ext4_unregister_sysfs() to prevent TOCTOU races where the kobject could be deleted between the state_in_sysfs check and the sysfs_notify() call. Fixes: `b98535d091` ("ext4: fix bug_on in start_this_handle during umount filesystem") Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260319120336.157873-1-jiayuan.chen@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:37:39 -04:00
Zqiang	496bb99b7e	ext4: fix the might_sleep() warnings in kvfree() Use the kvfree() in the RCU read critical section can trigger the following warnings: EXT4-fs (vdb): unmounting filesystem cd983e5b-3c83-4f5a-a136-17b00eb9d018. WARNING: suspicious RCU usage ./include/linux/rcupdate.h:409 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 Call Trace: <TASK> dump_stack_lvl+0xbb/0xd0 dump_stack+0x14/0x20 lockdep_rcu_suspicious+0x15a/0x1b0 __might_resched+0x375/0x4d0 ? put_object.part.0+0x2c/0x50 __might_sleep+0x108/0x160 vfree+0x58/0x910 ? ext4_group_desc_free+0x27/0x270 kvfree+0x23/0x40 ext4_group_desc_free+0x111/0x270 ext4_put_super+0x3c8/0xd40 generic_shutdown_super+0x14c/0x4a0 ? __pfx_shrinker_free+0x10/0x10 kill_block_super+0x40/0x90 ext4_kill_sb+0x6d/0xb0 deactivate_locked_super+0xb4/0x180 deactivate_super+0x7e/0xa0 cleanup_mnt+0x296/0x3e0 __cleanup_mnt+0x16/0x20 task_work_run+0x157/0x250 ? __pfx_task_work_run+0x10/0x10 ? exit_to_user_mode_loop+0x6a/0x550 exit_to_user_mode_loop+0x102/0x550 do_syscall_64+0x44a/0x500 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> BUG: sleeping function called from invalid context at mm/vmalloc.c:3441 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 556, name: umount preempt_count: 1, expected: 0 CPU: 3 UID: 0 PID: 556 Comm: umount Call Trace: <TASK> dump_stack_lvl+0xbb/0xd0 dump_stack+0x14/0x20 __might_resched+0x275/0x4d0 ? put_object.part.0+0x2c/0x50 __might_sleep+0x108/0x160 vfree+0x58/0x910 ? ext4_group_desc_free+0x27/0x270 kvfree+0x23/0x40 ext4_group_desc_free+0x111/0x270 ext4_put_super+0x3c8/0xd40 generic_shutdown_super+0x14c/0x4a0 ? __pfx_shrinker_free+0x10/0x10 kill_block_super+0x40/0x90 ext4_kill_sb+0x6d/0xb0 deactivate_locked_super+0xb4/0x180 deactivate_super+0x7e/0xa0 cleanup_mnt+0x296/0x3e0 __cleanup_mnt+0x16/0x20 task_work_run+0x157/0x250 ? __pfx_task_work_run+0x10/0x10 ? exit_to_user_mode_loop+0x6a/0x550 exit_to_user_mode_loop+0x102/0x550 do_syscall_64+0x44a/0x500 entry_SYSCALL_64_after_hwframe+0x77/0x7f The above scenarios occur in initialization failures and teardown paths, there are no parallel operations on the resources released by kvfree(), this commit therefore remove rcu_read_lock/unlock() and use rcu_access_pointer() instead of rcu_dereference() operations. Fixes: `7c990728b9` ("ext4: fix potential race between s_flex_groups online resizing and access") Fixes: `df3da4ea5a` ("ext4: fix potential race between s_group_info online resizing and access") Signed-off-by: Zqiang <qiang.zhang@linux.dev> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260319094545.19291-1-qiang.zhang@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:36:20 -04:00
Helen Koike	3822743dc2	ext4: reject mount if bigalloc with s_first_data_block != 0 bigalloc with s_first_data_block != 0 is not supported, reject mounting it. Signed-off-by: Helen Koike <koike@igalia.com> Suggested-by: Theodore Ts'o <tytso@mit.edu> Reported-by: syzbot+b73703b873a33d8eb8f6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b73703b873a33d8eb8f6 Link: https://patch.msgid.link/20260317142325.135074-1-koike@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:36:11 -04:00
Ye Bin	9e1b14320b	ext4: fix extents-test.c is not compiled when EXT4_KUNIT_TESTS=M Now, only EXT4_KUNIT_TESTS=Y testcase will be compiled in 'extents.c'. To solve this issue, the ext4 test code needs to be decoupled. The 'extents-test' module is compiled into 'ext4-test' module. Fixes: `cb1e0c1d1f` ("ext4: kunit tests for extent splitting and conversion") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260314075258.1317579-4-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-03-27 23:36:06 -04:00
Ye Bin	519b76ac0b	ext4: fix mballoc-test.c is not compiled when EXT4_KUNIT_TESTS=M Now, only EXT4_KUNIT_TESTS=Y testcase will be compiled in 'mballoc.c'. To solve this issue, the ext4 test code needs to be decoupled. The ext4 test module is compiled into a separate module. Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Closes: https://patchwork.kernel.org/project/cifs-client/patch/20260118091313.1988168-2-chenxiaosong.chenxiaosong@linux.dev/ Fixes: `7c9fa399a3` ("ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260314075258.1317579-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-03-27 23:36:02 -04:00
Ye Bin	49504a5125	ext4: introduce EXPORT_SYMBOL_FOR_EXT4_TEST() helper Introduce EXPORT_SYMBOL_FOR_EXT4_TEST() helper for kuint test. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260314075258.1317579-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-03-27 23:34:19 -04:00
Milos Nikic	bac3190a8e	jbd2: gracefully abort on checkpointing state corruptions This patch targets two internal state machine invariants in checkpoint.c residing inside functions that natively return integer error codes. - In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE and a graceful journal abort, returning -EFSCORRUPTED. - In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for an unexpected buffer_jwrite state. If the warning triggers, we explicitly drop the just-taken get_bh() reference and call __flush_batch() to safely clean up any previously queued buffers in the j_chkpt_bhs array, preventing a memory leak before returning -EFSCORRUPTED. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260311041548.159424-1-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:34:09 -04:00
Edward Adam Davis	5422fe71d2	ext4: avoid infinite loops caused by residual data On the mkdir/mknod path, when mapping logical blocks to physical blocks, if inserting a new extent into the extent tree fails (in this example, because the file system disabled the huge file feature when marking the inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to reclaim the physical block without deleting the corresponding data in the extent tree. This causes subsequent mkdir operations to reference the previously reclaimed physical block number again, even though this physical block is already being used by the xattr block. Therefore, a situation arises where both the directory and xattr are using the same buffer head block in memory simultaneously. The above causes ext4_xattr_block_set() to enter an infinite loop about "inserted" and cannot release the inode lock, ultimately leading to the 143s blocking problem mentioned in [1]. If the metadata is corrupted, then trying to remove some extent space can do even more harm. Also in case EXT4_GET_BLOCKS_DELALLOC_RESERVE was passed, remove space wrongly update quota information. Jan Kara suggests distinguishing between two cases: 1) The error is ENOSPC or EDQUOT - in this case the filesystem is fully consistent and we must maintain its consistency including all the accounting. However these errors can happen only early before we've inserted the extent into the extent tree. So current code works correctly for this case. 2) Some other error - this means metadata is corrupted. We should strive to do as few modifications as possible to limit damage. So I'd just skip freeing of allocated blocks. [1] INFO: task syz.0.17:5995 blocked for more than 143 seconds. Call Trace: inode_lock_nested include/linux/fs.h:1073 [inline] __start_dirop fs/namei.c:2923 [inline] start_dirop fs/namei.c:2934 [inline] Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7 Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com Link: https://patch.msgid.link/tencent_43696283A68450B761D76866C6F360E36705@qq.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:33:56 -04:00
Tejas Bharambe	2acb5c12eb	ext4: validate p_idx bounds in ext4_ext_correct_indexes ext4_ext_correct_indexes() walks up the extent tree correcting index entries when the first extent in a leaf is modified. Before accessing path[k].p_idx->ei_block, there is no validation that p_idx falls within the valid range of index entries for that level. If the on-disk extent header contains a corrupted or crafted eh_entries value, p_idx can point past the end of the allocated buffer, causing a slab-out-of-bounds read. Fix this by validating path[k].p_idx against EXT_LAST_INDEX() at both access sites: before the while loop and inside it. Return -EFSCORRUPTED if the index pointer is out of range, consistent with how other bounds violations are handled in the ext4 extent tree code. Reported-by: syzbot+04c4e65cab786a2e5b7e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=04c4e65cab786a2e5b7e Signed-off-by: Tejas Bharambe <tejas.bharambe@outlook.com> Link: https://patch.msgid.link/JH0PR06MB66326016F9B6AD24097D232B897CA@JH0PR06MB6632.apcprd06.prod.outlook.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:33:46 -04:00
Ye Bin	73bf12adbe	ext4: test if inode's all dirty pages are submitted to disk The commit `aa373cf550` ("writeback: stop background/kupdate works from livelocking other works") introduced an issue where unmounting a filesystem in a multi-logical-partition scenario could lead to batch file data loss. This problem was not fixed until the commit `d92109891f` ("fs/writeback: bail out if there is no more inodes for IO and queued once"). It took considerable time to identify the root cause. Additionally, in actual production environments, we frequently encountered file data loss after normal system reboots. Therefore, we are adding a check in the inode release flow to verify whether all dirty pages have been flushed to disk, in order to determine whether the data loss is caused by a logic issue in the filesystem code. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260303012242.3206465-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:33:34 -04:00
Ojaswin Mujoo	c4a48e9eee	ext4: minor fix for ext4_split_extent_zeroout() We missed storing the error which triggerd smatch warning: fs/ext4/extents.c:3369 ext4_split_extent_zeroout() warn: duplicate zero check 'err' (previous on line 3363) fs/ext4/extents.c 3361 3362 err = ext4_ext_get_access(handle, inode, path + depth); 3363 if (err) 3364 return err; 3365 3366 ext4_ext_mark_initialized(ex); 3367 3368 ext4_ext_dirty(handle, inode, path + depth); --> 3369 if (err) 3370 return err; 3371 3372 return 0; 3373 } Fix it by correctly storing the err value from ext4_ext_dirty(). Link: https://lore.kernel.org/all/aYXvVgPnKltX79KE@stanley.mountain/ Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `a985e07c26` ("ext4: refactor zeroout path and handle all cases") Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260302143811.605174-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:33:22 -04:00
Ye Bin	46066e3a06	ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() There's issue as follows: ... EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117 EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117 EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost EXT4-fs (mmcblk0p1): error count since last fsck: 1 EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760 EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760 ... According to the log analysis, blocks are always requested from the corrupted block group. This may happen as follows: ext4_mb_find_by_goal ext4_mb_load_buddy ext4_mb_load_buddy_gfp ext4_mb_init_cache ext4_read_block_bitmap_nowait ext4_wait_block_bitmap ext4_validate_block_bitmap if (!grp \|\| EXT4_MB_GRP_BBITMAP_CORRUPT(grp)) return -EFSCORRUPTED; // There's no logs. if (err) return err; // Will return error ext4_lock_group(ac->ac_sb, group); if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable goto out; After commit `9008a58e5d` ("ext4: make the bitmap read routines return real error codes") merged, Commit `163a203ddb` ("ext4: mark block group as corrupt on block bitmap error") is no real solution for allocating blocks from corrupted block groups. This is because if 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then 'ext4_mb_load_buddy()' may return an error. This means that the block allocation will fail. Therefore, check block group if corrupted when ext4_mb_load_buddy() returns error. Fixes: `163a203ddb` ("ext4: mark block group as corrupt on block bitmap error") Fixes: `9008a58e5d` ("ext4: make the bitmap read routines return real error codes") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260302134619.3145520-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:33:08 -04:00
Ritesh Harjani (IBM)	afe376d2c1	ext4: kunit: extents-test: lix percpu_counters list corruption commit `82f80e2e3b` ("ext4: add extent status cache support to kunit tests"), added ext4_es_register_shrinker() in extents_kunit_init() function but failed to add the unregister shrinker routine in extents_kunit_exit(). This could cause the following percpu_counters list corruption bug. ok 1 split unwrit extent to 2 extents and convert 1st half writ slab kmalloc-4k start c0000002007ff000 pointer offset 1448 size 4096 list_add corruption. next->prev should be prev (c000000004bc9e60), but was 0000000000000000. (next=c0000002007ff5a8). ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:29! cpu 0x2: Vector: 700 (Program Check) at [c000000241927a30] pc: c000000000f26ed0: __list_add_valid_or_report+0x120/0x164 lr: c000000000f26ecc: __list_add_valid_or_report+0x11c/0x164 sp: c000000241927cd0 msr: 800000000282b033 current = 0xc000000241215200 paca = 0xc0000003fffff300 irqmask: 0x03 irq_happened: 0x09 pid = 258, comm = kunit_try_catch kernel BUG at lib/list_debug.c:29! enter ? for help __percpu_counter_init_many+0x148/0x184 ext4_es_register_shrinker+0x74/0x23c extents_kunit_init+0x100/0x308 kunit_try_run_case+0x78/0x1f8 kunit_generic_run_threadfn_adapter+0x40/0x70 kthread+0x190/0x1a0 start_kernel_thread+0x14/0x18 2:mon> This happens because: extents_kunit_init(test N): ext4_es_register_shrinker(sbi) percpu_counters_init() x 4; // this adds 4 list nodes to global percpu_counters list list_add(&fbc->list, &percpu_counters); shrinker_register(); extents_kunit_exit(test N): kfree(sbi); // frees sbi w/o removing those 4 list nodes. // So, those list node now becomes dangling pointers extents_kunit_init(test N+1): kzalloc_obj(ext4_sb_info) // allocator returns same page, but zeroed. ext4_es_register_shrinker(sbi) percpu_counters_init() list_add(&fbc->list, &percpu_counters); __list_add_valid(new, prev, next); next->prev != prev // list corruption bug detected, since next->prev = NULL Fixes: `82f80e2e3b` ("ext4: add extent status cache support to kunit tests") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/5bb9041471dab8ce870c191c19cbe4df57473be8.1772381213.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:32:23 -04:00
Li Chen	1aec30021e	ext4: publish jinode after initialization ext4_inode_attach_jinode() publishes ei->jinode to concurrent users. It used to set ei->jinode before jbd2_journal_init_jbd_inode(), allowing a reader to observe a non-NULL jinode with i_vfs_inode still unset. The fast commit flush path can then pass this jinode to jbd2_wait_inode_data(), which dereferences i_vfs_inode->i_mapping and may crash. Below is the crash I observe: ``` BUG: unable to handle page fault for address: 000000010beb47f4 PGD 110e51067 P4D 110e51067 PUD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 4850 Comm: fc_fsync_bench_ Not tainted 6.18.0-00764-g795a690c06a5 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014 RIP: 0010:xas_find_marked+0x3d/0x2e0 Code: e0 03 48 83 f8 02 0f 84 f0 01 00 00 48 8b 47 08 48 89 c3 48 39 c6 0f 82 fd 01 00 00 48 85 c9 74 3d 48 83 f9 03 77 63 4c 8b 0f <49> 8b 71 08 48 c7 47 18 00 00 00 00 48 89 f1 83 e1 03 48 83 f9 02 RSP: 0018:ffffbbee806e7bf0 EFLAGS: 00010246 RAX: 000000000010beb4 RBX: 000000000010beb4 RCX: 0000000000000003 RDX: 0000000000000001 RSI: 0000002000300000 RDI: ffffbbee806e7c10 RBP: 0000000000000001 R08: 0000002000300000 R09: 000000010beb47ec R10: ffff9ea494590090 R11: 0000000000000000 R12: 0000002000300000 R13: ffffbbee806e7c90 R14: ffff9ea494513788 R15: ffffbbee806e7c88 FS: 00007fc2f9e3e6c0(0000) GS:ffff9ea6b1444000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000010beb47f4 CR3: 0000000119ac5000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> filemap_get_folios_tag+0x87/0x2a0 __filemap_fdatawait_range+0x5f/0xd0 ? srso_alias_return_thunk+0x5/0xfbef5 ? __schedule+0x3e7/0x10c0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? preempt_count_sub+0x5f/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? cap_safe_nice+0x37/0x70 ? srso_alias_return_thunk+0x5/0xfbef5 ? preempt_count_sub+0x5f/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 filemap_fdatawait_range_keep_errors+0x12/0x40 ext4_fc_commit+0x697/0x8b0 ? ext4_file_write_iter+0x64b/0x950 ? srso_alias_return_thunk+0x5/0xfbef5 ? preempt_count_sub+0x5f/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? vfs_write+0x356/0x480 ? srso_alias_return_thunk+0x5/0xfbef5 ? preempt_count_sub+0x5f/0x80 ext4_sync_file+0xf7/0x370 do_fsync+0x3b/0x80 ? syscall_trace_enter+0x108/0x1d0 __x64_sys_fdatasync+0x16/0x20 do_syscall_64+0x62/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e ... ``` Fix this by initializing the jbd2_inode first. Use smp_wmb() and WRITE_ONCE() to publish ei->jinode after initialization. Readers use READ_ONCE() to fetch the pointer. Fixes: `a361293f5f` ("jbd2: Fix oops in jbd2_journal_file_inode()") Cc: stable@vger.kernel.org Signed-off-by: Li Chen <me@linux.beauty> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260225082617.147957-1-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:32:07 -04:00
Yuto Ohnuki	356227096e	ext4: replace BUG_ON with proper error handling in ext4_read_inline_folio Replace BUG_ON() with proper error handling when inline data size exceeds PAGE_SIZE. This prevents kernel panic and allows the system to continue running while properly reporting the filesystem corruption. The error is logged via ext4_error_inode(), the buffer head is released to prevent memory leak, and -EFSCORRUPTED is returned to indicate filesystem corruption. Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Link: https://patch.msgid.link/20260223123345.14838-2-ytohnuki@amazon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:31:52 -04:00
Jan Kara	1308255bbf	ext4: fix fsync(2) for nojournal mode When inode metadata is changed, we sometimes just call ext4_mark_inode_dirty() to track modified metadata. This copies inode metadata into block buffer which is enough when we are journalling metadata. However when we are running in nojournal mode we currently fail to write the dirtied inode buffer during fsync(2) because the inode is not marked as dirty. Use explicit ext4_write_inode() call to make sure the inode table buffer is written to the disk. This is a band aid solution but proper solution requires a much larger rewrite including changes in metadata bh tracking infrastructure. Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com> Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/ CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260216164848.3074-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:31:43 -04:00
Jan Kara	bd060afa7c	ext4: make recently_deleted() properly work with lazy itable initialization recently_deleted() checks whether inode has been used in the near past. However this can give false positive result when inode table is not initialized yet and we are in fact comparing to random garbage (or stale itable block of a filesystem before mkfs). Ultimately this results in uninitialized inodes being skipped during inode allocation and possibly they are never initialized and thus e2fsck complains. Verify if the inode has been initialized before checking for dtime. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260216164848.3074-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:30:37 -04:00
Simon Weber	b1d682f199	ext4: fix journal credit check when setting fscrypt context Fix an issue arising when ext4 features has_journal, ea_inode, and encrypt are activated simultaneously, leading to ENOSPC when creating an encrypted file. Fix by passing XATTR_CREATE flag to xattr_set_handle function if a handle is specified, i.e., when the function is called in the control flow of creating a new inode. This aligns the number of jbd2 credits set_handle checks for with the number allocated for creating a new inode. ext4_set_context must not be called with a non-null handle (fs_data) if fscrypt context xattr is not guaranteed to not exist yet. The only other usage of this function currently is when handling the ioctl FS_IOC_SET_ENCRYPTION_POLICY, which calls it with fs_data=NULL. Fixes: `c1a5d5f6ab` ("ext4: improve journal credit handling in set xattr paths") Co-developed-by: Anthony Durrer <anthonydev@fastmail.com> Signed-off-by: Anthony Durrer <anthonydev@fastmail.com> Signed-off-by: Simon Weber <simon.weber.39@gmail.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20260207100148.724275-4-simon.weber.39@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:30:25 -04:00
Deepanshu Kartikey	ed9356a30e	ext4: convert inline data to extents when truncate exceeds inline size Add a check in ext4_setattr() to convert files from inline data storage to extent-based storage when truncate() grows the file size beyond the inline capacity. This prevents the filesystem from entering an inconsistent state where the inline data flag is set but the file size exceeds what can be stored inline. Without this fix, the following sequence causes a kernel BUG_ON(): 1. Mount filesystem with inode that has inline flag set and small size 2. truncate(file, 50MB) - grows size but inline flag remains set 3. sendfile() attempts to write data 4. ext4_write_inline_data() hits BUG_ON(write_size > inline_capacity) The crash occurs because ext4_write_inline_data() expects inline storage to accommodate the write, but the actual inline capacity (~60 bytes for i_block + ~96 bytes for xattrs) is far smaller than the file size and write request. The fix checks if the new size from setattr exceeds the inode's actual inline capacity (EXT4_I(inode)->i_inline_size) and converts the file to extent-based storage before proceeding with the size change. This addresses the root cause by ensuring the inline data flag and file size remain consistent during truncate operations. Reported-by: syzbot+7de5fe447862fc37576f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7de5fe447862fc37576f Tested-by: syzbot+7de5fe447862fc37576f@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Link: https://patch.msgid.link/20260207043607.1175976-1-kartikey406@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:29:54 -04:00
Jan Kara	f4a2b42e78	ext4: fix stale xarray tags after writeback There are cases where ext4_bio_write_page() gets called for a page which has no buffers to submit. This happens e.g. when the part of the file is actually a hole, when we cannot allocate blocks due to being called from jbd2, or in data=journal mode when checkpointing writes the buffers earlier. In these cases we just return from ext4_bio_write_page() however if the page didn't need redirtying, we will leave stale DIRTY and/or TOWRITE tags in xarray because those get cleared only in __folio_start_writeback(). As a result we can leave these tags set in mappings even after a final sync on filesystem that's getting remounted read-only or that's being frozen. Various assertions can then get upset when writeback is started on such filesystems (Gerald reported assertion in ext4_journal_check_start() firing). Fix the problem by cycling the page through writeback state even if we decide nothing needs to be written for it so that xarray tags get properly updated. This is slightly silly (we could update the xarray tags directly) but I don't think a special helper messing with xarray tags is really worth it in this relatively rare corner case. Reported-by: Gerald Yang <gerald.yang@canonical.com> Link: https://lore.kernel.org/all/20260128074515.2028982-1-gerald.yang@canonical.com Fixes: `dff4ac75ee` ("ext4: move keep_towrite handling to ext4_bio_write_page()") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260205092223.21287-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:29:39 -04:00
Zhang Yi	84e21e3fb8	ext4: do not check fast symlink during orphan recovery Commit '5f920d5d6083 ("ext4: verify fast symlink length")' causes the generic/475 test to fail during orphan cleanup of zero-length symlinks. generic/475 84s ... _check_generic_filesystem: filesystem on /dev/vde is inconsistent The fsck reports are provided below: Deleted inode 9686 has zero dtime. Deleted inode 158230 has zero dtime. ... Inode bitmap differences: -9686 -158230 Orphan file (inode 12) block 13 is not clean. Failed to initialize orphan file. In ext4_symlink(), a newly created symlink can be added to the orphan list due to ENOSPC. Its data has not been initialized, and its size is zero. Therefore, we need to disregard the length check of the symbolic link when cleaning up orphan inodes. Instead, we should ensure that the nlink count is zero. Fixes: `5f920d5d60` ("ext4: verify fast symlink length") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260131091156.1733648-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2026-03-27 23:14:25 -04:00
Theodore Ts'o	82c4f23d27	Update MAINTAINERS file to add reviewers for ext4 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>	2026-03-27 23:14:20 -04:00
Linus Torvalds	f338e77383	Linux 7.0-rc4 v7.0-rc4	2026-03-15 13:52:05 -07:00
Linus Torvalds	5c2fe8d11a	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "The one core change is a re-roll of the tag allocation fix from the last pull request that uses the correct goto to unroll all the allocations. The remianing fixes are all small ones in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: hisi_sas: Fix NULL pointer exception during user_scan() scsi: qla2xxx: Completely fix fcport double free scsi: ufs: core: Fix SError in ufshcd_rtc_work() during UFS suspend scsi: core: Fix error handling for scsi_alloc_sdev()	2026-03-15 13:15:39 -07:00
Linus Torvalds	d9bf296c39	Merge tag 'probes-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Avoid crash when rmmod/insmod after ftrace killed This fixes a kernel crash caused by kprobes on the symbol in a module which is unloaded after ftrace_kill() is called. - Remove unneeded warnings from __arm_kprobe_ftrace() Remove unneeded WARN messages which can be triggered if the kprobe is using ftrace and it fails to enable the ftrace. Since kprobes correctly handle such failure, we don't need to warn it. * tag 'probes-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() kprobes: avoid crash when rmmod/insmod after ftrace killed	2026-03-15 13:08:05 -07:00
Linus Torvalds	62cda74c79	Merge tag 'bootconfig-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig fixes from Masami Hiramatsu: - fix off-by-one in xbc_verify_tree() unclosed brace error. This fixes a wrong error place in unclosed brace error message - check bounds before writing in __xbc_open_brace(). This fixes to check the array index before setting array, so that the bootconfig can support 16th-depth nested brace correctly - fix snprintf truncation check in xbc_node_compose_key_after(). This fixes to handle the return value of snprintf() correctly in case of the return value == size - Add bootconfig tests about braces Add test cases for checking error position about unclosed brace and ensuring supporting 16th depth nested braces correctly * tag 'bootconfig-fixes-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: bootconfig: Add bootconfig tests about braces lib/bootconfig: fix snprintf truncation check in xbc_node_compose_key_after() lib/bootconfig: check bounds before writing in __xbc_open_brace() lib/bootconfig: fix off-by-one in xbc_verify_tree() unclosed brace error	2026-03-15 12:50:05 -07:00
Linus Torvalds	11e8c7e947	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk after the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...	2026-03-15 12:22:10 -07:00
Linus Torvalds	4f3df2e5ea	Merge tag 'powerpc-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix KUAP warning in VMX usercopy path - Fix lockdep warning during PCI enumeration - Fix to move CMA reservations to arch_mm_preinit - Fix to check current->mm is alive before getting user callchain Thanks to Aboorva Devarajan, Christophe Leroy (CS GROUP), Dan Horák, Nicolin Chen, Nilay Shroff, Qiao Zhao, Ritesh Harjani (IBM), Saket Kumar Bhaskar, Sayali Patil, Shrikanth Hegde, Venkat Rao Bagalkote, and Viktor Malik. * tag 'powerpc-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/iommu: fix lockdep warning during PCI enumeration powerpc/selftests/copyloops: extend selftest to exercise __copy_tofrom_user_power7_vmx powerpc: fix KUAP warning in VMX usercopy path powerpc, perf: Check that current->mm is alive before getting user callchain powerpc/mem: Move CMA reservations to arch_mm_preinit	2026-03-15 11:36:11 -07:00
Linus Torvalds	13af67f599	Merge tag 'x86-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Work around S2RAM hang if the firmware unexpectedly re-enables the x2apic hardware while it was disabled by the kernel. Force-disable it again and issue a warning into the syslog" * tag 'x86-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Disable x2apic on resume if the kernel expects so	2026-03-15 11:26:36 -07:00
Linus Torvalds	164cb546e9	Merge tag 'timers-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix function tracer recursion bug by marking jiffies_64_to_clock_t() notrace" * tag 'timers-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time/jiffies: Mark jiffies_64_to_clock_t() notrace	2026-03-15 11:14:09 -07:00
Linus Torvalds	63724e9519	Merge tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "More MM-CID fixes, mostly fixing hangs/races: - Fix CID hangs due to a race between concurrent forks - Fix vfork()/CLONE_VM MMCID bug causing hangs - Remove pointless preemption guard - Fix CID task list walk performance regression on large systems by removing the known-flaky and slow counting logic using for_each_process_thread() in mm_cid_fixup_tasks_to_cpus(), and implementing a simple sched_mm_cid::node list instead" tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Avoid full tasklist walks sched/mmcid: Remove pointless preempt guard sched/mmcid: Handle vfork()/CLONE_VM correctly sched/mmcid: Prevent CID stalls due to concurrent forks	2026-03-15 10:49:47 -07:00
Linus Torvalds	9745031130	Merge tag 'objtool-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: - Fix cross-build bug by using HOSTCFLAGS for HAVE_XXHASH test - Fix klp bug by fixing detection of corrupt static branch/call entries - Handle unsupported pr_debug() usage more gracefully - Fix hypothetical klp bug by avoiding NULL pointer dereference when printing code symbol name - Fix data alignment bug in elf_add_data() causing mangled strings - Fix confusing ERROR_INSN() error message - Handle unexpected Clang RSP musical chairs causing false positive warnings - Fix another objtool stack overflow in validate_branch() * tag 'objtool-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix another stack overflow in validate_branch() objtool: Handle Clang RSP musical chairs objtool: Fix ERROR_INSN() error message objtool: Fix data alignment in elf_add_data() objtool: Use HOSTCFLAGS for HAVE_XXHASH test objtool/klp: Avoid NULL pointer dereference when printing code symbol name objtool/klp: Disable unsupported pr_debug() usage objtool/klp: Fix detection of corrupt static branch/call entries	2026-03-15 10:36:01 -07:00
Linus Torvalds	be2e3750ce	Merge tag 'irq-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Two fixes for the riscv-aplic irqchip driver: - Fix probing dependency bug on probing failure - Fix double register_syscore() bug" * tag 'irq-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-aplic: Register syscore operations only once irqchip/riscv-aplic: Do not clear ACPI dependencies on probe failure	2026-03-15 10:32:57 -07:00
Linus Torvalds	9a48d4a130	Merge tag 'i3c/fixes-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c fixes from Alexandre Belloni: "This introduces the I3C_OR_I2C symbol which is not a fix per se but is affecting multiple subsystems so it is included to ease synchronization. Apart from that, Adrian is mostly fixing the mipi-i3c-hci driver DMA handling, and I took the opportunity to add two fixes for the dw-i3c driver. Subsystem: - simplify combined i3c/i2c dependencies Drivers: - dw: handle 2C properly, fix possible race condition - mipi-i3c-hci: many DMA related fixes" * tag 'i3c/fixes-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: dw-i3c-master: Set SIR_REJECT in DAT on device attach and reattach i3c: master: dw-i3c: Fix missing of_node for virtual I2C adapter i3c: mipi-i3c-hci: Fallback to software reset when bus disable fails i3c: mipi-i3c-hci: Fix handling of shared IRQs during early initialization i3c: mipi-i3c-hci: Fix race in DMA error handling in interrupt context i3c: mipi-i3c-hci: Consolidate common xfer processing logic i3c: mipi-i3c-hci: Restart DMA ring correctly after dequeue abort i3c: mipi-i3c-hci: Add missing TID field to no-op command descriptor i3c: mipi-i3c-hci: Correct RING_CTRL_ABORT handling in DMA dequeue i3c: mipi-i3c-hci: Fix race between DMA ring dequeue and interrupt handler i3c: mipi-i3c-hci: Fix race in DMA ring dequeue i3c: mipi-i3c-hci: Fix race in DMA ring enqueue for parallel xfers i3c: mipi-i3c-hci: Consolidate spinlocks i3c: mipi-i3c-hci: Factor out DMA mapping from queuing path i3c: mipi-i3c-hci: Fix Hot-Join NACK i3c: mipi-i3c-hci: Use ETIMEDOUT instead of ETIME for timeout errors i3c: simplify combined i3c/i2c dependencies	2026-03-14 16:25:10 -07:00
Linus Torvalds	f26de90c68	Merge tag 'i2c-for-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "Designware DT binding maintainer update" * tag 'i2c-for-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: dt-bindings: i2c: dw: Update maintainer	2026-03-14 16:15:49 -07:00

1 2 3 4 5 ...

1428449 Commits