linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-24 06:13:36 -04:00

Author	SHA1	Message	Date
Christoph Hellwig	7d460636b6	ntfs3: stop using write_cache_pages Stop using the obsolete write_cache_pages and use writeback_iter directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:41 +02:00
Moon Hee Lee	0dc7117da8	fs/ntfs3: reject index allocation if $BITMAP is empty but blocks exist Index allocation requires at least one bit in the $BITMAP attribute to track usage of index entries. If the bitmap is empty while index blocks are already present, this reflects on-disk corruption. syzbot triggered this condition using a malformed NTFS image. During a rename() operation involving a long filename (which spans multiple index entries), the empty bitmap allowed the name to be added without valid tracking. Subsequent deletion of the original entry failed with -ENOENT, due to unexpected index state. Reject such cases by verifying that the bitmap is not empty when index blocks exist. Reported-by: syzbot+b0373017f711c06ada64@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b0373017f711c06ada64 Fixes: `d99208b919` ("fs/ntfs3: cancle set bad inode after removing name fails") Tested-by: syzbot+b0373017f711c06ada64@syzkaller.appspotmail.com Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:41 +02:00
Haoxiang Li	d68318471a	fs/ntfs3: Fix a resource leak bug in wnd_extend() Add put_bh() to decrease the refcount of 'bh' after the job is finished, preventing a resource leak. Fixes: `3f3b442b5a` ("fs/ntfs3: Add bitmap") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:40 +02:00
Vitaly Grigoryev	736fc7bf5f	fs: ntfs3: Fix integer overflow in run_unpack() The MFT record relative to the file being opened contains its runlist, an array containing information about the file's location on the physical disk. Analysis of all Call Stack paths showed that the values of the runlist array, from which LCNs are calculated, are not validated before run_unpack function. The run_unpack function decodes the compressed runlist data format from MFT attributes (for example, $DATA), converting them into a runs_tree structure, which describes the mapping of virtual clusters (VCN) to logical clusters (LCN). The NTFS3 subsystem also has a shortcut for deleting files from MFT records - in this case, the RUN_DEALLOCATE command is sent to the run_unpack input, and the function logic provides that all data transferred to the runlist about file or directory is deleted without creating a runs_tree structure. Substituting the runlist in the $DATA attribute of the MFT record for an arbitrary file can lead either to access to arbitrary data on the disk bypassing access checks to them (since the inode access check occurs above) or to destruction of arbitrary data on the disk. Add overflow check for addition operation. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `4342306f0f` ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Vitaly Grigoryev <Vitaly.Grigoryev@kaspersky.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:39 +02:00
Tetsuo Handa	4e8011ffec	ntfs3: pretend $Extend records as regular files Since commit `af153bb63a` ("vfs: catch invalid modes in may_open()") requires any inode be one of S_IFDIR/S_IFLNK/S_IFREG/S_IFCHR/S_IFBLK/ S_IFIFO/S_IFSOCK type, use S_IFREG for $Extend records. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:38 +02:00
Ethan Ferguson	21dc07ac9c	ntfs3: add FS_IOC_SETFSLABEL ioctl Add support for the FS_IOC_SETFSLABEL ioctl. Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:38 +02:00
Ethan Ferguson	e4dff97009	ntfs3: add FS_IOC_GETFSLABEL ioctl Add support for the FS_IOC_GETFSLABEL ioctl. Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:37 +02:00
Ethan Ferguson	80ff677b55	ntfs3: transition magic number to shared constant Use the common FSLABEL_MAX constant instead of a hardcoded magic constant of 256. Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2025-09-10 11:01:18 +02:00
Yuezhang Mo	181993bb0d	erofs: fix runtime warning on truncate_folio_batch_exceptionals() Commit 0e2f80afcfa6("fs/dax: ensure all pages are idle prior to filesystem unmount") introduced the WARN_ON_ONCE to capture whether the filesystem has removed all DAX entries or not and applied the fix to xfs and ext4. Apply the missed fix on erofs to fix the runtime warning: [ 5.266254] ------------[ cut here ]------------ [ 5.266274] WARNING: CPU: 6 PID: 3109 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xff/0x260 [ 5.266294] Modules linked in: [ 5.266999] CPU: 6 UID: 0 PID: 3109 Comm: umount Tainted: G S 6.16.0+ #6 PREEMPT(voluntary) [ 5.267012] Tainted: [S]=CPU_OUT_OF_SPEC [ 5.267017] Hardware name: Dell Inc. OptiPlex 5000/05WXFV, BIOS 1.5.1 08/24/2022 [ 5.267024] RIP: 0010:truncate_folio_batch_exceptionals+0xff/0x260 [ 5.267076] Code: 00 00 41 39 df 7f 11 eb 78 83 c3 01 49 83 c4 08 41 39 df 74 6c 48 63 f3 48 83 fe 1f 0f 83 3c 01 00 00 43 f6 44 26 08 01 74 df <0f> 0b 4a 8b 34 22 4c 89 ef 48 89 55 90 e8 ff 54 1f 00 48 8b 55 90 [ 5.267083] RSP: 0018:ffffc900013f36c8 EFLAGS: 00010202 [ 5.267095] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 5.267101] RDX: ffffc900013f3790 RSI: 0000000000000000 RDI: ffff8882a1407898 [ 5.267108] RBP: ffffc900013f3740 R08: 0000000000000000 R09: 0000000000000000 [ 5.267113] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 5.267119] R13: ffff8882a1407ab8 R14: ffffc900013f3888 R15: 0000000000000001 [ 5.267125] FS: 00007aaa8b437800(0000) GS:ffff88850025b000(0000) knlGS:0000000000000000 [ 5.267132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.267138] CR2: 00007aaa8b3aac10 CR3: 000000024f764000 CR4: 0000000000f52ef0 [ 5.267144] PKRU: 55555554 [ 5.267150] Call Trace: [ 5.267154] <TASK> [ 5.267181] truncate_inode_pages_range+0x118/0x5e0 [ 5.267193] ? save_trace+0x54/0x390 [ 5.267296] truncate_inode_pages_final+0x43/0x60 [ 5.267309] evict+0x2a4/0x2c0 [ 5.267339] dispose_list+0x39/0x80 [ 5.267352] evict_inodes+0x150/0x1b0 [ 5.267376] generic_shutdown_super+0x41/0x180 [ 5.267390] kill_block_super+0x1b/0x50 [ 5.267402] erofs_kill_sb+0x81/0x90 [erofs] [ 5.267436] deactivate_locked_super+0x32/0xb0 [ 5.267450] deactivate_super+0x46/0x60 [ 5.267460] cleanup_mnt+0xc3/0x170 [ 5.267475] __cleanup_mnt+0x12/0x20 [ 5.267485] task_work_run+0x5d/0xb0 [ 5.267499] exit_to_user_mode_loop+0x144/0x170 [ 5.267512] do_syscall_64+0x2b9/0x7c0 [ 5.267523] ? __lock_acquire+0x665/0x2ce0 [ 5.267535] ? __lock_acquire+0x665/0x2ce0 [ 5.267560] ? lock_acquire+0xcd/0x300 [ 5.267573] ? find_held_lock+0x31/0x90 [ 5.267582] ? mntput_no_expire+0x97/0x4e0 [ 5.267606] ? mntput_no_expire+0xa1/0x4e0 [ 5.267625] ? mntput+0x24/0x50 [ 5.267634] ? path_put+0x1e/0x30 [ 5.267647] ? do_faccessat+0x120/0x2f0 [ 5.267677] ? do_syscall_64+0x1a2/0x7c0 [ 5.267686] ? from_kgid_munged+0x17/0x30 [ 5.267703] ? from_kuid_munged+0x13/0x30 [ 5.267711] ? __do_sys_getuid+0x3d/0x50 [ 5.267724] ? do_syscall_64+0x1a2/0x7c0 [ 5.267732] ? irqentry_exit+0x77/0xb0 [ 5.267743] ? clear_bhb_loop+0x30/0x80 [ 5.267752] ? clear_bhb_loop+0x30/0x80 [ 5.267765] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.267772] RIP: 0033:0x7aaa8b32a9fb [ 5.267781] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8 [ 5.267787] RSP: 002b:00007ffd7c4c9468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 5.267796] RAX: 0000000000000000 RBX: 00005a61592a8b00 RCX: 00007aaa8b32a9fb [ 5.267802] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005a61592b2080 [ 5.267806] RBP: 00007ffd7c4c9540 R08: 00007aaa8b403b20 R09: 0000000000000020 [ 5.267812] R10: 0000000000000001 R11: 0000000000000246 R12: 00005a61592a8c00 [ 5.267817] R13: 0000000000000000 R14: 00005a61592b2080 R15: 00005a61592a8f10 [ 5.267849] </TASK> [ 5.267854] irq event stamp: 4721 [ 5.267859] hardirqs last enabled at (4727): [<ffffffff814abf50>] __up_console_sem+0x90/0xa0 [ 5.267873] hardirqs last disabled at (4732): [<ffffffff814abf35>] __up_console_sem+0x75/0xa0 [ 5.267884] softirqs last enabled at (3044): [<ffffffff8132adb3>] kernel_fpu_end+0x53/0x70 [ 5.267895] softirqs last disabled at (3042): [<ffffffff8132b5f4>] kernel_fpu_begin_mask+0xc4/0x120 [ 5.267905] ---[ end trace 0000000000000000 ]--- Fixes: `bde708f1a6` ("fs/dax: always remove DAX page-cache entries when breaking layouts") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Friendy Su <friendy.su@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-09-10 14:11:06 +08:00
Paulo Alcantara	c5ea306558	smb: client: fix data loss due to broken rename(2) Rename of open files in SMB2+ has been broken for a very long time, resulting in data loss as the CIFS client would fail the rename(2) call with -ENOENT and then removing the target file. Fix this by implementing ->rename_pending_delete() for SMB2+, which will rename busy files to random filenames (e.g. silly rename) during unlink(2) or rename(2), and then marking them to delete-on-close. Besides, introduce a FIND_WR_NO_PENDING_DELETE flag to prevent open(2) from reusing open handles that had been marked as delete pending. Handle it in cifs_get_readable_path() as well. Reported-by: Jean-Baptiste Denis <jbdenis@pasteur.fr> Closes: https://marc.info/?i=16aeb380-30d4-4551-9134-4e7d1dc833c0@pasteur.fr Reviewed-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: Frank Sorenson <sorenson@redhat.com> Cc: Olga Kornievskaia <okorniev@redhat.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Scott Mayhew <smayhew@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-09 18:39:58 -05:00
Paulo Alcantara	90f7c100d2	smb: client: fix compound alignment with encryption The encryption layer can't handle the padding iovs, so flatten the compound request into a single buffer with required padding to prevent the server from dropping the connection when finding unaligned compound requests. Fixes: `bc925c1216` ("smb: client: improve compound padding in encryption") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-09 17:30:11 -05:00
Jaegeuk Kim	a33be64b98	f2fs: fix wrong layout information on 16KB page This patch fixes to support different block size. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 19:33:43 +00:00
Kang Chen	bea3e1d446	hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() BUG: KASAN: slab-out-of-bounds in hfsplus_uni2asc+0xa71/0xb90 fs/hfsplus/unicode.c:186 Read of size 2 at addr ffff8880289ef218 by task syz.6.248/14290 CPU: 0 UID: 0 PID: 14290 Comm: syz.6.248 Not tainted 6.16.4 #1 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x5f0 mm/kasan/report.c:482 kasan_report+0xca/0x100 mm/kasan/report.c:595 hfsplus_uni2asc+0xa71/0xb90 fs/hfsplus/unicode.c:186 hfsplus_listxattr+0x5b6/0xbd0 fs/hfsplus/xattr.c:738 vfs_listxattr+0xbe/0x140 fs/xattr.c:493 listxattr+0xee/0x190 fs/xattr.c:924 filename_listxattr fs/xattr.c:958 [inline] path_listxattrat+0x143/0x360 fs/xattr.c:988 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcb/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe0e9fae16d Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fe0eae67f98 EFLAGS: 00000246 ORIG_RAX: 00000000000000c3 RAX: ffffffffffffffda RBX: 00007fe0ea205fa0 RCX: 00007fe0e9fae16d RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000000 RBP: 00007fe0ea0480f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fe0ea206038 R14: 00007fe0ea205fa0 R15: 00007fe0eae48000 </TASK> Allocated by task 14290: kasan_save_stack+0x24/0x50 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __do_kmalloc_node mm/slub.c:4333 [inline] __kmalloc_noprof+0x219/0x540 mm/slub.c:4345 kmalloc_noprof include/linux/slab.h:909 [inline] hfsplus_find_init+0x95/0x1f0 fs/hfsplus/bfind.c:21 hfsplus_listxattr+0x331/0xbd0 fs/hfsplus/xattr.c:697 vfs_listxattr+0xbe/0x140 fs/xattr.c:493 listxattr+0xee/0x190 fs/xattr.c:924 filename_listxattr fs/xattr.c:958 [inline] path_listxattrat+0x143/0x360 fs/xattr.c:988 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcb/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f When hfsplus_uni2asc is called from hfsplus_listxattr, it actually passes in a struct hfsplus_attr_unistr. The size of the corresponding structure is different from that of hfsplus_unistr, so the previous fix (`94458781ae`) is insufficient. The pointer on the unicode buffer is still going beyond the allocated memory. This patch introduces two warpper functions hfsplus_uni2asc_xattr_str and hfsplus_uni2asc_str to process two unicode buffers, struct hfsplus_attr_unistr and struct hfsplus_unistr* respectively. When ustrlen value is bigger than the allocated memory size, the ustrlen value is limited to an safe size. Fixes: `94458781ae` ("hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()") Signed-off-by: Kang Chen <k.chen@smail.nju.edu.cn> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20250909031316.1647094-1-k.chen@smail.nju.edu.cn Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>	2025-09-09 11:44:38 -07:00
Keith Busch	7eac331869	iomap: simplify direct io validity check The block layer checks all the segments for validity later, so no need for an early check. Just reduce it to a simple position and total length check, and defer the more invasive segment checks to the block layer. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-09 10:27:01 -06:00
Keith Busch	743bf2e0c4	block: add size alignment to bio_iov_iter_get_pages The block layer tries to align bio vectors to the block device's logical block size. Some cases don't have a block device, or we may need to align to something larger, which we can't derive it from the queue limits. Have the caller specify what they want, or allow any length alignment if nothing was specified. Since the most common use case relies on the block device's limits, a helper function is provided. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-09 10:27:01 -06:00
Christoph Hellwig	d86eaa0f3c	block: remove the bi_inline_vecs variable sized array from struct bio Bios are embedded into other structures, and at least spare is unhappy about embedding structures with variable sized arrays. There's no real need to the array anyway, we can replace it with a helper pointing to the memory just behind the bio, and with the previous cleanups there is very few site doing anything special with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-09 07:31:59 -06:00
Christoph Hellwig	70a6f71b1a	block: add a bio_init_inline helper Just a simpler wrapper around bio_init for callers that want to initialize a bio with inline bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-09 07:31:59 -06:00
Max Kellermann	249e0a47cd	ceph: fix crash after fscrypt_encrypt_pagecache_blocks() error The function move_dirty_folio_in_page_array() was created by commit `ce80b76dd3` ("ceph: introduce ceph_process_folio_batch() method") by moving code from ceph_writepages_start() to this function. This new function is supposed to return an error code which is checked by the caller (now ceph_process_folio_batch()), and on error, the caller invokes redirty_page_for_writepage() and then breaks from the loop. However, the refactoring commit has gone wrong, and it by accident, it always returns 0 (= success) because it first NULLs the pointer and then returns PTR_ERR(NULL) which is always 0. This means errors are silently ignored, leaving NULL entries in the page array, which may later crash the kernel. The simple solution is to call PTR_ERR() before clearing the pointer. Cc: stable@vger.kernel.org Fixes: `ce80b76dd3` ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/ Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2025-09-09 12:57:03 +02:00
Max Kellermann	cce7c15faa	ceph: always call ceph_shift_unused_folios_left() The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array. However, since commit `ce80b76dd3` ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time. Later, commit `1551ec61dc` ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken): - if ceph_process_folio_batch() fails, shifting never happens - if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens - if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way: BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ceph_writepages_start+0xeb9/0x1410 The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`. (Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com for a discussion of this detail.) My suggestion is to move the ceph_shift_unused_folios_left() to right after ceph_process_folio_batch() to ensure it always gets called to fix up the illegal folio_batch state. Cc: stable@vger.kernel.org Fixes: `ce80b76dd3` ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/ Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2025-09-09 12:57:02 +02:00
Alex Markuze	bec324f33d	ceph: fix race condition where r_parent becomes stale before sending message When the parent directory's i_rwsem is not locked, req->r_parent may become stale due to concurrent operations (e.g. rename) between dentry lookup and message creation. Validate that r_parent matches the encoded parent inode and update to the correct inode if a mismatch is detected. [ idryomov: folded a follow-up fix from Alex to drop extra reference from ceph_get_reply_dir() in ceph_fill_trace(): ceph_get_reply_dir() may return a different, referenced inode when r_parent is stale and the parent directory lock is not held. ceph_fill_trace() used that inode but failed to drop the reference when it differed from req->r_parent, leaking an inode reference. Keep the directory inode in a local variable and iput() it at function end if it does not match req->r_parent. ] Cc: stable@vger.kernel.org Signed-off-by: Alex Markuze <amarkuze@redhat.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2025-09-09 12:57:02 +02:00
Alex Markuze	15f519e9f8	ceph: fix race condition validating r_parent before applying state Add validation to ensure the cached parent directory inode matches the directory info in MDS replies. This prevents client-side race conditions where concurrent operations (e.g. rename) cause r_parent to become stale between request initiation and reply processing, which could lead to applying state changes to incorrect directory inodes. [ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to move CEPH_CAP_PIN reference when r_parent is updated: When the parent directory lock is not held, req->r_parent can become stale and is updated to point to the correct inode. However, the associated CEPH_CAP_PIN reference was not being adjusted. The CEPH_CAP_PIN is a reference on an inode that is tracked for accounting purposes. Moving this pin is important to keep the accounting balanced. When the pin was not moved from the old parent to the new one, it created two problems: The reference on the old, stale parent was never released, causing a reference leak. A reference for the new parent was never acquired, creating the risk of a reference underflow later in ceph_mdsc_release_request(). This patch corrects the logic by releasing the pin from the old parent and acquiring it for the new parent when r_parent is switched. This ensures reference accounting stays balanced. ] Cc: stable@vger.kernel.org Signed-off-by: Alex Markuze <amarkuze@redhat.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2025-09-09 12:57:02 +02:00
Reinette Chatre	d2e1b84c51	fs/resctrl: Eliminate false positive lockdep warning when reading SNC counters Running resctrl_tests on an SNC-2 system with lockdep debugging enabled triggers several warnings with following trace: WARNING: CPU: 0 PID: 1914 at kernel/cpu.c:528 lockdep_assert_cpus_held ... Call Trace: __mon_event_count ? __lock_acquire ? __pfx___mon_event_count mon_event_count ? __pfx_smp_mon_event_count smp_mon_event_count smp_call_on_cpu_callback get_cpu_cacheinfo_level() called from __mon_event_count() requires CPU hotplug lock to be held. The hotplug lock is indeed held during this time, as confirmed by the lockdep_assert_cpus_held() within mon_event_read() that calls mon_event_count() via IPI, but the lockdep tracking is not able to follow the IPI. Fresh CPU cache information via get_cpu_cacheinfo_level() from __mon_event_count() was added to support the fix for the issue where resctrl inappropriately maintained links to L3 cache information that will be stale in the case when the associated CPU goes offline. Keep the cacheinfo ID in struct rdt_mon_domain to ensure that resctrl does not maintain stale cache information while CPUs can go offline. Return to using a pointer to the L3 cache information (struct cacheinfo) in struct rmid_read, rmid_read::ci. Initialize rmid_read::ci before the IPI where it is used. CPU hotplug lock is held across rmid_read::ci initialization and use to ensure that it points to accurate cache information. Fixes: `594902c986` ("x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-09-09 12:43:36 +02:00
wangzijie	0ce9398aa0	proc: fix type confusion in pde_set_flags() Commit `2ce3d282bd` ("proc: fix missing pde_set_flags() for net proc files") missed a key part in the definition of proc_dir_entry: union { const struct proc_ops proc_ops; const struct file_operations proc_dir_ops; }; So dereference of ->proc_ops assumes it is a proc_ops structure results in type confusion and make NULL check for 'proc_ops' not work for proc dir. Add !S_ISDIR(dp->mode) test before calling pde_set_flags() to fix it. Link: https://lkml.kernel.org/r/20250904135715.3972782-1-wangzijie1@honor.com Fixes: `2ce3d282bd` ("proc: fix missing pde_set_flags() for net proc files") Signed-off-by: wangzijie <wangzijie1@honor.com> Reported-by: Brad Spengler <spender@grsecurity.net> Closes: https://lore.kernel.org/all/20250903065758.3678537-1-wangzijie1@honor.com/ Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-08 23:45:12 -07:00
Mark Tinguely	04100f775c	ocfs2: fix recursive semaphore deadlock in fiemap call syzbot detected a OCFS2 hang due to a recursive semaphore on a FS_IOC_FIEMAP of the extent list on a specially crafted mmap file. context_switch kernel/sched/core.c:5357 [inline] __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961 __schedule_loop kernel/sched/core.c:7043 [inline] schedule+0x165/0x360 kernel/sched/core.c:7058 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115 rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185 __down_write_common kernel/locking/rwsem.c:1317 [inline] __down_write kernel/locking/rwsem.c:1326 [inline] down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591 ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142 do_page_mkwrite+0x14d/0x310 mm/memory.c:3361 wp_page_shared mm/memory.c:3762 [inline] do_wp_page+0x268d/0x5800 mm/memory.c:3981 handle_pte_fault mm/memory.c:6068 [inline] __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195 handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364 do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387 handle_page_fault arch/x86/mm/fault.c:1476 [inline] exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline] RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline] RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline] RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26 Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89 f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f 1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41 RSP: 0018:ffffc9000403f950 EFLAGS: 00050256 RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038 RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060 RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42 R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098 R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060 copy_to_user include/linux/uaccess.h:225 [inline] fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145 ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532 __do_sys_ioctl fs/ioctl.c:596 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f13850fd9 RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9 RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004 RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0 R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent() to read the extent list of this running mmap executable. The user supplied buffer to hold the fiemap information page faults calling ocfs2_page_mkwrite() which will take a write lock (since v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive semaphore will hold filesystem locks and causes a hang of the fileystem. The ip_alloc_sem protects the inode extent list and size. Release the read semphore before calling fiemap_fill_next_extent() in ocfs2_fiemap() and ocfs2_fiemap_inline(). This does an unnecessary semaphore lock/unlock on the last extent but simplifies the error path. Link: https://lkml.kernel.org/r/61d1a62b-2631-4f12-81e2-cd689914360b@oracle.com Fixes: `00dc417fa3` ("ocfs2: fiemap support") Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com> Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-08 23:45:11 -07:00
Chao Yu	d0236266cb	f2fs: clean up error handing of f2fs_submit_page_read() Below two functions should never fail, clean up error handling in their callers: 1) f2fs_grab_read_bio() in f2fs_submit_page_read() 2) bio_add_folio() in f2fs_submit_page_read() Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 03:26:41 +00:00
Chao Yu	2f84e99d61	f2fs: avoid unnecessary folio_clear_uptodate() for cleanup In error path of __get_node_folio(), if the folio is not uptodate, let's avoid unnecessary folio_clear_uptodate() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 03:26:41 +00:00
Jaegeuk Kim	44749759d5	f2fs: merge FUA command with the existing writes FUA writes can be merged to the existing write IOs. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 03:26:36 +00:00
Jonathan Curley	dd2fa82473	NFSv4/flexfiles: Fix layout merge mirror check. Typo in ff_lseg_match_mirrors makes the diff ineffective. This results in merge happening all the time. Merge happening all the time is problematic because it marks lsegs invalid. Marking lsegs invalid causes all outstanding IO to get restarted with EAGAIN and connections to get closed. Closing connections constantly triggers race conditions in the RDMA implementation... Fixes: `660d1eb223` ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match") Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-08 14:37:55 -04:00
Yikang Yue	32058c38d3	fs/hpfs: Fix error code for new_inode() failure in mkdir/create/mknod/symlink The function call new_inode() is a primitive for allocating an inode in memory, rather than planning disk space for it. Therefore, -ENOMEM should be returned as the error code rather than -ENOSPC. To be specific, new_inode()'s call path looks like this: new_inode new_inode_pseudo alloc_inode ops->alloc_inode (hpfs_alloc_inode) alloc_inode_sb kmem_cache_alloc_lru Therefore, the failure of new_inode() indicates a memory presure issue (-ENOMEM), not a lack of disk space. However, the current implementation of hpfs_mkdir/create/mknod/symlink incorrectly returns -ENOSPC when new_inode() fails. This patch fix this by set err to -ENOMEM before the goto statement. BTW, we also noticed that other nested calls within these four functions, like hpfs_alloc_f/dnode and hpfs_add_dirent, might also fail due to memory presure. But similarly, only -ENOSPC is returned. Addressing these will involve code modifications in other functions, and we plan to submit dedicated patches for these issues in the future. For this patch, we focus on new_inode(). Signed-off-by: Yikang Yue <yikangy2@illinois.edu> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>	2025-09-08 17:26:05 +02:00
Su Hui	fd8a620f19	hpfs: Replace simple_strtoul with kstrtoint in hpfs_parse_param kstrtoint() is better because simple_strtoul() ignores overflow and the type of 'timeshift' is 'int' rather than 'unsigned long'. Using kstrtoint() is more concise too. Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>	2025-09-08 17:23:34 +02:00
Linus Torvalds	f777d1112e	Merge tag 'vfs-6.17-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "fuse: - Prevent opening of non-regular backing files. Fuse doesn't support non-regular files anyway. - Check whether copy_file_range() returns a larger size than requested. - Prevent overflow in copy_file_range() as fuse currently only supports 32-bit sized copies. - Cache the blocksize value if the server returned a new value as inode->i_blkbits isn't modified directly anymore. - Fix i_blkbits handling for iomap partial writes. By default i_blkbits is set to PAGE_SIZE which causes iomap to mark the whole folio as uptodate even on a partial write. But fuseblk filesystems support choosing a blocksize smaller than PAGE_SIZE risking data corruption. Simply enforce PAGE_SIZE as blocksize for fuseblk's internal inode for now. - Prevent out-of-bounds acces in fuse_dev_write() when the number of bytes to be retrieved is truncated to the fc->max_pages limit. virtiofs: - Fix page faults for DAX page addresses. Misc: - Tighten file handle decoding from userns. Check that the decoded dentry itself has a valid idmapping in the user namespace. - Fix mount-notify selftests. - Fix some indentation errors. - Add an FMODE_ flag to indicate IOCB_HAS_METADATA availability. This will be moved to an FOP_* flag with a bit more rework needed for that to happen not suitable for a fix. - Don't silently ignore metadata for sync read/write. - Don't pointlessly log warning when reading coredump sysctls" * tag 'vfs-6.17-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: virtio_fs: fix page fault for DAX page address selftests/fs/mount-notify: Fix compilation failure. fhandle: use more consistent rules for decoding file handle from userns fuse: Block access to folio overlimit fuse: fix fuseblk i_blkbits for iomap partial writes fuse: reflect cached blocksize if blocksize was changed fuse: prevent overflow in copy_file_range return value fuse: check if copy_file_range() returns larger than requested size fuse: do not allow mapping a non-regular backing file coredump: don't pointlessly check and spew warnings fs: fix indentation style block: don't silently ignore metadata for sync read/write fs: add a FMODE_ flag to indicate IOCB_HAS_METADATA availability Please enter a commit message to explain why this merge is necessary, especially if it merges an updated upstream into a topic branch.	2025-09-08 07:53:01 -07:00
Gustavo A. R. Silva	68a7449062	fs: hpfs: Avoid multiple -Wflex-array-member-not-at-end warnings -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. So, in order to avoid ending up with a flexible-array member in the middle of other structs, we use the `struct_group_tagged()` helper to create a new tagged `struct bplus_header_fixed`. This structure groups together all the members of the flexible `struct bplus_header` except the flexible array. As a result, the array is effectively separated from the rest of the members without modifying the memory layout of the flexible structure. We then change the type of the middle struct member currently causing trouble from `struct bplus_header` to `struct bplus_header_fixed`. We also want to ensure that when new members need to be added to the flexible structure, they are always included within the newly created tagged struct. For this, we use `static_assert()`. This ensures that the memory layout for both the flexible structure and the new tagged struct is the same after any changes. This approach avoids having to implement `struct bplus_header_fixed` as a completely separate structure, thus preventing having to maintain two independent but basically identical structures, closing the door to potential bugs in the future. We also use `container_of()` (via a wrapper) whenever we need to retrieve a pointer to the flexible structure, through which we can access the flexible-array member, if necessary. So, with these changes, fix 26 of the following type of warnings: fs/hpfs/hpfs.h:456:23: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] fs/hpfs/hpfs.h:498:23: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>	2025-09-08 16:28:38 +02:00
Trond Myklebust	c12b6a7b12	NFS: Fix the marking of the folio as up to date Since all callers of nfs_page_group_covers_page() have already ensured that there is only one group member, all that is required is to check that the entire folio contains dirty data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:26 -04:00
Trond Myklebust	b7b8574225	NFS: nfs_invalidate_folio() must observe the offset and size arguments If we're truncating part of the folio, then we need to write out the data on the part that is not covered by the cancellation. Fixes: `d47992f86b` ("mm: change invalidatepage prototype to accept length") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:26 -04:00
Trond Myklebust	ca247c8990	NFSv4.2: Serialise O_DIRECT i/o and copy range Ensure that all O_DIRECT reads and writes complete before copying a file range, so that the destination is up to date. Fixes: `a5864c999d` ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:25 -04:00
Trond Myklebust	c80ebeba11	NFSv4.2: Serialise O_DIRECT i/o and clone range Ensure that all O_DIRECT reads and writes complete before cloning a file range, so that both the source and destination are up to date. Fixes: `a5864c999d` ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:25 -04:00
Trond Myklebust	b93128f297	NFSv4.2: Serialise O_DIRECT i/o and fallocate() Ensure that all O_DIRECT reads and writes complete before calling fallocate so that we don't race w.r.t. attribute updates. Fixes: `99f2378322` ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:25 -04:00
Trond Myklebust	9eb90f4354	NFS: Serialise O_DIRECT i/o and truncate() Ensure that all O_DIRECT reads and writes are complete, and prevent the initiation of new i/o until the setattr operation that will truncate the file is complete. Fixes: `a5864c999d` ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:25 -04:00
Trond Myklebust	b2036bb651	NFSv4.2: Protect copy offload and clone against 'eof page pollution' The NFSv4.2 copy offload and clone functions can also end up extending the size of the destination file, so they too need to call nfs_truncate_last_folio(). Reported-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:25 -04:00
Trond Myklebust	b1817b18ff	NFS: Protect against 'eof page pollution' This commit fixes the failing xfstest 'generic/363'. When the user mmaps() an area that extends beyond the end of file, and proceeds to write data into the folio that straddles that eof, we're required to discard that folio data if the user calls some function that extends the file length. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:25 -04:00
Tigran Mkrtchyan	5a46d2339a	flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read Recent commit `f06bedfa62` ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") has changed the error return type of ff_layout_choose_ds_for_read() from NULL to an error pointer. However, not all code paths have been updated to match the change. Thus, some non-NULL checks will accept error pointers as a valid return value. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `f06bedfa62` ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:25 -04:00
Mike Snitzer	d3684397ea	nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Previously nfs_local_probe() was made to disable and then attempt to re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was called and LOCALIO already enabled. Vague memory for _why_ this was the case is that this was useful if/when a local NFS server were to be restarted with a local NFS client connected to it. But as it happens this causes an absurd amount of LOCALIO flapping which has a side-effect of too much IO being needlessly sent to NFSD (using RPC over the loopback network interface). This is the definition of "serious performance loss" (that negates the point of having LOCALIO). So remove this mis-optimization for re-enabling LOCALIO if/when an NFS server is restarted (which is an extremely rare thing to do). Will revisit testing that scenario again but in the meantime this patch restores the full benefit of LOCALIO. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-06 16:51:21 -04:00
Chen Ridong	3c9ba2777d	kernfs: Fix UAF in polling when open file is released A use-after-free (UAF) vulnerability was identified in the PSI (Pressure Stall Information) monitoring mechanism: BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 psi_trigger_poll+0x3c/0x140 cgroup_pressure_poll+0x70/0xa0 cgroup_file_poll+0x8c/0x100 kernfs_fop_poll+0x11c/0x1c0 ep_item_poll.isra.0+0x188/0x2c0 Allocated by task 1: cgroup_file_open+0x88/0x388 kernfs_fop_open+0x73c/0xaf0 do_dentry_open+0x5fc/0x1200 vfs_open+0xa0/0x3f0 do_open+0x7e8/0xd08 path_openat+0x2fc/0x6b0 do_filp_open+0x174/0x368 Freed by task 8462: cgroup_file_release+0x130/0x1f8 kernfs_drain_open_files+0x17c/0x440 kernfs_drain+0x2dc/0x360 kernfs_show+0x1b8/0x288 cgroup_file_show+0x150/0x268 cgroup_pressure_write+0x1dc/0x340 cgroup_file_write+0x274/0x548 Reproduction Steps: 1. Open test/cpu.pressure and establish epoll monitoring 2. Disable monitoring: echo 0 > test/cgroup.pressure 3. Re-enable monitoring: echo 1 > test/cgroup.pressure The race condition occurs because: 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: - Releases PSI triggers via cgroup_file_release() - Frees of->priv through kernfs_drain_open_files() 2. While epoll still holds reference to the file and continues polling 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv epolling disable/enable cgroup.pressure fd=open(cpu.pressure) while(1) ... epoll_wait kernfs_fop_poll kernfs_get_active = true echo 0 > cgroup.pressure ... cgroup_file_show kernfs_show // inactive kn kernfs_drain_open_files cft->release(of); kfree(ctx); ... kernfs_get_active = false echo 1 > cgroup.pressure kernfs_show kernfs_activate_one(kn); kernfs_fop_poll kernfs_get_active = true cgroup_file_poll psi_trigger_poll // UAF ... end: close(fd) To address this issue, introduce kernfs_get_active_of() for kernfs open files to obtain active references. This function will fail if the open file has been released. Replace kernfs_get_active() with kernfs_get_active_of() to prevent further operations on released file descriptors. Fixes: `34f26a1561` ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") Cc: stable <stable@kernel.org> Reported-by: Zhang Zhaotian <zhangzhaotian@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250822070715.1565236-2-chenridong@huaweicloud.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-06 20:11:27 +02:00
Eric Biggers	19591f7e78	fscrypt: use HMAC-SHA512 library for HKDF For the HKDF-SHA512 key derivation needed by fscrypt, just use the HMAC-SHA512 library functions directly. These functions were introduced in v6.17, and they provide simple and efficient direct support for HMAC-SHA512. This ends up being quite a bit simpler and more efficient than using crypto/hkdf.c, as it avoids the generic crypto layer: - The HMAC library can't fail, so callers don't need to handle errors - No inefficient indirect calls - No inefficient and error-prone dynamic allocations - No inefficient and error-prone loading of algorithm by name - Less stack usage Benchmarks on x86_64 show that deriving a per-file key gets about 30% faster, and FS_IOC_ADD_ENCRYPTION_KEY gets nearly twice as fast. The only small downside is the HKDF-Expand logic gets duplicated again. Then again, even considering that, the new fscrypt_hkdf_expand() is only 7 lines longer than the version that called hkdf_expand(). Later we could add HKDF support to lib/crypto/, but for now let's just do this. Link: https://lore.kernel.org/r/20250906035913.1141532-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-09-05 21:01:51 -07:00
Scott Mayhew	992203a1fb	nfs/localio: restore creds before releasing pageio data Otherwise if the nfsd filecache code releases the nfsd_file immediately, it can trigger the BUG_ON(cred == current->cred) in __put_cred() when it puts the nfsd_file->nf_file->f-cred. Fixes: `b9f5dd57f4` ("nfs/localio: use dedicated workqueues for filesystem read and write") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20250807164938.2395136-1-smayhew@redhat.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2025-09-05 15:23:49 -04:00
Carlos Maiolino	e90dcba0a3	Merge tag 'kconfig-2025-changes_2025-09-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.18-merge xfs: kconfig and feature changes for 2025 LTS [6.18 v2 2/2] Ahead of the 2025 LTS kernel, disable by default the two features that we promised to turn off in September 2025: V4 filesystems, and the long-broken ASCII case insensitive directories. Since online fsck has not had any major issues in the 16 months since it was merged upstream, let's also turn that on by default. With a bit of luck, this should all go splendidly. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-09-05 20:27:25 +02:00
Carlos Maiolino	482c57805c	Merge tag 'fix-scrub-reap-calculations_2025-09-05' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.18-merge xfs: improve online repair reap calculations [6.18 v2 1/2] A few months ago, the multi-fsblock untorn writes patchset added a bunch of log intent item helper functions to estimate the number of intent items that could be added to a particular transaction. Those helpers enabled us to compute a safe upper bound on the number of blocks that could be written in an untorn fashion with filesystem-provided out of place writes. Currently, the online fsck code employs static limits on the number of intent items that it's willing to accrue to a single transaction when it's trying to reap what it thinks are the old blocks from a corrupt structure. There have been no problems reported with this approach after years of testing, but static limits are scary and gross because overestimating the intent item limit could result in transaction overflows and dead filesystems; and underestimating causes unnecessary overhead. This series uses the new log intent item size helpers to estimate the limits dynamically based on worst-case per-block repair work vs. the size of the scrub transaction. After several months of testing this, there don't seem to be any problems here either. v2: rearrange patches, add review tags This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-09-05 20:26:42 +02:00
Linus Torvalds	c2f3b108c0	Merge tag '6.17-RC4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix two potential NULL pointer references - Two debugging improvements (to help debug recent issues) a new tracepoint, and minor improvement to DebugData - Trivial comment cleanup * tag '6.17-RC4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: prevent NULL pointer dereference in UTF16 conversion smb: client: show negotiated cipher in DebugData smb: client: add new tracepoint to trace lease break notification smb: client: fix spellings in comments smb: client: Fix NULL pointer dereference in cifs_debug_dirs_proc_show()	2025-09-05 11:14:23 -07:00
Mark Harmstone	3d1267475b	btrfs: don't allow adding block device of less than 1 MB Commit 15ae0410c37a79 ("btrfs-progs: add error handling for device_get_partition_size_fd_stat()") in btrfs-progs inadvertently changed it so that if the BLKGETSIZE64 ioctl on a block device returned a size of 0, this was no longer seen as an error condition. Unfortunately this is how disconnected NBD devices behave, meaning that with btrfs-progs 6.16 it's now possible to add a device you can't remove: # btrfs device add /dev/nbd0 /root/temp # btrfs device remove /dev/nbd0 /root/temp ERROR: error removing device '/dev/nbd0': Invalid argument This check should always have been done kernel-side anyway, so add a check in btrfs_init_new_device() that the new device doesn't have a size less than BTRFS_DEVICE_RANGE_RESERVED (i.e. 1 MB). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-05 19:52:10 +02:00
Darrick J. Wong	0ff51a1fd7	xfs: enable online fsck by default in Kconfig Online fsck has been a part of upstream for over a year now without any serious problems. Turn it on by default in time for the 2025 LTS kernel, and get rid of the "say N if unsure" messages for the default Y options. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2025-09-05 08:48:24 -07:00

... 19 20 21 22 23 ...

102029 Commits