linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-22 21:25:25 -04:00

Author	SHA1	Message	Date
Darrick J. Wong	07c34f8cef	xfs: use deferred reaping for data device cow extents Don't roll the whole transaction after every extent, that's rather inefficient. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:23 -07:00
Darrick J. Wong	21d59d0022	xfs: remove deprecated sysctl knobs These sysctl knobs were scheduled for removal in September 2025. That time has come, so remove them. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2025-09-05 08:48:23 -07:00
Darrick J. Wong	d5b157e088	xfs: remove static reap limits from repair.h Delete XREAP_MAX_BINVAL and XREAP_MAX_DEFER_CHAIN because the reap code now calculates those limits dynamically, so they're no longer needed. Move the third limit (XREP_MAX_ITRUNCATE_EFIS) to the one file that uses it. Note that the btree rebuilding code should reserve exactly the number of blocks needed to rebuild a btree, so it is rare that the newbt code will need to add any EFIs to the commit transaction. That's why that static limit remains. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:23 -07:00
Darrick J. Wong	b9a176e541	xfs: remove deprecated mount options These four mount options were scheduled for removal in September 2025, so remove them now. Cc: preichl@redhat.com Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2025-09-05 08:48:23 -07:00
Darrick J. Wong	f69260511c	xfs: disable deprecated features by default in Kconfig We promised to turn off these old features by default in September 2025. Do so now. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2025-09-05 08:48:23 -07:00
Darrick J. Wong	e4c7eece76	xfs: compute file mapping reap limits dynamically Reaping file fork mappings is a little different -- log recovery can free the blocks for us, so we only try to process a single mapping at a time. Therefore, we only need to figure out the maximum number of blocks that we can invalidate in a single transaction. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used per binval) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:22 -07:00
Darrick J. Wong	74fc66ee17	xfs: compute realtime device CoW staging extent reap limits dynamically Calculate the maximum number of CoW staging extents that can be reaped in a single transaction chain. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used by intents per extent) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:22 -07:00
Darrick J. Wong	442bc127d4	xfs: compute data device CoW staging extent reap limits dynamically Calculate the maximum number of CoW staging extents that can be reaped in a single transaction chain. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used by intents per extent + space used for a few buffer invalidations) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:22 -07:00
Darrick J. Wong	b2311ec677	xfs: compute per-AG extent reap limits dynamically Calculate the maximum number of extents that can be reaped in a single transaction chain, and the number of buffers that can be invalidated in a single transaction. The rough calculation here is: nr_extents = (logres - reservation used by any one step) / (space used by intents per extent + space used per binval) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:22 -07:00
Darrick J. Wong	ef930cc371	xfs: convert the ifork reap code to use xreap_state Convert the file fork reaping code to use struct xreap_state so that we can reuse the dynamic state tracking code. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:22 -07:00
Darrick J. Wong	82e374405e	xfs: prepare reaping code for dynamic limits The online repair block reaping code employs static limits to decide if it's time to roll the transaction or finish the deferred item chains to avoid overflowing the scrub transaction's reservation. However, the use of static limits aren't great -- btree blocks are assumed to be scattered around the AG and the buffers need to be invalidated, whereas COW staging extents are usually contiguous and do not have buffers. We would like to configure the limits dynamically. To get ready for this, reorganize struct xreap_state to store dynamic limits, and add helpers to hide some of the details of how the limits are enforced. Also rename the "xreap roll" functions to include the word "binval" because they only exist to decide when we should roll the transaction to deal with buffer invalidations. No functional changes intended here. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:22 -07:00
Darrick J. Wong	cd32a0c0dc	xfs: use deferred intent items for reaping crosslinked blocks When we're removing rmap records for crosslinked blocks, use deferred intent items so that we can try to free/unmap as many of the old data structure's blocks as we can in the same transaction as the commit. Cc: <stable@vger.kernel.org> # v6.6 Fixes: `1c7ce115e5` ("xfs: reap large AG metadata extents when possible") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2025-09-05 08:48:21 -07:00
Anderson Nascimento	62e59ffe87	fanotify: Validate the return value of mnt_ns_from_dentry() before dereferencing The function do_fanotify_mark() does not validate if mnt_ns_from_dentry() returns NULL before dereferencing mntns->user_ns. This causes a NULL pointer dereference in do_fanotify_mark() if the path is not a mount namespace object. Fix this by checking mnt_ns_from_dentry()'s return value before dereferencing it. Before the patch $ gcc fanotify_nullptr.c -o fanotify_nullptr $ mkdir A $ ./fanotify_nullptr Fanotify fd: 3 fanotify_mark: Operation not permitted $ unshare -Urm Fanotify fd: 3 Killed int main(void){ int ffd; ffd = fanotify_init(FAN_CLASS_NOTIF \| FAN_REPORT_MNT, 0); if(ffd < 0){ perror("fanotify_init"); exit(EXIT_FAILURE); } printf("Fanotify fd: %d\n",ffd); if(fanotify_mark(ffd, FAN_MARK_ADD \| FAN_MARK_MNTNS, FAN_MNT_ATTACH, AT_FDCWD, "A") < 0){ perror("fanotify_mark"); exit(EXIT_FAILURE); } return 0; } After the patch $ gcc fanotify_nullptr.c -o fanotify_nullptr $ mkdir A $ ./fanotify_nullptr Fanotify fd: 3 fanotify_mark: Operation not permitted $ unshare -Urm Fanotify fd: 3 fanotify_mark: Invalid argument [ 25.694973] BUG: kernel NULL pointer dereference, address: 0000000000000038 [ 25.695006] #PF: supervisor read access in kernel mode [ 25.695012] #PF: error_code(0x0000) - not-present page [ 25.695017] PGD 109a30067 P4D 109a30067 PUD 142b46067 PMD 0 [ 25.695025] Oops: Oops: 0000 [#1] SMP NOPTI [ 25.695032] CPU: 4 UID: 1000 PID: 1478 Comm: fanotify_nullpt Not tainted 6.17.0-rc4 #1 PREEMPT(lazy) [ 25.695040] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 25.695049] RIP: 0010:do_fanotify_mark+0x817/0x950 [ 25.695066] Code: 04 00 00 e9 45 fd ff ff 48 8b 7c 24 48 4c 89 54 24 18 4c 89 5c 24 10 4c 89 0c 24 e8 b3 11 fc ff 4c 8b 54 24 18 4c 8b 5c 24 10 <48> 8b 78 38 4c 8b 0c 24 49 89 c4 e9 13 fd ff ff 8b 4c 24 28 85 c9 [ 25.695081] RSP: 0018:ffffd31c469e3c08 EFLAGS: 00010203 [ 25.695104] RAX: 0000000000000000 RBX: 0000000001000000 RCX: ffff8eb48aebd220 [ 25.695110] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8eb4835e8180 [ 25.695115] RBP: 0000000000000111 R08: 0000000000000000 R09: 0000000000000000 [ 25.695142] R10: ffff8eb48a7d56c0 R11: ffff8eb482bede00 R12: 00000000004012a7 [ 25.695148] R13: 0000000000000110 R14: 0000000000000001 R15: ffff8eb48a7d56c0 [ 25.695154] FS: 00007f8733bda740(0000) GS:ffff8eb61ce5f000(0000) knlGS:0000000000000000 [ 25.695162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.695170] CR2: 0000000000000038 CR3: 0000000136994006 CR4: 00000000003706f0 [ 25.695201] Call Trace: [ 25.695209] <TASK> [ 25.695215] __x64_sys_fanotify_mark+0x1f/0x30 [ 25.695222] do_syscall_64+0x82/0x2c0 ... Fixes: `58f5fbeb36` ("fanotify: support watching filesystems and mounts inside userns") Link: https://patch.msgid.link/CAPhRvkw4ONypNsJrCnxbKnJbYmLHTDEKFC4C_num_5sVBVa8jg@mail.gmail.com Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>	2025-09-05 16:02:55 +02:00
Haiyue Wang	e1bf212d06	fuse: virtio_fs: fix page fault for DAX page address The commit `ced17ee32a` ("Revert "virtio: reject shm region if length is zero"") exposes the following DAX page fault bug (this fix the failure that getting shm region alway returns false because of zero length): The commit `21aa65bf82` ("mm: remove callers of pfn_t functionality") handles the DAX physical page address incorrectly: the removed macro 'phys_to_pfn_t()' should be replaced with 'PHYS_PFN()'. [ 1.390321] BUG: unable to handle page fault for address: ffffd3fb40000008 [ 1.390875] #PF: supervisor read access in kernel mode [ 1.391257] #PF: error_code(0x0000) - not-present page [ 1.391509] PGD 0 P4D 0 [ 1.391626] Oops: Oops: 0000 [#1] SMP NOPTI [ 1.391806] CPU: 6 UID: 1000 PID: 162 Comm: weston Not tainted 6.17.0-rc3-WSL2-STABLE #2 PREEMPT(none) [ 1.392361] RIP: 0010:dax_to_folio+0x14/0x60 [ 1.392653] Code: 52 c9 c3 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 c1 ef 05 48 c1 e7 06 48 03 3d 34 b5 31 01 <48> 8b 57 08 48 89 f8 f6 c2 01 75 2b 66 90 c3 cc cc cc cc f7 c7 ff [ 1.393727] RSP: 0000:ffffaf7d04407aa8 EFLAGS: 00010086 [ 1.394003] RAX: 000000a000000000 RBX: ffffaf7d04407bb0 RCX: 0000000000000000 [ 1.394524] RDX: ffffd17b40000008 RSI: 0000000000000083 RDI: ffffd3fb40000000 [ 1.394967] RBP: 0000000000000011 R08: 000000a000000000 R09: 0000000000000000 [ 1.395400] R10: 0000000000001000 R11: ffffaf7d04407c10 R12: 0000000000000000 [ 1.395806] R13: ffffa020557be9c0 R14: 0000014000000001 R15: 0000725970e94000 [ 1.396268] FS: 000072596d6d2ec0(0000) GS:ffffa0222dc59000(0000) knlGS:0000000000000000 [ 1.396715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.397100] CR2: ffffd3fb40000008 CR3: 000000011579c005 CR4: 0000000000372ef0 [ 1.397518] Call Trace: [ 1.397663] <TASK> [ 1.397900] dax_insert_entry+0x13b/0x390 [ 1.398179] dax_fault_iter+0x2a5/0x6c0 [ 1.398443] dax_iomap_pte_fault+0x193/0x3c0 [ 1.398750] __fuse_dax_fault+0x8b/0x270 [ 1.398997] ? vm_mmap_pgoff+0x161/0x210 [ 1.399175] __do_fault+0x30/0x180 [ 1.399360] do_fault+0xc4/0x550 [ 1.399547] __handle_mm_fault+0x8e3/0xf50 [ 1.399731] ? do_syscall_64+0x72/0x1e0 [ 1.399958] handle_mm_fault+0x192/0x2f0 [ 1.400204] do_user_addr_fault+0x20e/0x700 [ 1.400418] exc_page_fault+0x66/0x150 [ 1.400602] asm_exc_page_fault+0x26/0x30 [ 1.400831] RIP: 0033:0x72596d1bf703 [ 1.401076] Code: 31 f6 45 31 e4 48 8d 15 b3 73 00 00 e8 06 03 00 00 8b 83 68 01 00 00 e9 8e fa ff ff 0f 1f 00 48 8b 44 24 08 4c 89 ee 48 89 df <c7> 00 21 43 34 12 e8 72 09 00 00 e9 6a fa ff ff 0f 1f 44 00 00 e8 [ 1.402172] RSP: 002b:00007ffc350f6dc0 EFLAGS: 00010202 [ 1.402488] RAX: 0000725970e94000 RBX: 00005b7c642c2560 RCX: 0000725970d359a7 [ 1.402898] RDX: 0000000000000003 RSI: 00007ffc350f6dc0 RDI: 00005b7c642c2560 [ 1.403284] RBP: 00007ffc350f6e90 R08: 000000000000000d R09: 0000000000000000 [ 1.403634] R10: 00007ffc350f6dd8 R11: 0000000000000246 R12: 0000000000000001 [ 1.404078] R13: 00007ffc350f6dc0 R14: 0000725970e29ce0 R15: 0000000000000003 [ 1.404450] </TASK> [ 1.404570] Modules linked in: [ 1.404821] CR2: ffffd3fb40000008 [ 1.405029] ---[ end trace 0000000000000000 ]--- [ 1.405323] RIP: 0010:dax_to_folio+0x14/0x60 [ 1.405556] Code: 52 c9 c3 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 c1 ef 05 48 c1 e7 06 48 03 3d 34 b5 31 01 <48> 8b 57 08 48 89 f8 f6 c2 01 75 2b 66 90 c3 cc cc cc cc f7 c7 ff [ 1.406639] RSP: 0000:ffffaf7d04407aa8 EFLAGS: 00010086 [ 1.406910] RAX: 000000a000000000 RBX: ffffaf7d04407bb0 RCX: 0000000000000000 [ 1.407379] RDX: ffffd17b40000008 RSI: 0000000000000083 RDI: ffffd3fb40000000 [ 1.407800] RBP: 0000000000000011 R08: 000000a000000000 R09: 0000000000000000 [ 1.408246] R10: 0000000000001000 R11: ffffaf7d04407c10 R12: 0000000000000000 [ 1.408666] R13: ffffa020557be9c0 R14: 0000014000000001 R15: 0000725970e94000 [ 1.409170] FS: 000072596d6d2ec0(0000) GS:ffffa0222dc59000(0000) knlGS:0000000000000000 [ 1.409608] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.409977] CR2: ffffd3fb40000008 CR3: 000000011579c005 CR4: 0000000000372ef0 [ 1.410437] Kernel panic - not syncing: Fatal exception [ 1.410857] Kernel Offset: 0xc000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: `21aa65bf82` ("mm: remove callers of pfn_t functionality") Signed-off-by: Haiyue Wang <haiyuewa@163.com> Link: https://lore.kernel.org/20250904120339.972-1-haiyuewa@163.com Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-05 15:56:30 +02:00
Nam Cao	0c43094f8c	eventpoll: Replace rwlock with spinlock The ready event list of an epoll object is protected by read-write semaphore: - The consumer (waiter) acquires the write lock and takes items. - the producer (waker) takes the read lock and adds items. The point of this design is enabling epoll to scale well with large number of producers, as multiple producers can hold the read lock at the same time. Unfortunately, this implementation may cause scheduling priority inversion problem. Suppose the consumer has higher scheduling priority than the producer. The consumer needs to acquire the write lock, but may be blocked by the producer holding the read lock. Since read-write semaphore does not support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y), we have a case of priority inversion: a higher priority consumer is blocked by a lower priority producer. This problem was reported in [1]. Furthermore, this could also cause stall problem, as described in [2]. Fix this problem by replacing rwlock with spinlock. This reduces the event bandwidth, as the producers now have to contend with each other for the spinlock. According to the benchmark from https://github.com/rouming/test-tools/blob/master/stress-epoll.c: On 12 x86 CPUs: Before After Diff threads events/ms events/ms 8 7162 4956 -31% 16 8733 5383 -38% 32 7968 5572 -30% 64 10652 5739 -46% 128 11236 5931 -47% On 4 riscv CPUs: Before After Diff threads events/ms events/ms 8 2958 2833 -4% 16 3323 3097 -7% 32 3451 3240 -6% 64 3554 3178 -11% 128 3601 3235 -10% Although the numbers look bad, it should be noted that this benchmark creates multiple threads who do nothing except constantly generating new epoll events, thus contention on the spinlock is high. For real workload, the event rate is likely much lower, and the performance drop is not as bad. Using another benchmark (perf bench epoll wait) where spinlock contention is lower, improvement is even observed on x86: On 12 x86 CPUs: Before: Averaged 110279 operations/sec (+- 1.09%), total secs = 8 After: Averaged 114577 operations/sec (+- 2.25%), total secs = 8 On 4 riscv CPUs: Before: Averaged 175767 operations/sec (+- 0.62%), total secs = 8 After: Averaged 167396 operations/sec (+- 0.23%), total secs = 8 In conclusion, no one is likely to be upset over this change. After all, spinlock was used originally for years, and the commit which converted to rwlock didn't mention a real workload, just that the benchmark numbers are nice. This patch is not exactly the revert of commit `a218cc4914` ("epoll: use rwlock in order to reduce ep_poll_callback() contention"), because git revert conflicts in some places which are not obvious on the resolution. This patch is intended to be backported, therefore go with the obvious approach: - Replace rwlock_t with spinlock_t one to one - Delete list_add_tail_lockless() and chain_epi_lockless(). These were introduced to allow producers to concurrently add items to the list. But now that spinlock no longer allows producers to touch the event list concurrently, these two functions are not necessary anymore. Fixes: `a218cc4914` ("epoll: use rwlock in order to reduce ep_poll_callback() contention") Signed-off-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/ec92458ea357ec503c737ead0f10b2c6e4c37d47.1752581388.git.namcao@linutronix.de Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Cc: stable@vger.kernel.org Reported-by: Frederic Weisbecker <frederic@kernel.org> Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/ [1] Reported-by: Valentin Schneider <vschneid@redhat.com> Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.csb/ [2] Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-05 15:51:24 +02:00
Marcelo Moreira	33ddc796ec	xfs: Replace strncpy with memcpy The changes modernizes the code by aligning it with current kernel best practices. It improves code clarity and consistency, as strncpy is deprecated as explained in Documentation/process/deprecated.rst. This change does not alter the functionality or introduce any behavioral changes. Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Marcelo Moreira <marcelomoreira1905@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2025-09-05 10:18:59 +02:00
Miklos Szeredi	3f29d59e92	fuse: add prune notification Some fuse servers need to prune their caches, which can only be done if the kernel's own dentry/inode caches are pruned first to avoid dangling references. Add FUSE_NOTIFY_PRUNE, which takes an array of node ID's to try and get rid of. Inodes with active references are skipped. A similar functionality is already provided by FUSE_NOTIFY_INVAL_ENTRY with the FUSE_EXPIRE_ONLY flag. Differences in the interface are FUSE_NOTIFY_INVAL_ENTRY: - can only prune one dentry - dentry is determined by parent ID and name - if inode has multiple aliases (cached hard links), then they would have to be invalidated individually to be able to get rid of the inode FUSE_NOTIFY_PRUNE: - can prune multiple inodes - inodes determined by their node ID - aliases are taken care of automatically Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-09-05 09:11:28 +02:00
Miklos Szeredi	60e1579a0d	fuse: remove redundant calls to fuse_copy_finish() in fuse_notify() Remove tail calls of fuse_copy_finish(), since it's now done from fuse_dev_do_write(). No functional change. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-09-05 09:11:28 +02:00
Miklos Szeredi	0b563aad1c	fuse: fix possibly missing fuse_copy_finish() call in fuse_notify() In case of FUSE_NOTIFY_RESEND and FUSE_NOTIFY_INC_EPOCH fuse_copy_finish() isn't called. Fix by always calling fuse_copy_finish() after fuse_notify(). It's a no-op if called a second time. Fixes: `760eac73f9` ("fuse: Introduce a new notification type for resend pending requests") Fixes: `2396356a94` ("fuse: add more control over cache invalidation behaviour") Cc: <stable@vger.kernel.org> # v6.9 Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-09-05 09:11:16 +02:00
Al Viro	57e62089f8	do_nfs4_mount(): switch to vfs_parse_fs_string() Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-04 15:20:58 -04:00
Al Viro	b28f9eba12	change the calling conventions for vfs_parse_fs_string() Absolute majority of callers are passing the 4th argument equal to strlen() of the 3rd one. Drop the v_size argument, add vfs_parse_fs_qstr() for the cases that want independent length. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-04 15:20:51 -04:00
Makar Semyonov	70bccd9855	cifs: prevent NULL pointer dereference in UTF16 conversion There can be a NULL pointer dereference bug here. NULL is passed to __cifs_sfu_make_node without checks, which passes it unchecked to cifs_strndup_to_utf16, which in turn passes it to cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash. This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and returns NULL early to prevent dereferencing NULL pointer. Found by Linux Verification Center (linuxtesting.org) with SVACE Signed-off-by: Makar Semyonov <m.semenov@tssltd.ru> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-04 11:43:31 -05:00
Xichao Zhao	462272dd73	configfs: use PTR_ERR_OR_ZERO() to simplify code Use the standard error pointer macro to shorten the code and simplify. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20250812082709.49796-1-zhao.xichao@vivo.com Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>	2025-09-04 16:49:17 +02:00
Svetlana Parfenova	8c94db0ae9	binfmt_elf: preserve original ELF e_flags for core dumps Some architectures, such as RISC-V, use the ELF e_flags field to encode ABI-specific information (e.g., ISA extensions, fpu support). Debuggers like GDB rely on these flags in core dumps to correctly interpret optional register sets. If the flags are missing or incorrect, GDB may warn and ignore valid data, for example: warning: Unexpected size of section '.reg2/213' in core file. This can prevent access to fpu or other architecture-specific registers even when they were dumped. Save the e_flags field during ELF binary loading (in load_elf_binary()) into the mm_struct, and later retrieve it during core dump generation (in fill_note_info()). Kconfig option CONFIG_ARCH_HAS_ELF_CORE_EFLAGS is introduced for architectures that require this behaviour. Signed-off-by: Svetlana Parfenova <svetlana.parfenova@syntacore.com> Link: https://lore.kernel.org/r/20250901135350.619485-1-svetlana.parfenova@syntacore.com Signed-off-by: Kees Cook <kees@kernel.org>	2025-09-03 20:49:32 -07:00
Linus Torvalds	08b06c30a4	Merge tag 'v6.17-rc4-ksmbd-fix' of git://git.samba.org/ksmbd Pull smb server fix from Steve French: - fix handling filenames with ":" (colon) in them * tag 'v6.17-rc4-ksmbd-fix' of git://git.samba.org/ksmbd: ksmbd: allow a filename to contain colons on SMB3.1.1 posix extensions	2025-09-03 20:44:15 -07:00
Shashank A P	72b7ceca85	fs: quota: create dedicated workqueue for quota_release_work There is a kernel panic due to WARN_ONCE when panic_on_warn is set. This issue occurs when writeback is triggered due to sync call for an opened file(ie, writeback reason is WB_REASON_SYNC). When f2fs balance is needed at sync path, flush for quota_release_work is triggered. By default quota_release_work is queued to "events_unbound" queue which does not have WQ_MEM_RECLAIM flag. During f2fs balance "writeback" workqueue tries to flush quota_release_work causing kernel panic due to MEM_RECLAIM flag mismatch errors. This patch creates dedicated workqueue with WQ_MEM_RECLAIM flag for work quota_release_work. ------------[ cut here ]------------ WARNING: CPU: 4 PID: 14867 at kernel/workqueue.c:3721 check_flush_dependency+0x13c/0x148 Call trace: check_flush_dependency+0x13c/0x148 __flush_work+0xd0/0x398 flush_delayed_work+0x44/0x5c dquot_writeback_dquots+0x54/0x318 f2fs_do_quota_sync+0xb8/0x1a8 f2fs_write_checkpoint+0x3cc/0x99c f2fs_gc+0x190/0x750 f2fs_balance_fs+0x110/0x168 f2fs_write_single_data_page+0x474/0x7dc f2fs_write_data_pages+0x7d0/0xd0c do_writepages+0xe0/0x2f4 __writeback_single_inode+0x44/0x4ac writeback_sb_inodes+0x30c/0x538 wb_writeback+0xf4/0x440 wb_workfn+0x128/0x5d4 process_scheduled_works+0x1c4/0x45c worker_thread+0x32c/0x3e8 kthread+0x11c/0x1b0 ret_from_fork+0x10/0x20 Kernel panic - not syncing: kernel: panic_on_warn set ... Fixes: `ac6f420291` ("quota: flush quota_release_work upon quota writeback") CC: stable@vger.kernel.org Signed-off-by: Shashank A P <shashank.ap@samsung.com> Link: https://patch.msgid.link/20250901092905.2115-1-shashank.ap@samsung.com Signed-off-by: Jan Kara <jack@suse.cz>	2025-09-03 12:35:00 +02:00
Bharath SM	91be128b49	smb: client: show negotiated cipher in DebugData Print the negotiated encryption cipher type in DebugData Signed-off-by: Bharath SM <bharathsm@microsoft.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-02 20:38:00 -05:00
Bharath SM	72595cb6da	smb: client: add new tracepoint to trace lease break notification Add smb3_lease_break_enter to trace lease break notifications, recording lease state, flags, epoch, and lease key. Align smb3_lease_not_found to use the same payload and print format. Signed-off-by: Bharath SM <bharathsm@microsoft.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-02 20:37:44 -05:00
Bharath SM	0c3813d855	smb: client: fix spellings in comments correct spellings in comments Signed-off-by: Bharath SM <bharathsm@microsoft.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-02 20:37:17 -05:00
Al Viro	f1f486b841	finish_automount(): use __free() to deal with dropping mnt on failure same story as with do_new_mount_fc(). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:59 -04:00
Al Viro	308a022f41	do_new_mount_fc(): use __free() to deal with dropping mnt on failure do_add_mount() consumes vfsmount on success; just follow it with conditional retain_and_null_ptr() on success and we can switch to __free() for mnt and be done with that - unlock_mount() is in the very end. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:59 -04:00
Al Viro	9bf5d48852	finish_automount(): take the lock_mount() analogue into a helper finish_automount() can't use lock_mount() - it treats finding something already mounted as "quitely drop our mount and return 0", not as "mount on top of whatever mounted there". It's been open-coded; let's take it into a helper similar to lock_mount(). "something's already mounted" => -EBUSY, finish_automount() needs to distinguish it from the normal case and it can't happen in other failure cases. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:59 -04:00
Al Viro	6bbbc4a04a	pivot_root(2): use __free() to deal with struct path in it preparations for making unlock_mount() a __cleanup(); can't have path_put() inside mount_lock scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	76dfde13d6	do_loopback(): use __free(path_put) to deal with old_path preparations for making unlock_mount() a __cleanup(); can't have path_put() inside mount_lock scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	11941610b0	finish_automount(): simplify the ELOOP check It's enough to check that dentries match; if path->dentry is equal to m->mnt_root, superblocks will match as well. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	d29da1a8f1	move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() We want to mount beneath the given location. For that operation to make sense, location must be the root of some mount that has something under it. Currently we let it proceed if those requirements are not met, with rather meaningless results, and have that bogosity caught further down the road; let's fail early instead - do_lock_mount() doesn't make sense unless those conditions hold, and checking them there makes things simpler. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	c1ab70be88	do_move_mount(): deal with the checks on old_path early 1) checking that location we want to move does point to root of some mount can be done before anything else; that property is not going to change and having it already verified simplifies the analysis. 2) checking the type agreement between what we are trying to move and what we are trying to move it onto also belongs in the very beginning - do_lock_mount() might end up switching new_path to something that overmounts the original location, but... the same type agreement applies to overmounts, so we could just as well check against the original location. 3) since we know that old_path->dentry is the root of old_path->mnt, there's no point bothering with path_is_overmounted() in can_move_mount_beneath(); it's simply a check for the mount we are trying to move having non-NULL ->overmount. And with that, we can switch can_move_mount_beneath() to taking old instead of old_path, leaving no uses of old_path past the original checks. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	a666bbcf7e	do_move_mount(): trim local variables Both 'parent' and 'ns' are used at most once, no point precalculating those... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	5423426a79	switch do_new_mount_fc() to fc_mount() Prior to the call of do_new_mount_fc() the caller has just done successful vfs_get_tree(). Then do_new_mount_fc() does several checks on resulting superblock, and either does fc_drop_locked() and returns an error or proceeds to unlock the superblock and call vfs_create_mount(). The thing is, there's no reason to delay that unlock + vfs_create_mount() - the tests do not rely upon the state of ->s_umount and fc_drop_locked() put_fs_context() is equivalent to unlock ->s_umount put_fs_context() Doing vfs_create_mount() before the checks allows us to move vfs_get_tree() from caller to do_new_mount_fc() and collapse it with vfs_create_mount() into an fc_mount() call. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	8281f98a68	current_chrooted(): use guards here a use of __free(path_put) for dropping fs_root is enough to make guard(mount_locked_reader) fit... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:58 -04:00
Al Viro	6b6516c56b	current_chrooted(): don't bother with follow_down_one() All we need here is to follow ->overmount on root mount of namespace... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:57 -04:00
Al Viro	2aec880c1c	path_is_under(): use guards ... and document that locking requirements for is_path_reachable(). There is one questionable caller in do_listmount() where we are not holding mount_lock and might not have the first argument mounted. However, in that case it will immediately return true without having to look at the ancestors. Might be cleaner to move the check into non-LSTM_ROOT case which it really belongs in - there the check is not always true and is_mounted() is guaranteed. Document the locking environments for is_path_reachable() callers: get_peer_under_root() get_dominating_id() do_statmount() do_listmount() Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:57 -04:00
Al Viro	2605d86843	mnt_set_expiry(): use guards The reason why it needs only mount_locked_reader is that there's no lockless accesses of expiry lists. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:57 -04:00
Al Viro	f80b84358f	has_locked_children(): use guards ... and document the locking requirements of __has_locked_children() Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:57 -04:00
Al Viro	511db073b2	propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:57 -04:00
Al Viro	6b448d7a7c	check_for_nsfs_mounts(): no need to take locks Currently we are taking mount_writer; what that function needs is either mount_locked_reader (we are not changing anything, we just want to iterate through the subtree) or namespace_shared and a reference held by caller on the root of subtree - that's also enough to stabilize the topology. The thing is, all callers are already holding at least namespace_shared as well as a reference to the root of subtree. Let's make the callers provide locking warranties - don't mess with mount_lock in check_for_nsfs_mounts() itself and document the locking requirements. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:57 -04:00
Al Viro	747e91e5b7	mnt_already_visible(): use guards clean fit; namespace_shared due to iterating through ns->mounts. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:57 -04:00
Al Viro	61e68af33a	put_mnt_ns(): use guards clean fit; guards can't be weaker due to umount_tree() call. Setting emptied_ns requires namespace_excl, but not anything mount_lock-related. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:56 -04:00
Al Viro	550dda45df	mark_mounts_for_expiry(): use guards Clean fit; guards can't be weaker due to umount_tree() calls. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:56 -04:00
Al Viro	7b99ee2c5c	do_set_group(): use guards clean fit; namespace_excl to modify propagation graph Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-02 19:35:56 -04:00

... 20 21 22 23 24 ...

102029 Commits