Commit Graph

102029 Commits

Author SHA1 Message Date
Gao Xiang
2a13fc417f erofs: consolidate z_erofs_extent_lookback()
The initial m.delta[0] also needs to be checked against zero.

In addition, also drop the redundant logic that errors out for
lcn == 0 / m.delta[0] == 1 case.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-10-22 07:54:31 +08:00
Gao Xiang
e13d315ae0 erofs: avoid infinite loops due to corrupted subpage compact indexes
Robert reported an infinite loop observed by two crafted images.

The root cause is that `clusterofs` can be larger than `lclustersize`
for !NONHEAD `lclusters` in corrupted subpage compact indexes, e.g.:

  blocksize = lclustersize = 512   lcn = 6   clusterofs = 515

Move the corresponding check for full compress indexes to
`z_erofs_load_lcluster_from_disk()` to also cover subpage compact
compress indexes.

It also fixes the position of `m->type >= Z_EROFS_LCLUSTER_TYPE_MAX`
check, since it should be placed right after
`z_erofs_load_{compact,full}_lcluster()`.

Fixes: 8d2517aaee ("erofs: fix up compacted indexes for block size < 4096")
Fixes: 1a5223c182 ("erofs: do sanity check on m->type in z_erofs_load_compact_lcluster()")
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/r/35167.1760645886@localhost
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-10-22 07:54:11 +08:00
Deepanshu Kartikey
cec944dd32 hugetlbfs: move lock assertions after early returns in huge_pmd_unshare()
When hugetlb_vmdelete_list() processes VMAs during truncate operations, it
may encounter VMAs where huge_pmd_unshare() is called without the required
shareable lock.  This triggers an assertion failure in
hugetlb_vma_assert_locked().

The previous fix in commit dd83609b88 ("hugetlbfs: skip VMAs without
shareable locks in hugetlb_vmdelete_list") skipped entire VMAs without
shareable locks to avoid the assertion.  However, this prevented pages
from being unmapped and freed, causing a regression in
fallocate(PUNCH_HOLE) operations where pages were not freed immediately,
as reported by Mark Brown.

Instead of checking locks in the caller or skipping VMAs, move the lock
assertions in huge_pmd_unshare() to after the early return checks.  The
assertions are only needed when actual PMD unsharing work will be
performed.  If the function returns early because sz != PMD_SIZE or the
PMD is not shared, no locks are required and assertions should not fire.

This approach reverts the VMA skipping logic from commit dd83609b88
("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
while moving the assertions to avoid the assertion failure, keeping all
the logic within huge_pmd_unshare() itself and allowing page unmapping and
freeing to proceed for all VMAs.

Link: https://lkml.kernel.org/r/20251014113344.21194-1-kartikey406@gmail.com
Fixes: dd83609b88 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reported-by: <syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com>
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://syzkaller.appspot.com/bug?extid=f26d7c75c26ec19790e7
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Oscar Salvador <osalvador@suse.de>
Tested-by: <syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-21 15:46:17 -07:00
Chuck Lever
3e7f011c25 Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"
I've found that pynfs COMP6 now leaves the connection or lease in a
strange state, which causes CLOSE9 to hang indefinitely. I've dug
into it a little, but I haven't been able to root-cause it yet.
However, I bisected to commit 48aab1606f ("NFSD: Remove the cap on
number of operations per NFSv4 COMPOUND").

Tianshuo Han also reports a potential vulnerability when decoding
an NFSv4 COMPOUND. An attacker can place an arbitrarily large op
count in the COMPOUND header, which results in:

[   51.410584] nfsd: vmalloc error: size 1209533382144, exceeds total
pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0

when NFSD attempts to allocate the COMPOUND op array.

Let's restore the operation-per-COMPOUND limit, but increased to 200
for now.

Reported-by: tianshuo han <hantianshuo233@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Tested-by: Tianshuo Han <hantianshuo233@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21 11:03:50 -04:00
Nathan Chancellor
29cdfb4950 nfsd: Avoid strlen conflict in nfsd4_encode_components_esc()
There is an error building nfs4xdr.c with CONFIG_SUNRPC_DEBUG_TRACE=y
and CONFIG_FORTIFY_SOURCE=n due to the local variable strlen conflicting
with the function strlen():

  In file included from include/linux/cpumask.h:11,
                   from arch/x86/include/asm/paravirt.h:21,
                   from arch/x86/include/asm/irqflags.h:102,
                   from include/linux/irqflags.h:18,
                   from include/linux/spinlock.h:59,
                   from include/linux/mmzone.h:8,
                   from include/linux/gfp.h:7,
                   from include/linux/slab.h:16,
                   from fs/nfsd/nfs4xdr.c:37:
  fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_components_esc':
  include/linux/kernel.h:321:46: error: called object 'strlen' is not a function or function pointer
    321 |                 __trace_puts(_THIS_IP_, str, strlen(str));              \
        |                                              ^~~~~~
  include/linux/kernel.h:265:17: note: in expansion of macro 'trace_puts'
    265 |                 trace_puts(fmt);                        \
        |                 ^~~~~~~~~~
  include/linux/sunrpc/debug.h:34:41: note: in expansion of macro 'trace_printk'
     34 | #  define __sunrpc_printk(fmt, ...)     trace_printk(fmt, ##__VA_ARGS__)
        |                                         ^~~~~~~~~~~~
  include/linux/sunrpc/debug.h:42:17: note: in expansion of macro '__sunrpc_printk'
     42 |                 __sunrpc_printk(fmt, ##__VA_ARGS__);                    \
        |                 ^~~~~~~~~~~~~~~
  include/linux/sunrpc/debug.h:25:9: note: in expansion of macro 'dfprintk'
     25 |         dfprintk(FACILITY, fmt, ##__VA_ARGS__)
        |         ^~~~~~~~
  fs/nfsd/nfs4xdr.c:2646:9: note: in expansion of macro 'dprintk'
   2646 |         dprintk("nfsd4_encode_components(%s)\n", components);
        |         ^~~~~~~
  fs/nfsd/nfs4xdr.c:2643:13: note: declared here
   2643 |         int strlen, count=0;
        |             ^~~~~~

This dprintk() instance is not particularly useful, so just remove it
altogether to get rid of the immediate strlen() conflict.

At the same time, eliminate the local strlen variable to avoid potential
conflicts with strlen() in the future.

Fixes: ec7d8e68ef ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21 11:03:19 -04:00
Chuck Lever
abb1f08a21 NFSD: Fix crash in nfsd4_read_release()
When tracing is enabled, the trace_nfsd_read_done trace point
crashes during the pynfs read.testNoFh test.

Fixes: 15a8b55dbb ("nfsd: call op_release, even when op_func returns an error")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21 11:03:19 -04:00
Chuck Lever
4f76435fd5 NFSD: Define actions for the new time_deleg FATTR4 attributes
NFSv4 clients won't send legitimate GETATTR requests for these new
attributes because they are intended to be used only with CB_GETATTR
and SETATTR. But NFSD has to do something besides crashing if it
ever sees a GETATTR request that queries these attributes.

RFC 8881 Section 18.7.3 states:

> The server MUST return a value for each attribute that the client
> requests if the attribute is supported by the server for the
> target file system. If the server does not support a particular
> attribute on the target file system, then it MUST NOT return the
> attribute value and MUST NOT set the attribute bit in the result
> bitmap. The server MUST return an error if it supports an
> attribute on the target but cannot obtain its value. In that case,
> no attribute values will be returned.

Further, RFC 9754 Section 5 states:

> These new attributes are invalid to be used with GETATTR, VERIFY,
> and NVERIFY, and they can only be used with CB_GETATTR and SETATTR
> by a client holding an appropriate delegation.

Thus there does not appear to be a specific server response mandated
by specification. Taking the guidance that querying these attributes
via GETATTR is "invalid", NFSD will return nfserr_inval, failing the
request entirely.

Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/linux-nfs/7819419cf0cb50d8130dc6b747765d2b8febc88a.camel@kernel.org/T/#t
Fixes: 51c0d4f7e3 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21 11:03:19 -04:00
Christoph Hellwig
0f41997b1b xfs: don't use __GFP_NOFAIL in xfs_init_fs_context
With enough debug options enabled, struct xfs_mount is larger
than 4k and thus NOFAIL allocations won't work for it.

xfs_init_fs_context is early in the mount process, and if we really
are out of memory there we'd better give up ASAP anyway.

Fixes: 7b77b46a61 ("xfs: use kmem functions for struct xfs_mount")
Reported-by: syzbot+359a67b608de1ef72f65@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21 11:32:50 +02:00
Christoph Hellwig
ca3d643a97 xfs: cache open zone in inode->i_private
The MRU cache for open zones is unfortunately still not ideal, as it can
time out pretty easily when doing heavy I/O to hard disks using up most
or all open zones.  One option would be to just increase the timeout,
but while looking into that I realized we're just better off caching it
indefinitely as there is no real downside to that once we don't hold a
reference to the cache open zone.

So switch the open zone to RCU freeing, and then stash the last used
open zone into inode->i_private.  This helps to significantly reduce
fragmentation by keeping I/O localized to zones for workloads that
write using many open files to HDD.

Fixes: 4e4d520755 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21 11:32:50 +02:00
Christoph Hellwig
a8c861f401 xfs: avoid busy loops in GCD
When GCD has no new work to handle, but read, write or reset commands
are outstanding, it currently busy loops, which is a bit suboptimal,
and can lead to softlockup warnings in case of stuck commands.

Change the code so that the task state is only set to running when work
is performed, which looks a bit tricky due to the design of the
reading/writing/resetting lists that contain both in-flight and finished
commands.

Fixes: 080d01c41d ("xfs: implement zoned garbage collection")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21 11:32:50 +02:00
Geert Uytterhoeven
f5caeb3689 xfs: XFS_ONLINE_SCRUB_STATS should depend on DEBUG_FS
Currently, XFS_ONLINE_SCRUB_STATS selects DEBUG_FS.  However, DEBUG_FS
is meant for debugging, and people may want to disable it on production
systems.  Since commit 0ff51a1fd7 ("xfs: enable online fsck by
default in Kconfig")), XFS_ONLINE_SCRUB_STATS is enabled by default,
forcing DEBUG_FS enabled too.

Fix this by replacing the selection of DEBUG_FS by a dependency on
DEBUG_FS, which is what most other options controlling the gathering and
exposing of statistics do.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21 09:52:59 +02:00
Damien Le Moal
b00bcb190e xfs: do not tightly pack-write large files
When using a zoned realtime device, tightly packing of data blocks
belonging to multiple closed files into the same realtime group (RTG)
is very efficient at improving write performance. This is especially
true with SMR HDDs as this can reduce, and even suppress, disk head
seeks.

However, such tight packing does not make sense for large files that
require at least a full RTG. If tight packing placement is applied for
such files, the VM writeback thread switching between inodes result in
the large files to be fragmented, thus increasing the garbage collection
penalty later when the RTG needs to be reclaimed.

This problem can be avoided with a simple heuristic: if the size of the
inode being written back is at least equal to the RTG size, do not use
tight-packing. Modify xfs_zoned_pack_tight() to always return false in
this case.

With this change, a multi-writer workload writing files of 256 MB on a
file system backed by an SMR HDD with 256 MB zone size as a realtime
device sees all files occupying exactly one RTG (i.e. one device zone),
thus completely removing the heavy fragmentation observed without this
change.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21 09:49:39 +02:00
Damien Le Moal
914f377075 xfs: Improve CONFIG_XFS_RT Kconfig help
Improve the description of the XFS_RT configuration option to document
that this option is required for zoned block devices.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21 09:48:31 +02:00
David Howells
5da6fb6356 cifs: Add a couple of missing smb3_rw_credits tracepoints
Add missing smb3_rw_credits tracepoints to cifs_readv_callback() (for SMB1)
to match those of SMB2/3.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-20 16:48:05 -05:00
Linus Torvalds
380cb5d353 Merge tag 'fsnotify_for_v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:

 - Stop-gap solution for a race between unmount of a filesystem with
   fsnotify marks and someone inspecting fdinfo of fsnotify group with
   those marks in procfs.

   A proper solution is in the works but it will get a while to settle.

 - Fix for non-decodable file handles (used by unprivileged apps using
   fanotify)

* tag 'fsnotify_for_v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fs/notify: call exportfs_encode_fid with s_umount
  expfs: Fix exportfs_can_encode_fh() for EXPORT_FH_FID
2025-10-20 09:35:13 -10:00
Babu Moger
19de7113bf x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode
The following NULL pointer dereference is encountered on mount of resctrl fs
after booting a system that supports assignable counters with the
"rdt=!mbmtotal,!mbmlocal" kernel parameters:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:mbm_cntr_get
  Call Trace:
  rdtgroup_assign_cntr_event
  rdtgroup_assign_cntrs
  rdt_get_tree

Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables
the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features
and the MBM events they represent. This results in the per-domain MBM event
related data structures to not be allocated during early initialization.

resctrl fs initialization follows by implicitly enabling both MBM total and
local events on a system that supports assignable counters (mbm_event mode),
but this enabling occurs after the per-domain data structures have been
created.

After booting, resctrl fs assumes that an enabled event can access all its
state. This results in NULL pointer dereference when resctrl attempts to
access the un-allocated structures of an enabled event.

Remove the late MBM event enabling from resctrl fs.

This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter
(mbm_event) mode is enabled without any events to support. Switching between
the "default" and "mbm_event" mode without any events is not practical.

Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and
X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that
supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL
or X86_FEATURE_CQM_MBM_LOCAL.

This ensures all needed MBM related data structures are created before use and
that it is only possible to switch between "default" and "mbm_event" mode when
the same events are available in both modes. This dependency does not exist in
the hardware but this usage of these feature settings work for known systems.

  [ bp: Massage commit message. ]

Fixes: 13390861b4 ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details")
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
2025-10-20 18:06:31 +02:00
Stefan Metzmacher
e607ef686a smb: client: allocate enough space for MR WRs and ib_drain_qp()
The IB_WR_REG_MR and IB_WR_LOCAL_INV operations for smbdirect_mr_io
structures should never fail because the submission or completion queues
are too small. So we allocate more send_wr depending on the (local) max
number of MRs.

While there also add additional space for ib_drain_qp().

This should make sure ib_post_send() will never fail
because the submission queue is full.

Fixes: f198186aa9 ("CIFS: SMBD: Establish SMB Direct connection")
Fixes: cc55f65dd3 ("smb: client: make use of common smbdirect_socket_parameters")
Cc: stable@vger.kernel.org
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-19 20:59:38 -05:00
Linus Torvalds
847f242f7a Merge tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fixes from Namjae Jeon:

 - Fix out-of-bounds in FS_IOC_SETFSLABEL

 - Add validation for stream entry size to prevent infinite loop

* tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix out-of-bounds in exfat_nls_to_ucs2()
  exfat: fix improper check of dentry.stream.valid_size
2025-10-18 07:23:59 -10:00
Linus Torvalds
2d07c6c209 Merge tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:

 - Fix for FlexFiles mirror->dss allocation

 - Apply delay_retrans to async operations

 - Check if suid/sgid is cleared after a write when needed

 - Fix setting the state renewal timer for early mounts after a reboot

* tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS4: Fix state renewals missing after boot
  NFS: check if suid/sgid was cleared after a write as needed
  NFS4: Apply delay_retrans to async operations
  NFSv4/flexfiles: fix to allocate mirror->dss before use
2025-10-18 07:18:48 -10:00
Linus Torvalds
4ccb3a8000 Merge tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "smb client fixes, security and smbdirect improvements, and some minor cleanup:

   - Important OOB DFS fix

   - Fix various potential tcon refcount leaks

   - smbdirect (RDMA) fixes (following up from test event a few weeks
     ago):

      - Fixes to improve and simplify handling of memory lifetime of
        smbdirect_mr_io structures, when a connection gets disconnected

      - Make sure we really wait to reach SMBDIRECT_SOCKET_DISCONNECTED
        before destroying resources

      - Make sure the send/recv submission/completion queues are large
        enough to avoid ib_post_send() from failing under pressure

   - convert cifs.ko to use the recommended crypto libraries (instead of
     crypto_shash), this also can improve performance

   - Three small cleanup patches"

* tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
  smb: client: Consolidate cmac(aes) shash allocation
  smb: client: Remove obsolete crypto_shash allocations
  smb: client: Use HMAC-MD5 library for NTLMv2
  smb: client: Use MD5 library for SMB1 signature calculation
  smb: client: Use MD5 library for M-F symlink hashing
  smb: client: Use HMAC-SHA256 library for SMB2 signature calculation
  smb: client: Use HMAC-SHA256 library for key generation
  smb: client: Use SHA-512 library for SMB3.1.1 preauth hash
  cifs: parse_dfs_referrals: prevent oob on malformed input
  smb: client: Fix refcount leak for cifs_sb_tlink
  smb: client: let smbd_destroy() wait for SMBDIRECT_SOCKET_DISCONNECTED
  smb: move some duplicate definitions to common/cifsglob.h
  smb: client: let destroy_mr_list() keep smbdirect_mr_io memory if registered
  smb: client: let destroy_mr_list() call ib_dereg_mr() before ib_dma_unmap_sg()
  smb: client: call ib_dma_unmap_sg if mr->sgt.nents is not 0
  smb: client: improve logic in smbd_deregister_mr()
  smb: client: improve logic in smbd_register_mr()
  smb: client: improve logic in allocate_mr_list()
  smb: client: let destroy_mr_list() remove locked from the list
  smb: client: let destroy_mr_list() call list_del(&mr->list)
  ...
2025-10-18 07:11:32 -10:00
Ting-Chang Hou
1fabe43b4e btrfs: send: fix duplicated rmdir operations when using extrefs
Commit 29d6d30f5c ("Btrfs: send, don't send rmdir for same target
multiple times") has fixed an issue that a send stream contained a rmdir
operation for the same directory multiple times. After that fix we keep
track of the last directory for which we sent a rmdir operation and
compare with it before sending a rmdir for the parent inode of a deleted
hardlink we are processing. But there is still a corner case that in
between rmdir dir operations for the same inode we find deleted hardlinks
for other parent inodes, so tracking just the last inode for which we sent
a rmdir operation is not enough.

Hardlinks of a file in the same directory are stored in the same INODE_REF
item, but if the number of hardlinks is too large and can not fit in a
leaf, we use INODE_EXTREF items to store them. The key of an INODE_EXTREF
item is (inode_id, INODE_EXTREF, hash[name, parent ino]), so between two
hardlinks for the same parent directory, we can find others for other
parent directories. For example for the reproducer below we get the
following (from a btrfs inspect-internal dump-tree output):

    item 0 key (259 INODE_EXTREF 2309449) itemoff 16257 itemsize 26
            index 6925 parent 257 namelen 8 name: foo.6923
    item 1 key (259 INODE_EXTREF 2311350) itemoff 16231 itemsize 26
            index 6588 parent 258 namelen 8 name: foo.6587
    item 2 key (259 INODE_EXTREF 2457395) itemoff 16205 itemsize 26
            index 6611 parent 257 namelen 8 name: foo.6609
    (...)

So tracking the last directory's inode number does not work in this case
since we process a link for parent inode 257, then for 258 and then back
again for 257, and that second time we process a deleted link for 257 we
think we have not yet sent a rmdir operation.

Fix this by using a rbtree to keep track of all the directories for which
we have already sent rmdir operations, and add those directories to the
'check_dirs' ref list in process_recorded_refs() only if the directory is
not yet in the rbtree, otherwise skip it since it means we have already
sent a rmdir operation for that directory.

The following test script reproduces the problem:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  mkdir $MNT/a $MNT/b

  echo 123 > $MNT/a/foo
  for ((i = 1; i <= 1000; i++)); do
     ln $MNT/a/foo $MNT/a/foo.$i
     ln $MNT/a/foo $MNT/b/foo.$i
  done

  btrfs subvolume snapshot -r $MNT $MNT/snap1
  btrfs send $MNT/snap1 -f /tmp/base.send

  rm -r $MNT/a $MNT/b

  btrfs subvolume snapshot -r $MNT $MNT/snap2
  btrfs send -p $MNT/snap1 $MNT/snap2 -f /tmp/incremental.send

  umount $MNT
  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  btrfs receive $MNT -f /tmp/base.send
  btrfs receive $MNT -f /tmp/incremental.send

  rm -f /tmp/base.send /tmp/incremental.send

  umount $MNT

When running it, it fails like this:

  $ ./test.sh
  (...)
  At subvol snap1
  At snapshot snap2
  ERROR: rmdir o257-9-0 failed: No such file or directory

CC: <stable@vger.kernel.org>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Ting-Chang Hou <tchou@synology.com>
[ Updated changelog ]
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-10-17 18:33:34 +02:00
Dewei Meng
17679ac6df btrfs: directly free partially initialized fs_info in btrfs_check_leaked_roots()
If fs_info->super_copy or fs_info->super_for_commit allocated failed in
btrfs_get_tree_subvol(), then no need to call btrfs_free_fs_info().
Otherwise btrfs_check_leaked_roots() would access NULL pointer because
fs_info->allocated_roots had not been initialised.

syzkaller reported the following information:
  ------------[ cut here ]------------
  BUG: unable to handle page fault for address: fffffffffffffbb0
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 64c9067 P4D 64c9067 PUD 64cb067 PMD 0
  Oops: Oops: 0000 [#1] SMP KASAN PTI
  CPU: 0 UID: 0 PID: 1402 Comm: syz.1.35 Not tainted 6.15.8 #4 PREEMPT(lazy)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), (...)
  RIP: 0010:arch_atomic_read arch/x86/include/asm/atomic.h:23 [inline]
  RIP: 0010:raw_atomic_read include/linux/atomic/atomic-arch-fallback.h:457 [inline]
  RIP: 0010:atomic_read include/linux/atomic/atomic-instrumented.h:33 [inline]
  RIP: 0010:refcount_read include/linux/refcount.h:170 [inline]
  RIP: 0010:btrfs_check_leaked_roots+0x18f/0x2c0 fs/btrfs/disk-io.c:1230
  [...]
  Call Trace:
   <TASK>
   btrfs_free_fs_info+0x310/0x410 fs/btrfs/disk-io.c:1280
   btrfs_get_tree_subvol+0x592/0x6b0 fs/btrfs/super.c:2029
   btrfs_get_tree+0x63/0x80 fs/btrfs/super.c:2097
   vfs_get_tree+0x98/0x320 fs/super.c:1759
   do_new_mount+0x357/0x660 fs/namespace.c:3899
   path_mount+0x716/0x19c0 fs/namespace.c:4226
   do_mount fs/namespace.c:4239 [inline]
   __do_sys_mount fs/namespace.c:4450 [inline]
   __se_sys_mount fs/namespace.c:4427 [inline]
   __x64_sys_mount+0x28c/0x310 fs/namespace.c:4427
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x92/0x180 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f032eaffa8d
  [...]

Fixes: 3bb17a25bc ("btrfs: add get_tree callback for new mount API")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Dewei Meng <mengdewei@cqsoftware.com.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-10-17 18:33:27 +02:00
Fernando Fernandez Mancera
c7fbb8218b sysfs: check visibility before changing group attribute ownership
Since commit 0c17270f9b ("net: sysfs: Implement is_visible for
phys_(port_id, port_name, switch_id)"), __dev_change_net_namespace() can
hit WARN_ON() when trying to change owner of a file that isn't visible.
See the trace below:

 WARNING: CPU: 6 PID: 2938 at net/core/dev.c:12410 __dev_change_net_namespace+0xb89/0xc30
 CPU: 6 UID: 0 PID: 2938 Comm: incusd Not tainted 6.17.1-1-mainline #1 PREEMPT(full)  4b783b4a638669fb644857f484487d17cb45ed1f
 Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.07 02/19/2025
 RIP: 0010:__dev_change_net_namespace+0xb89/0xc30
 [...]
 Call Trace:
  <TASK>
  ? if6_seq_show+0x30/0x50
  do_setlink.isra.0+0xc7/0x1270
  ? __nla_validate_parse+0x5c/0xcc0
  ? security_capable+0x94/0x1a0
  rtnl_newlink+0x858/0xc20
  ? update_curr+0x8e/0x1c0
  ? update_entity_lag+0x71/0x80
  ? sched_balance_newidle+0x358/0x450
  ? psi_task_switch+0x113/0x2a0
  ? __pfx_rtnl_newlink+0x10/0x10
  rtnetlink_rcv_msg+0x346/0x3e0
  ? sched_clock+0x10/0x30
  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
  netlink_rcv_skb+0x59/0x110
  netlink_unicast+0x285/0x3c0
  ? __alloc_skb+0xdb/0x1a0
  netlink_sendmsg+0x20d/0x430
  ____sys_sendmsg+0x39f/0x3d0
  ? import_iovec+0x2f/0x40
  ___sys_sendmsg+0x99/0xe0
  __sys_sendmsg+0x8a/0xf0
  do_syscall_64+0x81/0x970
  ? __sys_bind+0xe3/0x110
  ? syscall_exit_work+0x143/0x1b0
  ? do_syscall_64+0x244/0x970
  ? sock_alloc_file+0x63/0xc0
  ? syscall_exit_work+0x143/0x1b0
  ? do_syscall_64+0x244/0x970
  ? alloc_fd+0x12e/0x190
  ? put_unused_fd+0x2a/0x70
  ? do_sys_openat2+0xa2/0xe0
  ? syscall_exit_work+0x143/0x1b0
  ? do_syscall_64+0x244/0x970
  ? exc_page_fault+0x7e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 [...]
  </TASK>

Fix this by checking is_visible() before trying to touch the attribute.

Fixes: 303a42769c ("sysfs: add sysfs_group{s}_change_owner()")
Fixes: 0c17270f9b ("net: sysfs: Implement is_visible for phys_(port_id, port_name, switch_id)")
Reported-by: Cynthia <cynthia@kosmx.dev>
Closes: https://lore.kernel.org/netdev/01070199e22de7f8-28f711ab-d3f1-46d9-b9a0-048ab05eb09b-000000@eu-central-1.amazonses.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20251016101456.4087-1-fmancera@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-17 09:48:34 +02:00
Gao Xiang
a429b76114 erofs: fix crafted invalid cases for encoded extents
Robert recently reported two corrupted images that can cause system
crashes, which are related to the new encoded extents introduced
in Linux 6.15:

  - The first one [1] has plen != 0 (e.g. plen == 0x2000000) but
    (plen & Z_EROFS_EXTENT_PLEN_MASK) == 0. It is used to represent
    special extents such as sparse extents (!EROFS_MAP_MAPPED), but
    previously only plen == 0 was handled;

  - The second one [2] has pa 0xffffffffffdcffed and plen 0xb4000,
    then "cur [0xfffffffffffff000] += bvec.bv_len [0x1000]" in
    "} while ((cur += bvec.bv_len) < end);" wraps around, causing an
    out-of-bound access of pcl->compressed_bvecs[] in
    z_erofs_submit_queue().  EROFS only supports 48-bit physical block
    addresses (up to 1EiB for 4k blocks), so add a sanity check to
    enforce this.

Fixes: 1d191b4ca5 ("erofs: implement encoded extent metadata")
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/r/75022.1759355830@localhost  [1]
Closes: https://lore.kernel.org/r/80524.1760131149@localhost  [2]
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-10-17 15:21:36 +08:00
Linus Torvalds
98ac9cc4b4 Merge tag 'f2fs-fix-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:

 - fix soft lockupg caused by iput() added in bc986b1d75 ("fs: stop
   accessing ->i_count directly in f2fs and gfs2")

 - fix a wrong block address map on multiple devices

Link: https://lore.kernel.org/oe-lkp/202509301450.138b448f-lkp@intel.com [1]

* tag 'f2fs-fix-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: fix wrong block mapping for multi-devices
  f2fs: don't call iput() from f2fs_drop_inode()
2025-10-16 10:58:49 -07:00
Linus Torvalds
9f388a653c Merge tag 'for-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - in tree-checker fix extref bounds check

 - reorder send context structure to avoid
   -Wflex-array-member-not-at-end warning

 - fix extent readahead length for compressed extents

 - fix memory leaks on error paths (qgroup assign ioctl, zone loading
   with raid stripe tree enabled)

 - fix how device specific mount options are applied, in particular the
   'ssd' option will be set unexpectedly

 - fix tracking of relocation state when tasks are running and
   cancellation is attempted

 - adjust assertion condition for folios allocated for scrub

 - remove incorrect assertion checking for block group when populating
   free space tree

* tag 'for-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: send: fix -Wflex-array-member-not-at-end warning in struct send_ctx
  btrfs: tree-checker: fix bounds check in check_inode_extref()
  btrfs: fix memory leaks when rejecting a non SINGLE data profile without an RST
  btrfs: fix incorrect readahead expansion length
  btrfs: do not assert we found block group item when creating free space tree
  btrfs: do not use folio_test_partial_kmap() in ASSERT()s
  btrfs: only set the device specific options after devices are opened
  btrfs: fix memory leak on duplicated memory in the qgroup assign ioctl
  btrfs: fix clearing of BTRFS_FS_RELOC_RUNNING if relocation already running
2025-10-16 10:22:38 -07:00
Linus Torvalds
05de41f3e2 Merge tag 'v6.18-rc1-smb-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:

 - Fix RPC hang due to locking bug

 - Fix for memory leak in read and refcount leak (in session setup)

 - Minor cleanup

* tag 'v6.18-rc1-smb-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix recursive locking in RPC handle list access
  smb/server: fix possible refcount leak in smb2_sess_setup()
  smb/server: fix possible memory leak in smb2_read()
  smb: server: Use common error handling code in smb_direct_rdma_xmit()
2025-10-16 10:16:41 -07:00
Eric Biggers
3c15a6df61 smb: client: Consolidate cmac(aes) shash allocation
Now that smb3_crypto_shash_allocate() and smb311_crypto_shash_allocate()
are identical and only allocate "cmac(aes)", delete the latter and
replace the call to it with the former.

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eric Biggers
2c09630d09 smb: client: Remove obsolete crypto_shash allocations
Now that the SMB client accesses MD5, HMAC-MD5, HMAC-SHA256, and SHA-512
only via the library API and not via crypto_shash, allocating
crypto_shash objects for these algorithms is no longer necessary.
Remove all these allocations, their corresponding kconfig selections,
and their corresponding module soft dependencies.

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eric Biggers
395a77b030 smb: client: Use HMAC-MD5 library for NTLMv2
For the HMAC-MD5 computations in NTLMv2, use the HMAC-MD5 library
instead of a "hmac(md5)" crypto_shash.  This is simpler and faster.
With the library there's no need to allocate memory, no need to handle
errors, and the HMAC-MD5 code is accessed directly without inefficient
indirect calls and other unnecessary API overhead.

To preserve the existing behavior of NTLMv2 support being disabled when
the kernel is booted with "fips=1", make setup_ntlmv2_rsp() check
fips_enabled itself.  Previously it relied on the error from
cifs_alloc_hash("hmac(md5)", &hmacmd5).

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eric Biggers
c04e55b257 smb: client: Use MD5 library for SMB1 signature calculation
Convert cifs_calc_signature() to use the MD5 library instead of a "md5"
crypto_shash.  This is simpler and faster.  With the library there's no
need to allocate memory, no need to handle errors, and the MD5 code is
accessed directly without inefficient indirect calls and other
unnecessary API overhead.

To preserve the existing behavior of MD5 signature support being
disabled when the kernel is booted with "fips=1", make
cifs_calc_signature() check fips_enabled itself.  Previously it relied
on the error from cifs_alloc_hash("md5", &server->secmech.md5).

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eric Biggers
ae04b1bb06 smb: client: Use MD5 library for M-F symlink hashing
Convert parse_mf_symlink() and format_mf_symlink() to use the MD5
library instead of a "md5" crypto_shash.  This is simpler and faster.
With the library there's no need to allocate memory, no need to handle
errors, and the MD5 code is accessed directly without inefficient
indirect calls and other unnecessary API overhead.

This also fixes an issue where these functions did not work on kernels
booted in FIPS mode.  The use of MD5 here is for data integrity rather
than a security purpose, so it can use a non-FIPS-approved algorithm.

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eric Biggers
e05b3115e7 smb: client: Use HMAC-SHA256 library for SMB2 signature calculation
Convert smb2_calc_signature() to use the HMAC-SHA256 library instead of
a "hmac(sha256)" crypto_shash.  This is simpler and faster.  With the
library there's no need to allocate memory, no need to handle errors,
and the HMAC-SHA256 code is accessed directly without inefficient
indirect calls and other unnecessary API overhead.

To make this possible, make __cifs_calc_signature() support both the
HMAC-SHA256 library and crypto_shash.  (crypto_shash is still needed for
HMAC-MD5 and AES-CMAC.  A later commit will switch HMAC-MD5 from shash
to the library.  I'd like to eventually do the same for AES-CMAC, but it
doesn't have a library API yet.  So for now, shash is still needed.)

Also remove the unnecessary 'sigptr' variable.

For now smb3_crypto_shash_allocate() still allocates a "hmac(sha256)"
crypto_shash.  It will be removed in a later commit.

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eric Biggers
4b4c6fdb25 smb: client: Use HMAC-SHA256 library for key generation
Convert generate_key() to use the HMAC-SHA256 library instead of a
"hmac(sha256)" crypto_shash.  This is simpler and faster.  With the
library there's no need to allocate memory, no need to handle errors,
and the HMAC-SHA256 code is accessed directly without inefficient
indirect calls and other unnecessary API overhead.

Also remove the unnecessary 'hashptr' variable.

For now smb3_crypto_shash_allocate() still allocates a "hmac(sha256)"
crypto_shash.  It will be removed in a later commit.

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eric Biggers
af5fea5141 smb: client: Use SHA-512 library for SMB3.1.1 preauth hash
Convert smb311_update_preauth_hash() to use the SHA-512 library instead
of a "sha512" crypto_shash.  This is simpler and faster.  With the
library there's no need to allocate memory, no need to handle errors,
and the SHA-512 code is accessed directly without inefficient indirect
calls and other unnecessary API overhead.

Remove the call to smb311_crypto_shash_allocate() from
smb311_update_preauth_hash(), since it appears to have been needed only
to allocate the "sha512" crypto_shash.  (It also had the side effect of
allocating the "cmac(aes)" crypto_shash, but that's also done in
generate_key() which is where the AES-CMAC key is initialized.)

For now the "sha512" crypto_shash is still being allocated elsewhere.
It will be removed in a later commit.

Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Eugene Korenevsky
6447b0e355 cifs: parse_dfs_referrals: prevent oob on malformed input
Malicious SMB server can send invalid reply to FSCTL_DFS_GET_REFERRALS

- reply smaller than sizeof(struct get_dfs_referral_rsp)
- reply with number of referrals smaller than NumberOfReferrals in the
header

Processing of such replies will cause oob.

Return -EINVAL error on such replies to prevent oob-s.

Signed-off-by: Eugene Korenevsky <ekorenevsky@aliyun.com>
Cc: stable@vger.kernel.org
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:10:28 -05:00
Shuhao Fu
c2b77f4220 smb: client: Fix refcount leak for cifs_sb_tlink
Fix three refcount inconsistency issues related to `cifs_sb_tlink`.

Comments for `cifs_sb_tlink` state that `cifs_put_tlink()` needs to be
called after successful calls to `cifs_sb_tlink()`. Three calls fail to
update refcount accordingly, leading to possible resource leaks.

Fixes: 8ceb984379 ("CIFS: Move rename to ops struct")
Fixes: 2f1afe2599 ("cifs: Use smb 2 - 3 and cifsacl mount options getacl functions")
Fixes: 366ed846df ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function")
Cc: stable@vger.kernel.org
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 22:09:46 -05:00
Linus Torvalds
7ea30958b3 Merge tag 'vfs-6.18-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Handle inode number mismatches in nsfs file handles

 - Update the comment to init_file()

 - Add documentation link for EBADF in the rust file code

 - Skip read lock assertion for read-only filesystems when using dax

 - Don't leak disconnected dentries during umount

 - Fix new coredump input pattern validation

 - Handle ENOIOCTLCMD conversion in vfs_fileattr_{g,s}et() correctly

 - Remove redundant IOCB_DIO_CALLER_COMP clearing in overlayfs

* tag 'vfs-6.18-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ovl: remove redundant IOCB_DIO_CALLER_COMP clearing
  fs: return EOPNOTSUPP from file_setattr/file_getattr syscalls
  Revert "fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP"
  coredump: fix core_pattern input validation
  vfs: Don't leak disconnected dentries on umount
  dax: skip read lock assertion for read-only filesystems
  rust: file: add intra-doc link for 'EBADF'
  fs: update comment in init_file()
  nsfs: handle inode number mismatches gracefully in file handles
2025-10-15 15:12:58 -07:00
Deepanshu Kartikey
78a63493f8 ocfs2: clear extent cache after moving/defragmenting extents
The extent map cache can become stale when extents are moved or
defragmented, causing subsequent operations to see outdated extent flags. 
This triggers a BUG_ON in ocfs2_refcount_cal_cow_clusters().

The problem occurs when:
1. copy_file_range() creates a reflinked extent with OCFS2_EXT_REFCOUNTED
2. ioctl(FITRIM) triggers ocfs2_move_extents()
3. __ocfs2_move_extents_range() reads and caches the extent (flags=0x2)
4. ocfs2_move_extent()/ocfs2_defrag_extent() calls __ocfs2_move_extent()
   which clears OCFS2_EXT_REFCOUNTED flag on disk (flags=0x0)
5. The extent map cache is not invalidated after the move
6. Later write() operations read stale cached flags (0x2) but disk has
   updated flags (0x0), causing a mismatch
7. BUG_ON(!(rec->e_flags & OCFS2_EXT_REFCOUNTED)) triggers

Fix by clearing the extent map cache after each extent move/defrag
operation in __ocfs2_move_extents_range().  This ensures subsequent
operations read fresh extent data from disk.

Link: https://lore.kernel.org/all/20251009142917.517229-1-kartikey406@gmail.com/T/
Link: https://lkml.kernel.org/r/20251009154903.522339-1-kartikey406@gmail.com
Fixes: 53069d4e76 ("Ocfs2/move_extents: move/defrag extents within a certain range.")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reported-by: syzbot+6fdd8fa3380730a4b22c@syzkaller.appspotmail.com
Tested-by: syzbot+6fdd8fa3380730a4b22c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=2959889e1f6e216585ce522f7e8bc002b46ad9e7
Reviewed-by: Mark Fasheh <mark@fasheh.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-15 13:24:33 -07:00
Stefan Metzmacher
d451a0e88e smb: client: let smbd_destroy() wait for SMBDIRECT_SOCKET_DISCONNECTED
We should wait for the rdma_cm to become SMBDIRECT_SOCKET_DISCONNECTED,
it turns out that (at least running some xfstests e.g. cifs/001)
often triggers the case where wait_event_interruptible() returns
with -ERESTARTSYS instead of waiting for SMBDIRECT_SOCKET_DISCONNECTED
to be reached.

Or we are already in SMBDIRECT_SOCKET_DISCONNECTING and never wait
for SMBDIRECT_SOCKET_DISCONNECTED.

Fixes: 050b8c3740 ("smbd: Make upper layer decide when to destroy the transport")
Fixes: e8b3bfe9bc ("cifs: smbd: Don't destroy transport on RDMA disconnect")
Fixes: b0aa92a229 ("smb: client: make sure smbd_disconnect_rdma_work() doesn't run after smbd_destroy() took over")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 11:21:13 -05:00
Linus Torvalds
66f8e4df00 Merge tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:

 - Fix regression caused by removing CONFIG_EXT3_FS when testing some
   very old defconfigs

 - Avoid a BUG_ON when opening a file on a maliciously corrupted file
   system

 - Avoid mm warnings when freeing a very large orphan file metadata

 - Avoid a theoretical races between metadata writeback and checkpoints
   (it's very hard to hit in practice, since the race requires that the
   writeback take a very long time)

* tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs
  ext4: free orphan info with kvfree
  ext4: detect invalid INLINE_DATA + EXTENTS flag combination
  ext4, doc: fix and improve directory hash tree description
  ext4: wait for ongoing I/O to complete before freeing blocks
  jbd2: ensure that all ongoing I/O complete before freeing blocks
2025-10-15 07:51:57 -07:00
ZhangGuoDong
d877470b59 smb: move some duplicate definitions to common/cifsglob.h
In order to maintain the code more easily, move duplicate definitions to
new common header file.

Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 07:44:17 -05:00
Stefan Metzmacher
b0432201a1 smb: client: let destroy_mr_list() keep smbdirect_mr_io memory if registered
If a smbdirect_mr_io structure if still visible to callers of
smbd_register_mr() we can't free the related memory when the
connection is disconnected! Otherwise smbd_deregister_mr()
will crash.

Now we use a mutex and refcounting in order to keep the
memory around if the connection is disconnected.

It means smbd_deregister_mr() can be called at any later time to free
the memory, which is no longer referenced by nor referencing the
connection.

It also means smbd_destroy() no longer needs to wait for
mr_io.used.count to become 0.

Fixes: 050b8c3740 ("smbd: Make upper layer decide when to destroy the transport")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 07:44:17 -05:00
Marios Makassikis
88f170814f ksmbd: fix recursive locking in RPC handle list access
Since commit 305853cce3 ("ksmbd: Fix race condition in RPC handle list
access"), ksmbd_session_rpc_method() attempts to lock sess->rpc_lock.

This causes hung connections / tasks when a client attempts to open
a named pipe. Using Samba's rpcclient tool:

 $ rpcclient //192.168.1.254 -U user%password
 $ rpcclient $> srvinfo
 <connection hung here>

Kernel side:
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:kworker/0:0 state:D stack:0 pid:5021 tgid:5021 ppid:2 flags:0x00200000
  Workqueue: ksmbd-io handle_ksmbd_work
  Call trace:
  __schedule from schedule+0x3c/0x58
  schedule from schedule_preempt_disabled+0xc/0x10
  schedule_preempt_disabled from rwsem_down_read_slowpath+0x1b0/0x1d8
  rwsem_down_read_slowpath from down_read+0x28/0x30
  down_read from ksmbd_session_rpc_method+0x18/0x3c
  ksmbd_session_rpc_method from ksmbd_rpc_open+0x34/0x68
  ksmbd_rpc_open from ksmbd_session_rpc_open+0x194/0x228
  ksmbd_session_rpc_open from create_smb2_pipe+0x8c/0x2c8
  create_smb2_pipe from smb2_open+0x10c/0x27ac
  smb2_open from handle_ksmbd_work+0x238/0x3dc
  handle_ksmbd_work from process_scheduled_works+0x160/0x25c
  process_scheduled_works from worker_thread+0x16c/0x1e8
  worker_thread from kthread+0xa8/0xb8
  kthread from ret_from_fork+0x14/0x38
  Exception stack(0x8529ffb0 to 0x8529fff8)

The task deadlocks because the lock is already held:
  ksmbd_session_rpc_open
    down_write(&sess->rpc_lock)
    ksmbd_rpc_open
      ksmbd_session_rpc_method
        down_read(&sess->rpc_lock)   <-- deadlock

Adjust ksmbd_session_rpc_method() callers to take the lock when necessary.

Fixes: 305853cce3 ("ksmbd: Fix race condition in RPC handle list access")
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 06:03:09 -05:00
ZhangGuoDong
379510a815 smb/server: fix possible refcount leak in smb2_sess_setup()
Reference count of ksmbd_session will leak when session need reconnect.
Fix this by adding the missing ksmbd_user_session_put().

Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 06:03:09 -05:00
ZhangGuoDong
6fced056d2 smb/server: fix possible memory leak in smb2_read()
Memory leak occurs when ksmbd_vfs_read() fails.
Fix this by adding the missing kvfree().

Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-15 06:03:09 -05:00
Jeongjun Park
2d8636119b exfat: fix out-of-bounds in exfat_nls_to_ucs2()
Since the len argument value passed to exfat_ioctl_set_volume_label()
from exfat_nls_to_utf16() is passed 1 too large, an out-of-bounds read
occurs when dereferencing p_cstring in exfat_nls_to_ucs2() later.

And because of the NLS_NAME_OVERLEN macro, another error occurs when
creating a file with a period at the end using utf8 and other iocharsets.

So to avoid this, you should remove the code that uses NLS_NAME_OVERLEN
macro and make the len argument value be the length of the label string,
but with a maximum length of FSLABEL_MAX - 1.

Reported-by: syzbot+98cc76a76de46b3714d4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=98cc76a76de46b3714d4
Fixes: d01579d590 ("exfat: Add support for FS_IOC_{GET,SET}FSLABEL")
Suggested-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-10-15 17:53:20 +09:00
Jaehun Gou
82ebecdc74 exfat: fix improper check of dentry.stream.valid_size
We found an infinite loop bug in the exFAT file system that can lead to a
Denial-of-Service (DoS) condition. When a dentry in an exFAT filesystem is
malformed, the following system calls — SYS_openat, SYS_ftruncate, and
SYS_pwrite64 — can cause the kernel to hang.

Root cause analysis shows that the size validation code in exfat_find()
does not check whether dentry.stream.valid_size is negative. As a result,
the system calls mentioned above can succeed and eventually trigger the DoS
issue.

This patch adds a check for negative dentry.stream.valid_size to prevent
this vulnerability.

Co-developed-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Co-developed-by: Jihoon Kwon <jimmyxyz010315@gmail.com>
Signed-off-by: Jihoon Kwon <jimmyxyz010315@gmail.com>
Signed-off-by: Jaehun Gou <p22gone@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-10-15 14:37:21 +09:00
Linus Torvalds
9b332cece9 Merge tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:

 - Fix a crasher reported by rtm@csail.mit.edu

* tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Define a proc_layoutcommit for the FlexFiles layout type
2025-10-14 09:28:12 -07:00
Jaegeuk Kim
9d5c4f5c7a f2fs: fix wrong block mapping for multi-devices
Assuming the disk layout as below,

disk0: 0            --- 0x00035abfff
disk1: 0x00035ac000 --- 0x00037abfff
disk2: 0x00037ac000 --- 0x00037ebfff

and we want to read data from offset=13568 having len=128 across the block
devices, we can illustrate the block addresses like below.

0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 -------
          |          ^            ^                                ^
          |   fofs   0            13568                            13568+128
          |       ------------------------------------------------------
          |   LBA    0x37e8aa9    0x37ebfa9                        0x37ec029
          --- map    0x3caa9      0x3ffa9

In this example, we should give the relative map of the target block device
ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by
0x37ebfff + 1 - 0x37ebfa9.

In the below equation, however, map->m_pblk was supposed to be the original
address instead of the one from the target block address.

 - map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk);

Cc: stable@vger.kernel.org
Fixes: 71f2c82062 ("f2fs: multidevice: support direct IO")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-10-13 23:55:44 +00:00