Commit Graph

102029 Commits

Author SHA1 Message Date
Al Viro
12cdd1af7a do_change_type(): use guards
clean fit; namespace_excl to modify propagation graph

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-02 19:35:56 -04:00
Al Viro
4151c3cc58 __is_local_mountpoint(): use guards
clean fit; namespace_shared due to iterating through ns->mounts.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-02 19:35:56 -04:00
Al Viro
902e990467 __detach_mounts(): use guards
Clean fit for guards use; guards can't be weaker due to umount_tree() calls.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-02 19:35:56 -04:00
Al Viro
547af12dcd fs/namespace.c: allow to drop vfsmount references via __free(mntput)
Note that just as path_put, it should never be done in scope of
namespace_sem, be it shared or exclusive.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-02 19:35:56 -04:00
Al Viro
d154f18575 introduced guards for mount_lock
mount_writer: write_seqlock; that's an equivalent of {un,}lock_mount_hash()
mount_locked_reader: read_seqlock_excl; these tend to be open-coded.

No bulk conversions, please - if nothing else, quite a few places take
use mount_writer form when mount_locked_reader is sufficent.  It needs
to be dealt with carefully.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-02 19:35:56 -04:00
Al Viro
360600f8ec fs/namespace.c: fix the namespace_sem guard mess
If anything, namespace_lock should be DEFINE_LOCK_GUARD_0, not DEFINE_GUARD.
That way we
	* do not need to feed it a bogus argument
	* do not get gcc trying to compare an address of static in
file variable with -4097 - and, if we are unlucky, trying to keep
it in a register, with spills and all such.

The same problems apply to grabbing namespace_sem shared.

Rename it to namespace_excl, add namespace_shared, convert the existing users:

    guard(namespace_lock, &namespace_sem) => guard(namespace_excl)()
    guard(rwsem_read, &namespace_sem) => guard(namespace_shared)()
    scoped_guard(namespace_lock, &namespace_sem) => scoped_guard(namespace_excl)
    scoped_guard(rwsem_read, &namespace_sem) => scoped_guard(namespace_shared)

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-02 19:35:56 -04:00
Linus Torvalds
8026aed072 Merge tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "17 hotfixes. 13 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 11 of these
  fixes are for MM.

  This includes a three-patch series from Harry Yoo which fixes an
  intermittent boot failure which can occur on x86 systems. And a
  two-patch series from Alexander Gordeev which fixes a KASAN crash on
  S390 systems"

* tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: fix possible deadlock in kmemleak
  x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
  mm: introduce and use {pgd,p4d}_populate_kernel()
  mm: move page table sync declarations to linux/pgtable.h
  proc: fix missing pde_set_flags() for net proc files
  mm: fix accounting of memmap pages
  mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota()
  kexec: add KEXEC_FILE_NO_CMA as a legal flag
  kasan: fix GCC mem-intrinsic prefix with sw tags
  mm/kasan: avoid lazy MMU mode hazards
  mm/kasan: fix vmalloc shadow memory (de-)population races
  kunit: kasan_test: disable fortify string checker on kasan_strings() test
  selftests/mm: fix FORCE_READ to read input value correctly
  mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
  ocfs2: prevent release journal inode after journal shutdown
  rust: mm: mark VmaNew as transparent
  of_numa: fix uninitialized memory nodes causing kernel panic
2025-09-02 13:18:00 -07:00
Jaegeuk Kim
c872b6279c f2fs: allocate HOT_DATA for IPU writes
Let's split IPU writes in hot data area to improve the GC efficiency.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-09-02 20:16:04 +00:00
Linus Torvalds
e3c94a539e Merge tag 'for-6.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fix a few races related to inode link count

 - fix inode leak on failure to add link to inode

 - move transaction aborts closer to where they happen

* tag 'for-6.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: avoid load/store tearing races when checking if an inode was logged
  btrfs: fix race between setting last_dir_index_offset and inode logging
  btrfs: fix race between logging inode and checking if it was logged before
  btrfs: simplify error handling logic for btrfs_link()
  btrfs: fix inode leak on failure to add link to inode
  btrfs: abort transaction on failure to add link to inode
2025-09-02 13:13:22 -07:00
Omar Sandoval
f6a6c28005 btrfs: fix subvolume deletion lockup caused by inodes xarray race
There is a race condition between inode eviction and inode caching that
can cause a live struct btrfs_inode to be missing from the root->inodes
xarray. Specifically, there is a window during evict() between the inode
being unhashed and deleted from the xarray. If btrfs_iget() is called
for the same inode in that window, it will be recreated and inserted
into the xarray, but then eviction will delete the new entry, leaving
nothing in the xarray:

Thread 1                          Thread 2
---------------------------------------------------------------
evict()
  remove_inode_hash()
                                  btrfs_iget_path()
                                    btrfs_iget_locked()
                                    btrfs_read_locked_inode()
                                      btrfs_add_inode_to_root()
  destroy_inode()
    btrfs_destroy_inode()
      btrfs_del_inode_from_root()
        __xa_erase

In turn, this can cause issues for subvolume deletion. Specifically, if
an inode is in this lost state, and all other inodes are evicted, then
btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely.
If the lost inode has a delayed_node attached to it, then when
btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(),
it will loop forever because the delayed_nodes xarray will never become
empty (unless memory pressure forces the inode out). We saw this
manifest as soft lockups in production.

Fix it by only deleting the xarray entry if it matches the given inode
(using __xa_cmpxchg()).

Fixes: 310b2f5d5a ("btrfs: use an xarray to track open inodes in a root")
Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Co-authored-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02 20:45:30 +02:00
Qu Wenruo
9786531399 btrfs: fix corruption reading compressed range when block size is smaller than page size
[BUG]
With 64K page size (aarch64 with 64K page size config) and 4K btrfs
block size, the following workload can easily lead to a corrupted read:

        mkfs.btrfs -f -s 4k $dev > /dev/null
        mount -o compress $dev $mnt
        xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null
	echo "correct result:"
        od -Ad -t x1 $mnt/base
        xfs_io -f -c "reflink $mnt/base 32k 0 32k" \
		  -c "reflink $mnt/base 0 32k 32k" \
		  -c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null
	echo "incorrect result:"
        od -Ad -t x1 $mnt/new
        umount $mnt

This shows the following result:

correct result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
incorrect result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536

Notice the zero in the range [32K, 60K), which is incorrect.

[CAUSE]
With extra trace printk, it shows the following events during od:
(some unrelated info removed like CPU and context)

 od-3457   btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000

The "r/i" is indicating the root and inode number. In our case the file
"new" is using ino 258 from fs tree (root 5).

Here notice the @prev_em_start pointer is NULL. This means the
btrfs_do_readpage() is called from btrfs_read_folio(), not from
btrfs_readahead().

 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768

These above 32K blocks will be read from the first half of the
compressed data extent.

 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768

Note here there is no btrfs_submit_compressed_read() call. Which is
incorrect now.
Although both extent maps at 0 and 32K are pointing to the same compressed
data, their offsets are different thus can not be merged into the same
read.

So this means the compressed data read merge check is doing something
wrong.

 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate
 od-3457   btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440

The function btrfs_submit_compressed_read() is only called at the end of
folio read. The compressed bio will only have an extent map of range [0,
32K), but the original bio passed in is for the whole 64K folio.

This will cause the decompression part to only fill the first 32K,
leaving the rest untouched (aka, filled with zero).

This incorrect compressed read merge leads to the above data corruption.

There were similar problems that happened in the past, commit 808f80b467
("Btrfs: update fix for read corruption of compressed and shared
extents") is doing pretty much the same fix for readahead.

But that's back to 2015, where btrfs still only supports bs (block size)
== ps (page size) cases.
This means btrfs_do_readpage() only needs to handle a folio which
contains exactly one block.

Only btrfs_readahead() can lead to a read covering multiple blocks.
Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer.

With v5.15 kernel btrfs introduced bs < ps support. This breaks the above
assumption that a folio can only contain one block.

Now btrfs_read_folio() can also read multiple blocks in one go.
But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the
existing bio force submission check will never be triggered.

In theory, this can also happen for btrfs with large folios, but since
large folio is still experimental, we don't need to bother it, thus only
bs < ps support is affected for now.

[FIX]
Instead of passing @prev_em_start to do the proper compressed extent
check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that
the existing bio force submission logic will always be triggered.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02 20:45:25 +02:00
Calvin Owens
6db1df415d btrfs: accept and ignore compression level for lzo
The compression level is meaningless for lzo, but before commit
3f093ccb95 ("btrfs: harden parsing of compression mount options"),
it was silently ignored if passed.

After that commit, passing a level with lzo fails to mount:

    BTRFS error: unrecognized compression value lzo:1

It seems reasonable for users to expect that lzo would permit a numeric
level option, as all the other algos do, even though the kernel's
implementation of LZO currently only supports a single level. Because it
has always worked to pass a level, it seems likely to me that users in
the real world are relying on doing so.

This patch restores the old behavior, giving "lzo:N" the same semantics
as all of the other compression algos.

To be clear, silly variants like "lzo:one", "lzo:the_first_option", or
"lzo:armageddon" also used to work. This isn't meant to suggest that
any possible mis-interpretation of mount options that once worked must
continue to work forever. This is an exceptional case where it makes
sense to preserve compatibility, both because the mis-interpretation is
reasonable, and because nothing tangible is sacrificed.

Finally update btrfs_show_options() to ignore the level of LZO, as it
is only the default level without any extra meaning.

Fixes: 3f093ccb95 ("btrfs: harden parsing of compression mount options")
Reviewed-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02 20:45:19 +02:00
Boris Burkov
de134cb54c btrfs: fix squota compressed stats leak
The following workload on a squota enabled fs:

  btrfs subvol create mnt/subvol

  # ensure subvol extents get accounted
  sync
  btrfs qgroup create 1/1 mnt
  btrfs qgroup assign mnt/subvol 1/1 mnt
  btrfs qgroup delete mnt/subvol

  # make the cleaner thread run
  btrfs filesystem sync mnt
  sleep 1
  btrfs filesystem sync mnt
  btrfs qgroup destroy 1/1 mnt

will fail with EBUSY. The reason is that 1/1 does the quick accounting
when we assign subvol to it, gaining its exclusive usage as excl and
excl_cmpr. But then when we delete subvol, the decrement happens via
record_squota_delta() which does not update excl_cmpr, as squotas does
not make any distinction between compressed and normal extents. Thus,
we increment excl_cmpr but never decrement it, and are unable to delete
1/1. The two possible fixes are to make squota always mirror excl and
excl_cmpr or to make the fast accounting separately track the plain and
cmpr numbers. The latter felt cleaner to me so that is what I opted for.

Fixes: 1e0e9d5771 ("btrfs: add helper for recording simple quota deltas")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-02 20:45:15 +02:00
Xichao Zhao
6746c36c94 fsnotify: fix "rewriten"->"rewritten"
Trivial fix to spelling mistake in comment text.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Link: https://patch.msgid.link/20250808084213.230592-1-zhao.xichao@vivo.com
Signed-off-by: Jan Kara <jack@suse.cz>
2025-09-02 14:37:33 +02:00
Aleksa Sarai
fe49652e36 procfs: add "pidns" mount option
Since the introduction of pid namespaces, their interaction with procfs
has been entirely implicit in ways that require a lot of dancing around
by programs that need to construct sandboxes with different PID
namespaces.

Being able to explicitly specify the pid namespace to use when
constructing a procfs super block will allow programs to no longer need
to fork off a process which does then does unshare(2) / setns(2) and
forks again in order to construct a procfs in a pidns.

So, provide a "pidns" mount option which allows such users to just
explicitly state which pid namespace they want that procfs instance to
use. This interface can be used with fsconfig(2) either with a file
descriptor or a path:

  fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
  fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

or with classic mount(2) / mount(8):

  // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
  mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

As this new API is effectively shorthand for setns(2) followed by
mount(2), the permission model for this mirrors pidns_install() to avoid
opening up new attack surfaces by loosening the existing permission
model.

In order to avoid having to RCU-protect all users of proc_pid_ns() (to
avoid UAFs), attempting to reconfigure an existing procfs instance's pid
namespace will error out with -EBUSY. Creating new procfs instances is
quite cheap, so this should not be an impediment to most users, and lets
us avoid a lot of churn in fs/proc/* for a feature that it seems
unlikely userspace would use.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/20250805-procfs-pidns-api-v4-2-705f984940e7@cyphar.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02 11:37:24 +02:00
Joanne Koong
02d47e213d fuse: remove fuse_readpages_end() null mapping check
Remove extra logic in fuse_readpages_end() that checks against null
folio mappings. This was added in commit ce534fb052 ("fuse: allow
splice to move pages"):

"Since the remove_from_page_cache() + add_to_page_cache_locked()
are non-atomic it is possible that the page cache is repopulated in
between the two and add_to_page_cache_locked() will fail.  This
could be fixed by creating a new atomic replace_page_cache_page()
function.

fuse_readpages_end() needed to be reworked so it works even if
page->mapping is NULL for some or all pages which can happen if the
add_to_page_cache_locked() failed."

Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added
atomic page cache replacement, which means the check against null
mappings can be removed.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-09-02 11:14:15 +02:00
Miklos Szeredi
b3c7ab1d25 fuse: fix references to fuse.rst -> fuse/fuse.rst
Commit 6be0ddb202 ("Documentation: fuse: Consolidate FUSE docs into its
own subdirectory") moved fuse docs to a subdirectory but didn't update
references inside the kernel tree.

Fixes: 6be0ddb202 ("Documentation: fuse: Consolidate FUSE docs into its own subdirectory")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508261621.EaNMWVjm-lkp@intel.com/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-09-02 11:14:15 +02:00
Miklos Szeredi
dfb84c3307 fuse: allow synchronous FUSE_INIT
FUSE_INIT has always been asynchronous with mount.  That means that the
server processed this request after the mount syscall returned.

This means that FUSE_INIT can't supply the root inode's ID, hence it
currently has a hardcoded value.  There are other limitations such as not
being able to perform getxattr during mount, which is needed by selinux.

To remove these limitations allow server to process FUSE_INIT while
initializing the in-core super block for the fuse filesystem.  This can
only be done if the server is prepared to handle this, so add
FUSE_DEV_IOC_SYNC_INIT ioctl, which

 a) lets the server know whether this feature is supported, returning
 ENOTTY othewrwise.

 b) lets the kernel know to perform a synchronous initialization

The implementation is slightly tricky, since fuse_dev/fuse_conn are set up
only during super block creation.  This is solved by setting the private
data of the fuse device file to a special value ((struct fuse_dev *) 1) and
waiting for this to be turned into a proper fuse_dev before commecing with
operations on the device file.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-09-02 11:14:15 +02:00
Askar Safin
042a60680d openat2: don't trigger automounts with RESOLVE_NO_XDEV
openat2 had a bug: if we pass RESOLVE_NO_XDEV, then openat2
doesn't traverse through automounts, but may still trigger them.
(See the link for full bug report with reproducer.)

This commit fixes this bug.

Link: https://lore.kernel.org/linux-fsdevel/20250817075252.4137628-1-safinaskar@zohomail.com/
Fixes: fddb5d430a ("open: introduce openat2(2) syscall")
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: stable@vger.kernel.org
Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-5-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02 10:40:43 +02:00
Askar Safin
8ded1fde08 namei: move cross-device check to __traverse_mounts
This is preparation to RESOLVE_NO_XDEV fix in following commits.
Also this commit makes LOOKUP_NO_XDEV logic more clear: now we
immediately fail with EXDEV on first mount crossing
instead of waiting for very end.

No functional change intended

Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-4-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02 10:40:43 +02:00
Askar Safin
8b966d00b3 namei: remove LOOKUP_NO_XDEV check from handle_mounts
This is preparation to RESOLVE_NO_XDEV fix in following commits.
No functional change intended.

The only place that ever looks at
ND_JUMPED in nd->state is complete_walk()
and we are not going to reach
it if handle_mounts() returns an error

Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-3-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02 10:40:42 +02:00
Askar Safin
11c2b7ec2e namei: move cross-device check to traverse_mounts
This is preparation to RESOLVE_NO_XDEV fix in following commits.
No functional change intended

Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/20250825181233.2464822-2-safinaskar@zohomail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-02 10:40:42 +02:00
Simon Schuster
edd3cb05c0 copy_process: pass clone_flags as u64 across calltree
With the introduction of clone3 in commit 7f192e3cd3 ("fork: add
clone3") the effective bit width of clone_flags on all architectures was
increased from 32-bit to 64-bit, with a new type of u64 for the flags.
However, for most consumers of clone_flags the interface was not
changed from the previous type of unsigned long.

While this works fine as long as none of the new 64-bit flag bits
(CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still
undesirable in terms of the principle of least surprise.

Thus, this commit fixes all relevant interfaces of callees to
sys_clone3/copy_process (excluding the architecture-specific
copy_thread) to consistently pass clone_flags as u64, so that
no truncation to 32-bit integers occurs on 32-bit architectures.

Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com>
Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 15:31:34 +02:00
Tetsuo Handa
7f9d34b0a7 cramfs: Verify inode mode when loading from disk
The inode mode loaded from corrupted disk can be invalid. Do like what
commit 0a9e740513 ("isofs: Verify inode mode when loading from disk")
does.

Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/429b3ef1-13de-4310-9a8e-c2dc9a36234a@I-love.SAKURA.ne.jp
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 13:13:10 +02:00
Greg Kroah-Hartman
e5bca063c1 fs: remove vfs_ioctl export
vfs_ioctl() is no longer called by anything outside of fs/ioctl.c, so
remove the global symbol and export as it is not needed.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/2025083038-carving-amuck-a4ae@gregkh
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 13:08:01 +02:00
Christian Brauner
e23654f5b1 Merge tag 'fuse-fixes-6.17-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into vfs.fixes
fuse fixes for 6.17-rc5

* tag 'fuse-fixes-6.17-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (6 commits)
  fuse: Block access to folio overlimit
  fuse: fix fuseblk i_blkbits for iomap partial writes
  fuse: reflect cached blocksize if blocksize was changed
  fuse: prevent overflow in copy_file_range return value
  fuse: check if copy_file_range() returns larger than requested size
  fuse: do not allow mapping a non-regular backing file

Link: https://lore.kernel.org/CAJfpeguEVMMyw_zCb+hbOuSxdE2Z3Raw=SJsq=Y56Ae6dn2W3g@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 12:48:28 +02:00
Christian Brauner
90ccf10de5 inode: fix whitespace issues
Fix two minor whitespace issues.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 12:41:09 +02:00
Josef Bacik
37b27bd5d6 fs: add an icount_read helper
Instead of doing direct access to ->i_count, add a helper to handle
this. This will make it easier to convert i_count to a refcount later.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/9bc62a84c6b9d6337781203f60837bd98fbc4a96.1756222464.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 12:41:09 +02:00
Josef Bacik
9e70e985bd fs: rework iput logic
Currently, if we are the last iput, and we have the I_DIRTY_TIME bit
set, we will grab a reference on the inode again and then mark it dirty
and then redo the put.  This is to make sure we delay the time update
for as long as possible.

We can rework this logic to simply dec i_count if it is not 1, and if it
is do the time update while still holding the i_count reference.

Then we can replace the atomic_dec_and_lock with locking the ->i_lock
and doing atomic_dec_and_test, since we did the atomic_add_unless above.

Co-developed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/be208b89bdb650202e712ce2bcfc407ac7044c7a.1756222464.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 12:38:04 +02:00
Viacheslav Dubeyko
18b07c44f2 hfs: clear offset and space out of valid records in b-tree node
Currently, hfs_brec_remove() executes moving records
towards the location of deleted record and it updates
offsets of moved records. However, the hfs_brec_remove()
logic ignores the "mess" of b-tree node's free space and
it doesn't touch the offsets out of records number.
Potentially, it could confuse fsck or driver logic or
to be a reason of potential corruption cases.

This patch reworks the logic of hfs_brec_remove()
by means of clearing freed space of b-tree node
after the records moving. And it clear the last
offset that keeping old location of free space
because now the offset before this one is keeping
the actual offset to the free space after the record
deletion.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250815194918.38165-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:16:00 -07:00
Viacheslav Dubeyko
a06ec283e1 hfs: add logic of correcting a next unused CNID
The generic/736 xfstest fails for HFS case:

BEGIN TEST default (1 test): hfs Mon May 5 03:18:32 UTC 2025
DEVICE: /dev/vdb
HFS_MKFS_OPTIONS:
MOUNT_OPTIONS: MOUNT_OPTIONS
FSTYP -- hfs
PLATFORM -- Linux/x86_64 kvm-xfstests 6.15.0-rc4-xfstests-g00b827f0cffa #1 SMP PREEMPT_DYNAMIC Fri May 25
MKFS_OPTIONS -- /dev/vdc
MOUNT_OPTIONS -- /dev/vdc /vdc

generic/736 [03:18:33][ 3.510255] run fstests generic/736 at 2025-05-05 03:18:33
_check_generic_filesystem: filesystem on /dev/vdb is inconsistent
(see /results/hfs/results-default/generic/736.full for details)
Ran: generic/736
Failures: generic/736
Failed 1 of 1 tests

The HFS volume becomes corrupted after the test run:

sudo fsck.hfs -d /dev/loop50
** /dev/loop50
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking HFS volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking catalog hierarchy.
** Checking volume bitmap.
** Checking volume information.
invalid MDB drNxtCNID
Master Directory Block needs minor repair
(1, 0)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000000
** Repairing volume.
** Rechecking volume.
** Checking HFS volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking catalog hierarchy.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.

The main reason of the issue is the absence of logic that
corrects mdb->drNxtCNID/HFS_SB(sb)->next_id (next unused
CNID) after deleting a record in Catalog File. This patch
introduces a hfs_correct_next_unused_CNID() method that
implements the necessary logic. In the case of Catalog File's
record delete operation, the function logic checks that
(deleted_CNID + 1) == next_unused_CNID and it finds/sets the new
value of next_unused_CNID.

sudo ./check generic/736
FSTYP -- hfs
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.15.0+ #6 SMP PREEMPT_DYNAMIC Tue Jun 10 15:02:48 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/736 33s
Ran: generic/736
Passed all 1 tests

sudo fsck.hfs -d /dev/loop50
** /dev/loop50
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking HFS volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking catalog hierarchy.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled appears to be OK

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250610231609.551930-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:15:21 -07:00
Viacheslav Dubeyko
9b3d15a758 hfsplus: fix KMSAN uninit-value issue in hfsplus_delete_cat()
The syzbot reported issue in hfsplus_delete_cat():

[   70.682285][ T9333] =====================================================
[   70.682943][ T9333] BUG: KMSAN: uninit-value in hfsplus_subfolders_dec+0x1d7/0x220
[   70.683640][ T9333]  hfsplus_subfolders_dec+0x1d7/0x220
[   70.684141][ T9333]  hfsplus_delete_cat+0x105d/0x12b0
[   70.684621][ T9333]  hfsplus_rmdir+0x13d/0x310
[   70.685048][ T9333]  vfs_rmdir+0x5ba/0x810
[   70.685447][ T9333]  do_rmdir+0x964/0xea0
[   70.685833][ T9333]  __x64_sys_rmdir+0x71/0xb0
[   70.686260][ T9333]  x64_sys_call+0xcd8/0x3cf0
[   70.686695][ T9333]  do_syscall_64+0xd9/0x1d0
[   70.687119][ T9333]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.687646][ T9333]
[   70.687856][ T9333] Uninit was stored to memory at:
[   70.688311][ T9333]  hfsplus_subfolders_inc+0x1c2/0x1d0
[   70.688779][ T9333]  hfsplus_create_cat+0x148e/0x1800
[   70.689231][ T9333]  hfsplus_mknod+0x27f/0x600
[   70.689730][ T9333]  hfsplus_mkdir+0x5a/0x70
[   70.690146][ T9333]  vfs_mkdir+0x483/0x7a0
[   70.690545][ T9333]  do_mkdirat+0x3f2/0xd30
[   70.690944][ T9333]  __x64_sys_mkdir+0x9a/0xf0
[   70.691380][ T9333]  x64_sys_call+0x2f89/0x3cf0
[   70.691816][ T9333]  do_syscall_64+0xd9/0x1d0
[   70.692229][ T9333]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.692773][ T9333]
[   70.692990][ T9333] Uninit was stored to memory at:
[   70.693469][ T9333]  hfsplus_subfolders_inc+0x1c2/0x1d0
[   70.693960][ T9333]  hfsplus_create_cat+0x148e/0x1800
[   70.694438][ T9333]  hfsplus_fill_super+0x21c1/0x2700
[   70.694911][ T9333]  mount_bdev+0x37b/0x530
[   70.695320][ T9333]  hfsplus_mount+0x4d/0x60
[   70.695729][ T9333]  legacy_get_tree+0x113/0x2c0
[   70.696167][ T9333]  vfs_get_tree+0xb3/0x5c0
[   70.696588][ T9333]  do_new_mount+0x73e/0x1630
[   70.697013][ T9333]  path_mount+0x6e3/0x1eb0
[   70.697425][ T9333]  __se_sys_mount+0x733/0x830
[   70.697857][ T9333]  __x64_sys_mount+0xe4/0x150
[   70.698269][ T9333]  x64_sys_call+0x2691/0x3cf0
[   70.698704][ T9333]  do_syscall_64+0xd9/0x1d0
[   70.699117][ T9333]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.699730][ T9333]
[   70.699946][ T9333] Uninit was created at:
[   70.700378][ T9333]  __alloc_pages_noprof+0x714/0xe60
[   70.700843][ T9333]  alloc_pages_mpol_noprof+0x2a2/0x9b0
[   70.701331][ T9333]  alloc_pages_noprof+0xf8/0x1f0
[   70.701774][ T9333]  allocate_slab+0x30e/0x1390
[   70.702194][ T9333]  ___slab_alloc+0x1049/0x33a0
[   70.702635][ T9333]  kmem_cache_alloc_lru_noprof+0x5ce/0xb20
[   70.703153][ T9333]  hfsplus_alloc_inode+0x5a/0xd0
[   70.703598][ T9333]  alloc_inode+0x82/0x490
[   70.703984][ T9333]  iget_locked+0x22e/0x1320
[   70.704428][ T9333]  hfsplus_iget+0x5c/0xba0
[   70.704827][ T9333]  hfsplus_btree_open+0x135/0x1dd0
[   70.705291][ T9333]  hfsplus_fill_super+0x1132/0x2700
[   70.705776][ T9333]  mount_bdev+0x37b/0x530
[   70.706171][ T9333]  hfsplus_mount+0x4d/0x60
[   70.706579][ T9333]  legacy_get_tree+0x113/0x2c0
[   70.707019][ T9333]  vfs_get_tree+0xb3/0x5c0
[   70.707444][ T9333]  do_new_mount+0x73e/0x1630
[   70.707865][ T9333]  path_mount+0x6e3/0x1eb0
[   70.708270][ T9333]  __se_sys_mount+0x733/0x830
[   70.708711][ T9333]  __x64_sys_mount+0xe4/0x150
[   70.709158][ T9333]  x64_sys_call+0x2691/0x3cf0
[   70.709630][ T9333]  do_syscall_64+0xd9/0x1d0
[   70.710053][ T9333]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.710611][ T9333]
[   70.710842][ T9333] CPU: 3 UID: 0 PID: 9333 Comm: repro Not tainted 6.12.0-rc6-dirty #17
[   70.711568][ T9333] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.712490][ T9333] =====================================================
[   70.713085][ T9333] Disabling lock debugging due to kernel taint
[   70.713618][ T9333] Kernel panic - not syncing: kmsan.panic set ...
[   70.714159][ T9333] CPU: 3 UID: 0 PID: 9333 Comm: repro Tainted: G    B              6.12.0-rc6-dirty #17
[   70.715007][ T9333] Tainted: [B]=BAD_PAGE
[   70.715365][ T9333] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.716311][ T9333] Call Trace:
[   70.716621][ T9333]  <TASK>
[   70.716899][ T9333]  dump_stack_lvl+0x1fd/0x2b0
[   70.717350][ T9333]  dump_stack+0x1e/0x30
[   70.717743][ T9333]  panic+0x502/0xca0
[   70.718116][ T9333]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.718611][ T9333]  kmsan_report+0x296/0x2a0
[   70.719038][ T9333]  ? __msan_metadata_ptr_for_load_4+0x24/0x40
[   70.719859][ T9333]  ? __msan_warning+0x96/0x120
[   70.720345][ T9333]  ? hfsplus_subfolders_dec+0x1d7/0x220
[   70.720881][ T9333]  ? hfsplus_delete_cat+0x105d/0x12b0
[   70.721412][ T9333]  ? hfsplus_rmdir+0x13d/0x310
[   70.721880][ T9333]  ? vfs_rmdir+0x5ba/0x810
[   70.722458][ T9333]  ? do_rmdir+0x964/0xea0
[   70.722883][ T9333]  ? __x64_sys_rmdir+0x71/0xb0
[   70.723397][ T9333]  ? x64_sys_call+0xcd8/0x3cf0
[   70.723915][ T9333]  ? do_syscall_64+0xd9/0x1d0
[   70.724454][ T9333]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.725110][ T9333]  ? vprintk_emit+0xd1f/0xe60
[   70.725616][ T9333]  ? vprintk_default+0x3f/0x50
[   70.726175][ T9333]  ? vprintk+0xce/0xd0
[   70.726628][ T9333]  ? _printk+0x17e/0x1b0
[   70.727129][ T9333]  ? __msan_metadata_ptr_for_load_4+0x24/0x40
[   70.727739][ T9333]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.728324][ T9333]  __msan_warning+0x96/0x120
[   70.728854][ T9333]  hfsplus_subfolders_dec+0x1d7/0x220
[   70.729479][ T9333]  hfsplus_delete_cat+0x105d/0x12b0
[   70.729984][ T9333]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.730646][ T9333]  ? __msan_metadata_ptr_for_load_4+0x24/0x40
[   70.731296][ T9333]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.731863][ T9333]  hfsplus_rmdir+0x13d/0x310
[   70.732390][ T9333]  ? __pfx_hfsplus_rmdir+0x10/0x10
[   70.732919][ T9333]  vfs_rmdir+0x5ba/0x810
[   70.733416][ T9333]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.734044][ T9333]  do_rmdir+0x964/0xea0
[   70.734537][ T9333]  __x64_sys_rmdir+0x71/0xb0
[   70.735032][ T9333]  x64_sys_call+0xcd8/0x3cf0
[   70.735579][ T9333]  do_syscall_64+0xd9/0x1d0
[   70.736092][ T9333]  ? irqentry_exit+0x16/0x60
[   70.736637][ T9333]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.737269][ T9333] RIP: 0033:0x7fa9424eafc9
[   70.737775][ T9333] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[   70.739844][ T9333] RSP: 002b:00007fff099cd8d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000054
[   70.740760][ T9333] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9424eafc9
[   70.741642][ T9333] RDX: 006c6f72746e6f63 RSI: 000000000000000a RDI: 0000000020000100
[   70.742543][ T9333] RBP: 00007fff099cd8e0 R08: 00007fff099cd910 R09: 00007fff099cd910
[   70.743376][ T9333] R10: 0000000000000000 R11: 0000000000000202 R12: 0000565430642260
[   70.744247][ T9333] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   70.745082][ T9333]  </TASK>

The main reason of the issue that struct hfsplus_inode_info
has not been properly initialized for the case of root folder.
In the case of root folder, hfsplus_fill_super() calls
the hfsplus_iget() that implements only partial initialization of
struct hfsplus_inode_info and subfolders field is not
initialized by hfsplus_iget() logic.

This patch implements complete initialization of
struct hfsplus_inode_info in the hfsplus_iget() logic with
the goal to prevent likewise issues for the case of
root folder.

Reported-by: syzbot <syzbot+fdedff847a0e5e84c39f@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=fdedff847a0e5e84c39f
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250825225103.326401-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:14:32 -07:00
Viacheslav Dubeyko
2048ec5b98 hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits()
The syzbot reported issue in hfs_find_set_zero_bits():

=====================================================
BUG: KMSAN: uninit-value in hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45
 hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45
 hfs_vbm_search_free+0x13c/0x5b0 fs/hfs/bitmap.c:151
 hfs_extend_file+0x6a5/0x1b00 fs/hfs/extent.c:408
 hfs_get_block+0x435/0x1150 fs/hfs/extent.c:353
 __block_write_begin_int+0xa76/0x3030 fs/buffer.c:2151
 block_write_begin fs/buffer.c:2262 [inline]
 cont_write_begin+0x10e1/0x1bc0 fs/buffer.c:2601
 hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52
 cont_expand_zero fs/buffer.c:2528 [inline]
 cont_write_begin+0x35a/0x1bc0 fs/buffer.c:2591
 hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52
 hfs_file_truncate+0x1d6/0xe60 fs/hfs/extent.c:494
 hfs_inode_setattr+0x964/0xaa0 fs/hfs/inode.c:654
 notify_change+0x1993/0x1aa0 fs/attr.c:552
 do_truncate+0x28f/0x310 fs/open.c:68
 do_ftruncate+0x698/0x730 fs/open.c:195
 do_sys_ftruncate fs/open.c:210 [inline]
 __do_sys_ftruncate fs/open.c:215 [inline]
 __se_sys_ftruncate fs/open.c:213 [inline]
 __x64_sys_ftruncate+0x11b/0x250 fs/open.c:213
 x64_sys_call+0xfe3/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:78
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
 slab_post_alloc_hook mm/slub.c:4154 [inline]
 slab_alloc_node mm/slub.c:4197 [inline]
 __kmalloc_cache_noprof+0x7f7/0xed0 mm/slub.c:4354
 kmalloc_noprof include/linux/slab.h:905 [inline]
 hfs_mdb_get+0x1cc8/0x2a90 fs/hfs/mdb.c:175
 hfs_fill_super+0x3d0/0xb80 fs/hfs/super.c:337
 get_tree_bdev_flags+0x6e3/0x920 fs/super.c:1681
 get_tree_bdev+0x38/0x50 fs/super.c:1704
 hfs_get_tree+0x35/0x40 fs/hfs/super.c:388
 vfs_get_tree+0xb0/0x5c0 fs/super.c:1804
 do_new_mount+0x738/0x1610 fs/namespace.c:3902
 path_mount+0x6db/0x1e90 fs/namespace.c:4226
 do_mount fs/namespace.c:4239 [inline]
 __do_sys_mount fs/namespace.c:4450 [inline]
 __se_sys_mount+0x6eb/0x7d0 fs/namespace.c:4427
 __x64_sys_mount+0xe4/0x150 fs/namespace.c:4427
 x64_sys_call+0xfa7/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:166
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 1 UID: 0 PID: 12609 Comm: syz.1.2692 Not tainted 6.16.0-syzkaller #0 PREEMPT(none)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
=====================================================

The HFS_SB(sb)->bitmap buffer is allocated in hfs_mdb_get():

HFS_SB(sb)->bitmap = kmalloc(8192, GFP_KERNEL);

Finally, it can trigger the reported issue because kmalloc()
doesn't clear the allocated memory. If allocated memory contains
only zeros, then everything will work pretty fine.
But if the allocated memory contains the "garbage", then
it can affect the bitmap operations and it triggers
the reported issue.

This patch simply exchanges the kmalloc() on kzalloc()
with the goal to guarantee the correctness of bitmap operations.
Because, newly created allocation bitmap should have all
available blocks free. Potentially, initialization bitmap's read
operation could not fill the whole allocated memory and
"garbage" in the not initialized memory will be the reason of
volume coruptions and file system driver bugs.

Reported-by: syzbot <syzbot+773fa9d79b29bd8b6831@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=773fa9d79b29bd8b6831
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250820230636.179085-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:13:10 -07:00
Viacheslav Dubeyko
c62663a986 hfs: make proper initalization of struct hfs_find_data
Potenatially, __hfs_ext_read_extent() could operate by
not initialized values of fd->key after hfs_brec_find() call:

static inline int __hfs_ext_read_extent(struct hfs_find_data *fd, struct hfs_extent *extent,
                                        u32 cnid, u32 block, u8 type)
{
        int res;

        hfs_ext_build_key(fd->search_key, cnid, block, type);
        fd->key->ext.FNum = 0;
        res = hfs_brec_find(fd);
        if (res && res != -ENOENT)
                return res;
        if (fd->key->ext.FNum != fd->search_key->ext.FNum ||
            fd->key->ext.FkType != fd->search_key->ext.FkType)
                return -ENOENT;
        if (fd->entrylength != sizeof(hfs_extent_rec))
                return -EIO;
        hfs_bnode_read(fd->bnode, extent, fd->entryoffset, sizeof(hfs_extent_rec));
        return 0;
}

This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225252.126427-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:11:55 -07:00
Viacheslav Dubeyko
4840ceadef hfsplus: fix KMSAN uninit-value issue in __hfsplus_ext_cache_extent()
The syzbot reported issue in __hfsplus_ext_cache_extent():

[   70.194323][ T9350] BUG: KMSAN: uninit-value in __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195022][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.195530][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.195998][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.196458][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.196959][ T9350]  cont_write_begin+0x1000/0x1950
[   70.197416][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.197873][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.198374][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.198892][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.199393][ T9350]  vfs_write+0xb0f/0x14e0
[   70.199771][ T9350]  ksys_write+0x23e/0x490
[   70.200149][ T9350]  __x64_sys_write+0x97/0xf0
[   70.200570][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.201065][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.201506][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.202054][ T9350]
[   70.202279][ T9350] Uninit was created at:
[   70.202693][ T9350]  __kmalloc_noprof+0x621/0xf80
[   70.203149][ T9350]  hfsplus_find_init+0x8d/0x1d0
[   70.203602][ T9350]  hfsplus_file_extend+0x6ca/0x1cf0
[   70.204087][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.204561][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.205074][ T9350]  cont_write_begin+0x1000/0x1950
[   70.205547][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.206017][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.206519][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.207042][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.207552][ T9350]  vfs_write+0xb0f/0x14e0
[   70.207961][ T9350]  ksys_write+0x23e/0x490
[   70.208375][ T9350]  __x64_sys_write+0x97/0xf0
[   70.208810][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.209255][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.209680][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.210230][ T9350]
[   70.210454][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Not tainted 6.12.0-rc5 #5
[   70.211174][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.212115][ T9350] =====================================================
[   70.212734][ T9350] Disabling lock debugging due to kernel taint
[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic set ...
[   70.213858][ T9350] CPU: 2 UID: 0 PID: 9350 Comm: repro Tainted: G    B              6.12.0-rc5 #5
[   70.214679][ T9350] Tainted: [B]=BAD_PAGE
[   70.215057][ T9350] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.215999][ T9350] Call Trace:
[   70.216309][ T9350]  <TASK>
[   70.216585][ T9350]  dump_stack_lvl+0x1fd/0x2b0
[   70.217025][ T9350]  dump_stack+0x1e/0x30
[   70.217421][ T9350]  panic+0x502/0xca0
[   70.217803][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0

[   70.218294][ Message fromT sy9350]  kmsan_report+0x296/slogd@syzkaller 0x2aat Aug 18 22:11:058 ...
 kernel
:[   70.213284][ T9350] Kernel panic - not syncing: kmsan.panic [   70.220179][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
set ...
[   70.221254][ T9350]  ? __msan_warning+0x96/0x120
[   70.222066][ T9350]  ? __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.223023][ T9350]  ? hfsplus_file_extend+0x74f/0x1cf0
[   70.224120][ T9350]  ? hfsplus_get_block+0xe16/0x17b0
[   70.224946][ T9350]  ? __block_write_begin_int+0x962/0x2ce0
[   70.225756][ T9350]  ? cont_write_begin+0x1000/0x1950
[   70.226337][ T9350]  ? hfsplus_write_begin+0x85/0x130
[   70.226852][ T9350]  ? generic_perform_write+0x3e8/0x1060
[   70.227405][ T9350]  ? __generic_file_write_iter+0x215/0x460
[   70.227979][ T9350]  ? generic_file_write_iter+0x109/0x5e0
[   70.228540][ T9350]  ? vfs_write+0xb0f/0x14e0
[   70.228997][ T9350]  ? ksys_write+0x23e/0x490
[   70.229458][ T9350]  ? __x64_sys_write+0x97/0xf0
[   70.229939][ T9350]  ? x64_sys_call+0x3015/0x3cf0
[   70.230432][ T9350]  ? do_syscall_64+0xd9/0x1d0
[   70.230941][ T9350]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.231926][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.232738][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.233711][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.234516][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.235398][ T9350]  ? __msan_metadata_ptr_for_load_4+0x24/0x40
[   70.236323][ T9350]  ? hfsplus_brec_find+0x218/0x9f0
[   70.237090][ T9350]  ? __pfx_hfs_find_rec_by_key+0x10/0x10
[   70.237938][ T9350]  ? __msan_instrument_asm_store+0xbf/0xf0
[   70.238827][ T9350]  ? __msan_metadata_ptr_for_store_4+0x27/0x40
[   70.239772][ T9350]  ? __hfsplus_ext_write_extent+0x536/0x620
[   70.240666][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.241175][ T9350]  __msan_warning+0x96/0x120
[   70.241645][ T9350]  __hfsplus_ext_cache_extent+0x7d0/0x990
[   70.242223][ T9350]  hfsplus_file_extend+0x74f/0x1cf0
[   70.242748][ T9350]  hfsplus_get_block+0xe16/0x17b0
[   70.243255][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.243878][ T9350]  ? kmsan_get_metadata+0x13e/0x1c0
[   70.244400][ T9350]  ? kmsan_get_shadow_origin_ptr+0x4a/0xb0
[   70.244967][ T9350]  __block_write_begin_int+0x962/0x2ce0
[   70.245531][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.246079][ T9350]  cont_write_begin+0x1000/0x1950
[   70.246598][ T9350]  hfsplus_write_begin+0x85/0x130
[   70.247105][ T9350]  ? __pfx_hfsplus_get_block+0x10/0x10
[   70.247650][ T9350]  ? __pfx_hfsplus_write_begin+0x10/0x10
[   70.248211][ T9350]  generic_perform_write+0x3e8/0x1060
[   70.248752][ T9350]  __generic_file_write_iter+0x215/0x460
[   70.249314][ T9350]  generic_file_write_iter+0x109/0x5e0
[   70.249856][ T9350]  ? kmsan_internal_set_shadow_origin+0x77/0x110
[   70.250487][ T9350]  vfs_write+0xb0f/0x14e0
[   70.250930][ T9350]  ? __pfx_generic_file_write_iter+0x10/0x10
[   70.251530][ T9350]  ksys_write+0x23e/0x490
[   70.251974][ T9350]  __x64_sys_write+0x97/0xf0
[   70.252450][ T9350]  x64_sys_call+0x3015/0x3cf0
[   70.252924][ T9350]  do_syscall_64+0xd9/0x1d0
[   70.253384][ T9350]  ? irqentry_exit+0x16/0x60
[   70.253844][ T9350]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   70.254430][ T9350] RIP: 0033:0x7f7a92adffc9
[   70.254873][ T9350] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[   70.256674][ T9350] RSP: 002b:00007fff0bca3188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   70.257485][ T9350] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7a92adffc9
[   70.258246][ T9350] RDX: 000000000208e24b RSI: 0000000020000100 RDI: 0000000000000004
[   70.258998][ T9350] RBP: 00007fff0bca31a0 R08: 00007fff0bca31a0 R09: 00007fff0bca31a0
[   70.259769][ T9350] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e0d75f8250
[   70.260520][ T9350] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   70.261286][ T9350]  </TASK>
[   70.262026][ T9350] Kernel Offset: disabled

(gdb) l *__hfsplus_ext_cache_extent+0x7d0
0xffffffff8318aef0 is in __hfsplus_ext_cache_extent (fs/hfsplus/extents.c:168).
163		fd->key->ext.cnid = 0;
164		res = hfs_brec_find(fd, hfs_find_rec_by_key);
165		if (res && res != -ENOENT)
166			return res;
167		if (fd->key->ext.cnid != fd->search_key->ext.cnid ||
168		    fd->key->ext.fork_type != fd->search_key->ext.fork_type)
169			return -ENOENT;
170		if (fd->entrylength != sizeof(hfsplus_extent_rec))
171			return -EIO;
172		hfs_bnode_read(fd->bnode, extent, fd->entryoffset,

The __hfsplus_ext_cache_extent() calls __hfsplus_ext_read_extent():

res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino,
				block, HFSPLUS_IS_RSRC(inode) ?
					HFSPLUS_TYPE_RSRC :
					HFSPLUS_TYPE_DATA);

And if inode->i_ino could be equal to zero or any non-available CNID,
then hfs_brec_find() could not find the record in the tree. As a result,
fd->key could be compared with fd->search_key. But hfsplus_find_init()
uses kmalloc() for fd->key and fd->search_key allocation:

int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd)
{
<skipped>
        ptr = kmalloc(tree->max_key_len * 2 + 4, GFP_KERNEL);
        if (!ptr)
                return -ENOMEM;
        fd->search_key = ptr;
        fd->key = ptr + tree->max_key_len + 2;
<skipped>
}

Finally, fd->key is still not initialized if hfs_brec_find()
has found nothing.

This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.

Reported-by: syzbot <syzbot+55ad87f38795d6787521@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=55ad87f38795d6787521
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225232.126402-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:11:43 -07:00
Yang Chenzhi
738d5a5186 hfs: validate record offset in hfsplus_bmap_alloc
hfsplus_bmap_alloc can trigger a crash if a
record offset or length is larger than node_size

[   15.264282] BUG: KASAN: slab-out-of-bounds in hfsplus_bmap_alloc+0x887/0x8b0
[   15.265192] Read of size 8 at addr ffff8881085ca188 by task test/183
[   15.265949]
[   15.266163] CPU: 0 UID: 0 PID: 183 Comm: test Not tainted 6.17.0-rc2-gc17b750b3ad9 #14 PREEMPT(voluntary)
[   15.266165] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   15.266167] Call Trace:
[   15.266168]  <TASK>
[   15.266169]  dump_stack_lvl+0x53/0x70
[   15.266173]  print_report+0xd0/0x660
[   15.266181]  kasan_report+0xce/0x100
[   15.266185]  hfsplus_bmap_alloc+0x887/0x8b0
[   15.266208]  hfs_btree_inc_height.isra.0+0xd5/0x7c0
[   15.266217]  hfsplus_brec_insert+0x870/0xb00
[   15.266222]  __hfsplus_ext_write_extent+0x428/0x570
[   15.266225]  __hfsplus_ext_cache_extent+0x5e/0x910
[   15.266227]  hfsplus_ext_read_extent+0x1b2/0x200
[   15.266233]  hfsplus_file_extend+0x5a7/0x1000
[   15.266237]  hfsplus_get_block+0x12b/0x8c0
[   15.266238]  __block_write_begin_int+0x36b/0x12c0
[   15.266251]  block_write_begin+0x77/0x110
[   15.266252]  cont_write_begin+0x428/0x720
[   15.266259]  hfsplus_write_begin+0x51/0x100
[   15.266262]  cont_write_begin+0x272/0x720
[   15.266270]  hfsplus_write_begin+0x51/0x100
[   15.266274]  generic_perform_write+0x321/0x750
[   15.266285]  generic_file_write_iter+0xc3/0x310
[   15.266289]  __kernel_write_iter+0x2fd/0x800
[   15.266296]  dump_user_range+0x2ea/0x910
[   15.266301]  elf_core_dump+0x2a94/0x2ed0
[   15.266320]  vfs_coredump+0x1d85/0x45e0
[   15.266349]  get_signal+0x12e3/0x1990
[   15.266357]  arch_do_signal_or_restart+0x89/0x580
[   15.266362]  irqentry_exit_to_user_mode+0xab/0x110
[   15.266364]  asm_exc_page_fault+0x26/0x30
[   15.266366] RIP: 0033:0x41bd35
[   15.266367] Code: bc d1 f3 0f 7f 27 f3 0f 7f 6f 10 f3 0f 7f 77 20 f3 0f 7f 7f 30 49 83 c0 0f 49 29 d0 48 8d 7c 17 31 e9 9f 0b 00 00 66 0f ef c0 <f3> 0f 6f 0e f3 0f 6f 56 10 66 0f 74 c1 66 0f d7 d0 49 83 f8f
[   15.266369] RSP: 002b:00007ffc9e62d078 EFLAGS: 00010283
[   15.266371] RAX: 00007ffc9e62d100 RBX: 0000000000000000 RCX: 0000000000000000
[   15.266372] RDX: 00000000000000e0 RSI: 0000000000000000 RDI: 00007ffc9e62d100
[   15.266373] RBP: 0000400000000040 R08: 00000000000000e0 R09: 0000000000000000
[   15.266374] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   15.266375] R13: 0000000000000000 R14: 0000000000000000 R15: 0000400000000000
[   15.266376]  </TASK>

When calling hfsplus_bmap_alloc to allocate a free node, this function
first retrieves the bitmap from header node and map node using node->page
together with the offset and length from hfs_brec_lenoff

```
len = hfs_brec_lenoff(node, 2, &off16);
off = off16;

off += node->page_offset;
pagep = node->page + (off >> PAGE_SHIFT);
data = kmap_local_page(*pagep);
```

However, if the retrieved offset or length is invalid(i.e. exceeds
node_size), the code may end up accessing pages outside the allocated
range for this node.

This patch adds proper validation of both offset and length before use,
preventing out-of-bounds page access. Move is_bnode_offset_valid and
check_and_correct_requested_length to hfsplus_fs.h, as they may be
required by other functions.

Reported-by: syzbot+356aed408415a56543cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67bcb4a6.050a0220.bbfd1.008f.GAE@google.com/
Signed-off-by: Yang Chenzhi <yang.chenzhi@vivo.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250818141734.8559-2-yang.chenzhi@vivo.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:09:44 -07:00
Yangtao Li
9282bc905f hfsplus: return EIO when type of hidden directory mismatch in hfsplus_fill_super()
If Catalog File contains corrupted record for the case of
hidden directory's type, regard it as I/O error instead of
Invalid argument.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250805165905.3390154-1-frank.li@vivo.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-08-31 18:07:59 -07:00
Philipp Kerling
b5ee94ac65 ksmbd: allow a filename to contain colons on SMB3.1.1 posix extensions
If the client sends SMB2_CREATE_POSIX_CONTEXT to ksmbd, allow the filename
to contain a colon (':'). This requires disabling the support for Alternate
Data Streams (ADS), which are denoted by a colon-separated suffix to the
filename on Windows. This should not be an issue, since this concept is not
known to POSIX anyway and the client has to explicitly request a POSIX
context to get this behavior.

Link: https://lore.kernel.org/all/f9401718e2be2ab22058b45a6817db912784ef61.camel@rx2.rx-server.de/
Signed-off-by: Philipp Kerling <pkerling@casix.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-31 17:48:38 -05:00
Wang Zhaolong
6976c7a69d smb: client: Fix NULL pointer dereference in cifs_debug_dirs_proc_show()
Reading /proc/fs/cifs/open_dirs may hit a NULL dereference when
tcon->cfids is NULL.

Add NULL check before accessing cfids to prevent the crash.

Reproduction:
- Mount CIFS share
- cat /proc/fs/cifs/open_dirs

Fixes: 844e5c0eb1 ("smb3 client: add way to show directory leases for improved debugging")
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-31 17:46:57 -05:00
Liao Yuanhong
b639c20e74 f2fs: Use allocate_section_policy to control write priority in multi-devices setups
Introduces two new sys nodes: allocate_section_hint and
allocate_section_policy. The allocate_section_hint identifies the boundary
between devices, measured in sections; it defaults to the end of the device
for single storage setups, and the end of the first device for multiple
storage setups. The allocate_section_policy determines the write strategy,
with a default value of 0 for normal sequential write strategy. A value of
1 prioritizes writes before the allocate_section_hint, while a value of 2
prioritizes writes after it.

This strategy addresses the issue where, despite F2FS supporting multiple
devices, SOC vendors lack multi-devices support (currently only supporting
zoned devices). As a workaround, multiple storage devices are mapped to a
single dm device. Both this workaround and the F2FS multi-devices solution
may require prioritizing writing to certain devices, such as a device with
better performance or when switching is needed due to performance
degradation near a device's end. For scenarios with more than two devices,
sort them at mount time to utilize this feature.

When using this feature with a single storage device, it has almost no
impact. However, for configurations where multiple storage devices are
mapped to the same dm device using F2FS, utilizing this feature can provide
some optimization benefits. Therefore, I believe it should not be limited
to just multi-devices usage.

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-08-29 20:48:47 +00:00
Trond Myklebust
4fb2b677fc NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server
nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset
on entry to the function.

Fixes: b78ef845c3 ("NFSv4.2: query the server for extended attribute support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-29 12:56:43 -04:00
Trond Myklebust
b3ac334360 NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported
_nfs4_server_capabilities() should clear capabilities that are not
supported by the server.

Fixes: d2a00cceb9 ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-29 12:56:43 -04:00
Trond Myklebust
dd5a8621b8 NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set
_nfs4_server_capabilities() is expected to clear any flags that are not
supported by the server.

Fixes: 8a59bb93b7 ("NFSv4 store server support for fs_location attribute")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-29 12:56:43 -04:00
Trond Myklebust
31f1a960ad NFSv4: Don't clear capabilities that won't be reset
Don't clear the capabilities that are not going to get reset by the call
to _nfs4_server_capabilities().

Reported-by: Scott Haiden <scott.b.haiden@gmail.com>
Fixes: b01f21cacd ("NFS: Fix the setting of capabilities when automounting a new filesystem")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-29 12:56:27 -04:00
Linus Torvalds
fb679c832b Merge tag 'efi-fixes-for-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Assorted fixes for the OP-TEE based pseudo-EFI variable store

 - Fix for an OOB access when looking up the same non-existing efivarfs
   entry multiple times in parallel

* tag 'efi-fixes-for-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efivarfs: Fix slab-out-of-bounds in efivarfs_d_compare
  efi: stmm: Drop unneeded null pointer check
  efi: stmm: Drop unused EFI error from setup_mm_hdr arguments
  efi: stmm: Do not return EFI_OUT_OF_RESOURCES on internal errors
  efi: stmm: Fix incorrect buffer allocation method
2025-08-29 09:15:46 -07:00
Linus Torvalds
2575e638e2 Merge tag 'v6.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - Fix possible refcount leak in compound operations

 - Fix remap_file_range() return code mapping, found by generic/157

* tag 'v6.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  fs/smb: Fix inconsistent refcnt update
  smb3 client: fix return code mapping of remap_file_range
2025-08-29 08:51:34 -07:00
Linus Torvalds
469447200a Merge tag 'xfs-fixes-6.17-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
 "The highlight I'd like to point here is related to the XFS_RT
  Kconfig, which has been updated to be enabled by default now if
  CONFIG_BLK_DEV_ZONED is enabled.

  This also contains a few fixes for zoned devices support in XFS,
  specially related to swapon requests in inodes belonging to the zoned
  FS.

  A null-ptr dereference fix in the xattr data, due to a mishandling of
  medium errors generated by block devices is also included"

* tag 'xfs-fixes-6.17-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: do not propagate ENODATA disk errors into xattr code
  xfs: reject swapon for inodes on a zoned file system earlier
  xfs: kick off inodegc when failing to reserve zoned blocks
  xfs: remove xfs_last_used_zone
  xfs: Default XFS_RT to Y if CONFIG_BLK_DEV_ZONED is enabled
2025-08-29 08:09:34 -07:00
Lauri Vasama
db2ab24a34 Add RWF_NOSIGNAL flag for pwritev2
For a user mode library to avoid generating SIGPIPE signals (e.g.
because this behaviour is not portable across operating systems) is
cumbersome. It is generally bad form to change the process-wide signal
mask in a library, so a local solution is needed instead.

For I/O performed directly using system calls (synchronous or readiness
based asynchronous) this currently involves applying a thread-specific
signal mask before the operation and reverting it afterwards. This can be
avoided when it is known that the file descriptor refers to neither a
pipe nor a socket, but a conservative implementation must always apply
the mask. This incurs the cost of two additional system calls. In the
case of sockets, the existing MSG_NOSIGNAL flag can be used with send.

For asynchronous I/O performed using io_uring, currently the only option
(apart from MSG_NOSIGNAL for sockets), is to mask SIGPIPE entirely in the
call to io_uring_enter. Thankfully io_uring_enter takes a signal mask, so
only a single syscall is needed. However, copying the signal mask on
every call incurs a non-zero performance penalty. Furthermore, this mask
applies to all completions, meaning that if the non-signaling behaviour
is desired only for some subset of operations, the desired signals must
be raised manually from user-mode depending on the completed operation.

Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal
from being raised when writing on disconnected pipes or sockets. The flag
is handled directly by the pipe filesystem and converted to the existing
MSG_NOSIGNAL flag for sockets.

Signed-off-by: Lauri Vasama <git@vasama.org>
Link: https://lore.kernel.org/20250827133901.1820771-1-git@vasama.org
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-29 15:08:07 +02:00
Xichao Zhao
38d1227fa7 fs: Replace offsetof() with struct_size() in ioctl_file_dedupe_range()
When dealing with structures containing flexible arrays, struct_size()
provides additional compile-time checks compared to offsetof(). This
enhances code robustness and reduces the risk of potential errors.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Link: https://lore.kernel.org/20250829091510.597858-1-zhao.xichao@vivo.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-29 12:00:58 +02:00
Amir Goldstein
bb585591eb fhandle: use more consistent rules for decoding file handle from userns
Commit 620c266f39 ("fhandle: relax open_by_handle_at() permission
checks") relaxed the coditions for decoding a file handle from non init
userns.

The conditions are that that decoded dentry is accessible from the user
provided mountfd (or to fs root) and that all the ancestors along the
path have a valid id mapping in the userns.

These conditions are intentionally more strict than the condition that
the decoded dentry should be "lookable" by path from the mountfd.

For example, the path /home/amir/dir/subdir is lookable by path from
unpriv userns of user amir, because /home perms is 755, but the owner of
/home does not have a valid id mapping in unpriv userns of user amir.

The current code did not check that the decoded dentry itself has a
valid id mapping in the userns.  There is no security risk in that,
because that final open still performs the needed permission checks,
but this is inconsistent with the checks performed on the ancestors,
so the behavior can be a bit confusing.

Add the check for the decoded dentry itself, so that the entire path,
including the last component has a valid id mapping in the userns.

Fixes: 620c266f39 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250827194309.1259650-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-29 09:48:31 +02:00