Commit Graph

96808 Commits

Author SHA1 Message Date
Kent Overstreet
5906dcb993 bcachefs: Silence read-only errors when deleting snapshots
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
8b1f46bff3 bcachefs: Dropped superblock write is no longer a fatal error
Just emit a warning if errors=continue or fix_safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
8cfdc6ce1f bcachefs: bch2_trans_node_drop()
Factor out a small common helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
0971a72c3d bcachefs: bch2_trans_unlock_write()
New helper for dropping all write locks; which is distinct from the
helper the transaction commit path uses, which is faster and only
touches updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
e1911d7a69 bcachefs: btree_node_unlock() can now drop write locks
Prep work for reworking btree node locking during interior btree
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
9a5232ef0a bcachefs: six locks: write locks can now be held recursively
This is needed for the interior update locking rework, where we'll be
holding node write locks for the duration of the update - which is
needed for synchronizing with online check_allocations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
8f3aaa5d5d bcachefs: bch2_fs_btree_gc_init()
Now returns errors, prep work for check_allocations_done_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
cb3f34982c bcachefs: Assert that btree write buffer only touches the right btrees
More asserts, more better.

Also, clean up the per-btree flags a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
bdedae70f5 bcachefs: bch2_inum_path() now crosses subvolumes correctly
The dirent that points to a subvolume root is in the parent subvolume.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
ce9a21713b bcachefs: bch2_inum_path() no longer returns an error for disconnected inums
bch2_inum_path() should work even if the filesystem is corrupted - we
don't want it to cause fsck to fail.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
6adc5af50a bcachefs: btree_path_very_locks(): verify lock seq
If the btree_path's lock seq is wrong, the next bch2_trans_relock()
operation is guaranteed to fail and we take an unnecessary transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
f908eacc34 bcachefs: fix bch2_btree_key_cache_drop()
When evicting, we shouldn't leave a pointer to the key cache entry lying
around - that screws up btree path asserts we're adding.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
bc6fce7870 bcachefs: bch2_btree_node_write_trans()
Avoiding screwing up path->lock_seq.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
4bd06f07bc bcachefs: Fixes for snapshot_tree.master_subvol
Ensure that snapshot_tree.master_subvol is cleared when we delete the
master subvolume in a tree of snapshots, and allow for snapshot trees
that don't have a master subvolume in fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
b5e4cd0871 bcachefs: Don't rely on snapshot_tree.master_subvol for reattaching
Previously, fsck used the snapshot tree's master subvol for finding the
root inode number - but the master subvol might have been deleting, and
setting a new one should be a user operation; meaning we can't rely on
it existing.

Fortunately, for finding the root inode number in a tree of snapshots,
finding any associated subvolume works.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
4541408391 bcachefs: bch2_kvmalloc()
Add a version of kvmalloc() that doesn't have the INT_MAX limit; large
filesystems do hit this.

We'll want to get rid of the in-memory bucket gens array, but we're not
there quite yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
fa3e5135e4 bcachefs: Fix assert for online fsck
We can't check if we're racing with fsck ending until mark_lock is held.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
cf3da2d627 bcachefs: Handle -BCH_ERR_need_mark_replicas in gc
Locking considerations (possibly no longer relevant?) mean that when an
accounting update needs a new superblock replicas entry to be created,
it's deferred to the transaction commit error path.

But accounting updates for gc/fcsk aren't done from the transaction
commit path - so we need to handle
-BCH_ERR_btree_insert_need_mark_replicas locally.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
861cd0f606 bcachefs: Write lock btree node in key cache fills
this addresses a key cache coherency bug

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
baf13d8344 bcachefs: kill __bch2_btree_iter_flags()
bch2_btree_iter_flags() now takes a level parameter; this fixes a bug
where using a node iterator on a leaf wouldn't set
BTREE_ITER_with_key_cache, leading to fun cache coherency bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
30e32692d6 bcachefs: Drop redundant "read error" call from btree_gc
The btree node read error path already calls topology error, so this is
entirely redundant, and we're not specific enough about our error codes
- this was triggering for bucket_ref_update() errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
6542afe299 bcachefs: Drop racy warning
Checking for writing past i_size after unlocking the folio and clearing
the dirty bit is racy, and we already check it at the start.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
0475c7639e bcachefs: better check_bp_exists() error message
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Hongbo Li
d01ea14da7 bcachefs: add counter_flags for counters
In bcachefs, io_read and io_write counter record the amount
of data which has been read and written. They increase in
unit of sector, so to display correctly, they need to be
shifted to the left by the size of a sector. Other counters
like io_move, move_extent_{read, write, finish} also have
this problem.

In order to support different unit, we add extra column to
mark the counter type by using TYPE_COUNTER and TYPE_SECTORS
in BCH_PERSISTENT_COUNTERS().

Fixes: 1c6fdbd8f2 ("bcachefs: Initial commit")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
3db3084a86 bcachefs: bcachefs_metadata_version_autofix_errors
It's time to make self healing the default: change the error action for
old filesystems to fix_safe, matching the default for current
filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
df448ca355 bcachefs: bcachefs_metadata_version_persistent_inode_cursors
Persistent cursors for inode allocation.

A free inodes btree would add substantial overhead to inode allocation
and freeing - a "next num to allocate" cursor is always going to be
faster.

We just need it to be persistent, to avoid scanning the inodes btree
from the start on startup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Linus Torvalds
2144da2558 Merge tag '6.13-rc6-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
 "Four ksmbd server fixes, most also for stable:

   - fix for reporting special file type more accurately when POSIX
     extensions negotiated

   - minor cleanup

   - fix possible incorrect creation path when dirname is not present.
     In some cases, Windows apps create files without checking if they
     exist.

   - fix potential NULL pointer dereference sending interim response"

* tag '6.13-rc6-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: Implement new SMB3 POSIX type
  ksmbd: fix unexpectedly changed path in ksmbd_vfs_kern_path_locked
  ksmbd: Remove unneeded if check in ksmbd_rdma_capable_netdev()
  ksmbd: fix a missing return value check bug
2025-01-09 18:19:59 -08:00
Linus Torvalds
643e2e259c Merge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "A few more fixes.

  Besides the one-liners in Btrfs there's fix to the io_uring and
  encoded read integration (added in this development cycle). The update
  to io_uring provides more space for the ongoing command that is then
  used in Btrfs to handle some cases.

   - io_uring and encoded read:
       - provide stable storage for io_uring command data
       - make a copy of encoded read ioctl call, reuse that in case the
         call would block and will be called again

   - properly initialize zlib context for hardware compression on s390

   - fix max extent size calculation on filesystems with non-zoned
     devices

   - fix crash in scrub on crafted image due to invalid extent tree"

* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
  btrfs: zoned: calculate max_extent_size properly on non-zoned setup
  btrfs: avoid NULL pointer dereference if no valid extent tree
  btrfs: don't read from userspace twice in btrfs_uring_encoded_read()
  io_uring: add io_uring_cmd_get_async_data helper
  io_uring/cmd: add per-op data to struct io_uring_cmd_data
  io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
2025-01-09 10:16:45 -08:00
Lizhi Xu
17a4fde81d afs: Fix merge preference rule failure condition
syzbot reported a lock held when returning to userspace[1].  This is
because if argc is less than 0 and the function returns directly, the held
inode lock is not released.

Fix this by store the error in ret and jump to done to clean up instead of
returning directly.

[dh: Modified Lizhi Xu's original patch to make it honour the error code
from afs_split_string()]

[1]
WARNING: lock held when returning to user space!
6.13.0-rc3-syzkaller-00209-g499551201b5f #0 Not tainted
------------------------------------------------
syz-executor133/5823 is leaving the kernel with locks still held!
1 lock held by syz-executor133/5823:
 #0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: inode_lock include/linux/fs.h:818 [inline]
 #0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: afs_proc_addr_prefs_write+0x2bb/0x14e0 fs/afs/addr_prefs.c:388

Reported-by: syzbot+76f33569875eb708e575@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=76f33569875eb708e575
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241226012616.2348907-1-lizhi.xu@windriver.com/
Link: https://lore.kernel.org/r/529850.1736261552@warthog.procyon.org.uk
Tested-by: syzbot+76f33569875eb708e575@syzkaller.appspotmail.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 17:21:41 +01:00
David Howells
904abff4b1 netfs: Fix read-retry for fs with no ->prepare_read()
Fix netfslib's read-retry to only call ->prepare_read() in the backing
filesystem such a function is provided.  We can get to this point if a
there's an active cache as failed reads from the cache need negotiating
with the server instead.

Fixes: ee4cdf7ba8 ("netfs: Speed up buffered reading")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/529329.1736261010@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 17:20:04 +01:00
David Howells
3f6bc9e3ab netfs: Fix kernel async DIO
Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b5 ("netfs: Implement unbuffered/DIO write support")
Reported-by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/608725.1736275167@warthog.procyon.org.uk
Tested-by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 17:18:51 +01:00
Christian Brauner
482d520d86 Merge tag 'vfs-6.14-rc7.mount.fixes'
Bring in the fix for the mount namespace rbtree. It is used as the base
for the vfs mount work for this cycle and so shouldn't be applied
directly.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 17:03:21 +01:00
Christian Brauner
22eb23b8a7 fs: remove useless lockdep assertion
mnt_ns_release() can run asynchronously via call_rcu() so hitting that
lockdep assertion means someone else already grabbed the
mnt_ns_tree_lock and causes a false positive. That assertion has likely
always been wrong. call_rcu() just makes it more likely to hit.

Link: https://lore.kernel.org/r/Z2PlT5rcRTIhCpft@ly-workstation
Link: https://lore.kernel.org/r/20241219-darben-quietschen-b6e1d80327bb@brauner
Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:55 +01:00
Christian Brauner
7f9bfafc5f fs: use xarray for old mount id
While the ida does use the xarray internally we can use it explicitly
which allows us to increment the unique mount id under the xa lock.
This allows us to remove the atomic as we're now allocating both ids in
one go.

Link: https://lore.kernel.org/r/20241217-erhielten-regung-44bb1604ca8f@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:55 +01:00
Christian Brauner
2ce23285d7 fs: cache first and last mount
Speed up listmount() by caching the first and last node making retrieval
of the first and last mount of each mount namespace O(1).

Link: https://lore.kernel.org/r/20241215-vfs-6-14-mount-work-v1-2-fd55922c4af8@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:54 +01:00
Christian Brauner
e7c8dde368 fs: simplify rwlock to spinlock
We're not taking the read_lock() anymore now that all lookup is
lockless. Just use a simple spinlock.

Link: https://lore.kernel.org/r/20241213-work-mount-rbtree-lockless-v3-6-6e3cdaf9b280@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:52 +01:00
Christian Brauner
4368898b27 fs: lockless mntns lookup for nsfs
We already made the rbtree lookup lockless for the simple lookup case.
However, walking the list of mount namespaces via nsfs still happens
with taking the read lock blocking concurrent additions of new mount
namespaces pointlessly. Plus, such additions are rare anyway so allow
lockless lookup of the previous and next mount namespace by keeping a
separate list. This also allows to make some things simpler in the code.

Link: https://lore.kernel.org/r/20241213-work-mount-rbtree-lockless-v3-5-6e3cdaf9b280@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:52 +01:00
Christian Brauner
5dcbd85d35 fs: lockless mntns rbtree lookup
Currently we use a read-write lock but for the simple search case we can
make this lockless. Creating a new mount namespace is a rather rare
event compared with querying mounts in a foreign mount namespace. Once
this is picked up by e.g., systemd to list mounts in another mount in
it's isolated services or in containers this will be used a lot so this
seems worthwhile doing.

Link: https://lore.kernel.org/r/20241213-work-mount-rbtree-lockless-v3-3-6e3cdaf9b280@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:52 +01:00
Christian Brauner
144acef333 fs: add mount namespace to rbtree late
There's no point doing that under the namespace semaphore it just gives
the false impression that it protects the mount namespace rbtree and it
simply doesn't.

Link: https://lore.kernel.org/r/20241213-work-mount-rbtree-lockless-v3-2-6e3cdaf9b280@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:51 +01:00
Christian Brauner
62b8dee925 mount: remove inlude/nospec.h include
It's not needed, so remove it.

Link: https://lore.kernel.org/r/20241213-work-mount-rbtree-lockless-v3-1-6e3cdaf9b280@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:51 +01:00
Jeff Layton
056d33137b fs: prepend statmount.mnt_opts string with security_sb_mnt_opts()
Currently these mount options aren't accessible via statmount().

The read handler for /proc/#/mountinfo calls security_sb_show_options()
to emit the security options after emitting superblock flag options, but
before calling sb->s_op->show_options.

Have statmount_mnt_opts() call security_sb_show_options() before
calling ->show_options.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241115-statmount-v2-2-cd29aeff9cbb@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:51 +01:00
Christian Brauner
344bac8f0d fs: kill MNT_ONRB
Move mnt->mnt_node into the union with mnt->mnt_rcu and mnt->mnt_llist
instead of keeping it with mnt->mnt_list. This allows us to use
RB_CLEAR_NODE(&mnt->mnt_node) in umount_tree() as well as
list_empty(&mnt->mnt_node). That in turn allows us to remove MNT_ONRB.

This also fixes the bug reported in [1] where seemingly MNT_ONRB wasn't
set in @mnt->mnt_flags even though the mount was present in the mount
rbtree of the mount namespace.

The root cause is the following race. When a btrfs subvolume is mounted
a temporary mount is created:

btrfs_get_tree_subvol()
{
        mnt = fc_mount()
        // Register the newly allocated mount with sb->mounts:
        lock_mount_hash();
        list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
        unlock_mount_hash();
}

and registered on sb->s_mounts. Later it is added to an anonymous mount
namespace via mount_subvol():

-> mount_subvol()
   -> mount_subtree()
      -> alloc_mnt_ns()
         mnt_add_to_ns()
         vfs_path_lookup()
         put_mnt_ns()

The mnt_add_to_ns() call raises MNT_ONRB in @mnt->mnt_flags. If someone
concurrently does a ro remount:

reconfigure_super()
-> sb_prepare_remount_readonly()
   {
           list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
   }

all mounts registered in sb->s_mounts are visited and first
MNT_WRITE_HOLD is raised, then MNT_READONLY is raised, and finally
MNT_WRITE_HOLD is removed again.

The flag modification for MNT_WRITE_HOLD/MNT_READONLY and MNT_ONRB race
so MNT_ONRB might be lost.

Fixes: 2eea9ce431 ("mounts: keep list of mounts in an rbtree")
Cc: <stable@kernel.org> # v6.8+
Link: https://lore.kernel.org/r/20241215-vfs-6-14-mount-work-v1-1-fd55922c4af8@kernel.org
Link: https://lore.kernel.org/r/ec6784ed-8722-4695-980a-4400d4e7bd1a@gmx.com [1]
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:58:50 +01:00
Christoph Hellwig
468210ec76 xfs: report larger dio alignment for COW inodes
For I/O to reflinked blocks we always need to write an entire new file
system block, and the code enforces the file system block alignment for
the entire file if it has any reflinked blocks.  Mirror the larger
value reported in the statx in the dio_offset_align in the xfs-specific
XFS_IOC_DIOINFO ioctl for the same reason.

Don't bother adding a new field for the read alignment to this legacy
ioctl as all new users should use statx instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250109083109.1441561-6-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:23:18 +01:00
Christoph Hellwig
7422bbd030 xfs: report the correct read/write dio alignment for reflinked inodes
For I/O to reflinked blocks we always need to write an entire new
file system block, and the code enforces the file system block alignment
for the entire file if it has any reflinked blocks.

Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
vs write alignments for reflinked files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250109083109.1441561-5-hch@lst.de
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:23:18 +01:00
Christoph Hellwig
7e17483c7b xfs: cleanup xfs_vn_getattr
Split the two bits of optional statx reporting into their own helpers
so that they are self-contained instead of deeply indented in the main
getattr handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250109083109.1441561-4-hch@lst.de
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:23:17 +01:00
Christoph Hellwig
7ed6cbe0f8 fs: add STATX_DIO_READ_ALIGN
Add a separate dio read align field, as many out of place write
file systems can easily do reads aligned to the device sector size,
but require bigger alignment for writes.

This is usually papered over by falling back to buffered I/O for smaller
writes and doing read-modify-write cycles, but performance for this
sucks, so applications benefit from knowing the actual write alignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250109083109.1441561-3-hch@lst.de
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:23:17 +01:00
Marco Nelissen
c13094b894 iomap: avoid avoid truncating 64-bit offset to 32 bits
on 32-bit kernels, iomap_write_delalloc_scan() was inadvertently using a
32-bit position due to folio_next_index() returning an unsigned long.
This could lead to an infinite loop when writing to an xfs filesystem.

Signed-off-by: Marco Nelissen <marco.nelissen@gmail.com>
Link: https://lore.kernel.org/r/20250109041253.2494374-1-marco.nelissen@gmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:09:20 +01:00
James Bottomley
7e365c7e2c efivarfs: make variable_is_present use dcache lookup
Instead of searching the variable entry list for a variable, use the
dcache lookup functions to find it instead.  Also add an efivarfs_
prefix to the function now it is no longer static.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-09 15:29:08 +01:00
James Bottomley
1aba87f92d efivarfs: add helper to convert from UC16 name and GUID to utf8 name
These will be used by a later patch to check for uniqueness on initial
EFI variable iteration.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-09 15:29:08 +01:00
James Bottomley
c57b6e1d8a efivarfs: remove unused efi_variable.Attributes and efivar_entry.kobj
These fields look to be remnants of older code: Attributes was likely
meant to stash the variable attributes, but doesn't because we always
read them from the variable store and kobj was likely left over from
an older iteration of code where we manually created the objects
instead of using a filesystem.

[ ardb: these fields were used by the sysfs based 'efivars' precursor to
	efivarfs, which was removed in commit 0f5b2c69a4 ("efi: vars:
	Remove deprecated 'efivars' sysfs interface") ]

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-09 15:25:14 +01:00