There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Actual removal is done under the lock, but for checking if need to bother
the lockless RB_EMPTY_NODE() is safe - either that namespace had never
been added to mnt_ns_tree, in which case the the node will stay empty, or
whoever had allocated it has called mnt_ns_tree_add() and it has already
run to completion. After that point RB_EMPTY_NODE() will become false and
will remain false, no matter what we do with other nodes in the tree.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
f2fs_zero_post_eof_page() may cuase more overhead due to invalidate_lock
and page lookup, change as below to mitigate its overhead:
- check new_size before grabbing invalidate_lock
- lookup and invalidate pages only in range of [old_size, new_size]
Fixes: ba8dac350f ("f2fs: fix to zero post-eof page")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It reports a bug from device w/ zufs:
F2FS-fs (dm-64): Inconsistent segment (173822) type [1, 0] in SSA and SIT
F2FS-fs (dm-64): Stopped filesystem due to reason: 4
Thread A Thread B
- f2fs_expand_inode_data
- f2fs_allocate_pinning_section
- f2fs_gc_range
- do_garbage_collect w/ segno #x
- writepage
- f2fs_allocate_data_block
- new_curseg
- allocate segno #x
The root cause is: fallocate on pinning file may race w/ block allocation
as above, result in do_garbage_collect() from fallocate() may migrate
segment which is just allocated by a log, the log will update segment type
in its in-memory structure, however GC will get segment type from on-disk
SSA block, once segment type changes by log, we can detect such
inconsistency, then shutdown filesystem.
In this case, on-disk SSA shows type of segno #173822 is 1 (SUM_TYPE_NODE),
however segno #173822 was just allocated as data type segment, so in-memory
SIT shows type of segno #173822 is 0 (SUM_TYPE_DATA).
Change as below to fix this issue:
- check whether current section is empty before gc
- add sanity checks on do_garbage_collect() to avoid any race case, result
in migrating segment used by log.
- btw, it fixes misc issue in printed logs: "SSA and SIT" -> "SIT and SSA".
Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
syzbot reports a bug as below:
loop0: detected capacity change from 0 to 40427
F2FS-fs (loop0): Wrong SSA boundary, start(3584) end(4096) blocks(3072)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): f2fs_convert_inline_folio: corrupted inline inode ino=3, i_addr[0]:0x1601, run fsck to fix.
------------[ cut here ]------------
kernel BUG at fs/inode.c:753!
RIP: 0010:clear_inode+0x169/0x190 fs/inode.c:753
Call Trace:
<TASK>
evict+0x504/0x9c0 fs/inode.c:810
f2fs_fill_super+0x5612/0x6fa0 fs/f2fs/super.c:5047
get_tree_bdev_flags+0x40e/0x4d0 fs/super.c:1692
vfs_get_tree+0x8f/0x2b0 fs/super.c:1815
do_new_mount+0x2a2/0x9e0 fs/namespace.c:3808
do_mount fs/namespace.c:4136 [inline]
__do_sys_mount fs/namespace.c:4347 [inline]
__se_sys_mount+0x317/0x410 fs/namespace.c:4324
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
During f2fs_evict_inode(), clear_inode() detects that we missed to truncate
all page cache before destorying inode, that is because in below path, we
will create page #0 in cache, but missed to drop it in error path, let's fix
it.
- evict
- f2fs_evict_inode
- f2fs_truncate
- f2fs_convert_inline_inode
- f2fs_grab_cache_folio
: create page #0 in cache
- f2fs_convert_inline_folio
: sanity check failed, return -EFSCORRUPTED
- clear_inode detects that inode->i_data.nrpages is not zero
Fixes: 92dffd0179 ("f2fs: convert inline_data when i_size becomes large")
Reported-by: syzbot+90266696fe5daacebd35@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68c09802.050a0220.3c6139.000e.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
we are holding namespace_sem and a reference to root of tree;
iterating through that tree does not need mount_lock. Neither
does the insertion into the rbtree of new namespace or incrementing
the mount count of that namespace.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Needed there since the callback passed to d_walk() (path_check_mount())
is using __path_is_mountpoint(), which uses __lookup_mnt().
Has to be taken in the caller - d_walk() might take rename_lock spinlock
component and that nests inside mount_lock.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Comments regarding "shadow mounts" were stale - no such thing anymore.
Document the locking requirements for __lookup_mnt().
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->lower_path.mnt has the same value for all dentries on given ecryptfs
instance and if somebody goes for mountpoint-crossing variant where that
would not be true, we can deal with that when it happens (and _not_
with duplicating these reference into each dentry).
As it is, we are better off just sticking a reference into ecryptfs-private
part of superblock and keeping it pinned until ->kill_sb().
That way we can stick a reference to underlying dentry right into ->d_fsdata
of ecryptfs one, getting rid of indirection through struct ecryptfs_dentry_info,
along with the entire struct ecryptfs_dentry_info machinery.
[kudos to Dan Carpenter for spotting a bug in ecryptfs_get_tree() part]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For each removed mount we need to calculate where the slaves will end up.
To avoid duplicating that work, do it for all mounts to be removed
at once, taking the mounts themselves out of propagation graph as
we go, then do all transfers; the duplicate work on finding destinations
is avoided since if we run into a mount that already had destination found,
we don't need to trace the rest of the way. That's guaranteed
O(removed mounts) for finding destinations and removing from propagation
graph and O(surviving mounts that have master removed) for transfers.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently do_lock_mount() has the target path switched to whatever
might be overmounting it. We _do_ want to have the parent
mount/mountpoint chosen on top of the overmounting pile; however,
the way it's done has unpleasant races - if umount propagation
removes the overmount while we'd been trying to set the environment
up, we might end up failing if our target path strays into that overmount
just before the overmount gets kicked out.
Users of do_lock_mount() do not need the target path changed - they
have all information in res->{parent,mp}; only one place (in
do_move_mount()) currently uses the resulting path->mnt, and that value
is trivial to reconstruct by the original value of path->mnt + chosen
parent mount.
Let's keep the target path unchanged; it avoids a bunch of subtle races
and it's not hard to do:
do
as mount_locked_reader
find the prospective parent mount/mountpoint dentry
grab references if it's not the original target
lock the prospective mountpoint dentry
take namespace_sem exclusive
if prospective parent/mountpoint would be different now
err = -EAGAIN
else if location has been unmounted
err = -ENOENT
else if mountpoint dentry is not allowed to be mounted on
err = -ENOENT
else if beneath and the top of the pile was the absolute root
err = -EINVAL
else
try to get struct mountpoint (by dentry), set
err to 0 on success and -ENO{MEM,ENT} on failure
if err != 0
res->parent = ERR_PTR(err)
drop locks
else
res->parent = prospective parent
drop temporary references
while err == -EAGAIN
A somewhat subtle part is that dropping temporary references is allowed.
Neither mounts nor dentries should be evicted by a thread that holds
namespace_sem. On success we are dropping those references under
namespace_sem, so we need to be sure that these are not the last
references remaining. However, on success we'd already verified (under
namespace_sem) that original target is still mounted and that mount
and dentry we are about to drop are still reachable from it via the
mount tree. That guarantees that we are not about to drop the last
remaining references.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Returns the final (topmost) mount in the chain of overmounts
starting at given mount. Same locking rules as for any mount
tree traversal - either the spinlock side of mount_lock, or
rcu + sample the seqcount side of mount_lock before the call
and recheck afterwards.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
That kills the last place where callers of lock_mount(path, &mp)
used path->dentry.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
parent and mountpoint always come from the same struct pinned_mountpoint
now.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Both callers pass it a mountpoint reference picked from pinned_mountpoint
and path it corresponds to.
First of all, path->dentry is equal to mp.mp->m_dentry. Furthermore, path->mnt
is &mp.parent->mnt, making struct path contents redundant.
Pass it the address of that pinned_mountpoint instead; what's more, if we
teach it to treat ERR_PTR(error) in ->parent as "bail out with that error"
we can simplify the callers even more - do_add_mount() will do the right
thing even when called after lock_mount() failure.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
After successful do_lock_mount() call, mp.parent is set to either
real_mount(path->mnt) (for !beneath case) or to ->mnt_parent of that
(for beneath). p is set to real_mount(path->mnt) and after
several uses it's made equal to mp.parent. All uses prior to that
care only about p->mnt_ns and since p->mnt_ns == parent->mnt_ns,
we might as well use mp.parent all along.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1) pinned_mountpoint gets a new member - struct mount *parent.
Set only if we locked the sucker; ERR_PTR() - on failed attempt.
2) do_lock_mount() et.al. return void and set ->parent to
* on success with !beneath - mount corresponding to path->mnt
* on success with beneath - the parent of mount corresponding
to path->mnt
* in case of error - ERR_PTR(-E...).
IOW, we get the mount we will be actually mounting upon or ERR_PTR().
3) we can't use CLASS, since the pinned_mountpoint is placed on
hlist during initialization, so we define local macros:
LOCK_MOUNT(mp, path)
LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath)
LOCK_MOUNT_EXACT(mp, path)
All of them declare and initialize struct pinned_mountpoint mp,
with unlock_mount done via __cleanup().
Users converted.
[
lock_mount() is unused now; removed.
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and get rid of path argument - it turns into a local variable in get_target()
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>