Commit Graph

50492 Commits

Author SHA1 Message Date
Jaegeuk Kim
ef095d19e8 f2fs: write small sized IO to hot log
It would better split small and large IOs separately in order to get more
consecutive big writes.

The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05 11:05:05 -07:00
Chao Yu
a7eeb82385 f2fs: use bitmap in discard_entry
This patch changes to use bitmap instead of extent in struct discard_entry
to indicate discard range in one segment, for fragmented space, this
implementation can save memory footprint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05 11:05:04 -07:00
Chao Yu
f099405fc8 f2fs: clean up destroy_discard_cmd_control
Remove unneeded parameter and simply change flow in
destroy_discard_cmd_control.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05 11:05:04 -07:00
Chao Yu
5f32366a29 f2fs: count discard command entry
Adds to count discard command entry and show the number in debugfs,
also fix to add cost of discard command cache into total comsumed
memory footprint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05 11:05:03 -07:00
Chao Yu
8b8dd65f72 f2fs: show issued flush/discard count
Show historical count of flush command and discard command.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05 11:05:02 -07:00
Andrew Price
d4d7fc12b6 gfs2: Re-enable fallocate for the rindex
Commit 86066914ed "gfs2: Don't support
fallocate on jdata files" removed the ability of gfs2_grow to reserve
space at the end of the rindex, which could prevent a second gfs2_grow
from succeeding if the fs is full. Allow fallocate to work on the rindex
once again.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-05 11:45:26 -04:00
Jan Kara
1e0e653f11 reiserfs: Protect dquot_writeback_dquots() by s_umount semaphore
dquot_writeback_dquots() expects s_umount semaphore to be held to
protect it from other concurrent quota operations. reiserfs_sync_fs()
can call dquot_writeback_dquots() without holding s_umount semaphore
when called from flush_old_commits().

Fix the problem by grabbing s_umount in flush_old_commits(). However we
have to be careful and use only trylock since reiserfs_cancel_old_sync()
can be waiting for flush_old_commits() to complete while holding
s_umount semaphore. Possible postponing of sync work is not a big deal
though as that is only an opportunistic flush.

Fixes: 9d1ccbe70e
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-05 14:24:55 +02:00
Jan Kara
71b0576bdb reiserfs: Make cancel_old_flush() reliable
Currently canceling of delayed work that flushes old data using
cancel_old_flush() does not prevent work from being requeued. Thus
in theory new work can be queued after cancel_old_flush() from
reiserfs_freeze() has run. This will become larger problem once
flush_old_commits() can requeue the work itself.

Fix the problem by recording in sbi->work_queue that flushing work is
canceled and should not be requeued.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-05 14:24:16 +02:00
Jan Kara
6554766150 ext2: Call dquot_writeback_dquots() with s_umount held
ext2_sync_fs() could be called without s_umount semaphore held when
called through ext2_write_super() from __ext2_write_inode(). This
function then calls dquot_writeback_dquots() which relies on s_umount to
be held for protection against other quota operations.

In fact __ext2_write_inode() does not need all the functionality
ext2_write_super() provides. It is enough to just write the superblock.
So use ext2_sync_super() instead.

Fixes: 9d1ccbe70e
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-05 14:23:45 +02:00
Darrick J. Wong
4c934c7dd6 xfs: report realtime space information via the rtbitmap
Use the realtime bitmap to return free space information via getfsmap.
Eventually this will be superseded by the realtime rmapbt code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-04-03 15:18:18 -07:00
Darrick J. Wong
a1cae7283d xfs: have getfsmap fall back to the freesp btrees when rmap is not present
If the reverse-mapping btree isn't available, fall back to the
free space btrees to provide partial reverse mapping information.
The online scrub tool can make use of even partial information to
speed up the data block scan.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-04-03 15:18:18 -07:00
Darrick J. Wong
e89c041338 xfs: implement the GETFSMAP ioctl
Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-04-03 15:18:17 -07:00
Darrick J. Wong
fb3c3de2f6 xfs: add a couple of queries to iterate free extents in the rtbitmap
Add _query_range and _query_all functions to the realtime bitmap
allocator.  These two functions are similar in usage to the btree
functions with the same name and will be used for getfsmap and scrub.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-04-03 15:18:17 -07:00
Darrick J. Wong
e9a2599a24 xfs: create a function to query all records in a btree
Create a helper function that will query all records in a btree.
This will be used by the online repair functions to examine every
record in a btree to rebuild a second btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-04-03 15:18:17 -07:00
Darrick J. Wong
2d520bfaa2 xfs: provide a query_range function for freespace btrees
Implement a query_range function for the bnobt and cntbt.  This will
be used for getfsmap fallback if there is no rmapbt and by the online
scrub and repair code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-04-03 15:18:17 -07:00
Darrick J. Wong
08438b1e38 xfs: plumb in needed functions for range querying of the freespace btrees
Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the free space btrees.  Remove the debugging asserts
so that we can make queries starting from block 0.

While we're at it, merge the redundant "if (btnum ==" hunks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-04-03 15:18:17 -07:00
Darrick J. Wong
be6324c00c xfs: fix over-copying of getbmap parameters from userspace
In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel.  struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-03 15:18:16 -07:00
Nikolay Borisov
422e5b53ed xfs: Remove obsolete declaration of xfs_buf_get_empty
This function has been removed ever since at least 3.12-era. No need to
keep its declaration in the header so nuke it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 15:18:16 -07:00
Eric Sandeen
bc593eebfd xfs: fix up inode validation failure message
"xfs_iread: validation failed for inode 96 failed"

One "failed" seems like enough.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 15:18:16 -07:00
Christoph Hellwig
63fbb4c18d xfs: remove the ISUNWRITTEN macro
Opencoding the trivial checks makes it much easier to read (and grep..).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 15:18:16 -07:00
Christoph Hellwig
9c4f29d391 xfs: factor out a xfs_bmap_is_real_extent helper
This checks for all the non-normal extent types, including handling both
encodings of delayed allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 15:18:16 -07:00
Brian Foster
696a562072 xfs: use dedicated log worker wq to avoid deadlock with cil wq
The log covering background task used to be part of the xfssyncd
workqueue. That workqueue was removed as of commit 5889608df ("xfs:
syncd workqueue is no more") and the associated work item scheduled
to the xfs-log wq. The latter is used for log buffer I/O completion.

Since xfs_log_worker() can invoke a log flush, a deadlock is
possible between the xfs-log and xfs-cil workqueues. Consider the
following codepath from xfs_log_worker():

xfs_log_worker()
  xfs_log_force()
    _xfs_log_force()
      xlog_cil_force()
        xlog_cil_force_lsn()
          xlog_cil_push_now()
            flush_work()

The above is in xfs-log wq context and blocked waiting on the
completion of an xfs-cil work item. Concurrently, the cil push in
progress can end up blocked here:

xlog_cil_push_work()
  xlog_cil_push()
    xlog_write()
      xlog_state_get_iclog_space()
        xlog_wait(&log->l_flush_wait, ...)

The above is in xfs-cil context waiting on log buffer I/O
completion, which executes in xfs-log wq context. In this scenario
both workqueues are deadlocked waiting on eachother.

Add a new workqueue specifically for the high level log covering and
ail pushing worker, as was the case prior to commit 5889608df.

Diagnosed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 15:18:15 -07:00
Darrick J. Wong
f1c0e20243 xfs: fix kernel memory exposure problems
Fix a memory exposure problems in inumbers where we allocate an array of
structures with holes, fail to zero the holes, then blindly copy the
kernel memory contents (junk and all) into userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-03 15:18:15 -07:00
Calvin Owens
105664df51 xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files
When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will
round the file size up to the nearest multiple of PAGE_SIZE:

  calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1
  calvinow@vm-disks/generic-xfs-1 ~$ stat test
    Size: 2048            Blocks: 8          IO Block: 4096   regular file
  calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test
  calvinow@vm-disks/generic-xfs-1 ~$ stat test
    Size: 4096            Blocks: 8          IO Block: 4096   regular file

Commit 3c2bdc912a ("xfs: kill xfs_zero_remaining_bytes") replaced
xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers
don't enforce that [pos,offset) lies strictly on [0,i_size) when being
called from xfs_free_file_space(), so by "leaking" these ranges into
xfs_zero_range() we get this buggy behavior.

Fix this by reintroducing the checks xfs_zero_remaining_bytes() did
against i_size at the bottom of xfs_free_file_space().

Reported-by: Aaron Gao <gzh@fb.com>
Fixes: 3c2bdc912a ("xfs: kill xfs_zero_remaining_bytes")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Brian Foster <bfoster@redhat.com>
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 15:18:15 -07:00
Darrick J. Wong
bf9216f922 xfs: fix kernel memory exposure problems
Fix a memory exposure problems in inumbers where we allocate an array of
structures with holes, fail to zero the holes, then blindly copy the
kernel memory contents (junk and all) into userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-03 12:22:39 -07:00
Calvin Owens
3dd09d5a85 xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files
When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will
round the file size up to the nearest multiple of PAGE_SIZE:

  calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1
  calvinow@vm-disks/generic-xfs-1 ~$ stat test
    Size: 2048            Blocks: 8          IO Block: 4096   regular file
  calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test
  calvinow@vm-disks/generic-xfs-1 ~$ stat test
    Size: 4096            Blocks: 8          IO Block: 4096   regular file

Commit 3c2bdc912a ("xfs: kill xfs_zero_remaining_bytes") replaced
xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers
don't enforce that [pos,offset) lies strictly on [0,i_size) when being
called from xfs_free_file_space(), so by "leaking" these ranges into
xfs_zero_range() we get this buggy behavior.

Fix this by reintroducing the checks xfs_zero_remaining_bytes() did
against i_size at the bottom of xfs_free_file_space().

Reported-by: Aaron Gao <gzh@fb.com>
Fixes: 3c2bdc912a ("xfs: kill xfs_zero_remaining_bytes")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Brian Foster <bfoster@redhat.com>
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 12:22:29 -07:00
Darrick J. Wong
78420281a9 xfs: rework the inline directory verifiers
The inline directory verifiers should be called on the inode fork data,
which means after iformat_local on the read side, and prior to
ifork_flush on the write side.  This makes the fork verifier more
consistent with the way buffer verifiers work -- i.e. they will operate
on the memory buffer that the code will be reading and writing directly.

Furthermore, revise the verifier function to return -EFSCORRUPTED so
that we don't flood the logs with corruption messages and assert
notices.  This has been a particular problem with xfs/348, which
triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
that, at least not in a verifier.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03 12:22:20 -07:00
Jan Kara
c97476400d fanotify: Move recalculation of inode / vfsmount mask under mark_mutex
Move recalculation of inode / vfsmount notification mask under
group->mark_mutex of the mark which was modified. These are the only
places where mask recalculation happens without mark being protected
from detaching from inode / vfsmount which will cause issues with the
following patches.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-03 16:56:40 +02:00
Jan Kara
25c829afbd inotify: Remove inode pointers from debug messages
Printing inode pointers in warnings has dubious value and with future
changes we won't be able to easily get them without either locking or
chances we oops along the way. So just remove inode pointers from the
warning messages.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-03 16:56:34 +02:00
Jan Kara
5198adf649 fsnotify: Remove unnecessary tests when showing fdinfo
show_fdinfo() iterates group's list of marks. All marks found there are
guaranteed to be alive and they stay so until we release
group->mark_mutex. So remove uncecessary tests whether mark is alive.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-03 16:56:18 +02:00
Andreas Gruenbacher
d4da31986c Revert "GFS2: Wait for iopen glock dequeues"
Revert commit 86d067a797: it turns out
that waiting for iopen glock dequeues here isn't needed anymore because
the bugs that commit was meant to fix have been fixed otherwise.

In addition, we want to avoid waiting on glocks in gfs2_evict_inode in
shrinker context because the shrinker may be invoked on behalf of DLM,
in which case calling into DLM again would deadlock.  This commit makes
the described scenario less likely without completely avoiding it; it's
still a step in the right direction, though.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-03 09:14:47 -04:00
Andreas Gruenbacher
0a52aba7c2 gfs2: Switch to rhashtable_lookup_get_insert_fast
Switch from rhashtable_lookup_insert_fast + rhashtable_lookup_fast to
rhashtable_lookup_get_insert_fast, which is cleaner and avoids an extra
rhashtable lookup.

At the same time, turn the retry loop in gfs2_glock_get into an infinite
loop.  The lookup or insert will eventually succeed, usually very fast,
but there is no reason to give up trying at a fixed number of
iterations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-03 09:14:41 -04:00
David Howells
3209f68b3c statx: Include a mask for stx_attributes in struct statx
Include a mask in struct stat to indicate which bits of stx_attributes the
filesystem actually supports.

This would also be useful if we add another system call that allows you to
do a 'bulk attribute set' and pass in a statx struct with the masks
appropriately set to say what you want to set.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:06:00 -04:00
David Howells
47071aee6a statx: Reserve the top bit of the mask for future struct expansion
Reserve the top bit of the mask for future expansion of the statx struct
and give an error if statx() sees it set.  All the other bits are ignored
if we see them set but don't support the bit; we just clear the bit in the
returned mask.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:05:59 -04:00
Darrick J. Wong
5f955f26f3 xfs: report crtime and attribute flags to statx
statx has the ability to report inode creation times and inode flags, so
hook up di_crtime and di_flags to that functionality.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:05:59 -04:00
David Howells
99652ea56a ext4: Add statx support
Return enhanced file attributes from the Ext4 filesystem.  This includes
the following:

 (1) The inode creation time (i_crtime) as stx_btime, setting STATX_BTIME.

 (2) Certain FS_xxx_FL flags are mapped to stx_attribute flags.

This requires that all ext4 inodes have a getattr call, not just some of
them, so to this end, split the ext4_getattr() function and only call part
of it where appropriate.

Example output:

	[root@andromeda ~]# touch foo
	[root@andromeda ~]# chattr +ai foo
	[root@andromeda ~]# /tmp/test-statx foo
	statx(foo) = 0
	results=fff
	  Size: 0               Blocks: 0          IO Block: 4096    regular file
	Device: 08:12           Inode: 2101950     Links: 1
	Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
	Access: 2016-02-11 17:08:29.031795451+0000
	Modify: 2016-02-11 17:08:29.031795451+0000
	Change: 2016-02-11 17:11:11.987790114+0000
	 Birth: 2016-02-11 17:08:29.031795451+0000
	Attributes: 0000000000000030 (-------- -------- -------- -------- -------- -------- -------- --ai----)

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:05:58 -04:00
Eric Biggers
64bd72048a statx: optimize copy of struct statx to userspace
I found that statx() was significantly slower than stat().  As a
microbenchmark, I compared 10,000,000 invocations of fstat() on a tmpfs
file to the same with statx() passed a NULL path:

	$ time ./stat_benchmark

	real	0m1.464s
	user	0m0.275s
	sys	0m1.187s

	$ time ./statx_benchmark

	real	0m5.530s
	user	0m0.281s
	sys	0m5.247s

statx is expected to be a little slower than stat because struct statx
is larger than struct stat, but not by *that* much.  It turns out that
most of the overhead was in copying struct statx to userspace, mostly in
all the stac/clac instructions that got generated for each __put_user()
call.  (This was on x86_64, but some other architectures, e.g. arm64,
have something similar now too.)

stat() instead initializes its struct on the stack and copies it to
userspace with a single call to copy_to_user().  This turns out to be
much faster, and changing statx to do this makes it almost as fast as
stat:

	$ time ./statx_benchmark

	real	0m1.624s
	user	0m0.270s
	sys	0m1.354s

For zeroing the reserved fields, start by zeroing the full struct with
memset.  This makes it clear that every byte copied to userspace is
initialized, even implicit padding bytes (though there are none
currently).  In the scenarios I tested, it also performed the same as a
designated initializer.  Manually initializing each field was still
slightly faster, but would have been more error-prone and less
verifiable.

Also rename statx_set_result() to cp_statx() for consistency with
cp_old_stat() et al., and make it noinline so that struct statx doesn't
add to the stack usage during the main portion of the syscall execution.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:05:57 -04:00
Eric Biggers
b15fb70b82 statx: remove incorrect part of vfs_statx() comment
request_mask and query_flags are function arguments, not passed in
struct kstat.  So remove the part of the comment which claims otherwise.
This was apparently left over from an earlier version of the statx
patch.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:05:57 -04:00
Eric Biggers
8c7493aa3e statx: reject unknown flags when using NULL path
The statx() system call currently accepts unknown flags when called with
a NULL path to operate on a file descriptor.  Left unchanged, this could
make it hard to introduce new query flags in the future, since
applications may not be able to tell whether a given flag is supported.

Fix this by failing the system call with EINVAL if any flags other than
KSTAT_QUERY_FLAGS are specified in combination with a NULL path.

Arguably, we could still permit known lookup-related flags such as
AT_SYMLINK_NOFOLLOW.  However, that would be inconsistent with how
sys_utimensat() behaves when passed a NULL path, which seems to be the
closest precedent.  And given that the NULL path case is (I believe)
mainly intended to be used to implement a wrapper function like fstatx()
that doesn't have a path argument, I think rejecting lookup-related
flags too is probably the best choice.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:05:56 -04:00
Linus Torvalds
978e0f92cd Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kasan: do not sanitize kexec purgatory
  drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
  mm/hugetlb.c: don't call region_abort if region_chg fails
  kasan: report only the first error by default
  hugetlbfs: initialize shared policy as part of inode allocation
  mm: fix section name for .data..ro_after_init
  mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
  mm: workingset: fix premature shadow node shrinking with cgroups
  mm: rmap: fix huge file mmap accounting in the memcg stats
  mm: move mm_percpu_wq initialization earlier
  mm: migrate: fix remove_migration_pte() for ksm pages
2017-04-01 19:45:05 -07:00
Linus Torvalds
09c8b3d1d6 Merge tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "The restriction of NFSv4 to TCP went overboard and also broke the
  backchannel; fix.

  Also some minor refinements to the nfsd version-setting interface that
  we'd like to get fixed before release"

* tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux:
  svcrdma: set XPT_CONG_CTRL flag for bc xprt
  NFSD: fix nfsd_reset_versions for NFSv4.
  NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
  NFSD: further refinement of content of /proc/fs/nfsd/versions
  nfsd: map the ENOKEY to nfserr_perm for avoiding warning
  SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt
2017-04-01 10:43:37 -07:00
Linus Torvalds
fe8e12b503 Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We have three small fixes queued up in my for-linus-4.11 branch"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix an integer overflow check
  btrfs: Change qgroup_meta_rsv to 64bit
  Btrfs: bring back repair during read
2017-03-31 17:58:48 -07:00
Mike Kravetz
4742a35d9d hugetlbfs: initialize shared policy as part of inode allocation
Any time after inode allocation, destroy_inode can be called.  The
hugetlbfs inode contains a shared_policy structure, and
mpol_free_shared_policy is unconditionally called as part of
hugetlbfs_destroy_inode.  Initialize the policy as part of inode
allocation so that any quick (error path) calls to destroy_inode will be
handed an initialized policy.

syzkaller fuzzer found this bug, that resulted in the following:

    BUG: KASAN: user-memory-access in atomic_inc
    include/asm-generic/atomic-instrumented.h:87 [inline] at addr
    000000131730bd7a
    BUG: KASAN: user-memory-access in __lock_acquire+0x21a/0x3a80
    kernel/locking/lockdep.c:3239 at addr 000000131730bd7a
    Write of size 4 by task syz-executor6/14086
    CPU: 3 PID: 14086 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #364
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     atomic_inc include/asm-generic/atomic-instrumented.h:87 [inline]
     __lock_acquire+0x21a/0x3a80 kernel/locking/lockdep.c:3239
     lock_acquire+0x1ee/0x590 kernel/locking/lockdep.c:3762
     __raw_write_lock include/linux/rwlock_api_smp.h:210 [inline]
     _raw_write_lock+0x33/0x50 kernel/locking/spinlock.c:295
     mpol_free_shared_policy+0x43/0xb0 mm/mempolicy.c:2536
     hugetlbfs_destroy_inode+0xca/0x120 fs/hugetlbfs/inode.c:952
     alloc_inode+0x10d/0x180 fs/inode.c:216
     new_inode_pseudo+0x69/0x190 fs/inode.c:889
     new_inode+0x1c/0x40 fs/inode.c:918
     hugetlbfs_get_inode+0x40/0x420 fs/hugetlbfs/inode.c:734
     hugetlb_file_setup+0x329/0x9f0 fs/hugetlbfs/inode.c:1282
     newseg+0x422/0xd30 ipc/shm.c:575
     ipcget_new ipc/util.c:285 [inline]
     ipcget+0x21e/0x580 ipc/util.c:639
     SYSC_shmget ipc/shm.c:673 [inline]
     SyS_shmget+0x158/0x230 ipc/shm.c:657
     entry_SYSCALL_64_fastpath+0x1f/0xc2

Analysis provided by Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Link: http://lkml.kernel.org/r/1490477850-7944-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31 17:13:30 -07:00
Linus Torvalds
f9799ad21b Merge tag 'nfs-for-4.11-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
 "Here are a few more bugfixes that came in over the last couple of
  weeks. Most of these fix various hangs and loops that people found,
  but we also had a few error handling fixes.

  Stable Bugfixes:
   - fix infinite loop on BAD_STATEID error

  Other Bugfixes:
   - fix old dentry rehash after move
   - fix pnfs GETDEVINFO hangs
   - fix pnfs fallback to MDS on commit errors
   - fix flexfiles kernel oops"

* tag 'nfs-for-4.11-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  nfs: flexfiles: fix kernel OOPS if MDS returns unsupported DS type
  NFSv4.1 fix infinite loop on IO BAD_STATEID error
  PNFS fix fallback to MDS if got error on commit to DS
  NFS filelayout:call GETDEVICEINFO after pnfs_layout_process completes
  NFS store nfs4_deviceid in struct nfs4_filelayout_segment
  NFS cleanup struct nfs4_filelayout_segment
  NFS: Fix old dentry rehash after move
2017-03-31 12:29:03 -07:00
Tigran Mkrtchyan
f17f8a14e8 nfs: flexfiles: fix kernel OOPS if MDS returns unsupported DS type
this fix aims to fix dereferencing of a mirror in an error state when MDS
returns unsupported DS type (IOW, not v3), which causes the following oops:

[  220.370709] BUG: unable to handle kernel NULL pointer dereference at 0000000000000065
[  220.370842] IP: ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles]
[  220.370920] PGD 0

[  220.370972] Oops: 0000 [#1] SMP
[  220.371013] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btrfs kvm arc4 snd_hda_codec_hdmi iwldvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate mac80211 xor uvcvideo
[  220.371814]  videobuf2_vmalloc videobuf2_memops snd_hda_codec_idt mei_wdt videobuf2_v4l2 snd_hda_codec_generic iTCO_wdt ppdev videobuf2_core iTCO_vendor_support dell_rbtn dell_wmi iwlwifi sparse_keymap dell_laptop dell_smbios snd_hda_intel dcdbas videodev snd_hda_codec dell_smm_hwmon snd_hda_core media cfg80211 intel_uncore snd_hwdep raid6_pq snd_seq intel_rapl_perf snd_seq_device joydev i2c_i801 rfkill lpc_ich snd_pcm parport_pc mei_me parport snd_timer dell_smo8800 mei snd shpchp soundcore tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 nouveau mxm_wmi ttm i2c_algo_bit drm_kms_helper crc32c_intel e1000e drm sdhci_pci firewire_ohci sdhci serio_raw mmc_core firewire_core ptp crc_itu_t pps_core wmi fjes video
[  220.372568] CPU: 7 PID: 4988 Comm: cat Not tainted 4.10.5-200.fc25.x86_64 #1
[  220.372647] Hardware name: Dell Inc. Latitude E6520/0J4TFW, BIOS A06 07/11/2011
[  220.372729] task: ffff94791f6ea580 task.stack: ffffb72b88c0c000
[  220.372802] RIP: 0010:ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles]
[  220.372883] RSP: 0018:ffffb72b88c0f970 EFLAGS: 00010246
[  220.372945] RAX: 0000000000000000 RBX: ffff9479015ca600 RCX: ffffffffffffffed
[  220.373025] RDX: ffffffffffffffed RSI: ffff9479753dc980 RDI: 0000000000000000
[  220.373104] RBP: ffffb72b88c0f988 R08: 000000000001c980 R09: ffffffffc0ea6112
[  220.373184] R10: ffffef17477d9640 R11: ffff9479753dd6c0 R12: ffff9479211c7440
[  220.373264] R13: ffff9478f45b7790 R14: 0000000000000001 R15: ffff9479015ca600
[  220.373345] FS:  00007f555fa3e700(0000) GS:ffff9479753c0000(0000) knlGS:0000000000000000
[  220.373435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  220.373506] CR2: 0000000000000065 CR3: 0000000196044000 CR4: 00000000000406e0
[  220.373586] Call Trace:
[  220.373627]  nfs4_ff_layout_prepare_ds+0x5e/0x200 [nfs_layout_flexfiles]
[  220.373708]  ff_layout_pg_init_read+0x81/0x160 [nfs_layout_flexfiles]
[  220.373806]  __nfs_pageio_add_request+0x11f/0x4a0 [nfs]
[  220.373886]  ? nfs_create_request.part.14+0x37/0x330 [nfs]
[  220.373967]  nfs_pageio_add_request+0xb2/0x260 [nfs]
[  220.374042]  readpage_async_filler+0xaf/0x280 [nfs]
[  220.374103]  read_cache_pages+0xef/0x1b0
[  220.374166]  ? nfs_read_completion+0x210/0x210 [nfs]
[  220.374239]  nfs_readpages+0x129/0x200 [nfs]
[  220.374293]  __do_page_cache_readahead+0x1d0/0x2f0
[  220.374352]  ondemand_readahead+0x17d/0x2a0
[  220.374403]  page_cache_sync_readahead+0x2e/0x50
[  220.374460]  generic_file_read_iter+0x6c8/0x950
[  220.374532]  ? nfs_mapping_need_revalidate_inode+0x17/0x40 [nfs]
[  220.374617]  nfs_file_read+0x6e/0xc0 [nfs]
[  220.374670]  __vfs_read+0xe2/0x150
[  220.374715]  vfs_read+0x96/0x130
[  220.374758]  SyS_read+0x55/0xc0
[  220.374801]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[  220.374856] RIP: 0033:0x7f555f570bd0
[  220.374900] RSP: 002b:00007ffeb73e1b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  220.374986] RAX: ffffffffffffffda RBX: 00007f555f839ae0 RCX: 00007f555f570bd0
[  220.375066] RDX: 0000000000020000 RSI: 00007f555fa41000 RDI: 0000000000000003
[  220.375145] RBP: 0000000000021010 R08: ffffffffffffffff R09: 0000000000000000
[  220.375226] R10: 00007f555fa40010 R11: 0000000000000246 R12: 0000000000022000
[  220.375305] R13: 0000000000021010 R14: 0000000000001000 R15: 0000000000002710
[  220.375386] Code: 66 66 90 55 48 89 e5 41 54 53 49 89 fc 48 83 ec 08 48 85 f6 74 2e 48 8b 4e 30 48 89 f3 48 81 f9 00 f0 ff ff 77 1e 48 85 c9 74 15 <48> 83 79 78 00 b8 01 00 00 00 74 2c 48 83 c4 08 5b 41 5c 5d c3
[  220.375653] RIP: ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles] RSP: ffffb72b88c0f970
[  220.375748] CR2: 0000000000000065
[  220.403538] ---[ end trace bcdca752211b7da9 ]---

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-03-31 13:30:49 -04:00
Olga Kornievskaia
0e3d3e5df0 NFSv4.1 fix infinite loop on IO BAD_STATEID error
Commit 63d63cbf5e "NFSv4.1: Don't recheck delegations that
have already been checked" introduced a regression where when a
client received BAD_STATEID error it would not send any TEST_STATEID
and instead go into an infinite loop of resending the IO that caused
the BAD_STATEID.

Fixes: 63d63cbf5e ("NFSv4.1: Don't recheck delegations that have already been checked")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-03-31 13:30:21 -04:00
Olga Kornievskaia
fabbbee0eb PNFS fix fallback to MDS if got error on commit to DS
Upong receiving some errors (EACCES) on commit to the DS the code
doesn't fallback to MDS and intead retrieds to the same DS again.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-03-30 13:27:20 -04:00
Felix Fietkau
c3d9fda688 ubifs: Fix RENAME_WHITEOUT support
Remove faulty leftover check in do_rename(), apparently introduced in a
merge that combined whiteout support changes with commit f03b8ad8d3
("fs: support RENAME_NOREPLACE for local filesystems")

Fixes: f03b8ad8d3 ("fs: support RENAME_NOREPLACE for local
filesystems")
Fixes: 9e0a1fff8d ("ubifs: Implement RENAME_WHITEOUT")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-03-30 09:28:02 +02:00
Hyunchul Lee
33fda9fa9f ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode
instead of filenames, print inode numbers, file types, and length.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-03-30 09:27:53 +02:00
Hyunchul Lee
e328379a18 ubifs: Fix debug messages for an invalid filename in ubifs_dump_node
if a character is not printable, print '?' instead of that.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-03-30 09:27:47 +02:00