Commit Graph

50492 Commits

Author SHA1 Message Date
Christoph Hellwig
5e30c23d13 xfs: don't fail xfs_extent_busy allocation
We don't just need the structure to track busy extents which can be
avoided with a synchronous transaction, but also to keep track of
pending discard.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09 10:50:25 -08:00
Bill O'Donnell
b20fe4730e xfs: correct null checks and error processing in xfs_initialize_perag
If pag cannot be allocated, the current error exit path will trip
a null pointer deference error when calling xfs_buf_hash_destroy
with a null pag.  Fix this by adding a new error exit labels and
jumping to those accordingly, avoiding the hash destroy and
unnecessary kmem_free on pag.

Up to three things need to be properly unwound:

1) pag memory allocation
2) xfs_buf_hash_init
3) radix_tree_insert

For any given iteration through the loop, any of the above which
succeed must be unwound for /this/ pag, and then all prior
initialized pags must be unwound.

Addresses-Coverity-Id: 1397628 ("Dereference after null check")

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09 10:50:25 -08:00
Christoph Hellwig
c5ecb42342 xfs: update ctime and mtime on clone destinatation inodes
We're changing both metadata and data, so we need to update the
timestamps for clone operations.  Dedupe on the other hand does
not change file data, and only changes invisible metadata so the
timestamps should not be updated.

This follows existing btrfs behavior.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: remove redundant is_dedupe test]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-09 10:50:24 -08:00
Fabian Frederick
684666e515 jfs: atomically read inode size
See i_size_read() comments in include/linux/fs.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2017-02-09 11:57:22 -06:00
Kinglong Mee
6c71100db5 fanotify: simplify the code of fanotify_merge
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-09 14:09:22 +01:00
Trond Myklebust
a974deee47 NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
If we exit because the file access check failed, we currently
leak the struct nfs4_state. We need to attach it to the
open context before returning.

Fixes: 3efb972247 ("NFSv4: Refactor _nfs4_open_and_get_state..")
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08 17:02:47 -05:00
Kinglong Mee
863d7d9c2e sunrpc/nfs: cleanup procfs/pipefs entry in cache_detail
Record flush/channel/content entries is useless, remove them.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08 17:02:45 -05:00
Nicholas Piggin
600424e3d9 nfs: no PG_private waiters remain, remove waker
Since commit 4f52b6bb ("NFS: Don't call COMMIT in ->releasepage()"),
no tasks wait on PagePrivate, so the wake introduced in commit 95905446
("NFS: avoid deadlocks with loop-back mounted NFS filesystems.") can
be removed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08 17:02:44 -05:00
Benjamin Coddington
920b4530fb NFS: nfs_rename() handle -ERESTARTSYS dentry left behind
An interrupted rename will leave the old dentry behind if the rename
succeeds.  Fix this by moving the final local work of the rename to
rpc_call_done so that the results of the RENAME can always be handled,
even if the original process has already returned with -ERESTARTSYS.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-02-08 17:02:44 -05:00
Christoph Hellwig
168316db35 dax: assert that i_rwsem is held exclusive for writes
Make sure all callers follow the same locking protocol, given that DAX
transparantly replaced the normal buffered I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-02-08 14:43:13 -05:00
Christoph Hellwig
ff5462e39c ext4: fix DAX write locking
Unlike O_DIRECT DAX is not an optional opt-in feature selected by the
application, so we'll have to provide the traditional synchronіzation
of overlapping writes as we do for buffered writes.

This was broken historically for DAX, but got fixed for ext2 and XFS
as part of the iomap conversion.  Fix up ext4 as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-02-08 14:39:27 -05:00
Jeff Mahoney
2a36224918 btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
Commit 4c63c2454e incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called.  The ->compat_ioctl callback is
expected to handle all ioctls, not just compat variants.  As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.

Fixes: 4c63c2454e ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-08 17:47:30 +01:00
Eric Biggers
6f69f0ed61 fscrypt: constify struct fscrypt_operations
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Richard Weinberger <richard@nod.at>
2017-02-08 10:59:57 -05:00
David S. Miller
3efa70d78f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 16:29:30 -05:00
Hugh Dickins
b6789123bc mm: fix KPF_SWAPCACHE in /proc/kpageflags
Commit 6326fec112 ("mm: Use owner_priv bit for PageSwapCache, valid
when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
depending on PageSwapBacked being true).

As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
synthesized, instead of being shown on unrelated pages which just happen
to have PG_owner_priv_1 set.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-07 12:08:32 -08:00
Konstantin Khlebnikov
51f8f3c4e2 ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
If overlay was mounted by root then quota set for upper layer does not work
because overlay now always use mounter's credentials for operations.
Also overlay might deplete reserved space and inodes in ext4.

This patch drops capability SYS_RESOURCE from saved credentials.
This affects creation new files, whiteouts, and copy-up operations.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 1175b6b8d9 ("ovl: do operations on underlying file system in mounter's context")
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:47:14 +01:00
Amir Goldstein
e593b2bf51 ovl: properly implement sync_filesystem()
overlayfs syncs all inode pages on sync_filesystem(), but it also
needs to call s_op->sync_fs() of upper fs for metadata sync.

This fixes correctness of syncfs(2) as demonstrated by following
xfs specific test:

xfs_sync_stats()
{
	echo $1
	echo -n "xfs_log_force = "
	grep log /proc/fs/xfs/stat  | awk '{ print $5 }'
}

xfs_sync_stats "before touch"
touch x
xfs_sync_stats "after touch"
xfs_io -c syncfs .
xfs_sync_stats "after syncfs"
xfs_io -c fsync x
xfs_sync_stats "after fsync"
xfs_io -c fsync x
xfs_sync_stats "after fsync #2"

When this test is run in overlay mount over xfs, log force
count does not increase with syncfs command.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:47:14 +01:00
Amir Goldstein
01ad3eb8a0 ovl: concurrent copy up of regular files
Now that copy up of regular file is done using O_TMPFILE,
we don't need to hold rename_lock throughout copy up.

Use the copy up waitqueue to synchronize concurrent copy up
of the same file. Different regular files can be copied up
concurrently.

The upper dir inode_lock is taken instead of rename_lock,
because it is needed for lookup and later for linking the
temp file, but it is released while copying up data.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:47:14 +01:00
Amir Goldstein
39d3d60a54 ovl: introduce copy up waitqueue
The overlay sb 'copyup_wq' and overlay inode 'copying' condition
variable are about to replace the upper sb rename_lock, as finer
grained synchronization objects for concurrent copy up.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:47:14 +01:00
Amir Goldstein
d8514d8edb ovl: copy up regular file using O_TMPFILE
In preparation for concurrent copy up, implement copy up
of regular file as O_TMPFILE that is linked to upperdir
instead of a file in workdir that is moved to upperdir.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:47:14 +01:00
Amir Goldstein
42f269b925 ovl: rearrange code in ovl_copy_up_locked()
As preparation to implementing copy up with O_TMPFILE,
name the variable for dentry before final rename 'temp' and
assign it to 'newdentry' only after rename.

Also lookup upper dentry before looking up temp dentry and
move ovl_set_timestamps() into ovl_copy_up_locked(), because
that is going to be more convenient for upcoming change.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:47:14 +01:00
Amir Goldstein
e7f52429b4 ovl: check if upperdir fs supports O_TMPFILE
This is needed for choosing between concurrent copyup
using O_TMPFILE and legacy copyup using workdir+rename.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:47:14 +01:00
Amir Goldstein
bfe219d373 vfs: wrap write f_ops with file_{start,end}_write()
Before calling write f_ops, call file_start_write() instead
of sb_start_write().

Replace {sb,file}_start_write() for {copy,clone}_file_range() and
for fallocate().

Beyond correct semantics, this avoids freeze protection to sb when
operating on special inodes, such as fallocate() on a blockdev.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:05:04 +01:00
Amir Goldstein
11cbfb1077 vfs: deny copy_file_range() for non regular files
There is no in-tree file system that implements copy_file_range()
for non regular files.

Deny an attempt to copy_file_range() a directory with EISDIR
and any other non regualr file with EINVAL to conform with
behavior of vfs_{clone,dedup}_file_range().

This change is needed prior to converting sb_start_write()
to  file_start_write() in the vfs helper.

Cc: linux-api@vger.kernel.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:05:04 +01:00
Amir Goldstein
9e79b13263 vfs: deny fallocate() on directory
There was an obscure use case of fallocate of directory inode
in the vfs helper with the comment:
"Let individual file system decide if it supports preallocation
 for directories or not."

But there is no in-tree file system that implements fallocate
for directory operations.

Deny an attempt to fallocate a directory with EISDIR error.

This change is needed prior to converting sb_start_write()
to  file_start_write(), so freeze protection is correctly
handled for cases of fallocate file and blockdev.

Cc: linux-api@vger.kernel.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:05:04 +01:00
Amir Goldstein
af7bd4dc13 vfs: create vfs helper vfs_tmpfile()
Factor out some common vfs bits from do_tmpfile()
to be used by overlayfs for concurrent copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-07 15:05:04 +01:00
Richard Weinberger
b14c8e6afd fscrypt: properly declare on-stack completion
When a completion is declared on-stack we have to use
COMPLETION_INITIALIZER_ONSTACK().

Fixes: 0b81d07790 ("fs crypto: move per-file encryption from f2fs
tree to fs/crypto")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:45:28 -05:00
Eric Biggers
46f47e4800 fscrypt: split supp and notsupp declarations into their own headers
Previously, each filesystem configured without encryption support would
define all the public fscrypt functions to their notsupp_* stubs.  This
list of #defines had to be updated in every filesystem whenever a change
was made to the public fscrypt functions.  To make things more
maintainable now that we have three filesystems using fscrypt, split the
old header fscrypto.h into several new headers.  fscrypt_supp.h contains
the real declarations and is included by filesystems when configured
with encryption support, whereas fscrypt_notsupp.h contains the inline
stubs and is included by filesystems when configured without encryption
support.  fscrypt_common.h contains common declarations needed by both.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:26:43 -05:00
Colin Ian King
02680b31a0 fscrypt: remove redundant assignment of res
res is assigned to sizeof(ctx), however, this is unused and res
is updated later on without that assigned value to res ever being
used.  Remove this redundant assignment.

Fixes CoverityScan CID#1395546 "Unused value"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:25:53 -05:00
Christoph Hellwig
3c68d44a2b xfs: allocate direct I/O COW blocks in iomap_begin
Instead of preallocating all the required COW blocks in the high-level
write code do it inside the iomap code, like we do for all other I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06 17:47:47 -08:00
Christoph Hellwig
a14234c72b xfs: go straight to real allocations for direct I/O COW writes
When we allocate COW fork blocks for direct I/O writes we currently first
create a delayed allocation, and then convert it to a real allocation
once we've got the delayed one.

As there is no good reason for that this patch instead makes use call
xfs_bmapi_write from the COW allocation path.  The only interesting bits
are a few tweaks the low-level allocator to allow for this, most notably
the need to remove the call to xfs_bmap_extsize_align for the cowextsize
in xfs_bmap_btalloc - for the existing convert case it's a no-op, but
for the direct allocation case it would blow up our block reservation
way beyond what we reserved for the transaction.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06 17:47:47 -08:00
Christoph Hellwig
dcf9585a75 xfs: return the converted extent in __xfs_reflink_convert_cow
We'll need it for the direct I/O code.  Also rename the function to
xfs_reflink_convert_cow_extent to describe it a bit better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06 17:47:46 -08:00
Christoph Hellwig
f13eb2055a xfs: introduce xfs_aligned_fsb_count
Factor a helper to calculate the extent-size aligned block out of the
iomap code, so that it can be reused by the upcoming reflink dio code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06 17:47:46 -08:00
Christoph Hellwig
54a4ef8af4 xfs: reject all unaligned direct writes to reflinked files
We currently fall back from direct to buffered writes if we detect a
remaining shared extent in the iomap_begin callback.  But by the time
iomap_begin is called for the potentially unaligned end block we might
have already written most of the data to disk, which we'd now write
again using buffered I/O.  To avoid this reject all writes to reflinked
files before starting I/O so that we are guaranteed to only write the
data once.

The alternative would be to unshare the unaligned start and/or end block
before doing the I/O. I think that's doable, and will actually be
required to support reflinks on DAX file system.  But it will take a
little more time and I'd rather get rid of the double write ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-06 17:47:46 -08:00
NeilBrown
b880092109 NFSDv4: use export cache flushtime for changeid on V4ROOT objects.
If you change the set of filesystems that are exported, then
the contents of various directories in the NFSv4 pseudo-root
is likely to change.  However the change-id of those
directories is currently tied to the underlying directory,
so the client may not see the changes in a timely fashion.

This patch changes the change-id number to be derived from the
"flush_time" of the export cache.  Whenever any changes are
made to the set of exported filesystems, this flush_time is
updated.  The result is that clients see changes to the set
of exported filesystems much more quickly, often immediately.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-06 17:29:22 -05:00
Theodore Ts'o
783d948544 ext4: add EXT4_IOC_GOINGDOWN ioctl
This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl.  (In
fact, it uses the same code points.)

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05 19:47:14 -05:00
Theodore Ts'o
0db1ff222d ext4: add shutdown bit and check for it
Add a shutdown bit that will cause ext4 processing to fail immediately
with EIO.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05 01:28:48 -05:00
Theodore Ts'o
9549a168bd ext4: rename s_resize_flags to s_ext4_flags
We are currently using one bit in s_resize_flags; rename it in order
to allow more of the bits in that unsigned long for other purposes.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05 01:27:48 -05:00
Theodore Ts'o
4753d8a24d ext4: return EROFS if device is r/o and journal replay is needed
If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call.  This allows xfstests
generic/050 to pass.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-05 01:26:48 -05:00
Theodore Ts'o
97abd7d4b5 ext4: preserve the needs_recovery flag when the journal is aborted
If the journal is aborted, the needs_recovery feature flag should not
be removed.  Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-04 23:38:06 -05:00
Theodore Ts'o
e112666b49 jbd2: don't leak modified metadata buffers on an aborted journal
If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified.  And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-04 23:14:19 -05:00
Theodore Ts'o
eb5efbcb76 ext4: fix inline data error paths
The write_end() function must always unlock the page and drop its ref
count, even on an error.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2017-02-04 23:04:00 -05:00
Hou Tao
4dd2eb6335 xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t
After successful IO or permanent error, b_first_retry_time also
needs to be cleared, else the invalid first retry time will be
used by the next retry check.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-03 14:39:07 -08:00
Michal Hocko
d1908f5255 fs: break out of iomap_file_buffered_write on fatal signals
Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion.  He has tracked
this down to the following path

	__alloc_pages_nodemask+0x436/0x4d0
	alloc_pages_current+0x97/0x1b0
	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
	? iomap_write_end+0x80/0x80             fs/iomap.c:150
	iomap_apply+0xb3/0x130                  fs/iomap.c:79
	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
	? iomap_write_end+0x80/0x80
	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
	? remove_wait_queue+0x59/0x60
	xfs_file_write_iter+0x90/0x130 [xfs]
	__vfs_write+0xe5/0x140
	vfs_write+0xc7/0x1f0
	? syscall_trace_enter+0x1d0/0x380
	SyS_write+0x58/0xc0
	do_syscall_64+0x6c/0x200
	entry_SYSCALL64_slow_path+0x25/0x25

the oom victim has access to all memory reserves to make a forward
progress to exit easier.  But iomap_file_buffered_write and other
callers of iomap_apply loop to complete the full request.  We need to
check for fatal signals and back off with a short write instead.

As the iomap_apply delegates all the work down to the actor we have to
hook into those.  All callers that work with the page cache are calling
iomap_write_begin so we will check for signals there.  dax_iomap_actor
has to handle the situation explicitly because it copies data to the
userspace directly.  Other callers like iomap_page_mkwrite work on a
single page or iomap_fiemap_actor do not allocate memory based on the
given len.

Fixes: 68a9f5e700 ("xfs: implement iomap based buffered write path")
Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 14:13:19 -08:00
Jan Kara
70823b9bf3 orangefs: Remove orangefs_backing_dev_info
It is not used anywhere.

CC: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-02-03 14:38:35 -05:00
Martin Brandenburg
31c829f364 orangefs: Support readahead_readcnt parameter.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-02-03 14:38:28 -05:00
Dan Carpenter
eb82fbcf82 orangefs: silence harmless integer overflow warning
The issue here is that in orangefs_bufmap_alloc() we do:

	bufmap->buffer_index_array =
		kzalloc(DIV_ROUND_UP(bufmap->desc_count, BITS_PER_LONG), GFP_KERNEL);

If we choose a bufmap->desc_count like -31 then it means the
DIV_ROUND_UP ends up having an integer overflow.   The result is that
kzalloc() returns the ZERO_SIZE_PTR and there is a static checker
warning.

But this bug is harmless because on the next lines we use ->desc_count
to do a kcalloc().  That has integer overflow checking built in so the
kcalloc() fails and we return an error code.

Anyway, it doesn't make sense to talk about negative sizes and blocking
them silences the static checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-02-03 14:37:15 -05:00
Fabian Frederick
a074faad51 udf: simplify udf_ioctl()
"out" label was only returning error code.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-03 16:24:18 +01:00
Fabian Frederick
782deb2eec udf: fix ioctl errors
Currently, lsattr for instance in udf directory gives
"udf: Invalid argument While reading flags on ..."

This patch returns -ENOIOCTLCMD
when command is unknown to have more accurate message like this:
"Inappropriate ioctl for device While reading flags on ..."

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-03 16:19:54 +01:00
Andrew Price
c548a1c175 gfs2: Make gfs2_write_full_page static
It only gets called from aops.c and doesn't appear in any headers.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-02-03 08:23:47 -05:00