Commit Graph

65123 Commits

Author SHA1 Message Date
Linus Torvalds
26c20ffcb5 Merge tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
 "I've managed to get xfstests kind of working with afs. Here are a set
  of patches that fix most of the bugs found.

  There are a number of primary issues:

   - Incorrect handling of mtime and non-handling of ctime. It might be
     argued, that the latter isn't a bug since the AFS protocol doesn't
     support ctime, but I should probably still update it locally.

   - Shared-write mmap, truncate and writeback bugs. This includes not
     changing i_size under the callback lock, overwriting local i_size
     with the reply from the server after a partial writeback, not
     limiting the writeback from an mmapped page to EOF.

   - Checks for an abort code indicating that the primary vnode in an
     operation was deleted by a third-party are done in the wrong place.

   - Silly rename bugs. This includes an incomplete conversion to the
     new operation handling, duplicate nlink handling, nlink changing
     not being done inside the callback lock and insufficient handling
     of third-party conflicting directory changes.

  And some secondary ones:

   - The UAEOVERFLOW abort code should map to EOVERFLOW not EREMOTEIO.

   - Remove a couple of unused or incompletely used bits.

   - Remove a couple of redundant success checks.

  These seem to fix all the data-corruption bugs found by

	./check -afs -g quick

  along with the obvious silly rename bugs and time bugs.

  There are still some test failures, but they seem to fall into two
  classes: firstly, the authentication/security model is different to
  the standard UNIX model and permission is arbitrated by the server and
  cached locally; and secondly, there are a number of features that AFS
  does not support (such as mknod). But in these cases, the tests
  themselves need to be adapted or skipped.

  Using the in-kernel afs client with xfstests also found a bug in the
  AuriStor AFS server that has been fixed for a future release"

* tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix silly rename
  afs: afs_vnode_commit_status() doesn't need to check the RPC error
  afs: Fix use of afs_check_for_remote_deletion()
  afs: Remove afs_operation::abort_code
  afs: Fix yfs_fs_fetch_status() to honour vnode selector
  afs: Remove yfs_fs_fetch_file_status() as it's not used
  afs: Fix the mapping of the UAEOVERFLOW abort code
  afs: Fix truncation issues and mmap writeback size
  afs: Concoct ctimes
  afs: Fix EOF corruption
  afs: afs_write_end() should change i_size under the right lock
  afs: Fix non-setting of mtime when writing into mmap
2020-06-16 17:40:51 -07:00
Linus Torvalds
ffbc93768e Merge tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array member conversions from Gustavo A. R. Silva:
 "Replace zero-length arrays with flexible-array members.

  Notice that all of these patches have been baking in linux-next for
  two development cycles now.

  There is a regular need in the kernel to provide a way to declare
  having a dynamically sized set of trailing elements in a structure.
  Kernel code should always use “flexible array members”[1] for these
  cases. The older style of one-element or zero-length arrays should no
  longer be used[2].

  C99 introduced “flexible array members”, which lacks a numeric size
  for the array declaration entirely:

        struct something {
                size_t count;
                struct foo items[];
        };

  This is the way the kernel expects dynamically sized trailing elements
  to be declared. It allows the compiler to generate errors when the
  flexible array does not occur last in the structure, which helps to
  prevent some kind of undefined behavior[3] bugs from being
  inadvertently introduced to the codebase.

  It also allows the compiler to correctly analyze array sizes (via
  sizeof(), CONFIG_FORTIFY_SOURCE, and CONFIG_UBSAN_BOUNDS). For
  instance, there is no mechanism that warns us that the following
  application of the sizeof() operator to a zero-length array always
  results in zero:

        struct something {
                size_t count;
                struct foo items[0];
        };

        struct something *instance;

        instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
        instance->count = count;

        size = sizeof(instance->items) * instance->count;
        memcpy(instance->items, source, size);

  At the last line of code above, size turns out to be zero, when one
  might have thought it represents the total size in bytes of the
  dynamic memory recently allocated for the trailing array items. Here
  are a couple examples of this issue[4][5].

  Instead, flexible array members have incomplete type, and so the
  sizeof() operator may not be applied[6], so any misuse of such
  operators will be immediately noticed at build time.

  The cleanest and least error-prone way to implement this is through
  the use of a flexible array member:

        struct something {
                size_t count;
                struct foo items[];
        };

        struct something *instance;

        instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
        instance->count = count;

        size = sizeof(instance->items[0]) * instance->count;
        memcpy(instance->items, source, size);

  instead"

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
[4] commit f2cd32a443 ("rndis_wlan: Remove logically dead code")
[5] commit ab91c2a89f ("tpm: eventlog: Replace zero-length array with flexible-array member")
[6] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

* tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (41 commits)
  w1: Replace zero-length array with flexible-array
  tracing/probe: Replace zero-length array with flexible-array
  soc: ti: Replace zero-length array with flexible-array
  tifm: Replace zero-length array with flexible-array
  dmaengine: tegra-apb: Replace zero-length array with flexible-array
  stm class: Replace zero-length array with flexible-array
  Squashfs: Replace zero-length array with flexible-array
  ASoC: SOF: Replace zero-length array with flexible-array
  ima: Replace zero-length array with flexible-array
  sctp: Replace zero-length array with flexible-array
  phy: samsung: Replace zero-length array with flexible-array
  RxRPC: Replace zero-length array with flexible-array
  rapidio: Replace zero-length array with flexible-array
  media: pwc: Replace zero-length array with flexible-array
  firmware: pcdp: Replace zero-length array with flexible-array
  oprofile: Replace zero-length array with flexible-array
  block: Replace zero-length array with flexible-array
  tools/testing/nvdimm: Replace zero-length array with flexible-array
  libata: Replace zero-length array with flexible-array
  kprobes: Replace zero-length array with flexible-array
  ...
2020-06-16 17:23:57 -07:00
David Howells
b6489a49f7 afs: Fix silly rename
Fix AFS's silly rename by the following means:

 (1) Set the destination directory in afs_do_silly_rename() so as to avoid
     misbehaviour and indicate that the directory data version will
     increment by 1 so as to avoid warnings about unexpected changes in the
     DV.  Also indicate that the ctime should be updated to avoid xfstest
     grumbling.

 (2) Note when the server indicates that a directory changed more than we
     expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
     third party change, checking on successful completion of unlink and
     rename.

     The problem is that the FS.RemoveFile RPC op doesn't report the status
     of the unlinked file, though YFS.RemoveFile2 does.  This can be
     mitigated by the assumption that if the directory DV cranked by
     exactly 1, we can be sure we removed one link from the file; further,
     ordinarily in AFS, files cannot be hardlinked across directories, so
     if we reduce nlink to 0, the file is deleted.

     However, if the directory DV jumps by more than 1, we cannot know if a
     third party intervened by adding or removing a link on the file we
     just removed a link from.

     The same also goes for any vnode that is at the destination of the
     FS.Rename RPC op.

 (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
     section along with the other attribute updates if ->op_unlinked is set
     on the descriptor for the appropriate vnode.

 (4) Issue a follow up status fetch to the unlinked file in the event of a
     third party conflict that makes it impossible for us to know if we
     actually deleted the file or not.

 (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
     the user about the nlink of a silly deleted file so that it appears as
     0, not 1.

Found with the generic/035 and generic/084 xfstests.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 22:00:28 +01:00
David Howells
7c295eec1e afs: afs_vnode_commit_status() doesn't need to check the RPC error
afs_vnode_commit_status() is only ever called if op->error is 0, so remove
the op->error checks from the function.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
David Howells
728279a5a1 afs: Fix use of afs_check_for_remote_deletion()
afs_check_for_remote_deletion() checks to see if error ENOENT is returned
by the server in response to an operation and, if so, marks the primary
vnode as having been deleted as the FID is no longer valid.

However, it's being called from the operation success functions, where no
abort has happened - and if an inline abort is recorded, it's handled by
afs_vnode_commit_status().

Fix this by actually calling the operation aborted method if provided and
having that point to afs_check_for_remote_deletion().

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
David Howells
44767c3531 afs: Remove afs_operation::abort_code
Remove afs_operation::abort_code as it's read but never set.  Use
ac.abort_code instead.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
David Howells
9bd87ec631 afs: Fix yfs_fs_fetch_status() to honour vnode selector
Fix yfs_fs_fetch_status() to honour the vnode selector in
op->fetch_status.which as does afs_fs_fetch_status() that allows
afs_do_lookup() to use this as an alternative to the InlineBulkStatus RPC
call if not implemented by the server.

This doesn't matter in the current code as YFS servers always implement
InlineBulkStatus, but a subsequent will call it on YFS servers too in some
circumstances.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
David Howells
6c85cacc8c afs: Remove yfs_fs_fetch_file_status() as it's not used
Remove yfs_fs_fetch_file_status() as it's no longer used.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
Gustavo A. R. Silva
b2b32e3aa0 Squashfs: Replace zero-length array with flexible-array
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15 23:08:32 -05:00
Gustavo A. R. Silva
6112bad79f jffs2: Replace zero-length array with flexible-array
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15 23:08:31 -05:00
Gustavo A. R. Silva
241cb28e38 aio: Replace zero-length array with flexible-array
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15 23:08:25 -05:00
Linus Torvalds
3be20b6fc1 Merge tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull more ext4 updates from Ted Ts'o:
 "This is the second round of ext4 commits for 5.8 merge window [1].

  It includes the per-inode DAX support, which was dependant on the DAX
  infrastructure which came in via the XFS tree, and a number of
  regression and bug fixes; most notably the "BUG: using
  smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
  by syzkaller"

[1] The pull request actually came in 15 minutes after I had tagged the
    rc1 release. Tssk, tssk, late..   - Linus

* tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
  ext4: support xattr gnu.* namespace for the Hurd
  ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
  ext4: avoid utf8_strncasecmp() with unstable name
  ext4: stop overwrite the errcode in ext4_setup_super
  ext4: fix partial cluster initialization when splitting extent
  ext4: avoid race conditions when remounting with options that change dax
  Documentation/dax: Update DAX enablement for ext4
  fs/ext4: Introduce DAX inode flag
  fs/ext4: Remove jflag variable
  fs/ext4: Make DAX mount option a tri-state
  fs/ext4: Only change S_DAX on inode load
  fs/ext4: Update ext4_should_use_dax()
  fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
  fs/ext4: Disallow verity if inode is DAX
  fs/ext4: Narrow scope of DAX check in setflags
2020-06-15 09:32:10 -07:00
David Howells
4ec89596d0 afs: Fix the mapping of the UAEOVERFLOW abort code
Abort code UAEOVERFLOW is returned when we try and set a time that's out of
range, but it's currently mapped to EREMOTEIO by the default case.

Fix UAEOVERFLOW to map instead to EOVERFLOW.

Found with the generic/258 xfstest.  Note that the test is wrong as it
assumes that the filesystem will support a pre-UNIX-epoch date.

Fixes: 1eda8bab70 ("afs: Add support for the UAE error table")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-15 15:41:03 +01:00
David Howells
793fe82ee3 afs: Fix truncation issues and mmap writeback size
Fix the following issues:

 (1) Fix writeback to reduce the size of a store operation to i_size,
     effectively discarding the extra data.

     The problem comes when afs_page_mkwrite() records that a page is about
     to be modified by mmap().  It doesn't know what bits of the page are
     going to be modified, so it records the whole page as being dirty
     (this is stored in page->private as start and end offsets).

     Without this, the marshalling for the store to the server extends the
     size of the file to the end of the page (in afs_fs_store_data() and
     yfs_fs_store_data()).

 (2) Fix setattr to actually truncate the pagecache, thereby clearing
     the discarded part of a file.

 (3) Fix setattr to check that the new size is okay and to disable
     ATTR_SIZE if i_size wouldn't change.

 (4) Force i_size to be updated as the result of a truncate.

 (5) Don't truncate if ATTR_SIZE is not set.

 (6) Call pagecache_isize_extended() if the file was enlarged.

Note that truncate_set_size() isn't used because the setting of i_size is
done inside afs_vnode_commit_status() under the vnode->cb_lock.

Found with the generic/029 and generic/393 xfstests.

Fixes: 31143d5d51 ("AFS: implement basic file write support")
Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-15 15:41:02 +01:00
David Howells
da8d075512 afs: Concoct ctimes
The in-kernel afs filesystem ignores ctime because the AFS fileserver
protocol doesn't support ctimes.  This, however, causes various xfstests to
fail.

Work around this by:

 (1) Setting ctime to attr->ia_ctime in afs_setattr().

 (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.

 (3) Setting the ctime from the server mtime when on the target file when
     creating a hard link to it.

 (4) Setting the ctime on directories from their revised mtimes when
     renaming/moving a file.

Found by the generic/221 and generic/309 xfstests.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-15 15:41:02 +01:00
David Howells
3f4aa98181 afs: Fix EOF corruption
When doing a partial writeback, afs_write_back_from_locked_page() may
generate an FS.StoreData RPC request that writes out part of a file when a
file has been constructed from pieces by doing seek, write, seek, write,
... as is done by ld.

The FS.StoreData RPC is given the current i_size as the file length, but
the server basically ignores it unless the data length is 0 (in which case
it's just a truncate operation).  The revised file length returned in the
result of the RPC may then not reflect what we suggested - and this leads
to i_size getting moved backwards - which causes issues later.

Fix the client to take account of this by ignoring the returned file size
unless the data version number jumped unexpectedly - in which case we're
going to have to clear the pagecache and reload anyway.

This can be observed when doing a kernel build on an AFS mount.  The
following pair of commands produce the issue:

  ld -m elf_x86_64 -z max-page-size=0x200000 --emit-relocs \
      -T arch/x86/realmode/rm/realmode.lds \
      arch/x86/realmode/rm/header.o \
      arch/x86/realmode/rm/trampoline_64.o \
      arch/x86/realmode/rm/stack.o \
      arch/x86/realmode/rm/reboot.o \
      -o arch/x86/realmode/rm/realmode.elf
  arch/x86/tools/relocs --realmode \
      arch/x86/realmode/rm/realmode.elf \
      >arch/x86/realmode/rm/realmode.relocs

This results in the latter giving:

	Cannot read ELF section headers 0/18: Success

as the realmode.elf file got corrupted.

The sequence of events can also be driven with:

	xfs_io -t -f \
		-c "pwrite -S 0x58 0 0x58" \
		-c "pwrite -S 0x59 10000 1000" \
		-c "close" \
		/afs/example.com/scratch/a

Fixes: 31143d5d51 ("AFS: implement basic file write support")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-15 15:41:02 +01:00
David Howells
1f32ef7989 afs: afs_write_end() should change i_size under the right lock
Fix afs_write_end() to change i_size under vnode->cb_lock rather than
->wb_lock so that it doesn't race with afs_vnode_commit_status() and
afs_getattr().

The ->wb_lock is only meant to guard access to ->wb_keys which isn't
accessed by that piece of code.

Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-15 15:41:02 +01:00
David Howells
bb41348928 afs: Fix non-setting of mtime when writing into mmap
The mtime on an inode needs to be updated when a write is made into an
mmap'ed section.  There are three ways in which this could be done: update
it when page_mkwrite is called, update it when a page is changed from dirty
to writeback or leave it to the server and fix the mtime up from the reply
to the StoreData RPC.

Found with the generic/215 xfstest.

Fixes: 1cf7a1518a ("afs: Implement shared-writeable mmap")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-15 15:41:02 +01:00
Linus Torvalds
9d645db853 Merge tag 'for-5.8-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "This reverts the direct io port to iomap infrastructure of btrfs
  merged in the first pull request. We found problems in invalidate page
  that don't seem to be fixable as regressions or without changing iomap
  code that would not affect other filesystems.

  There are four reverts in total, but three of them are followup
  cleanups needed to revert a43a67a2d7 cleanly. The result is the
  buffer head based implementation of direct io.

  Reverts are not great, but under current circumstances I don't see
  better options"

* tag 'for-5.8-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: switch to iomap_dio_rw() for dio"
  Revert "fs: remove dio_end_io()"
  Revert "btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK"
  Revert "btrfs: split btrfs_direct_IO to read and write part"
2020-06-14 09:47:25 -07:00
David Sterba
55e20bd12a Revert "btrfs: switch to iomap_dio_rw() for dio"
This reverts commit a43a67a2d7.

This patch reverts the main part of switching direct io implementation
to iomap infrastructure. There's a problem in invalidate page that
couldn't be solved as regression in this development cycle.

The problem occurs when buffered and direct io are mixed, and the ranges
overlap. Although this is not recommended, filesystems implement
measures or fallbacks to make it somehow work. In this case, fallback to
buffered IO would be an option for btrfs (this already happens when
direct io is done on compressed data), but the change would be needed in
the iomap code, bringing new semantics to other filesystems.

Another problem arises when again the buffered and direct ios are mixed,
invalidation fails, then -EIO is set on the mapping and fsync will fail,
though there's no real error.

There have been discussions how to fix that, but revert seems to be the
least intrusive option.

Link: https://lore.kernel.org/linux-btrfs/20200528192103.xm45qoxqmkw7i5yl@fiona/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-06-14 01:19:02 +02:00
Linus Torvalds
f82e7b57b5 Merge tag '5.8-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
 "12 cifs/smb3 fixes, 2 for stable.

   - add support for idsfromsid on create and chgrp/chown allowing
     ability to save owner information more naturally for some workloads

   - improve query info (getattr) when SMB3.1.1 posix extensions are
     negotiated by using new query info level"

* tag '5.8-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: Add debug message for new file creation with idsfromsid mount option
  cifs: fix chown and chgrp when idsfromsid mount option enabled
  smb3: allow uid and gid owners to be set on create with idsfromsid mount option
  smb311: Add tracepoints for new compound posix query info
  smb311: add support for using info level for posix extensions query
  smb311: Add support for lookup with posix extensions query info
  smb311: Add support for SMB311 query info (non-compounded)
  SMB311: Add support for query info using posix extensions (level 100)
  smb3: add indatalen that can be a non-zero value to calculation of credit charge in smb2 ioctl
  smb3: fix typo in mount options displayed in /proc/mounts
  cifs: Add get_security_type_str function to return sec type.
  smb3: extend fscache mount volume coherency check
2020-06-13 13:43:56 -07:00
Linus Torvalds
6adc19fd13 Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:

 - fix build rules in binderfs sample

 - fix build errors when Kbuild recurses to the top Makefile

 - covert '---help---' in Kconfig to 'help'

* tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  treewide: replace '---help---' in Kconfig files with 'help'
  kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
  samples: binderfs: really compile this sample and fix build issues
2020-06-13 13:29:16 -07:00
Linus Torvalds
593bd5e5d3 Merge tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fix from Darrick Wong:
 "A single iomap bug fix for a variable type mistake on 32-bit
  architectures, fixing an integer overflow problem in the unshare
  actor"

* tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: Fix unsharing of an extent >2GB on a 32-bit machine
2020-06-13 12:44:30 -07:00
Linus Torvalds
c555722768 Merge tag 'xfs-5.8-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
 "We've settled down into the bugfix phase; this one fixes a resource
  leak on an error bailout path"

* tag 'xfs-5.8-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Add the missed xfs_perag_put() for xfs_ifree_cluster()
2020-06-13 12:40:24 -07:00
Masahiro Yamada
a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Linus Torvalds
6c32978414 Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull notification queue from David Howells:
 "This adds a general notification queue concept and adds an event
  source for keys/keyrings, such as linking and unlinking keys and
  changing their attributes.

  Thanks to Debarshi Ray, we do have a pull request to use this to fix a
  problem with gnome-online-accounts - as mentioned last time:

     https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

  Without this, g-o-a has to constantly poll a keyring-based kerberos
  cache to find out if kinit has changed anything.

  [ There are other notification pending: mount/sb fsinfo notifications
    for libmount that Karel Zak and Ian Kent have been working on, and
    Christian Brauner would like to use them in lxc, but let's see how
    this one works first ]

  LSM hooks are included:

   - A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set. Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable. The
     LSM should use current's credentials. [Wanted by SELinux & Smack]

   - A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue. This is
     given the credentials from the event generator (which may be the
     system) and the watch setter. [Wanted by Smack]

  I've provided SELinux and Smack with implementations of some of these
  hooks.

  WHY
  ===

  Key/keyring notifications are desirable because if you have your
  kerberos tickets in a file/directory, your Gnome desktop will monitor
  that using something like fanotify and tell you if your credentials
  cache changes.

  However, we also have the ability to cache your kerberos tickets in
  the session, user or persistent keyring so that it isn't left around
  on disk across a reboot or logout. Keyrings, however, cannot currently
  be monitored asynchronously, so the desktop has to poll for it - not
  so good on a laptop. This facility will allow the desktop to avoid the
  need to poll.

  DESIGN DECISIONS
  ================

   - The notification queue is built on top of a standard pipe. Messages
     are effectively spliced in. The pipe is opened with a special flag:

        pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem
     like it will ever be applicable in this context)[?]. It is given up
     front to make it a lot easier to prohibit splice&co from accessing
     the pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
         O_* flag if I can avoid it - should I add a pipe3() system call
         instead?

     The pipe is then configured::

        ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
        ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

   - It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the
     kernel has to be able to insert messages into the pipe *without*
     holding pipe->mutex and the code to make this work needs careful
     auditing.

   - sendfile(), splice() and vmsplice() are disabled on notification
     pipes because of the pipe->mutex issue and also because they
     sometimes want to revert what they just did - but one or more
     notification messages might've been interleaved in the ring.

   - The kernel inserts messages with the wait queue spinlock held. This
     means that pipe_read() and pipe_write() have to take the spinlock
     to update the queue pointers.

   - Records in the buffer are binary, typed and have a length so that
     they can be of varying size.

     This allows multiple heterogeneous sources to share a common
     buffer; there are 16 million types available, of which I've used
     just a few, so there is scope for others to be used. Tags may be
     specified when a watchpoint is created to help distinguish the
     sources.

   - Records are filterable as types have up to 256 subtypes that can be
     individually filtered. Other filtration is also available.

   - Notification pipes don't interfere with each other; each may be
     bound to a different set of watches. Any particular notification
     will be copied to all the queues that are currently watching for it
     - and only those that are watching for it.

   - When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's
     insufficient space. read() will fabricate a loss notification
     message at an appropriate point later.

   - The notification pipe is created and then watchpoints are attached
     to it, using one of:

        keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
        watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
        watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is
     a tag between 0 and 255.

   - Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed. In the latter case, a message will
     be generated indicating the enforced watch removal.

  Things I want to avoid:

   - Introducing features that make the core VFS dependent on the
     network stack or networking namespaces (ie. usage of netlink).

   - Dumping all this stuff into dmesg and having a daemon that sits
     there parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky. Further, dmesg might not exist or might be
     inaccessible inside a container.

   - Letting users see events they shouldn't be able to see.

  TESTING AND MANPAGES
  ====================

   - The keyutils tree has a pipe-watch branch that has keyctl commands
     for making use of notifications. Proposed manual pages can also be
     found on this branch, though a couple of them really need to go to
     the main manpages repository instead.

     If the kernel supports the watching of keys, then running "make
     test" on that branch will cause the testing infrastructure to spawn
     a monitoring process on the side that monitors a notifications pipe
     for all the key/keyring changes induced by the tests and they'll
     all be checked off to make sure they happened.

        https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

   - A test program is provided (samples/watch_queue/watch_test) that
     can be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout"

* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  smack: Implement the watch_key and post_notification hooks
  selinux: Implement the watch_key security hook
  keys: Make the KEY_NEED_* perms an enum rather than a mask
  pipe: Add notification lossage handling
  pipe: Allow buffers to be marked read-whole-or-error for notifications
  Add sample notification program
  watch_queue: Add a key/keyring notification facility
  security: Add hooks to rule on setting a watch
  pipe: Add general notification queue support
  pipe: Add O_NOTIFICATION_PIPE
  security: Add a hook for the point of notification insertion
  uapi: General notification queue definitions
2020-06-13 09:56:21 -07:00
Steve French
a7a519a492 smb3: Add debug message for new file creation with idsfromsid mount option
Pavel noticed that a debug message (disabled by default) in creating the security
descriptor context could be useful for new file creation owner fields
(as we already have for the mode) when using mount parm idsfromsid.

[38120.392272] CIFS: FYI: owner S-1-5-88-1-0, group S-1-5-88-2-0
[38125.792637] CIFS: FYI: owner S-1-5-88-1-1000, group S-1-5-88-2-1000

Also cleans up a typo in a comment

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-06-12 16:31:06 -05:00
Linus Torvalds
44ebe016df Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull proc fix from Eric Biederman:
 "Much to my surprise syzbot found a very old bug in proc that the
  recent changes made easier to reproce. This bug is subtle enough it
  looks like it fooled everyone who should know better"

* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Use new_inode not new_inode_pseudo
2020-06-12 12:38:18 -07:00
Eric W. Biederman
ef1548adad proc: Use new_inode not new_inode_pseudo
Recently syzbot reported that unmounting proc when there is an ongoing
inotify watch on the root directory of proc could result in a use
after free when the watch is removed after the unmount of proc
when the watcher exits.

Commit 69879c01a0 ("proc: Remove the now unnecessary internal mount
of proc") made it easier to unmount proc and allowed syzbot to see the
problem, but looking at the code it has been around for a long time.

Looking at the code the fsnotify watch should have been removed by
fsnotify_sb_delete in generic_shutdown_super.  Unfortunately the inode
was allocated with new_inode_pseudo instead of new_inode so the inode
was not on the sb->s_inodes list.  Which prevented
fsnotify_unmount_inodes from finding the inode and removing the watch
as well as made it so the "VFS: Busy inodes after unmount" warning
could not find the inodes to warn about them.

Make all of the inodes in proc visible to generic_shutdown_super,
and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo.
The only functional difference is that new_inode places the inodes
on the sb->s_inodes list.

I wrote a small test program and I can verify that without changes it
can trigger this issue, and by replacing new_inode_pseudo with
new_inode the issues goes away.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com
Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com
Fixes: 0097875bd4 ("proc: Implement /proc/thread-self to point at the directory of the current thread")
Fixes: 021ada7dff ("procfs: switch /proc/self away from proc_dir_entry")
Fixes: 51f0885e54 ("vfs,proc: guarantee unique inodes in /proc")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-06-12 14:13:33 -05:00
zhangyi (F)
7b97d868b7 ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
In the ext4 filesystem with errors=panic, if one process is recording
errno in the superblock when invoking jbd2_journal_abort() due to some
error cases, it could be raced by another __ext4_abort() which is
setting the SB_RDONLY flag but missing panic because errno has not been
recorded.

jbd2_journal_commit_transaction()
 jbd2_journal_abort()
  journal->j_flags |= JBD2_ABORT;
  jbd2_journal_update_sb_errno()
                                    | ext4_journal_check_start()
                                    |  __ext4_abort()
                                    |   sb->s_flags |= SB_RDONLY;
                                    |   if (!JBD2_REC_ERR)
                                    |        return;
  journal->j_flags |= JBD2_REC_ERR;

Finally, it will no longer trigger panic because the filesystem has
already been set read-only. Fix this by introduce j_abort_mutex to make
sure journal abort is completed before panic, and remove JBD2_REC_ERR
flag.

Fixes: 4327ba52af ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-12 14:51:41 -04:00
Steve French
a660339827 cifs: fix chown and chgrp when idsfromsid mount option enabled
idsfromsid was ignored in chown and chgrp causing it to fail
when upcalls were not configured for lookup.  idsfromsid allows
mapping users when setting user or group ownership using
"special SID" (reserved for this).  Add support for chmod and chgrp
when idsfromsid mount option is enabled.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-06-12 13:21:32 -05:00
Steve French
975221eca5 smb3: allow uid and gid owners to be set on create with idsfromsid mount option
Currently idsfromsid mount option allows querying owner information from the
special sids used to represent POSIX uids and gids but needed changes to
populate the security descriptor context with the owner information when
idsfromsid mount option was used.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-06-12 13:21:15 -05:00
Jan (janneke) Nieuwenhuizen
88ee9d571b ext4: support xattr gnu.* namespace for the Hurd
The Hurd gained[0] support for moving the translator and author
fields out of the inode and into the "gnu.*" xattr namespace.

In anticipation of that, an xattr INDEX was reserved[1].  The Hurd has
now been brought into compliance[2] with that.

This patch adds support for reading and writing such attributes from
Linux; you can now do something like

    mkdir -p hurd-root/servers/socket
    touch hurd-root/servers/socket/1
    setfattr --name=gnu.translator --value='"/hurd/pflocal\0"' \
        hurd-root/servers/socket/1
    getfattr --name=gnu.translator hurd-root/servers/socket/1
    # file: 1
    gnu.translator="/hurd/pflocal"

to setup a pipe translator, which is being used to create[3] a
vm-image for the Hurd from GNU Guix.

[0] https://summerofcode.withgoogle.com/projects/#5869799859027968
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3980bd3b406addb327d858aebd19e229ea340b9a
[2] https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=a04c7bf83172faa7cb080fbe3b6c04a8415ca645
[3] https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-hurd-vm

Signed-off-by: Jan Nieuwenhuizen <janneke@gnu.org>
Link: https://lore.kernel.org/r/20200525193940.878-1-janneke@gnu.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-12 13:23:34 -04:00
Steve French
e4bd7c4a8d smb311: Add tracepoints for new compound posix query info
Add dynamic tracepoints for new SMB3.1.1. posix extensions query info level (100)

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12 08:55:18 -05:00
Steve French
d313852d7a smb311: add support for using info level for posix extensions query
Adds calls to the newer info level for query info using SMB3.1.1 posix extensions.
The remaining two places that call the older query info (non-SMB3.1.1 POSIX)
require passing in the fid and can be updated in a later patch.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12 08:54:12 -05:00
Steve French
790434ff98 smb311: Add support for lookup with posix extensions query info
Improve support for lookup when using SMB3.1.1 posix mounts.
Use new info level 100 (posix query info)

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12 06:21:19 -05:00
Steve French
b1bc1874b8 smb311: Add support for SMB311 query info (non-compounded)
Add worker function for non-compounded SMB3.1.1 POSIX Extensions query info.
This is needed for revalidate of root (cached) directory for example.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12 06:21:06 -05:00
Steve French
6a5f6592a0 SMB311: Add support for query info using posix extensions (level 100)
Adds support for better query info on dentry revalidation (using
the SMB3.1.1 POSIX extensions level 100).  Followon patch will
add support for translating the UID/GID from the SID and also
will add support for using the posix query info on lookup.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-06-12 06:20:38 -05:00
Namjae Jeon
ebf57440ec smb3: add indatalen that can be a non-zero value to calculation of credit charge in smb2 ioctl
Some of tests in xfstests failed with cifsd kernel server since commit
e80ddeb2f7. cifsd kernel server validates credit charge from client
by calculating it base on max((InputCount + OutputCount) and
(MaxInputResponse + MaxOutputResponse)) according to specification.

MS-SMB2 specification describe credit charge calculation of smb2 ioctl :

If Connection.SupportsMultiCredit is TRUE, the server MUST validate
CreditCharge based on the maximum of (InputCount + OutputCount) and
(MaxInputResponse + MaxOutputResponse), as specified in section 3.3.5.2.5.
If the validation fails, it MUST fail the IOCTL request with
STATUS_INVALID_PARAMETER.

This patch add indatalen that can be a non-zero value to calculation of
credit charge in SMB2_ioctl_init().

Fixes: e80ddeb2f7 ("smb3: fix incorrect number of credits when ioctl
MaxOutputResponse > 64K")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Cc: Steve French <smfrench@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-06-12 06:20:17 -05:00
Linus Torvalds
b1a6274994 Merge branch 'akpm' (patches from Andrew)
Pull updates from Andrew Morton:
 "A few fixes and stragglers.

  Subsystems affected by this patch series: mm/memory-failure, ocfs2,
  lib/lzo, misc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  amdgpu: a NULL ->mm does not mean a thread is a kthread
  lib/lzo: fix ambiguous encoding bug in lzo-rle
  ocfs2: fix build failure when TCP/IP is disabled
  mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread
  mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill
2020-06-11 18:18:50 -07:00
Tom Seewald
fce1affe4e ocfs2: fix build failure when TCP/IP is disabled
After commit 12abc5ee78 ("tcp: add tcp_sock_set_nodelay") and commit
c488aeadcb ("tcp: add tcp_sock_set_user_timeout"), building the kernel
with OCFS2_FS=y but without INET=y causes it to fail with:

  ld: fs/ocfs2/cluster/tcp.o: in function `o2net_accept_many':
  tcp.c:(.text+0x21b1): undefined reference to `tcp_sock_set_nodelay'
  ld: tcp.c:(.text+0x21c1): undefined reference to `tcp_sock_set_user_timeout'
  ld: fs/ocfs2/cluster/tcp.o: in function `o2net_start_connect':
  tcp.c:(.text+0x2633): undefined reference to `tcp_sock_set_nodelay'
  ld: tcp.c:(.text+0x2643): undefined reference to `tcp_sock_set_user_timeout'

This is due to tcp_sock_set_nodelay() and tcp_sock_set_user_timeout()
being declared in linux/tcp.h and defined in net/ipv4/tcp.c, which
depend on TCP/IP being enabled.

To fix this, make OCFS2_FS depend on INET=y which already requires
NET=y.

Fixes: 12abc5ee78 ("tcp: add tcp_sock_set_nodelay")
Fixes: c488aeadcb ("tcp: add tcp_sock_set_user_timeout")
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Link: http://lkml.kernel.org/r/20200606190827.23954-1-tseewald@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11 18:17:47 -07:00
Linus Torvalds
b961f8dc89 Merge tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "A few late stragglers in here. In particular:

   - Validate full range for provided buffers (Bijan)

   - Fix bad use of kfree() in buffer registration failure (Denis)

   - Don't allow close of ring itself, it's not fully safe. Making it
     fully safe would require making the system call more expensive,
     which isn't worth it.

   - Buffer selection fix

   - Regression fix for O_NONBLOCK retry

   - Make IORING_OP_ACCEPT honor O_NONBLOCK (Jiufei)

   - Restrict opcode handling for SQ/IOPOLL (Pavel)

   - io-wq work handling cleanups and improvements (Pavel, Xiaoguang)

   - IOPOLL race fix (Xiaoguang)"

* tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-block:
  io_uring: fix io_kiocb.flags modification race in IOPOLL mode
  io_uring: check file O_NONBLOCK state for accept
  io_uring: avoid unnecessary io_wq_work copy for fast poll feature
  io_uring: avoid whole io_wq_work copy for requests completed inline
  io_uring: allow O_NONBLOCK async retry
  io_wq: add per-wq work handler instead of per work
  io_uring: don't arm a timeout through work.func
  io_uring: remove custom ->func handlers
  io_uring: don't derive close state from ->func
  io_uring: use kvfree() in io_sqe_buffer_register()
  io_uring: validate the full range of provided buffers for access
  io_uring: re-set iov base/len for buffer select retry
  io_uring: move send/recv IOPOLL check into prep
  io_uring: deduplicate io_openat{,2}_prep()
  io_uring: do build_open_how() only once
  io_uring: fix {SQ,IO}POLL with unsupported opcodes
  io_uring: disallow close of ring itself
2020-06-11 16:10:08 -07:00
David Howells
b3597945c8 afs: Fix afs_store_data() to set mtime in new operation descriptor
Fix afs_store_data() so that it sets the mtime in the new operation
descriptor otherwise the mtime on the server gets set to 0 when a write is
stored to the server.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11 16:04:30 -07:00
Linus Torvalds
623f6dc593 Merge branch 'akpm' (patches from Andrew)
Merge some more updates from Andrew Morton:

 - various hotfixes and minor things

 - hch's use_mm/unuse_mm clearnups

Subsystems affected by this patch series: mm/hugetlb, scripts, kcov,
lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kernel: set USER_DS in kthread_use_mm
  kernel: better document the use_mm/unuse_mm API contract
  kernel: move use_mm/unuse_mm to kthread.c
  kernel: move use_mm/unuse_mm to kthread.c
  stacktrace: cleanup inconsistent variable type
  lib: test get_count_order/long in test_bitops.c
  mm: add comments on pglist_data zones
  ocfs2: fix spelling mistake and grammar
  mm/debug_vm_pgtable: fix kernel crash by checking for THP support
  lib: fix bitmap_parse() on 64-bit big endian archs
  checkpatch: correct check for kernel parameters doc
  nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
  lib/lz4/lz4_decompress.c: document deliberate use of `&'
  kcov: check kcov_softirq in kcov_remote_stop()
  scripts/spelling: add a few more typos
  khugepaged: selftests: fix timeout condition in wait_for_scan()
2020-06-11 13:25:53 -07:00
Linus Torvalds
a539568299 Merge tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
 "New features and improvements:
   - Sunrpc receive buffer sizes only change when establishing a GSS credentials
   - Add more sunrpc tracepoints
   - Improve on tracepoints to capture internal NFS I/O errors

  Other bugfixes and cleanups:
   - Move a dprintk() to after a call to nfs_alloc_fattr()
   - Fix off-by-one issues in rpc_ntop6
   - Fix a few coccicheck warnings
   - Use the correct SPDX license identifiers
   - Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
   - Replace zero-length array with flexible array
   - Remove duplicate headers
   - Set invalid blocks after NFSv4 writes to update space_used attribute
   - Fix direct WRITE throughput regression"

* tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
  NFS: Fix direct WRITE throughput regression
  SUNRPC: rpc_xprt lifetime events should record xprt->state
  xprtrdma: Make xprt_rdma_slot_table_entries static
  nfs: set invalid blocks after NFSv4 writes
  NFS: remove redundant initialization of variable result
  sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs
  NFS: Add a tracepoint in nfs_set_pgio_error()
  NFS: Trace short NFS READs
  NFS: nfs_xdr_status should record the procedure name
  SUNRPC: Set SOFTCONN when destroying GSS contexts
  SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT
  SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS
  SUNRPC: trace RPC client lifetime events
  SUNRPC: Trace transport lifetime events
  SUNRPC: Split the xdr_buf event class
  SUNRPC: Add tracepoint to rpc_call_rpcerror()
  SUNRPC: Update the RPC_SHOW_SOCKET() macro
  SUNRPC: Update the rpc_show_task_flags() macro
  SUNRPC: Trace GSS context lifetimes
  SUNRPC: receive buffer size estimation values almost never change
  ...
2020-06-11 12:22:41 -07:00
Linus Torvalds
7cf035cc83 Merge tag 'vfs-5.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull DAX updates part three from Darrick Wong:
 "Now that the xfs changes have landed, this third piece changes the
  FS_XFLAG_DAX ioctl code in xfs to request that the inode be reloaded
  after the last program closes the file, if doing so would make a S_DAX
  change happen. The goal here is to make dax access mode switching
  quicker when possible.

  Summary:

   - Teach XFS to ask the VFS to drop an inode if the administrator
     changes the FS_XFLAG_DAX inode flag such that the S_DAX state would
     change. This can result in files changing access modes without
     requiring an unmount cycle"

* tag 'vfs-5.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  fs/xfs: Create function xfs_inode_should_enable_dax()
  fs/xfs: Make DAX mount option a tri-state
  fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
  fs/xfs: Remove unnecessary initialization of i_rwsem
2020-06-11 10:48:12 -07:00
Chuck Lever
ba838a75e7 NFS: Fix direct WRITE throughput regression
I measured a 50% throughput regression for large direct writes.

The observed on-the-wire behavior is that the client sends every
NFS WRITE twice: once as an UNSTABLE WRITE plus a COMMIT, and once
as a FILE_SYNC WRITE.

This is because the nfs_write_match_verf() check in
nfs_direct_commit_complete() fails for every WRITE.

Buffered writes use nfs_write_completion(), which sets req->wb_verf
correctly. Direct writes use nfs_direct_write_completion(), which
does not set req->wb_verf at all. This leaves req->wb_verf set to
all zeroes for every direct WRITE, and thus
nfs_direct_commit_completion() always sets NFS_ODIRECT_RESCHED_WRITES.

This fix appears to restore nearly all of the lost performance.

Fixes: 1f28476dcb ("NFS: Fix O_DIRECT commit verifier handling")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Zheng Bin
3a39e77869 nfs: set invalid blocks after NFSv4 writes
Use the following command to test nfsv4(size of file1M is 1MB):
mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt
cp file1M /mnt
du -h /mnt/file1M  -->0 within 60s, then 1M

When write is done(cp file1M /mnt), will call this:
nfs_writeback_done
  nfs4_write_done
    nfs4_write_done_cb
      nfs_writeback_update_inode
        nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime
nfs_post_op_update_inode_force_wcc_locked
   nfs_set_cache_invalid
   nfs_refresh_inode_locked
     nfs_update_inode

nfsd write response contains change, ctime, mtime, the flag will be
clear after nfs_update_inode. Howerver, write response does not contain
space_used, previous open response contains space_used whose value is 0,
so inode->i_blocks is still 0.

nfs_getattr  -->called by "du -h"
  do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s
  cache_validity = READ_ONCE(NFS_I(inode)->cache_validity)
  do_update |= cache_validity & (NFS_INO_INVALID_ATTR    -->false
  if (do_update) {
        __nfs_revalidate_inode
  }

Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M"
is 0.

Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done.

Fixes: 16e1437517 ("NFS: More fine grained attribute tracking")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Colin Ian King
86b936672e NFS: remove redundant initialization of variable result
The variable result is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Chuck Lever
cd2ed9bdc0 NFS: Add a tracepoint in nfs_set_pgio_error()
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00