Commit Graph

933062 Commits

Author SHA1 Message Date
Pavel Begunkov
b90cd197f9 io_uring: set @poll->file after @poll init
It's a good practice to modify fields of a struct after but not before
it was initialised. Even though io_init_poll_iocb() doesn't touch
poll->file, call it first.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:46:05 -06:00
Pavel Begunkov
24c7467863 io_uring: remove REQ_F_MUST_PUNT
REQ_F_MUST_PUNT may seem looking good and clear, but it's the same
as not having REQ_F_NOWAIT set. That rather creates more confusion.
Moreover, it doesn't even affect any behaviour (e.g. see the patch
removing it from io_{read,write}).

Kill theg flag and update already outdated comments.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:46:05 -06:00
Pavel Begunkov
62ef731650 io_uring: remove setting REQ_F_MUST_PUNT in rw
io_{read,write}() {
	...
copy_iov: // prep async
  	if (!(flags & REQ_F_NOWAIT) && !file_can_poll(file))
		flags |= REQ_F_MUST_PUNT;
}

REQ_F_MUST_PUNT there is pointless, because if it happens then
REQ_F_NOWAIT is known to be _not_ set, and the request will go
async path in __io_queue_sqe() anyway. file_can_poll() check
is also repeated in arm_poll*(), so don't need it.

Remove the mentioned assignment REQ_F_MUST_PUNT in preparation
for killing the flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:46:03 -06:00
Jens Axboe
895aa7b1a0 Merge branch 'async-buffered.8' into for-5.9/io_uring
Pull in async buffered reads branch.

* async-buffered.8:
  io_uring: support true async buffered reads, if file provides it
  mm: add kiocb_wait_page_queue_init() helper
  btrfs: flag files as supporting buffered async reads
  xfs: flag files as supporting buffered async reads
  block: flag block devices as supporting IOCB_WAITQ
  fs: add FMODE_BUF_RASYNC
  mm: support async buffered reads in generic_file_buffered_read()
  mm: add support for async page locking
  mm: abstract out wake_page_match() from wake_page_function()
  mm: allow read-ahead with IOCB_NOWAIT set
  io_uring: re-issue block requests that failed because of resources
  io_uring: catch -EIO from buffered issue request failure
  io_uring: always plug for any number of IOs
  block: provide plug based way of signaling forced no-wait semantics
2020-06-21 20:44:36 -06:00
Jens Axboe
bcf5a06304 io_uring: support true async buffered reads, if file provides it
If the file is flagged with FMODE_BUF_RASYNC, then we don't have to punt
the buffered read to an io-wq worker. Instead we can rely on page
unlocking callbacks to support retry based async IO. This is a lot more
efficient than doing async thread offload.

The retry is done similarly to how we handle poll based retry. From
the unlock callback, we simply queue the retry to a task_work based
handler.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:26 -06:00
Jens Axboe
d1932dc3dc mm: add kiocb_wait_page_queue_init() helper
Checks if the file supports it, and initializes the values that we need.
Caller passes in 'data' pointer, if any, and the callback function to
be used.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:26 -06:00
Jens Axboe
8730f12b79 btrfs: flag files as supporting buffered async reads
btrfs uses generic_file_read_iter(), which already supports this.

Acked-by: Chris Mason <clm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
f89fb730aa xfs: flag files as supporting buffered async reads
XFS uses generic_file_read_iter(), which already supports this.

Acked-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
a304f07448 block: flag block devices as supporting IOCB_WAITQ
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
c2a25ec0f1 fs: add FMODE_BUF_RASYNC
If set, this indicates that the file system supports IOCB_WAITQ for
buffered reads.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
1a0a7853b9 mm: support async buffered reads in generic_file_buffered_read()
Use the async page locking infrastructure, if IOCB_WAITQ is set in the
passed in iocb. The caller must expect an -EIOCBQUEUED return value,
which means that IO is started but not done yet. This is similar to how
O_DIRECT signals the same operation. Once the callback is received by
the caller for IO completion, the caller must retry the operation.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
dd3e6d5039 mm: add support for async page locking
Normally waiting for a page to become unlocked, or locking the page,
requires waiting for IO to complete. Add support for lock_page_async()
and wait_on_page_locked_async(), which are callback based instead. This
allows a caller to get notified when a page becomes unlocked, rather
than wait for it.

We add a new iocb field, ki_waitq, to pass in the necessary data for this
to happen. We can unionize this with ki_cookie, since that is only used
for polled IO. Polled IO can never co-exist with async callbacks, as it is
(by definition) polled completions. struct wait_page_key is made public,
and we define struct wait_page_async as the interface between the caller
and the core.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
c7510ab2cf mm: abstract out wake_page_match() from wake_page_function()
No functional changes in this patch, just in preparation for allowing
more callers.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
2e85abf053 mm: allow read-ahead with IOCB_NOWAIT set
The read-ahead shouldn't block, so allow it to be done even if
IOCB_NOWAIT is set in the kiocb.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
b63534c41e io_uring: re-issue block requests that failed because of resources
Mark the plug with nowait == true, which will cause requests to avoid
blocking on request allocation. If they do, we catch them and reissue
them from a task_work based handler.

Normally we can catch -EAGAIN directly, but the hard case is for split
requests. As an example, the application issues a 512KB request. The
block core will split this into 128KB if that's the max size for the
device. The first request issues just fine, but we run into -EAGAIN for
some latter splits for the same request. As the bio is split, we don't
get to see the -EAGAIN until one of the actual reads complete, and hence
we cannot handle it inline as part of submission.

This does potentially cause re-reads of parts of the range, as the whole
request is reissued. There's currently no better way to handle this.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
4503b7676a io_uring: catch -EIO from buffered issue request failure
-EIO bubbles up like -EAGAIN if we fail to allocate a request at the
lower level. Play it safe and treat it like -EAGAIN in terms of sync
retry, to avoid passing back an errant -EIO.

Catch some of these early for block based file, as non-mq devices
generally do not support NOWAIT. That saves us some overhead by
not first trying, then retrying from async context. We can go straight
to async punt instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
ac8691c415 io_uring: always plug for any number of IOs
Currently we only plug if we're doing more than two request. We're going
to be relying on always having the plug there to pass down information,
so plug unconditionally.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Jens Axboe
5a473e8311 block: provide plug based way of signaling forced no-wait semantics
Provide a way for the caller to specify that IO should be marked
with REQ_NOWAIT to avoid blocking on allocation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:25 -06:00
Bijan Mottahedeh
2e0464d48f io_uring: separate reporting of ring pages from registered pages
Ring pages are not pinned so it is more appropriate to report them
as locked.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:01 -06:00
Bijan Mottahedeh
309758254e io_uring: report pinned memory usage
Report pinned memory usage always, regardless of whether locked memory
limit is enforced.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:01 -06:00
Bijan Mottahedeh
aad5d8da1b io_uring: rename ctx->account_mem field
Rename account_mem to limit_name to clarify its purpose.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:01 -06:00
Bijan Mottahedeh
a087e2b519 io_uring: add wrappers for memory accounting
Facilitate separation of locked memory usage reporting vs. limiting for
upcoming patches.  No functional changes.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
[axboe: kill unnecessary () around return in io_account_mem()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:00 -06:00
Jiufei Xue
a31eb4a2f1 io_uring: use EPOLLEXCLUSIVE flag to aoid thundering herd type behavior
Applications can pass this flag in to avoid accept thundering herd.

Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:00 -06:00
Jiufei Xue
5769a351b8 io_uring: change the poll type to be 32-bits
poll events should be 32-bits to cover EPOLLEXCLUSIVE.

Explicit word-swap the poll32_events for big endian to make sure the ABI
is not changed.  We call this feature IORING_FEAT_POLL_32BITS,
applications who want to use EPOLLEXCLUSIVE should check the feature bit
first.

Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-21 20:44:00 -06:00
Linus Torvalds
48778464bb Linux 5.8-rc2 v5.8-rc2 2020-06-21 15:45:29 -07:00
Linus Torvalds
817d914d17 Merge tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux fixes from Paul Moore:
 "Three small patches to fix problems in the SELinux code, all found via
  clang.

  Two patches fix potential double-free conditions and one fixes an
  undefined return value"

* tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix undefined return of cond_evaluate_expr
  selinux: fix a double free in cond_read_node()/cond_read_list()
  selinux: fix double free
2020-06-21 15:41:24 -07:00
Linus Torvalds
16f4aa9b7c Merge tag 'pinctrl-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
 "Some early fixes collected during the first week after the merge
  window, all pretty self-evident, with the details below. The revert is
  the crucial thing.

   - Fix a warning on the Qualcomm SPMI GPIO chip being instatiated
     twice without a unique irqchip struct

   - Use the noirq variants of the suspend and resume callbacks in the
     Tegra driver

   - Clean up the errorpath on the MCP23s08 driver

   - Revert the use of devm_of_iomap() in the Freescale driver as it was
     regressing the platform

   - Add some missing pins in the Qualcomm IPQ6018 driver

   - Fix a simple documentation bug in the pinctrl-single driver"

* tag 'pinctrl-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: single: fix function name in documentation
  pinctrl: qcom: ipq6018 Add missing pins in qpic pin group
  Revert "pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'"
  pinctrl: mcp23s08: Split to three parts: fix ptr_ret.cocci warnings
  pinctrl: tegra: Use noirq suspend/resume callbacks
  pinctrl: qcom: spmi-gpio: fix warning about irq chip reusage
2020-06-21 13:04:57 -07:00
Linus Torvalds
be9160a90d Merge tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - fix -gz=zlib compiler option test for CONFIG_DEBUG_INFO_COMPRESSED

 - improve cc-option in scripts/Kbuild.include to clean up temp files

 - improve cc-option in scripts/Kconfig.include for more reliable
   compile option test

 - do not copy modules.builtin by 'make install' because it would break
   existing systems

 - use 'userprogs' syntax for watch_queue sample

* tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  samples: watch_queue: build sample program for target architecture
  Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n"
  scripts: Fix typo in headers_install.sh
  kconfig: unify cc-option and as-option
  kbuild: improve cc-option to clean up all temporary files
  Makefile: Improve compressed debug info support detection
2020-06-21 12:44:52 -07:00
Linus Torvalds
7561393908 Merge tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - One fix for the interrupt rework we did last release which broke
   KVM-PR

 - Three commits fixing some fallout from the READ_ONCE() changes
   interacting badly with our 8xx 16K pages support, which uses a pte_t
   that is a structure of 4 actual PTEs

 - A cleanup of the 8xx pte_update() to use the newly added pmd_off()

 - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is
   enabled

 - A minor fix for the SPU syscall generation

Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike
Rapoport, Nicholas Piggin.

* tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/8xx: Provide ptep_get() with 16k pages
  mm: Allow arches to provide ptep_get()
  mm/gup: Use huge_ptep_get() in gup_hugepte()
  powerpc/syscalls: Use the number when building SPU syscall table
  powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()
  powerpc/64s: Fix KVM interrupt using wrong save area
  powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
2020-06-21 10:02:53 -07:00
Linus Torvalds
93bbca271a Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - NULL dereference in octeontx

 - PM reference imbalance in ks-sa

 - deadlock in crypto manager

 - memory leak in drbg

 - missing socket limit check on receive SG list size in algif_skcipher

 - typos in caam

 - warnings in ccp and hisilicon

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: drbg - always try to free Jitter RNG instance
  crypto: marvell/octeontx - Fix a potential NULL dereference
  crypto: algboss - don't wait during notifier callback
  crypto: caam - fix typos
  crypto: ccp - Fix sparse warnings in sev-dev
  crypto: hisilicon - Cap block size at 2^31
  crypto: algif_skcipher - Cap recv SG list at ctx->used
  hwrng: ks-sa - Fix runtime PM imbalance on error
2020-06-21 10:01:03 -07:00
Masahiro Yamada
214377e9b7 samples: watch_queue: build sample program for target architecture
This userspace program includes UAPI headers exported to usr/include/.
'make headers' always works for the target architecture (i.e. the same
architecture as the kernel), so the sample program should be built for
the target as well. Kbuild now supports 'userprogs' for that.

I also guarded the CONFIG option by 'depends on CC_CAN_LINK' because
$(CC) may not provide libc.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-22 01:56:09 +09:00
Masahiro Yamada
2c6d9636ad Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n"
This reverts commit e0b250b57d,
which broke build systems that need to install files to a certain
path, but do not set INSTALL_MOD_PATH when invoking 'make install'.

  $ make INSTALL_PATH=/tmp/destdir install
  mkdir: cannot create directory ‘/lib/modules/5.8.0-rc1+/’: Permission denied
  Makefile:1342: recipe for target '_builtin_inst_' failed
  make: *** [_builtin_inst_] Error 1

While modules.builtin is useful also for CONFIG_MODULES=n, this change
in the behavior is quite unexpected. Maybe "make modules_install"
can install modules.builtin irrespective of CONFIG_MODULES as Jonas
originally suggested.

Anyway, that commit should be reverted ASAP.

Reported-by: Douglas Anderson <dianders@chromium.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
2020-06-22 00:19:14 +09:00
Linus Torvalds
64677779e8 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "One minor fix and two patches reworking the ata dma drain for the
  !CONFIG_LIBATA case. The latter is a 5.7 regression fix"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: Wire up ata_scsi_dma_need_drain for SAS HBA drivers
  scsi: libata: Provide an ata_scsi_dma_need_drain stub for !CONFIG_ATA
  scsi: ufs-bsg: Fix runtime PM imbalance on error
2020-06-20 19:23:13 -07:00
Linus Torvalds
a5c6a1f0fe Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:

 - a small collection of remaining API conversion patches (all acked)
   which allow to finally remove the deprecated API

 - some documentation fixes and a MAINTAINERS addition

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: Add robert and myself as qcom i2c cci maintainers
  i2c: smbus: Fix spelling mistake in the comments
  Documentation/i2c: SMBus start signal is S not A
  i2c: remove deprecated i2c_new_device API
  Documentation: media: convert to use i2c_new_client_device()
  video: backlight: tosa_lcd: convert to use i2c_new_client_device()
  x86/platform/intel-mid: convert to use i2c_new_client_device()
  drm: encoder_slave: use new I2C API
  drm: encoder_slave: fix refcouting error for modules
2020-06-20 19:18:27 -07:00
Drew Fustini
25fae75215 pinctrl: single: fix function name in documentation
Use the correct the function name in the documentation for
"pcs_parse_one_pinctrl_entry()".

"smux_parse_one_pinctrl_entry()" appears to be an artifact from the
development of a prior patch series ("simple pinmux driver") which
transformed into pinctrl-single.

Signed-off-by: Drew Fustini <drew@beagleboard.org>
Link: https://lore.kernel.org/r/20200612112758.GA3407886@x1
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-06-20 22:41:32 +02:00
Linus Torvalds
8b6ddd10d6 Merge tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Have recordmcount work with > 64K sections (to support LTO)

 - kprobe RCU fixes

 - Correct a kprobe critical section with missing mutex

 - Remove redundant arch_disarm_kprobe() call

 - Fix lockup when kretprobe triggers within kprobe_flush_task()

 - Fix memory leak in fetch_op_data operations

 - Fix sleep in atomic in ftrace trace array sample code

 - Free up memory on failure in sample trace array code

 - Fix incorrect reporting of function_graph fields in format file

 - Fix quote within quote parsing in bootconfig

 - Fix return value of bootconfig tool

 - Add testcases for bootconfig tool

 - Fix maybe uninitialized warning in ftrace pid file code

 - Remove unused variable in tracing_iter_reset()

 - Fix some typos

* tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix maybe-uninitialized compiler warning
  tools/bootconfig: Add testcase for show-command and quotes test
  tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig
  tools/bootconfig: Fix to use correct quotes for value
  proc/bootconfig: Fix to use correct quotes for value
  tracing: Remove unused event variable in tracing_iter_reset
  tracing/probe: Fix memleak in fetch_op_data operations
  trace: Fix typo in allocate_ftrace_ops()'s comment
  tracing: Make ftrace packed events have align of 1
  sample-trace-array: Remove trace_array 'sample-instance'
  sample-trace-array: Fix sleeping function called from invalid context
  kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  kprobes: Remove redundant arch_disarm_kprobe() call
  kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
  kprobes: Use non RCU traversal APIs on kprobe_tables if possible
  kprobes: Suppress the suspicious RCU warning on kprobes
  recordmcount: support >64k sections
2020-06-20 13:17:47 -07:00
Linus Torvalds
eede2b9b3f Merge tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
 "A feature (papr_scm health retrieval) and a fix (sysfs attribute
  visibility) for v5.8.

  Vaibhav explains in the merge commit below why missing v5.8 would be
  painful and I agreed to try a -rc2 pull because only cosmetics kept
  this out of -rc1 and his initial versions were posted in more than
  enough time for v5.8 consideration:

   'These patches are tied to specific features that were committed to
    customers in upcoming distros releases (RHEL and SLES) whose
    time-lines are tied to 5.8 kernel release.

    Being able to track the health of an nvdimm is critical for our
    customers that are running workloads leveraging papr-scm nvdimms.
    Missing the 5.8 kernel would mean missing the distro timelines and
    shifting forward the availability of this feature in distro kernels
    by at least 6 months'

  Summary:

   - Fix the visibility of the region 'align' attribute.

     The new unit tests for region alignment handling caught a corner
     case where the alignment cannot be specified if the region is
     converted from static to dynamic provisioning at runtime.

   - Add support for device health retrieval for the persistent memory
     supported by the papr_scm driver.

     This includes both the standard sysfs "health flags" that the nfit
     persistent memory driver publishes and a mechanism for the ndctl
     tool to retrieve a health-command payload"

* tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm/region: always show the 'align' attribute
  powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
  ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
  powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
  powerpc/papr_scm: Fetch nvdimm health information from PHYP
  seq_buf: Export seq_buf_printf
  powerpc: Document details on H_SCM_HEALTH hcall
2020-06-20 13:13:21 -07:00
Sivaprakash Murugesan
7f5f4de83c pinctrl: qcom: ipq6018 Add missing pins in qpic pin group
The patch adds missing qpic data pins to qpic pingroup. These pins are
necessary for the qpic nand to work.

Fixes: ef1ea54eab ("pinctrl: qcom: Add ipq6018 pinctrl driver")
Signed-off-by: Sivaprakash Murugesan <sivaprak@codeaurora.org>
Link: https://lore.kernel.org/r/1592541089-17700-1-git-send-email-sivaprak@codeaurora.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-06-20 22:05:24 +02:00
Haibo Chen
13f2d25b95 Revert "pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'"
This reverts commit ba40324261.

After commit 26d8cde526 ("pinctrl: freescale: imx: add shared
input select reg support"). i.MX7D has two iomux controllers
iomuxc and iomuxc-lpsr which share select_input register for
daisy chain settings.
If use 'devm_of_iomap()', when probe the iomuxc-lpsr, will call
devm_request_mem_region() for the region <0x30330000-0x3033ffff>
for the first time. Then, next time when probe the iomuxc, API
devm_platform_ioremap_resource() will also use the API
devm_request_mem_region() for the share region <0x30330000-0x3033ffff>
again, then cause issue, log like below:

[    0.179561] imx7d-pinctrl 302c0000.iomuxc-lpsr: initialized IMX pinctrl driver
[    0.191742] imx7d-pinctrl 30330000.pinctrl: can't request region for resource [mem 0x30330000-0x3033ffff]
[    0.191842] imx7d-pinctrl: probe of 30330000.pinctrl failed with error -16

Fixes: ba40324261 ("pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Link: https://lore.kernel.org/r/1591673223-1680-1-git-send-email-haibo.chen@nxp.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-06-20 22:04:39 +02:00
Linus Torvalds
1566feea45 Merge tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:

 - a few ptrace fixes mostly for strace and seccomp_bpf kernel tests
   findings

 - cleanup unused pm callbacks in virtio ccw

 - replace kmalloc + memset with kzalloc in crypto

 - use $(LD) for vDSO linkage to make clang happy

 - fix vDSO clock_getres() to preserve the same behaviour as
   posix_get_hrtimer_res()

 - fix workqueue cpumask warning when NUMA=n and nr_node_ids=2

 - reduce SLSB writes during input processing, improve warnings and
   cleanup qdio_data usage in qdio

 - a few fixes to use scnprintf() instead of snprintf()

* tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix syscall_get_error for compat processes
  s390/qdio: warn about unexpected SLSB states
  s390/qdio: clean up usage of qdio_data
  s390/numa: let NODES_SHIFT depend on NEED_MULTIPLE_NODES
  s390/vdso: fix vDSO clock_getres()
  s390/vdso: Use $(LD) instead of $(CC) to link vDSO
  s390/protvirt: use scnprintf() instead of snprintf()
  s390: use scnprintf() in sys_##_prefix##_##_name##_show
  s390/crypto: use scnprintf() instead of snprintf()
  s390/zcrypt: use kzalloc
  s390/virtio: remove unused pm callbacks
  s390/qdio: reduce SLSB writes during Input Queue processing
  selftests/seccomp: s390 shares the syscall and return value register
  s390/ptrace: fix setting syscall number
  s390/ptrace: pass invalid syscall numbers to tracing
  s390/ptrace: return -ENOSYS when invalid syscall is supplied
  s390/seccomp: pass syscall arguments via seccomp_data
  s390/qdio: fine-tune SLSB update
2020-06-20 12:31:08 -07:00
Linus Torvalds
7fdfbe08a2 Merge tag 'riscv-for-linus-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - a workaround for a compiler surprise related to the "r" inline
   assembly that allows LLVM to boot.

 - a fix to avoid WX-only mappings, which the ISA does not allow. While
   this probably manifests in many ways, the bug was found in stress-ng.

 - a missing lock in set_direct_map_*(), which due to a recent lockdep
   change started asserting.

* tag 'riscv-for-linus-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Acquire mmap lock before invoking walk_page_range
  RISC-V: Don't allow write+exec only page mapping request in mmap
  riscv/atomic: Fix sign extension for RV64I
2020-06-20 12:14:29 -07:00
Linus Torvalds
27c2760561 Merge tag 'linux-kselftest-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest cleanups from Shuah Khan:

 - ftrace "requires:" list for simplifying and unifying requirement
   checks for each test case, adding "requires:" line instead of
   checking required ftrace interfaces in each test case.

 - a minor spelling correction patch

* tag 'linux-kselftest-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Support ":README" suffix for requires
  selftests/ftrace: Support ":tracer" suffix for requires
  selftests/ftrace: Convert check_filter_file() with requires list
  selftests/ftrace: Convert required interface checks into requires list
  selftests/ftrace: Add "requires:" list support
  selftests/ftrace: Return unsupported for the unconfigured features
  selftests/ftrace: Allow ":" in description
  tools: testing: ftrace: trigger: fix spelling mistake
2020-06-20 12:10:09 -07:00
David Howells
5481fc6eb8 afs: Fix hang on rmmod due to outstanding timer
The fileserver probe timer, net->fs_probe_timer, isn't cancelled when
the kafs module is being removed and so the count it holds on
net->servers_outstanding doesn't get dropped..

This causes rmmod to wait forever.  The hung process shows a stack like:

	afs_purge_servers+0x1b5/0x23c [kafs]
	afs_net_exit+0x44/0x6e [kafs]
	ops_exit_list+0x72/0x93
	unregister_pernet_operations+0x14c/0x1ba
	unregister_pernet_subsys+0x1d/0x2a
	afs_exit+0x29/0x6f [kafs]
	__do_sys_delete_module.isra.0+0x1a2/0x24b
	do_syscall_64+0x51/0x95
	entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by:

 (1) Attempting to cancel the probe timer and, if successful, drop the
     count that the timer was holding.

 (2) Make the timer function just drop the count and not schedule the
     prober if the afs portion of net namespace is being destroyed.

Also, whilst we're at it, make the following changes:

 (3) Initialise net->servers_outstanding to 1 and decrement it before
     waiting on it so that it doesn't generate wake up events by being
     decremented to 0 until we're cleaning up.

 (4) Switch the atomic_dec() on ->servers_outstanding for ->fs_timer in
     afs_purge_servers() to use the helper function for that.

Fixes: f6cbb368bc ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-20 12:01:58 -07:00
David Howells
f8ea5c7bce afs: Fix afs_do_lookup() to call correct fetch-status op variant
Fix afs_do_lookup()'s fallback case for when FS.InlineBulkStatus isn't
supported by the server.

In the fallback, it calls FS.FetchStatus for the specific vnode it's
meant to be looking up.  Commit b6489a49f7 broke this by renaming one
of the two identically-named afs_fetch_status_operation descriptors to
something else so that one of them could be made non-static.  The site
that used the renamed one, however, wasn't renamed and didn't produce
any warning because the other was declared in a header.

Fix this by making afs_do_lookup() use the renamed variant.

Note that there are two variants of the success method because one is
called from ->lookup() where we may or may not have an inode, but can't
call iget until after we've talked to the server - whereas the other is
called from within iget where we have an inode, but it may or may not be
initialised.

The latter variant expects there to be an inode, but because it's being
called from there former case, there might not be - resulting in an oops
like the following:

  BUG: kernel NULL pointer dereference, address: 00000000000000b0
  ...
  RIP: 0010:afs_fetch_status_success+0x27/0x7e
  ...
  Call Trace:
    afs_wait_for_operation+0xda/0x234
    afs_do_lookup+0x2fe/0x3c1
    afs_lookup+0x3c5/0x4bd
    __lookup_slow+0xcd/0x10f
    walk_component+0xa2/0x10c
    path_lookupat.isra.0+0x80/0x110
    filename_lookup+0x81/0x104
    vfs_statx+0x76/0x109
    __do_sys_newlstat+0x39/0x6b
    do_syscall_64+0x4c/0x78
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: b6489a49f7 ("afs: Fix silly rename")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-20 12:01:58 -07:00
Christophe Leroy
c0e1c8c22b powerpc/8xx: Provide ptep_get() with 16k pages
READ_ONCE() now enforces atomic read, which leads to:

  CC      mm/gup.o
In file included from ./include/linux/kernel.h:11:0,
                 from mm/gup.c:2:
In function 'gup_hugepte.constprop',
    inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
    prefix ## suffix();    \
    ^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
  compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
  ^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
  compiletime_assert_rwonce_type(x);    \
  ^
mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
  pte = READ_ONCE(*ptep);
        ^
In function 'gup_get_pte',
    inlined from 'gup_pte_range' at mm/gup.c:2228:9,
    inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
    inlined from 'gup_pud_range' at mm/gup.c:2641:15,
    inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
    inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
    inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
    prefix ## suffix();    \
    ^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
  compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
  ^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
  compiletime_assert_rwonce_type(x);    \
  ^
mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE'
  return READ_ONCE(*ptep);
         ^
make[2]: *** [mm/gup.o] Error 1

Define ptep_get() on 8xx when using 16k pages.

Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/341688399c1b102756046d19ea6ce39db1ae4742.1592225558.git.christophe.leroy@csgroup.eu
2020-06-20 22:14:54 +10:00
Christophe Leroy
481e980a7c mm: Allow arches to provide ptep_get()
Since commit 9e343b467c ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses") it is not possible anymore to
use READ_ONCE() to access complex page table entries like the one
defined for powerpc 8xx with 16k size pages.

Define a ptep_get() helper that architectures can override instead
of performing a READ_ONCE() on the page table entry pointer.

Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu
2020-06-20 22:14:53 +10:00
Christophe Leroy
55ca22633a mm/gup: Use huge_ptep_get() in gup_hugepte()
gup_hugepte() reads hugepage table entries, it can't read
them directly, huge_ptep_get() must be used.

Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ffc3714334c3bfaca6f13788ad039e8759ae413f.1592225558.git.christophe.leroy@csgroup.eu
2020-06-20 22:14:53 +10:00
Dan Williams
9df24eaef8 Merge branch 'for-5.8/papr_scm' into libnvdimm-for-next
Include the papr_scm health retrieval feature for v5.8-rc2. The
functionality was initially posted well in advance of the merge window,
but review comments and a late build-bot warning kept them out of the
v5.8-rc1 libnvdimm pull request.

Vaibhav notes:
These patches are tied to specific features that were committed to
customers in upcoming distros releases (RHEL and SLES) whose time-lines
are tied to 5.8 kernel release.

Being able to track the health of an nvdimm is critical for our
customers that are running workloads leveraging papr-scm nvdimms.
Missing the 5.8 kernel would mean missing the distro timelines and
shifting forward the availability of this feature in distro kernels by
at least 6 months.
2020-06-19 14:18:51 -07:00
Linus Torvalds
4333a9b0b6 Merge tag 'io_uring-5.8-2020-06-19' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:

 - Catch a case where io_sq_thread() didn't do proper mm acquire

 - Ensure poll completions are reaped on shutdown

 - Async cancelation and run fixes (Pavel)

 - io-poll race fixes (Xiaoguang)

 - Request cleanup race fix (Xiaoguang)

* tag 'io_uring-5.8-2020-06-19' of git://git.kernel.dk/linux-block:
  io_uring: fix possible race condition against REQ_F_NEED_CLEANUP
  io_uring: reap poll completions while waiting for refs to drop on exit
  io_uring: acquire 'mm' for task_work for SQPOLL
  io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  io_uring: don't fail links for EAGAIN error in IOPOLL mode
  io_uring: cancel by ->task not pid
  io_uring: lazy get task
  io_uring: batch cancel in io_uring_cancel_files()
  io_uring: cancel all task's requests on exit
  io-wq: add an option to cancel all matched reqs
  io-wq: reorder cancellation pending -> running
  io_uring: fix lazy work init
2020-06-19 13:16:58 -07:00
Linus Torvalds
d2b1c81f5f Merge tag 'block-5.8-2020-06-19' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - Use import_uuid() where appropriate (Andy)

 - bcache fixes (Coly, Mauricio, Zhiqiang)

 - blktrace sparse warnings fix (Jan)

 - blktrace concurrent setup fix (Luis)

 - blkdev_get use-after-free fix (Jason)

 - Ensure all blk-mq maps are updated (Weiping)

 - Loop invalidate bdev fix (Zheng)

* tag 'block-5.8-2020-06-19' of git://git.kernel.dk/linux-block:
  block: make function 'kill_bdev' static
  loop: replace kill_bdev with invalidate_bdev
  partitions/ldm: Replace uuid_copy() with import_uuid() where it makes sense
  block: update hctx map when use multiple maps
  blktrace: Avoid sparse warnings when assigning q->blk_trace
  blktrace: break out of blktrace setup on concurrent calls
  block: Fix use-after-free in blkdev_get()
  trace/events/block.h: drop kernel-doc for dropped function parameter
  blk-mq: Remove redundant 'return' statement
  bcache: pr_info() format clean up in bcache_device_init()
  bcache: use delayed kworker fo asynchronous devices registration
  bcache: check and adjust logical block size for backing devices
  bcache: fix potential deadlock problem in btree_gc_coalesce
2020-06-19 13:11:26 -07:00