Commit Graph

104104 Commits

Author SHA1 Message Date
Christoph Hellwig
6e768461c2 block: remove bvec_to_phys
We only use it in biovec_phys_mergeable and a m68k paravirt driver,
so just opencode it there.  Also remove the pointless unsigned long cast
for the offset in the opencoded instances.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:59 -06:00
Christoph Hellwig
3dccdae54f block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable
These two checks should always be performed together, so merge them into
a single helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:57 -06:00
Christoph Hellwig
6a9f5f240a block: simplify BIOVEC_PHYS_MERGEABLE
Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.

Also rename the function to biovec_phys_mergeable as there is no need
to shout.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:54 -06:00
Christoph Hellwig
27ca1d4ed0 block: move req_gap_back_merge to blk.h
No need to expose these helpers outside the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:53 -06:00
Christoph Hellwig
e9907009cb block: move req_gap_{back,front}_merge to blk-merge.c
Keep it close to the actual users instead of exposing the function to all
drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:51 -06:00
Christoph Hellwig
43b729bfe9 block: move integrity_req_gap_{back,front}_merge to blk.h
No need to expose these to drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24 12:33:50 -06:00
Dennis Zhou (Facebook)
101246ec02 blkcg: rename blkg_try_get to blkg_tryget
blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:19 -06:00
Dennis Zhou (Facebook)
b3b9f24f5f blkcg: change blkg reference counting to use percpu_ref
Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:18 -06:00
Dennis Zhou (Facebook)
e2b0989954 blkcg: cleanup and make blk_get_rl use blkg_lookup_create
blk_get_rl is responsible for identifying which request_list a request
should be allocated to. Try get logic was added earlier, but
semantically the logic was not changed.

This patch makes better use of the bio already having a reference to the
blkg in the hot path. The cold path uses a better fallback of
blkg_lookup_create rather than just blkg_lookup and then falling back to
the q->root_rl. If lookup_create fails with anything but -ENODEV, it
falls back to q->root_rl.

A clarifying comment is added to explain why q->root_rl is used rather
than the root blkg's rl.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:16 -06:00
Dennis Zhou (Facebook)
f0fcb3ec89 blkcg: remove additional reference to the css
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:15 -06:00
Dennis Zhou (Facebook)
c839e7a03f blkcg: remove bio->bi_css and instead use bio->bi_blkg
Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.

This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:13 -06:00
Dennis Zhou (Facebook)
bdc2491708 blkcg: associate writeback bios with a blkg
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg. In
this patch, the wbc_init_bio call is changed such that it must be called
after a queue has been associated with the bio.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:11 -06:00
Dennis Zhou (Facebook)
74b7c02a9b blkcg: associate a blkg for pages being evicted by swap
A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:09 -06:00
Dennis Zhou (Facebook)
5bf9a1f3b4 blkcg: consolidate bio_issue_init to be a part of core
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:08 -06:00
Dennis Zhou (Facebook)
a7b39b4e96 blkcg: always associate a bio with a blkg
Previously, blkg's were only assigned as needed by blk-iolatency and
blk-throttle. bio->css was also always being associated while blkg was
being looked up and then thrown away in blkcg_bio_issue_check.

This patch begins the cleanup of bio->css and bio->bi_blkg by always
associating a blkg in blkcg_bio_issue_check. This tries to create the
blkg, but if it is not possible, falls back to using the root_blkg of
the request_queue. Therefore, a bio will always be associated with a
blkg. The duplicate association logic is removed from blk-throttle and
blk-iolatency.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:06 -06:00
Dennis Zhou (Facebook)
07b05bcc32 blkcg: convert blkg_lookup_create to find closest blkg
There are several scenarios where blkg_lookup_create can fail. Examples
include the blkcg dying, request_queue is dying, or simply being OOM. At
the end of the day, most handle this by simply falling back to the
q->root_blkg and calling it a day.

This patch implements the notion of closest blkg. During
blkg_lookup_create, if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest is introduced and used
during association so a bio is always attached to a blkg.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:05 -06:00
Dennis Zhou (Facebook)
49f4c2dc2b blkcg: update blkg_lookup_create to do locking
To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create. This will be used in
upcoming patches.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:03 -06:00
Dennis Zhou (Facebook)
27e6fa996c blkcg: fix ref count issue with bio_blkcg using task_css
The accessor function bio_blkcg either returns the blkcg associated with
the bio or finds one in the current context. This can cause an issue
when trying to associate a bio with a blkcg. Particularly, it's the
third case that is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

As the above may race against task migration and the cgroup exiting, it
is not always ok to take a reference on the blkcg returned from
bio_blkcg.

This patch adds association ahead of calling bio_blkcg rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg. blk_get_rl is modified as well to get
a reference to the blkcg it may use and blk_put_rl will always put the
reference back. Association is also moved above the bio_blkcg call to
ensure it will not return NULL in blk-iolatency.

BFQ and CFQ utilize this flaw, but due to the complexity, I do not want
to address this in this series. I've created a private version of the
function with notes not to use it describing the flaw. Hopefully soon,
that code can be cleaned up.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:02 -06:00
Ming Lei
7759eb23fd block: remove bio_rewind_iter()
It is pointed that bio_rewind_iter() is one very bad API[1]:

1) bio size may not be restored after rewinding

2) it causes some bogus change, such as 5151842b9d (block: reset
bi_iter.bi_done after splitting bio)

3) rewinding really makes things complicated wrt. bio splitting

4) unnecessary updating of .bi_done in fast path

[1] https://marc.info/?t=153549924200005&r=1&w=2

So this patch takes Kent's suggestion to restore one bio into its original
state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(),
given now bio_rewind_iter() is only used by bio integrity code.

Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Hannes Reinecke <hare@suse.com>
Suggested-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-06 15:12:24 -06:00
Linus Torvalds
ca16eb342e Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Small collection of fixes that should go into this release. This
  contains:

   - Small series that fixes a race between blkcg teardown and writeback
     (Dennis Zhou)

   - Fix disallowing invalid block size settings from the nbd ioctl (me)

   - BFQ fix for a use-after-free on last release of a bfqg (Konstantin
     Khlebnikov)

   - Fix for the "don't warn for flush" fix (Mikulas)"

* tag 'for-linus-20180906' of git://git.kernel.dk/linux-block:
  block: bfq: swap puts in bfqg_and_blkg_put
  block: don't warn when doing fsync on read-only devices
  nbd: don't allow invalid blocksize settings
  blkcg: use tryget logic when associating a blkg with a bio
  blkcg: delay blkg destruction until after writeback has finished
  Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
2018-09-06 14:01:15 -07:00
Linus Torvalds
be65e2595b Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "This fixes two annoying bugs:

   - The first one is a side effect caused by using SRCU for rcuidle
     tracepoints. It seems that the perf was depending on the rcuidle
     tracepoints to make RCU watch when it wasn't.

     The real fix will be to have perf use SRCU instead of depending on
     RCU watching, but that can't be done until SRCU is safe to use in
     NMI context (Paul's working on that).

   - The second bug fix is for a bug that's been periodically making my
     tests fail randomly for some time. I haven't had time to track it
     down, but finally have. It has to do with stressing NMIs (via perf)
     while enabling or disabling ftrace function handling with lockdep
     enabled.

     If an interrupt happens and just as it returns, it sets lockdep
     back to "interrupts enabled" but before it returns an NMI is
     triggered, and if this happens while printk_nmi_enter has a
     breakpoint attached to it (because ftrace is converting it to or
     from nop to call fentry), the breakpoint trap also calls into
     lockdep, and since returning from the NMI to a interrupt handler,
     interrupts were disabled when the NMI went off, lockdep keeps its
     state as interrupts disabled when it returns back from the
     interrupt handler where interrupts are enabled.

     This causes lockdep_assert_irqs_enabled() to trigger a false
     positive"

* tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  printk/tracing: Do not trace printk_nmi_enter()
  tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
2018-09-06 09:06:49 -07:00
Steven Rostedt (VMware)
865e63b04e tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
Borislav reported the following splat:

 =============================
 WARNING: suspicious RCU usage
 4.19.0-rc1+ #1 Not tainted
 -----------------------------
 ./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:

 RCU used illegally from idle CPU!
 rcu_scheduler_active = 2, debug_locks = 1
 RCU used illegally from extended quiescent state!
 1 lock held by swapper/0/0:
  #0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #1
 Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
 Call Trace:
  dump_stack+0x85/0xcb
  perf_event_output_forward+0xf6/0x130
  __perf_event_overflow+0x52/0xe0
  perf_swevent_overflow+0x91/0xb0
  perf_tp_event+0x11a/0x350
  ? find_held_lock+0x2d/0x90
  ? __lock_acquire+0x2ce/0x1350
  ? __lock_acquire+0x2ce/0x1350
  ? retint_kernel+0x2d/0x2d
  ? find_held_lock+0x2d/0x90
  ? tick_nohz_get_sleep_length+0x83/0xb0
  ? perf_trace_cpu+0xbb/0xd0
  ? perf_trace_buf_alloc+0x5a/0xa0
  perf_trace_cpu+0xbb/0xd0
  cpuidle_enter_state+0x185/0x340
  do_idle+0x1eb/0x260
  cpu_startup_entry+0x5f/0x70
  start_kernel+0x49b/0x4a6
  secondary_startup_64+0xa4/0xb0

This is due to the tracepoints moving to SRCU usage which does not require
RCU to be "watching". But perf uses these tracepoints with RCU and expects
it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson()
calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU
working in NMI context, and then perf can be converted to use that instead
of normal RCU.

Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home

Cc: x86-ml <x86@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Fixes: e6753f23d9 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-05 11:23:21 -04:00
Linus Torvalds
0e9b103950 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "17 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  nilfs2: convert to SPDX license tags
  drivers/dax/device.c: convert variable to vm_fault_t type
  lib/Kconfig.debug: fix three typos in help text
  checkpatch: add __ro_after_init to known $Attribute
  mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
  uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
  memory_hotplug: fix kernel_panic on offline page processing
  checkpatch: add optional static const to blank line declarations test
  ipc/shm: properly return EIDRM in shm_lock()
  mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
  mm/util.c: improve kvfree() kerneldoc
  tools/vm/page-types.c: fix "defined but not used" warning
  tools/vm/slabinfo.c: fix sign-compare warning
  kmemleak: always register debugfs file
  mm: respect arch_dup_mmap() return value
  mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
  mm: memcontrol: print proper OOM header when no eligible victim left
2018-09-04 17:01:11 -07:00
Randy Dunlap
8a2336e549 uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
Since this header is in "include/uapi/linux/", apparently people want to
use it in userspace programs -- even in C++ ones.  However, the header
uses a C++ reserved keyword ("private"), so change that to "dh_private"
instead to allow the header file to be used in C++ userspace.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=191051
Link: http://lkml.kernel.org/r/0db6c314-1ef4-9bfa-1baa-7214dd2ee061@infradead.org
Fixes: ddbb411487 ("KEYS: Add KEYCTL_DH_COMPUTE command")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-04 16:45:02 -07:00
Linus Torvalds
28619527b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Must perform TXQ teardown before unregistering interfaces in
    mac80211, from Toke Høiland-Jørgensen.

 2) Don't allow creating mac80211_hwsim with less than one channel, from
    Johannes Berg.

 3) Division by zero in cfg80211, fix from Johannes Berg.

 4) Fix endian issue in tipc, from Haiqing Bai.

 5) BPF sockmap use-after-free fixes from Daniel Borkmann.

 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.

 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.

 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
    kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.

 9) Fix deadlock in hv_netvsc, from Dexuan Cui.

10) Do not restart timewait timer on RST, from Florian Westphal.

11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.

12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.

13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.

14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.

15) Use-after-free in act_ife, fix from Cong Wang.

16) Missing hold to meta module in act_ife, from Vlad Buslov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
  net: phy: sfp: Handle unimplemented hwmon limits and alarms
  net: sched: action_ife: take reference to meta module
  act_ife: fix a potential use-after-free
  net/mlx5: Fix SQ offset in QPs with small RQ
  tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
  tipc: correct spelling errors for struct tipc_bc_base's comment
  bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
  bnxt_en: Clean up unused functions.
  bnxt_en: Fix firmware signaled resource change logic in open.
  sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
  sctp: fix invalid reference to the index variable of the iterator
  net/ibm/emac: wrong emac_calc_base call was used by typo
  net: sched: null actions array pointer before releasing action
  vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
  r8169: add support for NCube 8168 network card
  ip6_tunnel: respect ttl inherit for ip6tnl
  mac80211: shorten the IBSS debug messages
  mac80211: don't Tx a deauth frame if the AP forbade Tx
  mac80211: Fix station bandwidth setting after channel switch
  mac80211: fix a race between restart and CSA flows
  ...
2018-09-04 12:45:11 -07:00
David S. Miller
fc3e3bf55f Merge tag 'mac80211-for-davem-2018-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
Here are quite a large number of fixes, notably:
 * various A-MSDU building fixes (currently only affects mt76)
 * syzkaller & spectre fixes in hwsim
 * TXQ vs. teardown fix that was causing crashes
 * embed WMM info in reg rule, bad code here had been causing crashes
 * one compilation issue with fix from Arnd (rfkill-gpio includes)
 * fixes for a race and bad data during/after channel switch
 * nl80211: a validation fix, attribute type & unit fixes
along with other small fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 22:12:02 -07:00
Gleb Fotengauer-Malinovskiy
c48300c92a vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
The _IOC_READ flag fits this ioctl request more because this request
actually only writes to, but doesn't read from userspace.
See NOTEs in include/uapi/asm-generic/ioctl.h for more information.

Fixes: 429711aec2 ("vhost: switch to use new message format")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 21:23:24 -07:00
Anthony Wong
9fd0e09a4e r8169: add support for NCube 8168 network card
This card identifies itself as:
  Ethernet controller [0200]: NCube Device [10ff:8168] (rev 06)
  Subsystem: TP-LINK Technologies Co., Ltd. Device [7470:3468]

Adding a new entry to rtl8169_pci_tbl makes the card work.

Link: http://launchpad.net/bugs/1788730
Signed-off-by: Anthony Wong <anthony.wong@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 19:05:13 -07:00
Vinson Lee
59a03fea13 uapi: Fix linux/rds.h userspace compilation errors.
Include linux/in6.h for struct in6_addr.

/usr/include/linux/rds.h:156:18: error: field ‘laddr’ has incomplete type
  struct in6_addr laddr;
                  ^~~~~
/usr/include/linux/rds.h:157:18: error: field ‘faddr’ has incomplete type
  struct in6_addr faddr;
                  ^~~~~
/usr/include/linux/rds.h:178:18: error: field ‘laddr’ has incomplete type
  struct in6_addr laddr;
                  ^~~~~
/usr/include/linux/rds.h:179:18: error: field ‘faddr’ has incomplete type
  struct in6_addr faddr;
                  ^~~~~
/usr/include/linux/rds.h:198:18: error: field ‘bound_addr’ has incomplete type
  struct in6_addr bound_addr;
                  ^~~~~~~~~~
/usr/include/linux/rds.h:199:18: error: field ‘connected_addr’ has incomplete type
  struct in6_addr connected_addr;
                  ^~~~~~~~~~~~~~
/usr/include/linux/rds.h:219:18: error: field ‘local_addr’ has incomplete type
  struct in6_addr local_addr;
                  ^~~~~~~~~~
/usr/include/linux/rds.h:221:18: error: field ‘peer_addr’ has incomplete type
  struct in6_addr peer_addr;
                  ^~~~~~~~~
/usr/include/linux/rds.h:245:18: error: field ‘src_addr’ has incomplete type
  struct in6_addr src_addr;
                  ^~~~~~~~
/usr/include/linux/rds.h:246:18: error: field ‘dst_addr’ has incomplete type
  struct in6_addr dst_addr;
                  ^~~~~~~~

Fixes: b7ff8b1036 ("rds: Extend RDS API for IPv6 support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02 16:14:44 -07:00
Linus Torvalds
fd6868d82b Merge tag 'devicetree-fixes-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
 "A couple of new helper functions in preparation for some tree wide
  clean-ups.

  I'm sending these new helpers now for rc2 in order to simplify the
  dependencies on subsequent cleanups across the tree in 4.20"

* tag 'devicetree-fixes-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Add device_type access helper functions
  of: add node name compare helper functions
  of: add helper to lookup compatible child node
2018-09-02 10:56:01 -07:00
Dennis Zhou (Facebook)
59b57717ff blkcg: delay blkg destruction until after writeback has finished
Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab0c ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-31 14:48:56 -06:00
Dennis Zhou (Facebook)
6b06546206 Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
This reverts commit 4c6994806f.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c6994806f) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c6994806f ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-31 14:48:54 -06:00
Linus Torvalds
420f51f4ab Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "A few arm64 fixes came in this week, specifically fixing some nasty
  truncation of return values from firmware calls and resolving a
  VM_BUG_ON due to accessing uninitialised struct pages corresponding to
  NOMAP pages.

  Summary:

   - Fix typos in SVE documentation

   - Fix type-checking and implicit truncation for SMCCC calls

   - Force CONFIG_HOLES_IN_ZONE=y so that SLAB doesn't fall over NOMAP
     regions"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: always enable CONFIG_HOLES_IN_ZONE
  arm/arm64: smccc-1.1: Handle function result as parameters
  arm/arm64: smccc-1.1: Make return values unsigned long
  Documentation/arm64/sve: Couple of improvements and typos
2018-08-31 09:20:30 -07:00
Linus Torvalds
754cf4b243 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:

 - regression fixes for i801 and designware

 - better API and leak fix for releasing DMA safe buffers

 - better greppable strings for the bitbang algorithm

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: sh_mobile: fix leak when using DMA bounce buffer
  i2c: sh_mobile: define start_ch() void as it only returns 0 anyhow
  i2c: refactor function to release a DMA safe buffer
  i2c: algos: bit: make the error messages grepable
  i2c: designware: Re-init controllers with pm_disabled set on resume
  i2c: i801: Allow ACPI AML access I/O ports not reserved for SMBus
2018-08-31 08:38:53 -07:00
Rob Herring
0413bedabc of: Add device_type access helper functions
In preparation to remove direct access to device_node.type, add
of_node_is_type() and of_node_get_device_type() helpers to check and
retrieve the device type.

Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2018-08-31 08:30:42 -04:00
Wolfram Sang
82fe39a6bc i2c: refactor function to release a DMA safe buffer
a) rename to 'put' instead of 'release' to match 'get' when obtaining
   the buffer
b) change the argument order to have the buffer as first argument
c) add a new argument telling the function if the message was
   transferred. This allows the function to be used also in cases
   where setting up DMA failed, so the buffer needs to be freed without
   syncing to the message buffer.

Also convert the only user.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-30 23:13:15 +02:00
Rob Herring
f42b0e18f2 of: add node name compare helper functions
In preparation to remove device_node.name pointer, add helper functions
for node name comparisons which are a common pattern throughout the kernel.

Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2018-08-30 13:53:05 -05:00
Marc Zyngier
755a8bf557 arm/arm64: smccc-1.1: Handle function result as parameters
If someone has the silly idea to write something along those lines:

	extern u64 foo(void);

	void bar(struct arm_smccc_res *res)
	{
		arm_smccc_1_1_smc(0xbad, foo(), res);
	}

they are in for a surprise, as this gets compiled as:

	0000000000000588 <bar>:
	 588:   a9be7bfd        stp     x29, x30, [sp, #-32]!
	 58c:   910003fd        mov     x29, sp
	 590:   f9000bf3        str     x19, [sp, #16]
	 594:   aa0003f3        mov     x19, x0
	 598:   aa1e03e0        mov     x0, x30
	 59c:   94000000        bl      0 <_mcount>
	 5a0:   94000000        bl      0 <foo>
	 5a4:   aa0003e1        mov     x1, x0
	 5a8:   d4000003        smc     #0x0
	 5ac:   b4000073        cbz     x19, 5b8 <bar+0x30>
	 5b0:   a9000660        stp     x0, x1, [x19]
	 5b4:   a9010e62        stp     x2, x3, [x19, #16]
	 5b8:   f9400bf3        ldr     x19, [sp, #16]
	 5bc:   a8c27bfd        ldp     x29, x30, [sp], #32
	 5c0:   d65f03c0        ret
	 5c4:   d503201f        nop

The call to foo "overwrites" the x0 register for the return value,
and we end up calling the wrong secure service.

A solution is to evaluate all the parameters before assigning
anything to specific registers, leading to the expected result:

	0000000000000588 <bar>:
	 588:   a9be7bfd        stp     x29, x30, [sp, #-32]!
	 58c:   910003fd        mov     x29, sp
	 590:   f9000bf3        str     x19, [sp, #16]
	 594:   aa0003f3        mov     x19, x0
	 598:   aa1e03e0        mov     x0, x30
	 59c:   94000000        bl      0 <_mcount>
	 5a0:   94000000        bl      0 <foo>
	 5a4:   aa0003e1        mov     x1, x0
	 5a8:   d28175a0        mov     x0, #0xbad
	 5ac:   d4000003        smc     #0x0
	 5b0:   b4000073        cbz     x19, 5bc <bar+0x34>
	 5b4:   a9000660        stp     x0, x1, [x19]
	 5b8:   a9010e62        stp     x2, x3, [x19, #16]
	 5bc:   f9400bf3        ldr     x19, [sp, #16]
	 5c0:   a8c27bfd        ldp     x29, x30, [sp], #32
	 5c4:   d65f03c0        ret

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-08-30 14:18:03 +01:00
Linus Torvalds
af3a5fe4dd Merge tag 'hwmon-for-linus-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:

 - Fix potential Spectre v1 in nct6775

 - Add error checking to adt7475 driver

 - Fix reading shunt resistor value in ina2xx driver

* tag 'hwmon-for-linus-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct6775) Fix potential Spectre v1
  hwmon: (adt7475) Make adt7475_read_word() return errors
  hwmon: (adt7475) Potential error pointer dereferences
  hwmon: (ina2xx) fix sysfs shunt resistor read access
2018-08-29 16:03:45 -07:00
Linus Torvalds
f3f106dac0 Merge tag 'for_v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc fs fixes from Jan Kara:

 - make UDF to properly mount media created by Win7

 - make isofs to properly refuse devices with large physical block size

 - fix a Spectre gadget in quotactl(2)

 - fix a warning in fsnotify code hit by syzkaller

* tag 'for_v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix mounting of Win7 created UDF filesystems
  udf: Remove dead code from udf_find_fileset()
  fs/quota: Fix spectre gadget in do_quotactl
  fs/quota: Replace XQM_MAXQUOTAS usage with MAXQUOTAS
  isofs: reject hardware sector size > 2048 bytes
  fsnotify: fix false positive warning on inode delete
2018-08-29 14:56:45 -07:00
Johan Hovold
36156f9241 of: add helper to lookup compatible child node
Add of_get_compatible_child() helper that can be used to lookup
compatible child nodes.

Several drivers currently use of_find_compatible_node() to lookup child
nodes while failing to notice that the of_find_ functions search the
entire tree depth-first (from a given start node) and therefore can
match unrelated nodes. The fact that these functions also drop a
reference to the node they start searching from (e.g. the parent node)
is typically also overlooked, something which can lead to use-after-free
bugs.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
2018-08-29 08:06:46 -05:00
Marc Zyngier
1d8f574708 arm/arm64: smccc-1.1: Make return values unsigned long
An unfortunate consequence of having a strong typing for the input
values to the SMC call is that it also affects the type of the
return values, limiting r0 to 32 bits and r{1,2,3} to whatever
was passed as an input.

Let's turn everything into "unsigned long", which satisfies the
requirements of both architectures, and allows for the full
range of return values.

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-08-29 11:42:20 +01:00
Stanislaw Gruszka
38cb87ee47 cfg80211: make wmm_rule part of the reg_rule structure
Make wmm_rule be part of the reg_rule structure. This simplifies the
code a lot at the cost of having bigger memory usage. However in most
cases we have only few reg_rule's and when we do have many like in
iwlwifi we do not save memory as it allocates a separate wmm_rule for
each channel anyway.

This also fixes a bug reported in various places where somewhere the
pointers were corrupted and we ended up doing a null-dereference.

Fixes: 230ebaa189 ("cfg80211: read wmm rules from regulatory database")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
[rephrase commit message slightly]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28 11:11:47 +02:00
Linus Torvalds
050cdc6c95 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks.

 2) Better fix for AB-BA deadlock in packet scheduler code, from Cong
    Wang.

 3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel
    Borkmann.

 4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent
    attackers using it as a side-channel. From Eric Dumazet.

 5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva.

 6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin
    Liu.

 7) hns and hns3 bug fixes from Huazhong Tan.

 8) Fix RIF leak in mlxsw driver, from Ido Schimmel.

 9) iova range check fix in vhost, from Jason Wang.

10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend.

11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng.

12) Memory exposure in TCA_U32_SEL handling, from Kees Cook.

13) TCP BBR congestion control fixes from Kevin Yang.

14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger.

15) qed driver fixes from Tomer Tayar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  net: sched: Fix memory exposure from short TCA_U32_SEL
  qed: fix spelling mistake "comparsion" -> "comparison"
  vhost: correctly check the iova range when waking virtqueue
  qlge: Fix netdev features configuration.
  net: macb: do not disable MDIO bus at open/close time
  Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency"
  net: macb: Fix regression breaking non-MDIO fixed-link PHYs
  mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
  i40e: fix condition of WARN_ONCE for stat strings
  i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
  ixgbe: fix driver behaviour after issuing VFLR
  ixgbe: Prevent unsupported configurations with XDP
  ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
  igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
  igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
  igb: Use an advanced ctx descriptor for launchtime
  e1000: ensure to free old tx/rx rings in set_ringparam()
  e1000: check on netif_running() before calling e1000_up()
  ixgb: use dma_zalloc_coherent instead of allocator/memset
  ice: Trivial formatting fixes
  ...
2018-08-27 11:59:39 -07:00
Lothar Felten
3ad867001c hwmon: (ina2xx) fix sysfs shunt resistor read access
fix the sysfs shunt resistor read access: return the shunt resistor
value, not the calibration register contents.

update email address

Signed-off-by: Lothar Felten <lothar.felten@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-08-26 17:45:25 -07:00
Linus Torvalds
b933d6ebf2 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer update from Thomas Gleixner:
 "New defines for the compat time* types so they can be shared between
  32bit and 64bit builds. Not used yet, but merging them now allows the
  actual conversions to be merged through different maintainer trees
  without dependencies

  We still have compat interfaces for 32bit on 64bit even with the new
  2038 safe timespec/val variants because pointer size is different. And
  for the old style timespec/val interfaces we need yet another 'compat'
  interface for both 32bit native and 32bit on 64bit"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  y2038: Provide aliases for compat helpers
2018-08-26 13:39:05 -07:00
Linus Torvalds
aba16dc5cf Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax
Pull IDA updates from Matthew Wilcox:
 "A better IDA API:

      id = ida_alloc(ida, GFP_xxx);
      ida_free(ida, id);

  rather than the cumbersome ida_simple_get(), ida_simple_remove().

  The new IDA API is similar to ida_simple_get() but better named.  The
  internal restructuring of the IDA code removes the bitmap
  preallocation nonsense.

  I hope the net -200 lines of code is convincing"

* 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits)
  ida: Change ida_get_new_above to return the id
  ida: Remove old API
  test_ida: check_ida_destroy and check_ida_alloc
  test_ida: Convert check_ida_conv to new API
  test_ida: Move ida_check_max
  test_ida: Move ida_check_leaf
  idr-test: Convert ida_check_nomem to new API
  ida: Start new test_ida module
  target/iscsi: Allocate session IDs from an IDA
  iscsi target: fix session creation failure handling
  drm/vmwgfx: Convert to new IDA API
  dmaengine: Convert to new IDA API
  ppc: Convert vas ID allocation to new IDA API
  media: Convert entity ID allocation to new IDA API
  ppc: Convert mmu context allocation to new IDA API
  Convert net_namespace to new IDA API
  cb710: Convert to new IDA API
  rsxx: Convert to new IDA API
  osd: Convert to new IDA API
  sd: Convert to new IDA API
  ...
2018-08-26 11:48:42 -07:00
Linus Torvalds
d207ea8e74 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
 "Kernel:
   - Improve kallsyms coverage
   - Add x86 entry trampolines to kcore
   - Fix ARM SPE handling
   - Correct PPC event post processing

  Tools:
   - Make the build system more robust
   - Small fixes and enhancements all over the place
   - Update kernel ABI header copies
   - Preparatory work for converting libtraceevnt to a shared library
   - License cleanups"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
  tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
  tools arch x86: Update tools's copy of cpufeatures.h
  perf python: Fix pyrf_evlist__read_on_cpu() interface
  perf mmap: Store real cpu number in 'struct perf_mmap'
  perf tools: Remove ext from struct kmod_path
  perf tools: Add gzip_is_compressed function
  perf tools: Add lzma_is_compressed function
  perf tools: Add is_compressed callback to compressions array
  perf tools: Move the temp file processing into decompress_kmodule
  perf tools: Use compression id in decompress_kmodule()
  perf tools: Store compression id into struct dso
  perf tools: Add compression id into 'struct kmod_path'
  perf tools: Make is_supported_compression() static
  perf tools: Make decompress_to_file() function static
  perf tools: Get rid of dso__needs_decompress() call in __open_dso()
  perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble()
  perf tools: Get rid of dso__needs_decompress() call in read_object_code()
  tools lib traceevent: Change to SPDX License format
  perf llvm: Allow passing options to llc in addition to clang
  perf parser: Improve error message for PMU address filters
  ...
2018-08-26 11:25:21 -07:00
Linus Torvalds
2923b27e54 Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm memory-failure update from Dave Jiang:
 "As it stands, memory_failure() gets thoroughly confused by dev_pagemap
  backed mappings. The recovery code has specific enabling for several
  possible page states and needs new enabling to handle poison in dax
  mappings.

  In order to support reliable reverse mapping of user space addresses:

   1/ Add new locking in the memory_failure() rmap path to prevent races
      that would typically be handled by the page lock.

   2/ Since dev_pagemap pages are hidden from the page allocator and the
      "compound page" accounting machinery, add a mechanism to determine
      the size of the mapping that encompasses a given poisoned pfn.

   3/ Given pmem errors can be repaired, change the speculatively
      accessed poison protection, mce_unmap_kpfn(), to be reversible and
      otherwise allow ongoing access from the kernel.

  A side effect of this enabling is that MADV_HWPOISON becomes usable
  for dax mappings, however the primary motivation is to allow the
  system to survive userspace consumption of hardware-poison via dax.
  Specifically the current behavior is:

     mce: Uncorrected hardware memory error in user-access at af34214200
     {1}[Hardware Error]: It has been corrected by h/w and requires no further action
     mce: [Hardware Error]: Machine check events logged
     {1}[Hardware Error]: event severity: corrected
     Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
     [..]
     Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
     mce: Memory error not recovered
     <reboot>

  ...and with these changes:

     Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000
     Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption
     Memory failure: 0x20cb00: recovery action for dax page: Recovered

  Given all the cross dependencies I propose taking this through
  nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax
  folks"

* tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, pmem: Restore page attributes when clearing errors
  x86/memory_failure: Introduce {set, clear}_mce_nospec()
  x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses
  mm, memory_failure: Teach memory_failure() about dev_pagemap pages
  filesystem-dax: Introduce dax_lock_mapping_entry()
  mm, memory_failure: Collect mapping size in collect_procs()
  mm, madvise_inject_error: Let memory_failure() optionally take a page reference
  mm, dev_pagemap: Do not clear ->mapping on final put
  mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages
  filesystem-dax: Set page->index
  device-dax: Set page->index
  device-dax: Enable page_mapping()
  device-dax: Convert to vmf_insert_mixed and vm_fault_t
2018-08-25 18:43:59 -07:00
Linus Torvalds
b326272010 Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC late updates from Olof Johansson:
 "A couple of late-merged changes that would be useful to get in this
  merge window:

   - Driver support for reset of audio complex on Meson platforms. The
     audio driver went in this merge window, and these changes have been
     in -next for a while (just not in our tree).

   - Power management fixes for IOMMU on Rockchip platforms, getting
     closer to kexec working on them, including Chromebooks.

   - Another pass updating "arm,psci" -> "psci" for some properties that
     have snuck in since last time it was done"

* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  iommu/rockchip: Move irq request past pm_runtime_enable
  iommu/rockchip: Handle errors returned from PM framework
  arm64: rockchip: Force CONFIG_PM on Rockchip systems
  ARM: rockchip: Force CONFIG_PM on Rockchip systems
  arm64: dts: Fix various entry-method properties to reflect documentation
  reset: imx7: Fix always writing bits as 0
  reset: meson: add meson audio arb driver
  reset: meson: add dt-bindings for meson-axg audio arb
2018-08-25 14:12:36 -07:00