Commit Graph

855367 Commits

Author SHA1 Message Date
Linus Torvalds
a29a0a467e Merge branch 'access-creds'
The access() (and faccessat()) credentials change can cause an
unnecessary load on the RCU machinery because every access() call ends
up freeing the temporary access credential using RCU.

This isn't really noticeable on small machines, but if you have hundreds
of cores you can cause huge slowdowns due to RCU storms.

It's easy to avoid: the temporary access crededntials aren't actually
normally accessed using RCU at all, so we can avoid the whole issue by
just marking them as such.

* access-creds:
  access: avoid the RCU grace period for the temporary subjective credentials
2019-07-25 08:36:29 -07:00
Linus Torvalds
d7852fbd0f access: avoid the RCU grace period for the temporary subjective credentials
It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.

The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.

Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.

But it turns out that all of this is entirely unnecessary.  Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all.  Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.

So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this).  We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.

Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards.  It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.

It is possible that we should just remove the whole RCU markings for
->cred entirely.  Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.

But this is a "minimal semantic changes" change for the immediate
problem.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-24 10:12:09 -07:00
Linus Torvalds
bed38c3e2d Merge tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "An assortment of non-regression fixes that have accumulated since the
  start of the merge window.

   - A fix for a user triggerable oops on machines where transactional
     memory is disabled, eg. Power9 bare metal, Power8 with TM disabled
     on the command line, or all Power7 or earlier machines.

   - Three fixes for handling of PMU and power saving registers when
     running nested KVM on Power9.

   - Two fixes for bugs found while stress testing the XIVE interrupt
     controller code, also on Power9.

   - A fix to allow guests to boot under Qemu/KVM on Power9 using the
     the Hash MMU with >= 1TB of memory.

   - Two fixes for bugs in the recent DMA cleanup, one of which could
     lead to checkstops.

   - And finally three fixes for the PAPR SCM nvdimm driver.

  Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater,
  Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling,
  Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar
  Singh, Vaibhav Jain"

* tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
  powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
  powerpc/pseries: Update SCM hcall op-codes in hvcall.h
  powerpc/tm: Fix oops on sigreturn on systems without TM
  powerpc/dma: Fix invalid DMA mmap behavior
  KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails
  powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
  powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA
  KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries
  powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR
  KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
  powerpc/mm: Limit rma_size to 1TB when running without HV mode
2019-07-24 09:58:39 -07:00
Linus Torvalds
7626077457 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "Bugfixes, a pvspinlock optimization, and documentation moving"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
  Documentation: move Documentation/virtual to Documentation/virt
  KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
  KVM: X86: Dynamically allocate user_fpu
  KVM: X86: Fix fpu state crash in kvm guest
  Revert "kvm: x86: Use task structs fpu field for user"
  KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
2019-07-24 09:46:13 -07:00
Linus Torvalds
c2626876c2 Merge tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping regression fix from Christoph Hellwig:
 "Ensure that dma_addressing_limited doesn't crash on devices without a
  dma mask (Eric Auger)"

* tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: use dma_get_mask in dma_addressing_limited
2019-07-24 09:28:55 -07:00
Wanpeng Li
266e85a5ec KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
Commit 11752adb (locking/pvqspinlock: Implement hybrid PV queued/unfair locks)
introduces hybrid PV queued/unfair locks
 - queued mode (no starvation)
 - unfair mode (good performance on not heavily contended lock)
The lock waiter goes into the unfair mode especially in VMs with over-commit
vCPUs since increaing over-commitment increase the likehood that the queue
head vCPU may have been preempted and not actively spinning.

However, reschedule queue head vCPU timely to acquire the lock still can get
better performance than just depending on lock stealing in over-subscribe
scenario.

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M
             vanilla     boosting    improved
 1VM          23520        25040         6%
 2VM           8000        13600        70%
 3VM           3100         5400        74%

The lock holder vCPU yields to the queue head vCPU when unlock, to boost queue
head vCPU which is involuntary preemption or the one which is voluntary halt
due to fail to acquire the lock after a short spin in the guest.

Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-24 13:56:53 +02:00
Christoph Hellwig
2f5947dfca Documentation: move Documentation/virtual to Documentation/virt
Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories.  We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory.  Fix up the documentation
to match that.

Fixes: ed16648eb5 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-24 10:52:11 +02:00
Linus Torvalds
ad5e427e0f Merge branch 'parisc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:

 - Fix build issues when kprobes are enabled

 - Speed up ITLB/DTLB cache flushes when running on machines with
   combined TLBs

* 'parisc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Flush ITLB in flush_tlb_all_local() only on split TLB machines
  parisc: add kprobe_fault_handler()
2019-07-23 15:34:59 -07:00
Eric Auger
0653275001 dma-mapping: use dma_get_mask in dma_addressing_limited
We currently have cases where the dma_addressing_limited() gets
called with dma_mask unset. This causes a NULL pointer dereference.

Use dma_get_mask() accessor to prevent the crash.

Fixes: b866455423 ("dma-mapping: add a dma_addressing_limited helper")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-07-23 17:43:58 +02:00
Linus Torvalds
7b5cf701ea Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull preemption Kconfig fix from Thomas Gleixner:
 "The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
  PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
  and PREEMPT_RT.

  Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
  obviously can't work anymore. oldconfig builds are affected as well,
  but it's more obvious as the user gets asked. [old]defconfig silently
  fixes it up and selects PREEMPT_NONE.

  Unbreak it by undoing the rename and adding a intermediate config
  symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
  to chase down a few #ifdefs, but it's better than tweaking 114
  defconfigs and annoying users"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
2019-07-22 09:30:34 -07:00
Linus Torvalds
44b912cd0b Merge tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd polling fix from Christian Brauner:
 "A fix for pidfd polling. It ensures that the task's exit state is
  visible to all waiters"

* tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  pidfd: fix a poll race when setting exit_state
2019-07-22 09:14:19 -07:00
Linus Torvalds
21c730d734 Merge tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fixes for leaks caused by recently merged patches

 - one build fix

 - a fix to prevent mixing of incompatible features

* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't leak extent_map in btrfs_get_io_geometry()
  btrfs: free checksum hash on in close_ctree
  btrfs: Fix build error while LIBCRC32C is module
  btrfs: inode: Don't compress if NODATASUM or NODATACOW set
2019-07-22 09:08:38 -07:00
Thomas Gleixner
b8d3349803 sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
the preemption mode choice wich defaults to NONE. This also affects
oldconfig builds.

So rather than changing 114 defconfig files and being an annoyance to
users, revert the rename and select a new config symbol PREEMPTION. That
keeps everything working smoothly and the revelant ifdef's are going to be
fixed up step by step.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: a50a3f4b6a ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-22 18:05:11 +02:00
Linus Torvalds
c92f038067 Merge tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "For two regressions in media core:

   - v4l2-subdev: fix regression in check_pad()

   - videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
     in use"

* tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
  media: v4l2-subdev: fix regression in check_pad()
2019-07-22 09:01:47 -07:00
Linus Torvalds
83768245a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Several netfilter fixes including a nfnetlink deadlock fix from
    Florian Westphal and fix for dropping VRF packets from Miaohe Lin.

 2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
    proper block sharing.

 3) Fix r8169 PHY init from Thomas Voegtle.

 4) Fix memory leak in mac80211, from Lorenzo Bianconi.

 5) Missing NULL check on object allocation in cxgb4, from Navid
    Emamdoost.

 6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.

 7) Check that there is actually an ip header to access in skb->data in
    VRF, from Peter Kosyh.

 8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.

 9) One more tweak the the TCP fragmentation memory limit changes, to be
    less harmful to applications setting small SO_SNDBUF values. From
    Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
  tcp: be more careful in tcp_fragment()
  hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
  vrf: make sure skb->data contains ip header to make routing
  connector: remove redundant input callback from cn_dev
  qed: Prefer pcie_capability_read_word()
  igc: Prefer pcie_capability_read_word()
  cxgb4: Prefer pcie_capability_read_word()
  be2net: Synchronize be_update_queues with dev_watchdog
  bnx2x: Prevent load reordering in tx completion processing
  net: phy: sfp: hwmon: Fix scaling of RX power
  net: sched: verify that q!=NULL before setting q->flags
  chelsio: Fix a typo in a function name
  allocate_flower_entry: should check for null deref
  net: hns3: typo in the name of a constant
  kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
  tipc: Fix a typo
  mac80211: don't warn about CW params when not using them
  mac80211: fix possible memory leak in ieee80211_assign_beacon
  nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
  nl80211: fix VENDOR_CMD_RAW_DATA
  ...
2019-07-22 08:49:22 -07:00
Suren Baghdasaryan
b191d6491b pidfd: fix a poll race when setting exit_state
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
  tsk->exit_state = EXIT_DEAD
                                  pidfd_poll
                                     if (tsk->exit_state)

However nothing prevents the following sequence:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
                                   pidfd_poll
                                      if (tsk->exit_state)
  tsk->exit_state = EXIT_DEAD

This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.

To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.

Fixes: b53b0b9d9a ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-22 16:02:03 +02:00
Vaibhav Jain
3a855b7ac7 powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
In some cases initial bind of scm memory for an lpar can fail if
previously it wasn't released using a scm-unbind hcall. This situation
can arise due to panic of the previous kernel or forced lpar
fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.

To mitigate such cases the patch updates papr_scm_probe() to force a
call to drc_pmem_unbind() in case the initial bind of scm memory fails
with EBUSY error. In case scm-bind operation again fails after the
forced scm-unbind then we follow the existing error path. We also
update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
and indicate it as a EBUSY error back to the caller.

Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
2019-07-22 23:31:00 +10:00
Vaibhav Jain
0d7fc080ba powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
The new hcall named H_SCM_UNBIND_ALL has been introduce that can
unbind all or specific scm memory assigned to an lpar. This is
more efficient than using H_SCM_UNBIND_MEM as currently we don't
support partial unbind of scm memory.

Hence this patch proposes following changes to drc_pmem_unbind():

    * Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to
      H_SCM_UNBIND_ALL.

    * Update drc_pmem_unbind() to handles cases when PHYP asks the guest
      kernel to wait for specific amount of time before retrying the
      hcall via the 'LONG_BUSY' return value.

    * Ensure appropriate error code is returned back from the function
      in case of an error.

Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
2019-07-22 23:31:00 +10:00
Vaibhav Jain
6d140e7569 powerpc/pseries: Update SCM hcall op-codes in hvcall.h
Update the hvcalls.h to include op-codes for new hcalls introduce to
manage SCM memory. Also update existing hcall definitions to reflect
current papr specification for SCM.

The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
transient proposals and there support was never implemented by
Power-VM nor they were used anywhere in Linux kernel. Hence we don't
expect anyone to be impacted by this change.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
2019-07-22 23:31:00 +10:00
Jan Kiszka
c6bf2ae931 KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
Shall help finding use-after-free bugs earlier.

Suggested-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:49 +02:00
Wanpeng Li
d9a710e5fc KVM: X86: Dynamically allocate user_fpu
After reverting commit 240c35a378 (kvm: x86: Use task structs fpu field
for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3)
is the order at which allocations are deemed costly to service. In serveless
scenario, one host can service hundreds/thoudands firecracker/kata-container
instances, howerver, new instance will fail to launch after memory is too
fragmented to allocate kvm_vcpu struct on host, this was observed in some
cloud provider product environments.

This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my
Skylake server.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:48 +02:00
Wanpeng Li
e751732486 KVM: X86: Fix fpu state crash in kvm guest
The idea before commit 240c35a37 (which has just been reverted)
was that we have the following FPU states:

               userspace (QEMU)             guest
---------------------------------------------------------------------------
               processor                    vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
               vcpu->arch.user_fpu          processor
>>> preempt out
               vcpu->arch.user_fpu          current->thread.fpu
>>> preempt in
               vcpu->arch.user_fpu          processor
>>> back to userspace
>>> kvm_put_guest_fpu
               processor                    vcpu->arch.guest_fpu
---------------------------------------------------------------------------

With the new lazy model we want to get the state back to the processor
when schedule in from current->thread.fpu.

Reported-by: Thomas Lambertz <mail@thomaslambertz.de>
Reported-by: anthony <antdev66@gmail.com>
Tested-by: anthony <antdev66@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Lambertz <mail@thomaslambertz.de>
Cc: anthony <antdev66@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add a comment in front of the warning. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:47 +02:00
Paolo Bonzini
ec269475cb Revert "kvm: x86: Use task structs fpu field for user"
This reverts commit 240c35a378
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.

Fixes: 240c35a378 ("kvm: x86: Use task structs fpu field for user")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:47 +02:00
Jan Kiszka
cf64527bb3 KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Letting this pend may cause nested_get_vmcs12_pages to run against an
invalid state, corrupting the effective vmcs of L1.

This was triggerable in QEMU after a guest corruption in L2, followed by
a L1 reset.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 7f7f1ba33c ("KVM: x86: do not load vmcs12 pages while still in SMM")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:46 +02:00
Eric Dumazet
b617158dc0 tcp: be more careful in tcp_fragment()
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.

We should allow these flows to make progress.

This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.

It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15

Note for < 4.15 backports :
 tcp_rtx_queue_tail() will probably look like :

static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
	struct sk_buff *skb = tcp_send_head(sk);

	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}

Fixes: f070ef2ac6 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 20:41:24 -07:00
Haiyang Zhang
be4363bdf0 hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
There is an extra rcu_read_unlock left in netvsc_recv_callback(),
after a previous patch that removes RCU from this function.
This patch removes the extra RCU unlock.

Fixes: 345ac08990 ("hv_netvsc: pass netvsc_device to receive callback")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 20:40:28 -07:00
Michael Neuling
f16d80b75a powerpc/tm: Fix oops on sigreturn on systems without TM
On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:

  Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69
  NIP:  c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
  REGS: c00000003fffbd70 TRAP: 0700   Not tainted  (5.2.0-11045-g7142b497d8)
  MSR:  8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]>  CR: 42004242  XER: 00000000
  CFAR: c0000000000022e0 IRQMASK: 0
  GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
  GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
  GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
  GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
  GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
  GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
  NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
  LR [00007fffb2d67e48] 0x7fffb2d67e48
  Call Trace:
  Instruction dump:
  e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
  e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18

The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
with P9 powernv or if `ppc_tm=off` is used on P8.

This means any local user can crash the system.

Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.

Found with sigfuz kernel selftest on P9.

This fixes CVE-2019-13648.

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
2019-07-22 13:05:23 +10:00
Linus Torvalds
5f9e832c13 Linus 5.3-rc1 v5.3-rc1 2019-07-21 14:05:38 -07:00
Peter Kosyh
107e47cc80 vrf: make sure skb->data contains ip header to make routing
vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available
in skb->data[] (skb_headlen() is less then header size).

Case:

1) igb driver from intel.
2) Packet size is greater then 255.
3) MPLS forwards to VRF device.

So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
functions.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 13:32:51 -07:00
Vasily Averin
903e9d1bff connector: remove redundant input callback from cn_dev
A small cleanup: this callback is never used.
Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
for OpenVZ7 bug OVZ-6877

cc: stanislav.kinsburskiy@gmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 13:31:14 -07:00
Frederick Lawler
93428c5826 qed: Prefer pcie_capability_read_word()
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.

Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 13:29:47 -07:00
Frederick Lawler
a16f6d3a15 igc: Prefer pcie_capability_read_word()
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.

Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 13:29:47 -07:00
Frederick Lawler
6133b9204c cxgb4: Prefer pcie_capability_read_word()
Commit 8c0d3a02c1 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.

Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 13:29:47 -07:00
Benjamin Poirier
ffd342e087 be2net: Synchronize be_update_queues with dev_watchdog
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
ethtool set_channels operation is started. be_tx_timeout(), which dumps
some queue structures, is not written to run concurrently with
be_update_queues(), which frees/allocates those queues structures. Add some
synchronization between the two.

Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 13:22:03 -07:00
Brian King
ea811b795d bnx2x: Prevent load reordering in tx completion processing
This patch fixes an issue seen on Power systems with bnx2x which results
in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 12:29:25 -07:00
Andrew Lunn
0cea0e1148 net: phy: sfp: hwmon: Fix scaling of RX power
The RX power read from the SFP uses units of 0.1uW. This must be
scaled to units of uW for HWMON. This requires a divide by 10, not the
current 100.

With this change in place, sensors(1) and ethtool -m agree:

sff2-isa-0000
Adapter: ISA adapter
in0:          +3.23 V
temp1:        +33.1 C
power1:      270.00 uW
power2:      200.00 uW
curr1:        +0.01 A

        Laser output power                        : 0.2743 mW / -5.62 dBm
        Receiver signal average optical power     : 0.2014 mW / -6.96 dBm

Reported-by: chris.healy@zii.aero
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 1323061a01 ("net: phy: sfp: Add HWMON support for module sensors")
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:51:50 -07:00
Vlad Buslov
503d81d428 net: sched: verify that q!=NULL before setting q->flags
In function int tc_new_tfilter() q pointer can be NULL when adding filter
on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
filter creation, following NULL pointer dereference happens in case parent
block is shared:

[  212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  212.925445] #PF: supervisor write access in kernel mode
[  212.925709] #PF: error_code(0x0002) - not-present page
[  212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0
[  212.926302] Oops: 0002 [#1] SMP KASAN PTI
[  212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G    B             5.2.0+ #512
[  212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
[  212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
[  212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296
[  212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
[  212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297
[  212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d
[  212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040
[  212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680
[  212.929999] FS:  00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000
[  212.930235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0
[  212.930588] Call Trace:
[  212.930682]  ? tc_del_tfilter+0xa40/0xa40
[  212.930811]  ? __lock_acquire+0x5b5/0x2460
[  212.930948]  ? find_held_lock+0x85/0xa0
[  212.931081]  ? tc_del_tfilter+0xa40/0xa40
[  212.931201]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  212.931332]  ? rtnl_dellink+0x490/0x490
[  212.931454]  ? lockdep_hardirqs_on+0x260/0x260
[  212.931589]  ? netlink_deliver_tap+0xab/0x5a0
[  212.931717]  ? match_held_lock+0x1b/0x240
[  212.931844]  netlink_rcv_skb+0xd0/0x200
[  212.931958]  ? rtnl_dellink+0x490/0x490
[  212.932079]  ? netlink_ack+0x440/0x440
[  212.932205]  ? netlink_deliver_tap+0x161/0x5a0
[  212.932335]  ? lock_downgrade+0x360/0x360
[  212.932457]  ? lock_acquire+0xe5/0x210
[  212.932579]  netlink_unicast+0x296/0x350
[  212.932705]  ? netlink_attachskb+0x390/0x390
[  212.932834]  ? _copy_from_iter_full+0xe0/0x3a0
[  212.932976]  netlink_sendmsg+0x394/0x600
[  212.937998]  ? netlink_unicast+0x350/0x350
[  212.943033]  ? move_addr_to_kernel.part.0+0x90/0x90
[  212.948115]  ? netlink_unicast+0x350/0x350
[  212.953185]  sock_sendmsg+0x96/0xa0
[  212.958099]  ___sys_sendmsg+0x482/0x520
[  212.962881]  ? match_held_lock+0x1b/0x240
[  212.967618]  ? copy_msghdr_from_user+0x250/0x250
[  212.972337]  ? lock_downgrade+0x360/0x360
[  212.976973]  ? rwlock_bug.part.0+0x60/0x60
[  212.981548]  ? __mod_node_page_state+0x1f/0xa0
[  212.986060]  ? match_held_lock+0x1b/0x240
[  212.990567]  ? find_held_lock+0x85/0xa0
[  212.994989]  ? do_user_addr_fault+0x349/0x5b0
[  212.999387]  ? lock_downgrade+0x360/0x360
[  213.003713]  ? find_held_lock+0x85/0xa0
[  213.007972]  ? __fget_light+0xa1/0xf0
[  213.012143]  ? sockfd_lookup_light+0x91/0xb0
[  213.016165]  __sys_sendmsg+0xba/0x130
[  213.020040]  ? __sys_sendmsg_sock+0xb0/0xb0
[  213.023870]  ? handle_mm_fault+0x337/0x470
[  213.027592]  ? page_fault+0x8/0x30
[  213.031316]  ? lockdep_hardirqs_off+0xbe/0x100
[  213.034999]  ? mark_held_locks+0x24/0x90
[  213.038671]  ? do_syscall_64+0x1e/0xe0
[  213.042297]  do_syscall_64+0x74/0xe0
[  213.045828]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  213.049354] RIP: 0033:0x7fe7c527c7b8
[  213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
[  213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8
[  213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003
[  213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0
[  213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080
[  213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000
[  213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
[<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
_ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[  213.112326] CR2: 0000000000000010
[  213.117429] ---[ end trace adb58eb0a4ee6283 ]---

Verify that q pointer is not NULL before setting the 'flags' field.

Fixes: 3f05e6886a ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:49:53 -07:00
Christophe JAILLET
85d9bf9795 chelsio: Fix a typo in a function name
It is likely that 'my3216_poll()' should be 'my3126_poll()'. (1 and 2
switched in 3126.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:48:25 -07:00
Navid Emamdoost
bb1320834b allocate_flower_entry: should check for null deref
allocate_flower_entry does not check for allocation success, but tries
to deref the result. I only moved the spin_lock under null check, because
 the caller is checking allocation's status at line 652.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:47:30 -07:00
Christophe JAILLET
4803d01001 net: hns3: typo in the name of a constant
All constant in 'enum HCLGE_MBX_OPCODE' start with HCLGE, except
'HLCGE_MBX_PUSH_VLAN_INFO' (C and L switched)

s/HLC/HCL/

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:46:31 -07:00
Jeremy Sowden
408d2bbbfd kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
net/netfilter/nf_tables_offload.h includes net/netfilter/nf_tables.h
which is itself on the blacklist.

Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:43:43 -07:00
Christophe JAILLET
bad7f869d8 tipc: Fix a typo
s/tipc_toprsv_listener_data_ready/tipc_topsrv_listener_data_ready/
(r and s switched in topsrv)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:41:01 -07:00
David S. Miller
953ba0a638 Merge tag 'mac80211-for-davem-2019-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
We have a handful of fixes:
 * ignore bad CW parameters if we aren't using them,
   instead of warning
 * fix operation (and then build) with the new netlink vendor
   command policy requirement
 * fix a memory leak in an error path when setting beacons
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21 11:39:05 -07:00
Linus Torvalds
c7bf0a0f37 Merge tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree fixes from Rob Herring:
 "Fix several warnings/errors in validation of binding schemas"

* tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples
  dt-bindings: iio: ad7124: Fix dtc warnings in example
  dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example
  dt-bindings: pinctrl: aspeed: Fix AST2500 example errors
  dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors
  dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes
  dt-bindings: Ensure child nodes are of type 'object'
2019-07-21 10:28:39 -07:00
Linus Torvalds
d6788eb7d0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs documentation typo fix from Al Viro.

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  typo fix: it's d_make_root, not d_make_inode...
2019-07-21 10:09:43 -07:00
Linus Torvalds
91962d0f79 Merge tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Two fixes for stable, one that had dependency on earlier patch in this
  merge window and can now go in, and a perf improvement in SMB3 open"

* tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module number
  cifs: flush before set-info if we have writeable handles
  smb3: optimize open to not send query file internal info
  cifs: copy_file_range needs to strip setuid bits and update timestamps
  CIFS: fix deadlock in cached root handling
2019-07-21 10:01:17 -07:00
Qian Cai
8cf6650421 iommu/amd: fix a crash in iova_magazine_free_pfns
The commit b3aa14f022 ("iommu: remove the mapping_error dma_map_ops
method") incorrectly changed the checking from dma_ops_alloc_iova() in
map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
never return DMA_MAPPING_ERROR on failure but 0, so the error handling
is all wrong.

   kernel BUG at drivers/iommu/iova.c:801!
    Workqueue: kblockd blk_mq_run_work_fn
    RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
    Call Trace:
     free_cpu_cached_iovas+0xbd/0x150
     alloc_iova_fast+0x8c/0xba
     dma_ops_alloc_iova.isra.6+0x65/0xa0
     map_sg+0x8c/0x2a0
     scsi_dma_map+0xc6/0x160
     pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
     pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
     scsi_queue_rq+0x79c/0x1200
     blk_mq_dispatch_rq_list+0x4dc/0xb70
     blk_mq_sched_dispatch_requests+0x249/0x310
     __blk_mq_run_hw_queue+0x128/0x200
     blk_mq_run_work_fn+0x27/0x30
     process_one_work+0x522/0xa10
     worker_thread+0x63/0x5b0
     kthread+0x1d2/0x1f0
     ret_from_fork+0x22/0x40

Fixes: b3aa14f022 ("iommu: remove the mapping_error dma_map_ops method")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-21 09:57:13 -07:00
Mike Rapoport
618381f09c hexagon: switch to generic version of pte allocation
The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(),
pte_free_kernel() and pte_free() is identical to the generic except of
lack of __GFP_ACCOUNT for the user PTEs allocation.

Switch hexagon to use generic version of these functions.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-21 09:53:00 -07:00
Linus Torvalds
bec5545ede Merge tag 'ntb-5.3' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
 "New feature to add support for NTB virtual MSI interrupts, the ability
  to test and use this feature in the NTB transport layer.

  Also, bug fixes for the AMD and Switchtec drivers, as well as some
  general patches"

* tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
  NTB: Describe the ntb_msi_test client in the documentation.
  NTB: Add MSI interrupt support to ntb_transport
  NTB: Add ntb_msi_test support to ntb_test
  NTB: Introduce NTB MSI Test Client
  NTB: Introduce MSI library
  NTB: Rename ntb.c to support multiple source files in the module
  NTB: Introduce functions to calculate multi-port resource index
  NTB: Introduce helper functions to calculate logical port number
  PCI/switchtec: Add module parameter to request more interrupts
  PCI/MSI: Support allocating virtual MSI interrupts
  ntb_hw_switchtec: Fix setup MW with failure bug
  ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
  ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
  NTB: correct ntb_dev_ops and ntb_dev comment typos
  NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
  ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
  NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
  NTB: ntb_hw_amd: set peer limit register
  NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
  NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
  ...
2019-07-21 09:46:59 -07:00
Helge Deller
69245c9756 parisc: Flush ITLB in flush_tlb_all_local() only on split TLB machines
flush_tlb_all_local() flushes the ITLB and DTLB of the CPU.
In case the machine does not have separate ITLBs and DTLBs, use the
alternative functionality to replace the code which flushes the ITLB
with nops while keeping the code which flushes the DTLB.

Signed-off-by: Helge Deller <deller@gmx.de>
2019-07-21 11:03:02 +02:00