Commit Graph

782758 Commits

Author SHA1 Message Date
Paolo Bonzini
4cebf459b6 Merge tag 'kvmarm-fixes-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm fixes for 4.19, take #2

- Correctly order GICv3 SGI registers in the cp15 array
2018-10-09 18:49:32 +02:00
Paolo Bonzini
853c110982 KVM: x86: support CONFIG_KVM_AMD=y with CONFIG_CRYPTO_DEV_CCP_DD=m
SEV requires access to the AMD cryptographic device APIs, and this
does not work when KVM is builtin and the crypto driver is a module.
Actually the Kconfig conditions for CONFIG_KVM_AMD_SEV try to disable
SEV in that case, but it does not work because the actual crypto
calls are not culled, only sev_hardware_setup() is.

This patch adds two CONFIG_KVM_AMD_SEV checks that gate all the remaining
SEV code; it fixes this particular configuration, and drops 5 KiB of
code when CONFIG_KVM_AMD_SEV=n.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-09 18:38:42 +02:00
Marc Zyngier
ec876f4b25 ARM: KVM: Correctly order SGI register entries in the cp15 array
The ICC_ASGI1R and ICC_SGI0R register entries in the cp15 array
are not correctly ordered, leading to a BUG() at boot time.

Move them to their natural location.

Fixes: 3e8a8a50c7 ("KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses")
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-09 11:55:47 +01:00
Paolo Bonzini
cc906f07d7 Merge tag 'kvm-ppc-fixes-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
Third set of PPC KVM fixes for 4.19

One patch here, fixing a potential host crash introduced (or at least
exacerbated) by a previous fix for corruption relating to radix guest
page faults and THP operations.
2018-10-05 09:39:53 +02:00
Paolo Bonzini
7e7126846c kvm: nVMX: fix entry with pending interrupt if APICv is enabled
Commit b5861e5cf2 introduced a check on
the interrupt-window and NMI-window CPU execution controls in order to
inject an external interrupt vmexit before the first guest instruction
executes.  However, when APIC virtualization is enabled the host does not
need a vmexit in order to inject an interrupt at the next interrupt window;
instead, it just places the interrupt vector in RVI and the processor will
inject it as soon as possible.  Therefore, on machines with APICv it is
not enough to check the CPU execution controls: the same scenario can also
happen if RVI>vPPR.

Fixes: b5861e5cf2
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 17:10:40 +02:00
Paolo Bonzini
2cf7ea9f40 KVM: VMX: hide flexpriority from guest when disabled at the module level
As of commit 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls
have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when
a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0,
whereas previously KVM would allow a nested guest to enable
VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware.  That is,
KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't
(always) allow setting it when kvm-intel.flexpriority=0, and may even
initially allow the control and then clear it when the nested guest
writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause
functional issues.

Hide the control completely when the module parameter is cleared.

reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 13:40:44 +02:00
Sean Christopherson
fd6b6d9b82 KVM: VMX: check for existence of secondary exec controls before accessing
Return early from vmx_set_virtual_apic_mode() if the processor doesn't
support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of
which reside in SECONDARY_VM_EXEC_CONTROL.  This eliminates warnings
due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing
on processors without secondary exec controls.

Remove the similar check for TPR shadowing as it is incorporated in the
flexpriority_enabled check and the APIC-related code in
vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE.

Reported-by: Gerhard Wiesinger <redhat@wiesinger.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 13:40:21 +02:00
Paul Mackerras
6579804c43 KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault
Commit 71d29f43b6 ("KVM: PPC: Book3S HV: Don't use compound_order to
determine host mapping size", 2018-09-11) added a call to 
__find_linux_pte() and a dereference of the returned PTE pointer to the
radix page fault path in the common case where the page is normal
system memory.  Previously, __find_linux_pte() was only called for
mappings to physical addresses which don't have a page struct (e.g.
memory-mapped I/O) or where the page struct is marked as reserved
memory.

This exposes us to the possibility that the returned PTE pointer
could be NULL, for example in the case of a concurrent THP collapse
operation.  Dereferencing the returned NULL pointer causes a host
crash.

To fix this, we check for NULL, and if it is NULL, we retry the
operation by returning to the guest, with the expectation that it
will generate the same page fault again (unless of course it has
been fixed up by another CPU in the meantime).

Fixes: 71d29f43b6 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-04 14:51:11 +10:00
Sean Christopherson
daa07cbc9a KVM: x86: fix L1TF's MMIO GFN calculation
One defense against L1TF in KVM is to always set the upper five bits
of the *legal* physical address in the SPTEs for non-present and
reserved SPTEs, e.g. MMIO SPTEs.  In the MMIO case, the GFN of the
MMIO SPTE may overlap with the upper five bits that are being usurped
to defend against L1TF.  To preserve the GFN, the bits of the GFN that
overlap with the repurposed bits are shifted left into the reserved
bits, i.e. the GFN in the SPTE will be split into high and low parts.
When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO
access, get_mmio_spte_gfn() unshifts the affected bits and restores
the original GFN for comparison.  Unfortunately, get_mmio_spte_gfn()
neglects to mask off the reserved bits in the SPTE that were used to
store the upper chunk of the GFN.  As a result, KVM fails to detect
MMIO accesses whose GPA overlaps the repurprosed bits, which in turn
causes guest panics and hangs.

Fix the bug by generating a mask that covers the lower chunk of the
GFN, i.e. the bits that aren't shifted by the L1TF mitigation.  The
alternative approach would be to explicitly zero the five reserved
bits that are used to store the upper chunk of the GFN, but that
requires additional run-time computation and makes an already-ugly
bit of code even more inscrutable.

I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to
warn if GENMASK_ULL() generated a nonsensical value, but that seemed
silly since that would mean a system that supports VMX has less than
18 bits of physical address space...

Reported-by: Sakari Ailus <sakari.ailus@iki.fi>
Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
Cc: Junaid Shahid <junaids@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Junaid Shahid <junaids@google.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:41:00 +02:00
Stefan Raspl
fe804cd677 tools/kvm_stat: cut down decimal places in update interval dialog
We currently display the default number of decimal places for floats in
_show_set_update_interval(), which is quite pointless. Cutting down to a
single decimal place.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:59 +02:00
Liran Alon
62cf9bd811 KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS
L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only
when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls.

Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which
is L1 IA32_BNDCFGS.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:59 +02:00
Liran Alon
503234b3fd KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly
Commit a87036add0 ("KVM: x86: disable MPX if host did not enable
MPX XSAVE features") introduced kvm_mpx_supported() to return true
iff MPX is enabled in the host.

However, that commit seems to have missed replacing some calls to
kvm_x86_ops->mpx_supported() to kvm_mpx_supported().

Complete original commit by replacing remaining calls to
kvm_mpx_supported().

Fixes: a87036add0 ("KVM: x86: disable MPX if host did not enable
MPX XSAVE features")

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:57 +02:00
Liran Alon
5f76f6f5ff KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
Before this commit, KVM exposes MPX VMX controls to L1 guest only based
on if KVM and host processor supports MPX virtualization.
However, these controls should be exposed to guest only in case guest
vCPU supports MPX.

Without this change, a L1 guest running with kernel which don't have
commit 691bd4340b ("kvm: vmx: allow host to access guest
MSR_IA32_BNDCFGS") asserts in QEMU on the following:
	qemu-kvm: error: failed to set MSR 0xd90 to 0x0
	qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs:
	Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed'
This is because L1 KVM kvm_init_msr_list() will see that
vmx_mpx_supported() (As it only checks MPX VMX controls support) and
therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS.
However, later when L1 will attempt to set this MSR via KVM_SET_MSRS
IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu).

Therefore, fix the issue by exposing MPX VMX controls to L1 guest only
when vCPU supports MPX.

Fixes: 36be0b9deb ("KVM: x86: Add nested virtualization support for MPX")

Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:57 +02:00
Paolo Bonzini
4679b61f26 KVM: x86: never trap MSR_KERNEL_GS_BASE
KVM has an old optimization whereby accesses to the kernel GS base MSR
are trapped when the guest is in 32-bit and not when it is in 64-bit mode.
The idea is that swapgs is not available in 32-bit mode, thus the
guest has no reason to access the MSR unless in 64-bit mode and
32-bit applications need not pay the price of switching the kernel GS
base between the host and the guest values.

However, this optimization adds complexity to the code for little
benefit (these days most guests are going to be 64-bit anyway) and in fact
broke after commit 678e315e78 ("KVM: vmx: add dedicated utility to
access guest's kernel_gs_base", 2018-08-06); the guest kernel GS base
can be corrupted across SMIs and UEFI Secure Boot is therefore broken
(a secure boot Linux guest, for example, fails to reach the login prompt
about half the time).  This patch just removes the optimization; the
kernel GS base MSR is now never trapped by KVM, similarly to the FS and
GS base MSRs.

Fixes: 678e315e78
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-24 18:34:13 +02:00
Greg Kroah-Hartman
a27fb6d983 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Paolo writes:
  "It's mostly small bugfixes and cleanups, mostly around x86 nested
   virtualization.  One important change, not related to nested
   virtualization, is that the ability for the guest kernel to trap
   CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
   now masked by default.  This is because the feature is detected
   through an MSR; a very bad idea that Intel seems to like more and
   more.  Some applications choke if the other fields of that MSR are
   not initialized as on real hardware, hence we have to disable the
   whole MSR by default, as was the case before Linux 4.12."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
  KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
  kvm: selftests: Add platform_info_test
  KVM: x86: Control guest reads of MSR_PLATFORM_INFO
  KVM: x86: Turbo bits in MSR_PLATFORM_INFO
  nVMX x86: Check VPID value on vmentry of L2 guests
  nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
  KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
  KVM: VMX: check nested state and CR4.VMXE against SMM
  kvm: x86: make kvm_{load|put}_guest_fpu() static
  x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
  KVM: VMX: use preemption timer to force immediate VMExit
  KVM: VMX: modify preemption timer bit only when arming timer
  KVM: VMX: immediately mark preemption timer expired only for zero value
  KVM: SVM: Switch to bitmap_zalloc()
  KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
  kvm: selftests: use -pthread instead of -lpthread
  KVM: x86: don't reset root in kvm_mmu_setup()
  kvm: mmu: Don't read PDPTEs when paging is not enabled
  x86/kvm/lapic: always disable MMIO interface in x2APIC mode
  KVM: s390: Make huge pages unavailable in ucontrol VMs
  ...
2018-09-21 16:21:42 +02:00
Greg Kroah-Hartman
0eba8697bc Merge tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs
Richard writes:
  "This pull request contains fixes for UBIFS:
   - A wrong UBIFS assertion in mount code
   - Fix for a NULL pointer deref in mount code
   - Revert of a bad fix for xattrs"

* tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs:
  Revert "ubifs: xattr: Don't operate on deleted inodes"
  ubifs: drop false positive assertion
  ubifs: Check for name being NULL while mounting
2018-09-21 15:29:44 +02:00
Greg Kroah-Hartman
211b100a5c Merge tag 'for-linus-20180920' of git://git.kernel.dk/linux-block
Jens writes:
  "Storage fixes for 4.19-rc5

  - Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)

  - NVMe pull request from Christoph, and a single ANA log page fix
    (Hannes)

  - Regression fix for libata qd32 support, where we trigger an illegal
    active command transition. This fixes a CD-ROM detection issue that
    was reported, but could also trigger premature completion of the
    internal tag (me)"

* tag 'for-linus-20180920' of git://git.kernel.dk/linux-block:
  floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
  libata: mask swap internal and hardware tag
  nvme: count all ANA groups for ANA Log page
2018-09-21 09:41:05 +02:00
Greg Kroah-Hartman
a38fd7d808 Merge tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm
David writes:
  "drm fixes for 4.19-rc5:

   - core: fix debugfs for atomic, fix the check for atomic for
     non-modesetting drivers
   - amdgpu: adds a new PCI id, some kfd fixes and a sdma fix
   - i915: a bunch of GVT fixes.
   - vc4: scaling fix
   - vmwgfx: modesetting fixes and a old buffer eviction fix
   - udl: framebuffer destruction fix
   - sun4i: disable on R40 fix until next kernel
   - pl111: NULL termination on table fix"

* tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm: (21 commits)
  drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs
  drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9
  drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
  drm/vmwgfx: Fix buffer object eviction
  drm/vmwgfx: Don't impose STDU limits on framebuffer size
  drm/vmwgfx: limit mode size for all display unit to texture_max
  drm/vmwgfx: limit screen size to stdu_max during check_modeset
  drm/vmwgfx: don't check for old_crtc_state enable status
  drm/amdgpu: add new polaris pci id
  drm: sun4i: drop second PLL from A64 HDMI PHY
  drm: fix drm_drv_uses_atomic_modeset on non modesetting drivers.
  drm/i915/gvt: clear ggtt entries when destroy vgpu
  drm/i915/gvt: request srcu_read_lock before checking if one gfn is valid
  drm/i915/gvt: Add GEN9_CLKGATE_DIS_4 to default BXT mmio handler
  drm/i915/gvt: Init PHY related registers for BXT
  drm/atomic: Use drm_drv_uses_atomic_modeset() for debugfs creation
  drm/fb-helper: Remove set but not used variable 'connector_funcs'
  drm: udl: Destroy framebuffer only if it was initialized
  drm/sun4i: Remove R40 display pipeline compatibles
  drm/pl111: Make sure of_device_id tables are NULL terminated
  ...
2018-09-21 09:11:18 +02:00
Dave Airlie
4fcb7f8be8 Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few fixes for 4.19:
- Add a new polaris pci id
- KFD fixes for raven and gfx7

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920155850.5455-1-alexander.deucher@amd.com
2018-09-21 09:52:27 +10:00
Dave Airlie
618cc1514b Merge branch 'vmwgfx-fixes-4.19' of git://people.freedesktop.org/~thomash/linux into drm-fixes
A couple of modesetting fixes and a fix for a long-standing buffer-eviction
problem cc'd stable.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thellstrom@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920063935.35492-1-thellstrom@vmware.com
2018-09-21 09:51:09 +10:00
Junxiao Bi
234b69e3e0 ocfs2: fix ocfs2 read block panic
While reading block, it is possible that io error return due to underlying
storage issue, in this case, BH_NeedsValidate was left in the buffer head.
Then when reading the very block next time, if it was already linked into
journal, that will trigger the following panic.

[203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
[203748.702533] invalid opcode: 0000 [#1] SMP
[203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
[203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
[203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
[203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
[203748.703088] RIP: 0010:[<ffffffffa05e9f09>]  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.703130] RSP: 0018:ffff88006ff4b818  EFLAGS: 00010206
[203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
[203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
[203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
[203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
[203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
[203748.705871] FS:  00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
[203748.706370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
[203748.707124] Stack:
[203748.707371]  ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
[203748.707885]  0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
[203748.708399]  00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
[203748.708915] Call Trace:
[203748.709175]  [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[203748.709680]  [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
[203748.710185]  [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
[203748.710691]  [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
[203748.711204]  [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
[203748.711716]  [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
[203748.712227]  [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
[203748.712737]  [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
[203748.713003]  [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2]
[203748.713263]  [<ffffffff8121714b>] vfs_create+0xdb/0x150
[203748.713518]  [<ffffffff8121b225>] do_last+0x815/0x1210
[203748.713772]  [<ffffffff812192e9>] ? path_init+0xb9/0x450
[203748.714123]  [<ffffffff8121bca0>] path_openat+0x80/0x600
[203748.714378]  [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620
[203748.714634]  [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0
[203748.714888]  [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130
[203748.715143]  [<ffffffff81209ffc>] do_sys_open+0x12c/0x220
[203748.715403]  [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180
[203748.715668]  [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190
[203748.715928]  [<ffffffff8120a10e>] SyS_open+0x1e/0x20
[203748.716184]  [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7
[203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
[203748.717505] RIP  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.717775]  RSP <ffff88006ff4b818>

Joesph ever reported a similar panic.
Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html

Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:12 +02:00
Roman Gushchin
172b06c32b mm: slowly shrink slabs with a relatively small number of objects
9092c71bb7 ("mm: use sc->priority for slab shrink targets") changed the
way that the target slab pressure is calculated and made it
priority-based:

    delta = freeable >> priority;
    delta *= 4;
    do_div(delta, shrinker->seeks);

The problem is that on a default priority (which is 12) no pressure is
applied at all, if the number of potentially reclaimable objects is less
than 4096 (1<<12).

This causes the last objects on slab caches of no longer used cgroups to
(almost) never get reclaimed.  It's obviously a waste of memory.

It can be especially painful, if these stale objects are holding a
reference to a dying cgroup.  Slab LRU lists are reparented on memcg
offlining, but corresponding objects are still holding a reference to the
dying cgroup.  If we don't scan these objects, the dying cgroup can't go
away.  Most likely, the parent cgroup hasn't any directly charged objects,
only remaining objects from dying children cgroups.  So it can easily hold
a reference to hundreds of dying cgroups.

If there are no big spikes in memory pressure, and new memory cgroups are
created and destroyed periodically, this causes the number of dying
cgroups grow steadily, causing a slow-ish and hard-to-detect memory
"leak".  It's not a real leak, as the memory can be eventually reclaimed,
but it could not happen in a real life at all.  I've seen hosts with a
steadily climbing number of dying cgroups, which doesn't show any signs of
a decline in months, despite the host is loaded with a production
workload.

It is an obvious waste of memory, and to prevent it, let's apply a minimal
pressure even on small shrinker lists.  E.g.  if there are freeable
objects, let's scan at least min(freeable, scan_batch) objects.

This fix significantly improves a chance of a dying cgroup to be
reclaimed, and together with some previous patches stops the steady growth
of the dying cgroups number on some of our hosts.

Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
Fixes: 9092c71bb7 ("mm: use sc->priority for slab shrink targets")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
YueHaibing
3bf181bc5d kernel/sys.c: remove duplicated include
Link: http://lkml.kernel.org/r/20180821133424.18716-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
Joel Fernandes (Google)
b45d71fb89 mm: shmem.c: Correctly annotate new inodes for lockdep
Directories and inodes don't necessarily need to be in the same lockdep
class.  For ex, hugetlbfs splits them out too to prevent false positives
in lockdep.  Annotate correctly after new inode creation.  If its a
directory inode, it will be put into a different class.

This should fix a lockdep splat reported by syzbot:

> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-rc8-next-20180810+ #36 Not tainted
> ------------------------------------------------------
> syz-executor900/4483 is trying to acquire lock:
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
> include/linux/fs.h:765 [inline]
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
> shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>
> but task is already holding lock:
> 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
> drivers/staging/android/ashmem.c:448
>
> which lock already depends on the new lock.
>
> -> #2 (ashmem_mutex){+.+.}:
>        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
>        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
>        call_mmap include/linux/fs.h:1844 [inline]
>        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
>        do_mmap+0xa10/0x1220 mm/mmap.c:1535
>        do_mmap_pgoff include/linux/mm.h:2298 [inline]
>        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
>        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
>        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
>        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (&mm->mmap_sem){++++}:
>        __might_fault+0x155/0x1e0 mm/memory.c:4568
>        _copy_to_user+0x30/0x110 lib/usercopy.c:25
>        copy_to_user include/linux/uaccess.h:155 [inline]
>        filldir+0x1ea/0x3a0 fs/readdir.c:196
>        dir_emit_dot include/linux/fs.h:3464 [inline]
>        dir_emit_dots include/linux/fs.h:3475 [inline]
>        dcache_readdir+0x13a/0x620 fs/libfs.c:193
>        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
>        __do_sys_getdents fs/readdir.c:231 [inline]
>        __se_sys_getdents fs/readdir.c:212 [inline]
>        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&sb->s_type->i_mutex_key#9){++++}:
>        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
>        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
>        inode_lock include/linux/fs.h:765 [inline]
>        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
>        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
>        vfs_ioctl fs/ioctl.c:46 [inline]
>        file_ioctl fs/ioctl.c:501 [inline]
>        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
>        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
>        __do_sys_ioctl fs/ioctl.c:709 [inline]
>        __se_sys_ioctl fs/ioctl.c:707 [inline]
>        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
>   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(ashmem_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(ashmem_mutex);
>   lock(&sb->s_type->i_mutex_key#9);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor900/4483:
>  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
> ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448

Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Suggested-by: NeilBrown <neilb@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
Dominique Martinet
a1b3d2f217 fs/proc/kcore.c: fix invalid memory access in multi-page read optimization
The 'm' kcore_list item could point to kclist_head, and it is incorrect to
look at m->addr / m->size in this case.

There is no choice but to run through the list of entries for every
address if we did not find any entry in the previous iteration

Reset 'm' to NULL in that case at Omar Sandoval's suggestion.

[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org
Fixes: bf991c2231 ("proc/kcore: optimize multiple page reads")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
Pasha Tatashin
889c695d41 mm: disable deferred struct page for 32-bit arches
Deferred struct page init is needed only on systems with large amount of
physical memory to improve boot performance.  32-bit systems do not
benefit from this feature.

Jiri reported a problem where deferred struct pages do not work well with
x86-32:

[    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.036269] Initializing CPU#0
[    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
[    0.038459] page:f6780000 is uninitialized and poisoned
[    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
[    0.040038] ------------[ cut here ]------------
[    0.040399] kernel BUG at include/linux/page-flags.h:293!
[    0.040823] invalid opcode: 0000 [#1] SMP PTI
[    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
[    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[    0.042496] EIP: free_highmem_page+0x64/0x80
[    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
[    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
[    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
[    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
[    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
[    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.046913] DR6: fffe0ff0 DR7: 00000400
[    0.047220] Call Trace:
[    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
[    0.047854]  set_highmem_pages_init+0x5b/0x71
[    0.048202]  mem_init+0x2b/0x1e8
[    0.048460]  start_kernel+0x1d2/0x425
[    0.048757]  i386_start_kernel+0x93/0x97
[    0.049073]  startup_32_smp+0x164/0x168
[    0.049379] Modules linked in:
[    0.049626] ---[ end trace 337949378db0abbb ]---

We free highmem pages before their struct pages are initialized:

mem_init()
 set_highmem_pages_init()
  add_highpages_with_active_regions()
   free_highmem_page()
    .. Access uninitialized struct page here..

Because there is no reason to have this feature on 32-bit systems, just
disable it.

Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
Fixes: 2e3ca40f03 ("mm: relax deferred struct page requirements")
Signed-off-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
KJ Tsanaktsidis
f83606f5eb fork: report pid exhaustion correctly
Make the clone and fork syscalls return EAGAIN when the limit on the
number of pids /proc/sys/kernel/pid_max is exceeded.

Currently, when the pid_max limit is exceeded, the kernel will return
ENOSPC from the fork and clone syscalls.  This is contrary to the
documented behaviour, which explicitly calls out the pid_max case as one
where EAGAIN should be returned.  It also leads to really confusing error
messages in userspace programs which will complain about a lack of disk
space when they fail to create processes/threads for this reason.

This error is being returned because alloc_pid() uses the idr api to find
a new pid; when there are none available, idr_alloc_cyclic() returns
-ENOSPC, and this is being propagated back to userspace.

This behaviour has been broken before, and was explicitly fixed in
commit 35f71bc0a0 ("fork: report pid reservation failure properly"),
so I think -EAGAIN is definitely the right thing to return in this case.
The current behaviour change dates from commit 95846ecf9d ("pid:
replace pid bitmap implementation with IDR AIP") and was I believe
unintentional.

This patch has no impact on the case where allocating a pid fails because
the child reaper for the namespace is dead; that case will still return
-ENOMEM.

Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com
Fixes: 95846ecf9d ("pid: replace pid bitmap implementation with IDR AIP")
Signed-off-by: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
Richard Weinberger
f061c1cc40 Revert "ubifs: xattr: Don't operate on deleted inodes"
This reverts commit 11a6fc3dc7.
UBIFS wants to assert that xattr operations are only issued on files
with positive link count. The said patch made this operations return
-ENOENT for unlinked files such that the asserts will no longer trigger.
This was wrong since xattr operations are perfectly fine on unlinked
files.
Instead the assertions need to be fixed/removed.

Cc: <stable@vger.kernel.org>
Fixes: 11a6fc3dc7 ("ubifs: xattr: Don't operate on deleted inodes")
Reported-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-09-20 21:37:41 +02:00
Sascha Hauer
d3bdc016c5 ubifs: drop false positive assertion
The following sequence triggers

	ubifs_assert(c, c->lst.taken_empty_lebs > 0);

at the end of ubifs_remount_fs():

mount -t ubifs /dev/ubi0_0 /mnt
echo 1 > /sys/kernel/debug/ubifs/ubi0_0/ro_error
umount /mnt
mount -t ubifs -o ro /dev/ubix_y /mnt
mount -o remount,ro /mnt

The resulting

UBIFS assert failed in ubifs_remount_fs at 1878 (pid 161)

is a false positive. In the case above c->lst.taken_empty_lebs has
never been changed from its initial zero value. This will only happen
when the deferred recovery is done.

Fix this by doing the assertion only when recovery has been done
already.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-09-20 21:37:07 +02:00
Richard Weinberger
37f31b6ca4 ubifs: Check for name being NULL while mounting
The requested device name can be NULL or an empty string.
Check for that and refuse to continue. UBIFS has to do this manually
since we cannot use mount_bdev(), which checks for this condition.

Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-09-20 21:37:07 +02:00
Liran Alon
26b471c7e2 KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set
their return value in "r" local var and break out of switch block
when they encounter some error.
This is because vcpu_load() is called before the switch block which
have a proper cleanup of vcpu_put() afterwards.

However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return
immediately on error without performing above mentioned cleanup.

Thus, change these handlers to behave as expected.

Fixes: 8fcc4b5923 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")

Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 18:54:08 +02:00
Yong Zhao
44d8cc6f1a drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs
Because CRAT_CU_FLAGS_IOMMU_PRESENT was not set in some BIOS crat, we
need to workaround this.

For future compatibility, we also overwrite the bit in capability according
to the value of needs_iommu_device.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-20 10:25:23 -05:00
Yong Zhao
15426dbb65 drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9
CWSR fails on Raven if the control stack is MTYPE_UC, which is used
for regular GART mappings. As a workaround we map it using MTYPE_NC.

The MEC firmware expects the control stack at one page offset from the
start of the MQD so it is part of the MQD allocation on GFXv9. AMDGPU
added a memory allocation flag just for this purpose.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-20 10:25:17 -05:00
Amber Lin
caaa4c8a6b drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
A wrong register bit was examinated for checking SDMA status so it reports
false failures. This typo only appears on gfx_v7. gfx_v8 checks the correct
bit.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-20 10:25:01 -05:00
Jens Axboe
d611aaf336 Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus
Pull NVMe fix from Christoph.

* 'nvme-4.19' of git://git.infradead.org/nvme:
  nvme: count all ANA groups for ANA Log page
2018-09-20 09:10:38 -06:00
Andy Whitcroft
65eea8edc3 floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
The final field of a floppy_struct is the field "name", which is a pointer
to a string in kernel memory.  The kernel pointer should not be copied to
user memory.  The FDGETPRM ioctl copies a floppy_struct to user memory,
including this "name" field.  This pointer cannot be used by the user
and it will leak a kernel address to user-space, which will reveal the
location of kernel code and data and undermine KASLR protection.

Model this code after the compat ioctl which copies the returned data
to a previously cleared temporary structure on the stack (excluding the
name pointer) and copy out to userspace from there.  As we already have
an inparam union with an appropriate member and that memory is already
cleared even for read only calls make use of that as a temporary store.

Based on an initial patch by Brian Belleville.

CVE-2018-7755
Signed-off-by: Andy Whitcroft <apw@canonical.com>

Broke up long line.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-20 09:09:48 -06:00
Jens Axboe
7ce5c8cd75 libata: mask swap internal and hardware tag
hen we're comparing the hardware completion mask passed in from the
driver with the internal tag pending mask, we need to account for the
fact that the internal tag is different from the hardware tag. If not,
then we can end up either prematurely completing the internal tag (since
it's not set in the hw mask), or simply flag an error:

ata2: illegal qc_active transition (100000000->00000001)

If the internal tag is set, then swap that with the hardware tag in this
case before comparing with what the hardware reports.

Fixes: 28361c4036 ("libata: add extra internal command")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201151
Cc: stable@vger.kernel.org
Reported-by: Paul Sbarra <sbarra.paul@gmail.com>
Tested-by: Paul Sbarra <sbarra.paul@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-20 08:30:55 -06:00
Miguel Ojeda
ae596de1a0 Compiler Attributes: naked can be shared
The naked attribute is supported by at least gcc >= 4.6 (for ARM,
which is the only current user), gcc >= 8 (for x86), clang >= 3.1
and icc >= 13. See https://godbolt.org/z/350Dyc

Therefore, move it out of compiler-gcc.h so that the definition
is shared by all compilers.

This also fixes Clang support for ARM32 --- 815f0ddb34
("include/linux/compiler*.h: make compiler-*.h mutually exclusive").

Fixes: 815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Eli Friedman <efriedma@codeaurora.org>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-sparse@vger.kernel.org
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 15:23:58 +02:00
Miguel Ojeda
d124b44f09 Compiler Attributes: naked was fixed in gcc 4.6
Commit 9c695203a7 ("compiler-gcc.h: gcc-4.5 needs noclone
and noinline on __naked functions") added noinline and noclone
as a workaround for a gcc 4.5 bug, which was resolved in 4.6.0.

Since now the minimum gcc supported version is 4.6,
we can clean it up.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290
and https://godbolt.org/z/h6NMIL

Fixes: 815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Eli Friedman <efriedma@codeaurora.org>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-sparse@vger.kernel.org
Tested-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 15:23:58 +02:00
Greg Kroah-Hartman
4b92e7fd76 Merge tag 'mtd/fixes-for-4.19-rc5' of git://git.infradead.org/linux-mtd
Boris writes:
  "- Fixes a bug in the ->read/write_reg() implementation of the m25p80
     driver
   - Make sure of_node_get/put() calls are balanced in the partition
     parsing code
   - Fix a race in the denali NAND controller driver
   - Fix false positive WARN_ON() in the marvell NAND controller driver"

* tag 'mtd/fixes-for-4.19-rc5' of git://git.infradead.org/linux-mtd:
  mtd: devices: m25p80: Make sure the buffer passed in op is DMA-able
  mtd: partitions: fix unbalanced of_node_get/put()
  mtd: rawnand: denali: fix a race condition when DMA is kicked
  mtd: rawnand: marvell: prevent harmless warnings
2018-09-20 11:25:20 +02:00
Greg Kroah-Hartman
d82920849f Merge tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Takashi writes:
  "sound fixes for 4.19-rc5

   here comes a collection of various fixes, mostly for stable-tree
   or regression fixes.

   Two relatively high LOCs are about the (rather simple) conversion of
   uapi integer types in topology API, and a regression fix about HDMI
   hotplug notification on AMD HD-audio.  The rest are all small
   individual fixes like ASoC Intel Skylake race condition, minor
   uninitialized page leak in emu10k1 ioctl, Firewire audio error paths,
   and so on."

* tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
  ALSA: fireworks: fix memory leak of response buffer at error path
  ALSA: oxfw: fix memory leak of discovered stream formats at error path
  ALSA: oxfw: fix memory leak for model-dependent data at error path
  ALSA: bebob: fix memory leak for M-Audio FW1814 and ProjectMix I/O at error path
  ALSA: hda - Enable runtime PM only for discrete GPU
  ALSA: oxfw: fix memory leak of private data
  ALSA: firewire-tascam: fix memory leak of private data
  ALSA: firewire-digi00x: fix memory leak of private data
  sound: don't call skl_init_chip() to reset intel skl soc
  sound: enable interrupt after dma buffer initialization
  Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
  ALSA: emu10k1: fix possible info leak to userspace on SNDRV_EMU10K1_IOCTL_INFO
  ASoC: cs4265: fix MMTLR Data switch control
  ASoC: AMD: Ensure reset bit is cleared before configuring
  ALSA: fireface: fix memory leak in ff400_switch_fetching_mode()
  ALSA: bebob: use address returned by kmalloc() instead of kernel stack for streaming DMA mapping
  ASoC: rsnd: don't fallback to PIO mode when -EPROBE_DEFER
  ASoC: rsnd: adg: care clock-frequency size
  ASoC: uniphier: change status to orphan
  ASoC: rsnd: fixup not to call clk_get/set under non-atomic
  ...
2018-09-20 09:50:49 +02:00
Thomas Hellstrom
e71cf59187 drm/vmwgfx: Fix buffer object eviction
Commit 19be557010 ("drm/ttm: add operation ctx to ttm_bo_validate v2")
introduced a regression where the vmwgfx driver refused to evict a
buffer that was still busy instead of waiting for it to become idle.

Fix this.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2018-09-20 08:05:14 +02:00
Deepak Rawat
a4bd815a94 drm/vmwgfx: Don't impose STDU limits on framebuffer size
If framebuffers are larger, we create bounce surfaces that are within
STDU limits.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:03 +02:00
Deepak Rawat
140b4e67c2 drm/vmwgfx: limit mode size for all display unit to texture_max
For all display units, limit mode size exposed to texture_max_width/
height as this is the maximum framebuffer size that virtual device can
create.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:03 +02:00
Deepak Rawat
0c1b174b1b drm/vmwgfx: limit screen size to stdu_max during check_modeset
For STDU individual screen target size is limited by
SVGA_REG_SCREENTARGET_MAX_WIDTH/HEIGHT registers so add that limit
during atomic check_modeset.

An additional limit is placed in the update_layout ioctl to avoid
requesting layouts that current user-space typically can't support.
Also modified the comments to reflect current limitation on topology.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:03 +02:00
Deepak Rawat
bfc8882614 drm/vmwgfx: don't check for old_crtc_state enable status
During atomic check to prepare the new topology no need to check if
old_crtc_state was enabled or not. This will cause atomic_check to fail
because due to connector routing a crtc can be in atomic_state even if
there was no change to enable status.

Detected this issue with igt run.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:02 +02:00
Alex Deucher
30f3984ede drm/amdgpu: add new polaris pci id
Add new pci id.

Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2018-09-19 22:35:23 -05:00
Dave Airlie
8ca4fff974 Merge tag 'drm-intel-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Only fixes coming from gvt containing "Two more BXT fixes from Colin,
one srcu locking fix and one fix for GGTT clear when destroy vGPU."

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180919151915.GA6309@intel.com
2018-09-20 10:01:53 +10:00
Dave Airlie
d5b3a31b1c Merge tag 'drm-misc-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v4.19-rc5:
- Fix crash in vgem in drm_drv_uses_atomic_modeset.
- Allow atomic drivers that don't set DRIVER_ATOMIC to create debugfs entries.
- Fix compiler warning for unused connector_funcs.
- Fix null pointer deref on UDL unplug.
- Disable DRM support for sun4i's R40 for now.
  (Not all patches went in for v4.19, so it has to wait a cycle.)
- NULL-terminate the of_device_id table in pl111.
- Make sure vc4 NV12 planar format works when displaying an unscaled fb.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/dda393bb-f13f-8d36-711b-cacfc578e5a3@linux.intel.com
2018-09-20 10:00:46 +10:00
Drew Schmitt
8b56ee91ff kvm: selftests: Add platform_info_test
Test guest access to MSR_PLATFORM_INFO when the capability is enabled
or disabled.

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:47 +02:00